[Date Prev][Date Next] [Thread Prev][Thread Next]
[Date Index] [Thread Index] [New search]

[one more time] RE: MS Word and XML



While I was composing my previous brilliant missive, Dan Emory wrote:
 
> Moritz Berger et al argue that very old legacy documents stored on 
> floppyies can easily be recovered. Apparently they haven't 
> heard of the 
> legacy doc nightmares experienced by many large companies and 
> government 
> agencies who discover, too late, that they still need those 
> old docs, and 
> find it to be virtually impossible to bring them into some 
> "modern" DTP. 

That's true, but their problems are typically with files from custom
mainframe apps using some odd variant of EBCDIC, or from obscure or
custom minicomputer or PC applications. People don't lose the data in
"legacy" Word, WP, or Lotus 1-2-3 files, and the like, precisely because
those applications were so popular that their modern replacements still
have the appropriate import filters.  

> There are large outfits like Data Conversion Laboratories 
> (DCL) who make an 
> excellent living by coming up with (very expensive) 
> conversion solutions 
> for such legacy docs. The last time I heard, DCL wouldn't 
> touch anything 
> less than a million pages. 

Data Conversion Laboratory, Inc. (the name was wrong in Publishers
Weekly and other online articles) mainly converts unstructured text from
various sources (paper, Quark, Bookmaster, Interleaf, etc.) to
structured (SGML, XML, HTML). It also creates eBooks and PDFs for
publishing houses, etc. Although they have very large projects, a quick
scan of their client information turned up projects as small as a
thousand pages. 

Most of the project descriptions I skimmed had nothing to do with
"recovering" data from "legacy" docs. It's stuff like converting troff
to FrameMaker, hardcopy to SGML, WP to XML, etc. Naturally, converting a
large unstructured document to SGML or XML is difficult and expensive,
but this has nothing to do with proprietary file formats and everything
to do with the effort required to structure an unstructured document. 

<snip> 
> All of this is avoidable.

As I stated in reply to Larry, you can protect yourself against being
locked into a specific vendor or file format by storing your data in
formats that are _ubiquitous_. XML is certainly showing signs (finally)
of fitting the bill. But, if you don't have or currently need structure,
you have to ask yourself if this is the right time to incur the expense
of structuring all your docs. And, will all those memos and meeting
minutes ever need to be structured?

For large manuals, give me FM or nothing. But, for people creating more
modest docs, there's nothing safer from obsolescence right now than MS
Word. Every word processing or desktop publishing program for the next
20 years is going to read Word files. There's safety in numbers :-)

Richard


------
Richard G. Combs
Senior Technical Writer
Voyant Technologies, Inc.
richardDOTcombs AT voyanttechDOTcom
303-223-5111
------
rgcombs AT freeDASHmarketDOTnet
303-777-0436
------








** To unsubscribe, send a message to majordomo@omsys.com **
** with "unsubscribe framers" (no quotes) in the body.   **