[Date Prev][Date Next] [Thread Prev][Thread Next]
[Date Index] [Thread Index] [New search]

Re: MS Word and XML

On Mon, 28 Apr 2003 11:15:12 -0700, Dov Isaacs <isaacs@Adobe.COM> 

>Even RTF changed although in an upward-compatible manner.

Not entirely upward-compatible.  For example, in earlier versions
of Word, graphics sizes were expressed in twips (1440/inch).  But
starting with Word 8/97, the units were silently changed to 0.1mm
(2540/inch), MS "himetric".  There was *no* way in RTF to indicate
which unit size was being used, so Word 8 docs misinterpret Word 7
RTF to make the graphics look considerably smaller.  There are more
differences; we find them regularly while improving our Word RTF
export filter, a never-ending task... ;-)

>I would disagree that there is ANYTHING straightforward about writing
>an import filter for Microsoft Word-format documents, whether binary
>or RTF. Not only is there the problem of physically parsing these
>formats, but there is the larger problem of INTERPRETATION of the
>formatting data therein. 

Amen.  The MS RTF specs are a travesty of technical documentation.
Aside from the outright errors and typos (many), they totally lack
examples of usage.  To understand what a given element *does*, you
need to examine Word RTF files and use inductive logic.  Heavily.

And some MS formats that are used throughout Windows apps, like the
Structured Storage (OLE object) format, are not documented at all.
Deliberately so, in the case of OLE, so that you are forced to use
MS licensed libraries to read or write them.  These libraries,
oddly <g>, are available only on certain platforms... excluding,
for example, UNIX.  We had to reverse-engineer that one ourselves.
It took weeks of work, just so that we could extract the preview
WMF from embedded OLE graphics in FrameMaker.

>If Microsoft has trouble interpreting all the versions of their
>documents, what do you we and others have?!?

A hard row to hoe... ;-)  Nonetheless, we just might take a shot
at it sometime.  <bg>  We have some interesting ideas about how
a filter to import Word into Frame should work, rather different
from the design principles used in the present native filters,
and if we can get past the shuddering that we experience after
opening a current Word RTF file in a text editor, we may embody
those ideas in an import filter.  In our lifetime.  ;-)

-- Jeremy H. Griffith, at Omni Systems Inc.
  (jeremy@omsys.com)  http://www.omsys.com/

** To unsubscribe, send a message to majordomo@omsys.com **
** with "unsubscribe framers" (no quotes) in the body.   **