[Date Prev][Date Next] [Thread Prev][Thread Next]
[Date Index] [Thread Index] [New search]

Re: Conversion of Word documents to structured frame documents



At 07:11 PM 4/8/99 +1000, Marcus Carr wrote:
>
>-------------------Snip
>No, it doesn't - I've looked into it further. For structured documents, the XML
>export maps the SGML elements to XML elements using the same name by
default. You
>can change the name using read/write rules as you do in an SGML application.
>----------------Snip
+++++++++++++++++++++++++++
Thanks, Marcus for finally clearing this up. I'm still suspicious that
the devil is in the details, however. I guess I'll just have to break down
and get a copy of FM+SGML 5.5.6 to find out for myself.
++++++++++++++++++++++++++++++++++++
>> SEMA's RTF-DOC DTD and rtf2rdc filter can do that.
>> SEMA also has an rdc2rtf filter that converts RTF-DOC-conforming structured
>> docs back to RTF. I then waxed lyrically that such a round-trip capability
>> could solve a problem that's constantly coming up in postings on the two
>> Framers lists, namely the unreliability of document conversions between
>> FrameMaker and Word.
>
>The unreliability of document conversions between FrameMaker and Word is
not the
>same as round tripping. Yes, people do complain about not being able to go
in one
>direction or the other, but I have rarely seen postings from people who
seriously
>want to round trip.
>--------------------------------Snip
>As you pointed out, "The subject of this thread, originated by
>wendy_ling@uk.ibm.com, was whether there was a way to convert Word docs to
>structured FM+SGML docs". The fact that the documents conform to a DTD when
they
>come into FrameMaker+SGML doesn't mean that they're usefully structured. All
>recursion information (except possibly very high level things like sections)
>disappears when you convert back to RTF. In keeping with the lowest common
>denominator theory, that means not adding any structure in FrameMaker+SGML,
as it
>will be blasted anyway. I don't consider these to be structured documents.
>--------------------------Snip
>I have written many filters to go from SGML to RTF, so my approach would be
>different. I would save the SGML out of FrameMaker+SGML and write a
conversion that
>dealt with converting SGML conforming to a specific DTD to RTF. This
typically only
>takes a couple of days, unless your DTD is huge. Now I have the SGML to RTF
side
>covered.
++++++++++++++++++++++++++++++++++++++++++++++++
But if you do it the way you describe above, there's no way to include the
RTF formatting information (font definitions, style sheets, ad-hoc format
overrides, etc.)
++++++++++++++++++++++++++++++++++++++++++++++++++ 
Can I get the RTF to SGML? No, because my SGML structure is more
>complicated than RTF is capable of representing. Your approach seems to involve
>dumbing your structure down to something that matches RTF - if the customer is
>satisfied with such a structure, then you do indeed have a solution. I
don't however
>see this very narrow band of users as being the salvation of the product.
++++++++++++++++++++++++++++++++++++++++++++++
After further analysis, it appears that SEMA's round-trip filters have the
principal purpose of archival storage of unstructured Word (and other
RTF-compatible WP products) documents in a neutral format (XML or SGML) that
preserves the formatting information so that they can be recovered years
later when the original WP is no longer available.

However, the SEMA rtf2rdc filter's preservation (in attributes) of the
original font definitions, stylesheet, and document header, combined with
the preservation (again in attributes) of any ad hoc format variations in
XML/SGML paragraph and character style element instances, is a nice touch.
The fact that each paragraph (PARA) and character style (CS) element in the
RTF-DOC DTD has attributes that identify the applicable stylesheet instance
being used offers the opportunity to use SGML- or SML-aware tools to convert
RTF-DOC document instances to more elaborate structures if:

1. The stylesheet names are indicative of the structure, AND

2. Consistent tagging was utilized during the preparation of the original WP
document.
++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 8. What I was proposing, in my original post to this thread, was that the
>> RTF-DOC DTD, combined with the round-trip filters from SEMA, might offer a
>> solution for publications groups confronted with the (round-trip) dilemma...
----------------------Snip
>It would accomplish that, but I really don't believe that there's a large
market for
>it. If it really was a winner, the good people at Adobe would have spent
some of
>their fortunes on beefing up the RTF filters. After all, that cuts out the
>uncertainty involved with introducing SEMA into the loop.
+++++++++++++++++++++++++++++++++++++++++++++++++++++
I think Adobe's strategy is being driven mainly by what the existing major
license holders want. Most of those companies' businesses are in the
military, aerospace, semiconductor, pharmaceutical, and telecommunications
fields, where the Word vs. Frame debate is already resolved in favor of
Frame. They aren't much concerned about round-tripping.

But Frame, although it is ideally suited for producing proposal documents,
has little penetration of that market, and one of the main reasons is the
need for Word-to-Frame round-tripping. Most proposal input comes from people
who use Word. If the proposal group uses Frame or FM+SGML, their documents
must be converted back to Word for editing/updating by the proposal
contributors. And many US government agencies still require that proposals
be submitted in Word or WordPerfect, even when submittals in PDF or HTML are
also allowed. The increasing US government requirement for page-limited
proposals imposes even greater demands on round-tripping, since conversions
in one direction or the other might change the page count.

I know of several instances where major Frame license holders have
considered using Frame or FM+SGML in their proposal groups, but abandoned
the idea because of the unreliability of the round-trip conversion process.
In a pressure-cooker proposal environment, round-trip conversions must be
almost completely free of errors, or the necessity for any post-conversion
clean-up.

Even when round-tripping is not a regular occurrence, it can still be a
vital requirement. For example:

1. Source data is often created in Word, and must be converted to Frame
without introducing errors or the need for extensive post-conversion clean-up.

2. Legacy documents in Word need to be converted to Frame, particularly when
an organization first acquires Frame.

3. An enterprise's tools for converting to on-line context-sensitive help
(e.g., HTML Help, WinHelp, RoboHelp, etc.) may require that the input be in
Word or RTF, necessitating error-free conversions of Frame docs to RTF or Word.

4. Documents created in Frame may have to be repurposed using Word (e.g.,
training materials produced by departments other that don't use Frame.

In summary, Frame is still (and probably always will be) a niche product,
which means that it is not widely distributed electronically in its native
format. Presently, the only reliable conversion available within the
FrameMaker product is to PDF, and even that conversion is often problematic.
Conversion of FM+SGML structured docs to SGML or XML often requires
extensive development. If the market for Frame products is to broaden, its
capability for round-trip conversions to/from other formats must be expanded
and improved. The most likely source of such conversion tools is third-party
software vendors like SEMA, Omni Systems, Blueberry, Quadralay, and (in the
case of SGML/XML) OmniMark.

It now appears that Adobe plans to issue a major new release of Frame about
once every two years (provided they can find a way to avoid bug-ridden point
releases such as 5.5). Third-party software vendors of conversion tools are
on a much shorter release schedule, because the nature of their business
demands it. Adobe would be better off if it subsidized, or in other ways
supported, those third-party vendors rather than trying to develop adequate
conversion tools within the Frame product itself. Promotional deals could be
struck that offered these third-party products at a deep discount to Frame
license holders.

 
     ====================
     | Nullius in Verba |
     ====================
Dan Emory, Dan Emory & Associates
FrameMaker/FrameMaker+SGML Document Design & Database Publishing
Voice/Fax: 949-722-8971 E-Mail: danemory@primenet.com
10044 Adams Ave. #208, Huntington Beach, CA 92646
---Subscribe to the "Free Framers" list by sending a message to
   majordomo@omsys.com with "subscribe framers" (no quotes) in the body.


** To unsubscribe, send a message to majordomo@omsys.com **
** with "unsubscribe framers" (no quotes) in the body.   **