[Date Prev][Date Next] [Thread Prev][Thread Next]
[Date Index] [Thread Index] [New search]

Re: Conversion of Word documents to structured frame documents



At 12:24 PM 4/6/99 +1000, Marcus Carr wrote:
>This is my third attempt at responding to this mail - continuous lines of equal
>signs wreaks havoc on my mail app (NS 4.08).
++++++++++++++++++++++++++++++++++++++++++++++++++++
I apologize for using equal signs. Maybe plus signs are better. In any
event, what this demonstrates is the kind of problem that gets in the way of
error-free electronic information exchange, which was the subject I was
trying to address in my earlier posts
+++++++++++++++++++++++++++++++++++++++++++++++++++++
At 03:30 PM 4/6/99 +1000, Hedley_S_Finger@allegiance.com.au wrote:
>If you have created a structured document in FM+SGML, then you can export a
>DTD that maps the EDD. You can use an XML-aware editor to create a
>conforming instance document that FM+SGML can import and map the EDD
>element definitions and format rules to.  I do not see why you need to
>create a para/char tag for each element in the EDD -- or are we still
>pursuing the chimera of converting SGML<-->Word in a round trip?
++++++++++++++++++++++++++++++++++++++++++++++++++++++++
FM+SGML V5.5.6's XML import/export capabilities are a side issue. But I
contend that FM+SGML is not an XML-aware editor, and thus cannot cannot
create an XML-conforming document instance that can be exported as XML in
the manner used to export SGML. Instead, the FM+SGML V5.5.6 XML export
capability uses the same methodology employed to export HTML from an
unstructured doc, namely, paragraph mapping.
++++++++++++++++++++++++++++++++++++++++++++++++++++
The subject of this thread, originated by wendy_ling@uk.ibm.com, was whether
there was a way to convert Word docs to structured FM+SGML docs.

SEMA's RTF-DOC DTD and rtf2rdc filter can do that.
SEMA also has an rdc2rtf filter that converts RTF-DOC-conforming structured
docs back to RTF. I then waxed lyrically that such a round-trip capability
could solve a problem that's constantly coming up in postings on the two
Framers lists, namely the unreliability of document conversions between
FrameMaker and Word. 

Here, for example is the EDD definition for the RTF-DOC element named PARA:

Element: PARA
General Rule: (<TEXT> | CS | FOOTNOTE | BKMK-START | BKMK-END | FIELD-START
| FIELD-END | TAB | CR | SOFT-CR | PAGE | SOFT-PAGE)*
Attribute list:
1. Name: STYLE	String	
2. Name: STYLE-NBR Integer
3. Name: DFLT-PS Integer	
4. Name: ALIGN	String	
5. Name: LI	String	
6. Name: RI	String	
7. Name: FI	String	
8. Name: SA	String	
9. Name: SB	String	
10.Name: SL	String	
11.Name: SM	String	
12.Name: LEVEL	Integer	
13.Name: TABTYPE Strings 
14.Name: TABLEAD Strings 
15.Name: TABPOS	Integers 

The PARA element's 15 attributes all have RTF equivalents.
The BKMK-START, BKMK-END, FIELD-START, FIELD-END, TAB, CR, SOFT-CR, PAGE,
and SOFT-PAGE elements in the general rule are all EMPTY elements that have
RTF equivalents (The BKMK-START/END and FIELD-START/END elements have NAME
attributes, and the FIELD-START element also has an INSTRUCTION attribute
for "field computing").

The CS element is a text range character style container, whose 16
attributes have RTF equivalents that define a character format, as follows:
Attribute list
1. Name: STYLE	String	
2. Name: STYLE-NBR	Integer	
3. Name: DFLT-CS	Integer	
4. Name: DELETED	Integer	
5. Name: REVISED	Integer	
6. Name: INVISIBLE	Integer	
7. Name: FONT	Integer	
8. Name: SIZE	Integer	
9. Name: BOLD	Integer	
10.Name: ITALIC	Integer	
11.Name: OUTLINE Integer	
12.Name: SHADOW	Integer	
13.Name: CAPS	String	
14.Name: SCRIPT	String	
15.Name: STRIKE	Integer	
16.Name: UNDERLINE String

The DTD/EDD also has:

1. A FONTDEF container whose general rule is FONT* for defining (in
attributes) each font.
2. A STYLE-SHEET container whose general rule is (CHAR-STYLE | STYLE)* for
defining (in attributes) each character and paragraph style
3. A DOC-HEAD container whose general rule is (TITLE | AUTHOR | OPERATOR |
CREATED | REVISED | VERSION)* for the document header.
4. A SECTION container whose general rule is SECTION-HEAD?, SECTION-BODY
5. A HEADER container (a child of SECTION-HEAD) for subsections under a section
6. TABLE and PICTURE elements for tables and figures respectively. These
elements and their children contain the RTF formatting data in attributes
7. A FOOTER element for running footers that has an attribute specifying
whether it is used on first, left, or right pages.

As you can see, the DTD/EDD is quite simple, and is capable of being used to
produce either SGML or XML document instances. All of the original RTF
formatting information is preserved in attributes and EMPTY elements. The
rtf2rdc filter recognizes what type of document object each RTF statement is
describing, wraps the document object contents (if any) in the corresponding
RTF-DOC element, and converts the RTF formatting information for that object
to element attribute values.

Numbered or bulleted lists, for example, are created by specifying the
appropriate paragraph style in the STYLE attribute of the PARA element.

If a document instance were originated in FM+SGML using the RTF-DOC EDD, it
ought to be possible to use the FDK to develop an API client that would, on
export to SGML, insert all (or most) of the format-rule-specified formatting
properties into the applicable attributes of each instance of each element,
so that the formatting specified in the EDD would be preserved in the
exported document instance. Then, using the SEMA rdc2rtf filter, the
exported instance could be converted to RTF so it can be opened as a
faithfully reproduced, error-free, unstructured document in Word,
FrameMaker, or any other DTP that imports RTF.
++++++++++++++++++++++++++++++++++++++++++++++
Finally, I want to make the following rebuttal to Marcus's and Hedley's
comments.

1. The purpose of electronic document interchange is not just to read them,
it is also for the purpose of facilitating the creation, reviewing, and
changing of those documents within Work Groups that may be widely dispersed
both geographically and departmentally.

2. In many large organizations, particularly in the US, Word has become the
de-facto tool for authoring, reviewing, and changing mission-critical
documents that use engineering source data also created in Word. The process
of creating such documents occurs in a Work Group environment that includes
many people outside of the publications group itself, and requires that the
source data and documents be frequently exchanged amongst the Work Group
members. 

3. The old method of distributing printed copies of such documents to Work
Group members for reviw by marking up their individual copies and returning
them to the publications group for consolidation is being replaced by a more
direct approach. This approach allows reviewers to insert their
corrections/changes/additions in a review copy of the electronic document
itself. Although this too is a messy process, it can be successfylly managed
if reviewers adhere to a few simple rules, in which case it can become much
more efficient than the old way. Although it can be argued that documents
can be submitted for review in PDF format, reviewers would prefer to make
their changes directly in the text of the actual document, rather than in
sticky notes, which often do not suffice. 

4. If, in an organization such as that described in items 2 and 3 above, a
publications group uses FrameMaker and the rest of the Work Group members
all use Word, there is a serious disconnect in the electronic interchange of
information. First, source data, created by other Work Group members in
Word, must be converted to FrameMaker before it can be used, and this
conversion often produces mis-translation, not only of formats, but also of
the information itself. Second, at each review point, the Frame documents
must be converted back to Word for distribution to Work Group members,
producing more translation errors. A publications group that justifies the
use of FrameMaker rather than Word on the basis of Frame's superior
capabilities, is, as far as management is concerned, blowing smoke. To
management, the error-free electronic interchange of documents throughout
the enterprise is the first prerequisite. All other considerations must give
way to that vital requirement.

5. Unless it is resolved, the situation described in item 4 above will not
only begin to erode the existing installed base of FrameMaker licenses, it
will also diminish the chances that Adobe can add new license holders to the
installed base. The solution is unlikely to come from Adobe, thus it will
have to come from third-party software developers.

6. If the conclusion in item 5 above describes what may happen to FrameMaker
in the US, it doesn't matter what's happening in the rest of the world. If
Adobe can't increase the installed base in the US, Adobe's support for the
product will sag, putting its future in jeopardy.

7. It is an unavoidable and permanent fact of life that all document
conversions are problematic. That includes conversions to HTML and PDF as
well as Word. Anyone who monitors postings to the two Framers lists can
attest that conversions from FrameMaker to PDF are fraught with peril, even
though both products are made by the same company. This fact of life will
ultimately convince most people that the only way to avoid such conversions
is to author in SGML/XML. In the meantime, patchwork solutions are needed.

8. What I was proposing, in my original post to this thread, was that the
RTF-DOC DTD, combined with the round-trip filters from SEMA, might offer a
solution for publications groups confronted with the dilemma described in
item 4 above. Although SGML/XML offers the ultimate solution for the
electronic interchange of information, I was suggesting something much more
limited than that. Namely: Replace FrameMaker with FM+SGML and the RTF-DOC
DTD/EDD. This is not really a structured document solution. It's simply a
solution that requires a structured document approach in order to carry out
error-free round-trip conversions between FrameMaker and Word.

   





     ====================
     | Nullius in Verba |
     ====================
Dan Emory, Dan Emory & Associates
FrameMaker/FrameMaker+SGML Document Design & Database Publishing
Voice/Fax: 949-722-8971 E-Mail: danemory@primenet.com
10044 Adams Ave. #208, Huntington Beach, CA 92646
---Subscribe to the "Free Framers" list by sending a message to
   majordomo@omsys.com with "subscribe framers" (no quotes) in the body.


** To unsubscribe, send a message to majordomo@omsys.com **
** with "unsubscribe framers" (no quotes) in the body.   **