[Date Prev][Date Next]
[Thread Prev][Thread Next]
[Date Index]
[Thread Index]
[New search]
To: Marcus Carr <mrc@xxxxxxxxxxxxxx>, Hedley_S_Finger@xxxxxxxxxxxxxxxxx
Subject: Re: Conversion of Word documents to structured frame documents
From: Dan Emory <danemory@xxxxxxxxxxxx>
Date: Tue, 6 Apr 1999 03:12:35 -0700 (MST)
Sender: owner-framers@xxxxxxxxx
At 12:24 PM 4/6/99 +1000, Marcus Carr wrote: >This is my third attempt at responding to this mail - continuous lines of equal >signs wreaks havoc on my mail app (NS 4.08). ++++++++++++++++++++++++++++++++++++++++++++++++++++ I apologize for using equal signs. Maybe plus signs are better. In any event, what this demonstrates is the kind of problem that gets in the way of error-free electronic information exchange, which was the subject I was trying to address in my earlier posts +++++++++++++++++++++++++++++++++++++++++++++++++++++ At 03:30 PM 4/6/99 +1000, Hedley_S_Finger@allegiance.com.au wrote: >If you have created a structured document in FM+SGML, then you can export a >DTD that maps the EDD. You can use an XML-aware editor to create a >conforming instance document that FM+SGML can import and map the EDD >element definitions and format rules to. I do not see why you need to >create a para/char tag for each element in the EDD -- or are we still >pursuing the chimera of converting SGML<-->Word in a round trip? ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ FM+SGML V5.5.6's XML import/export capabilities are a side issue. But I contend that FM+SGML is not an XML-aware editor, and thus cannot cannot create an XML-conforming document instance that can be exported as XML in the manner used to export SGML. Instead, the FM+SGML V5.5.6 XML export capability uses the same methodology employed to export HTML from an unstructured doc, namely, paragraph mapping. ++++++++++++++++++++++++++++++++++++++++++++++++++++ The subject of this thread, originated by wendy_ling@uk.ibm.com, was whether there was a way to convert Word docs to structured FM+SGML docs. SEMA's RTF-DOC DTD and rtf2rdc filter can do that. SEMA also has an rdc2rtf filter that converts RTF-DOC-conforming structured docs back to RTF. I then waxed lyrically that such a round-trip capability could solve a problem that's constantly coming up in postings on the two Framers lists, namely the unreliability of document conversions between FrameMaker and Word. Here, for example is the EDD definition for the RTF-DOC element named PARA: Element: PARA General Rule: (<TEXT> | CS | FOOTNOTE | BKMK-START | BKMK-END | FIELD-START | FIELD-END | TAB | CR | SOFT-CR | PAGE | SOFT-PAGE)* Attribute list: 1. Name: STYLE String 2. Name: STYLE-NBR Integer 3. Name: DFLT-PS Integer 4. Name: ALIGN String 5. Name: LI String 6. Name: RI String 7. Name: FI String 8. Name: SA String 9. Name: SB String 10.Name: SL String 11.Name: SM String 12.Name: LEVEL Integer 13.Name: TABTYPE Strings 14.Name: TABLEAD Strings 15.Name: TABPOS Integers The PARA element's 15 attributes all have RTF equivalents. The BKMK-START, BKMK-END, FIELD-START, FIELD-END, TAB, CR, SOFT-CR, PAGE, and SOFT-PAGE elements in the general rule are all EMPTY elements that have RTF equivalents (The BKMK-START/END and FIELD-START/END elements have NAME attributes, and the FIELD-START element also has an INSTRUCTION attribute for "field computing"). The CS element is a text range character style container, whose 16 attributes have RTF equivalents that define a character format, as follows: Attribute list 1. Name: STYLE String 2. Name: STYLE-NBR Integer 3. Name: DFLT-CS Integer 4. Name: DELETED Integer 5. Name: REVISED Integer 6. Name: INVISIBLE Integer 7. Name: FONT Integer 8. Name: SIZE Integer 9. Name: BOLD Integer 10.Name: ITALIC Integer 11.Name: OUTLINE Integer 12.Name: SHADOW Integer 13.Name: CAPS String 14.Name: SCRIPT String 15.Name: STRIKE Integer 16.Name: UNDERLINE String The DTD/EDD also has: 1. A FONTDEF container whose general rule is FONT* for defining (in attributes) each font. 2. A STYLE-SHEET container whose general rule is (CHAR-STYLE | STYLE)* for defining (in attributes) each character and paragraph style 3. A DOC-HEAD container whose general rule is (TITLE | AUTHOR | OPERATOR | CREATED | REVISED | VERSION)* for the document header. 4. A SECTION container whose general rule is SECTION-HEAD?, SECTION-BODY 5. A HEADER container (a child of SECTION-HEAD) for subsections under a section 6. TABLE and PICTURE elements for tables and figures respectively. These elements and their children contain the RTF formatting data in attributes 7. A FOOTER element for running footers that has an attribute specifying whether it is used on first, left, or right pages. As you can see, the DTD/EDD is quite simple, and is capable of being used to produce either SGML or XML document instances. All of the original RTF formatting information is preserved in attributes and EMPTY elements. The rtf2rdc filter recognizes what type of document object each RTF statement is describing, wraps the document object contents (if any) in the corresponding RTF-DOC element, and converts the RTF formatting information for that object to element attribute values. Numbered or bulleted lists, for example, are created by specifying the appropriate paragraph style in the STYLE attribute of the PARA element. If a document instance were originated in FM+SGML using the RTF-DOC EDD, it ought to be possible to use the FDK to develop an API client that would, on export to SGML, insert all (or most) of the format-rule-specified formatting properties into the applicable attributes of each instance of each element, so that the formatting specified in the EDD would be preserved in the exported document instance. Then, using the SEMA rdc2rtf filter, the exported instance could be converted to RTF so it can be opened as a faithfully reproduced, error-free, unstructured document in Word, FrameMaker, or any other DTP that imports RTF. ++++++++++++++++++++++++++++++++++++++++++++++ Finally, I want to make the following rebuttal to Marcus's and Hedley's comments. 1. The purpose of electronic document interchange is not just to read them, it is also for the purpose of facilitating the creation, reviewing, and changing of those documents within Work Groups that may be widely dispersed both geographically and departmentally. 2. In many large organizations, particularly in the US, Word has become the de-facto tool for authoring, reviewing, and changing mission-critical documents that use engineering source data also created in Word. The process of creating such documents occurs in a Work Group environment that includes many people outside of the publications group itself, and requires that the source data and documents be frequently exchanged amongst the Work Group members. 3. The old method of distributing printed copies of such documents to Work Group members for reviw by marking up their individual copies and returning them to the publications group for consolidation is being replaced by a more direct approach. This approach allows reviewers to insert their corrections/changes/additions in a review copy of the electronic document itself. Although this too is a messy process, it can be successfylly managed if reviewers adhere to a few simple rules, in which case it can become much more efficient than the old way. Although it can be argued that documents can be submitted for review in PDF format, reviewers would prefer to make their changes directly in the text of the actual document, rather than in sticky notes, which often do not suffice. 4. If, in an organization such as that described in items 2 and 3 above, a publications group uses FrameMaker and the rest of the Work Group members all use Word, there is a serious disconnect in the electronic interchange of information. First, source data, created by other Work Group members in Word, must be converted to FrameMaker before it can be used, and this conversion often produces mis-translation, not only of formats, but also of the information itself. Second, at each review point, the Frame documents must be converted back to Word for distribution to Work Group members, producing more translation errors. A publications group that justifies the use of FrameMaker rather than Word on the basis of Frame's superior capabilities, is, as far as management is concerned, blowing smoke. To management, the error-free electronic interchange of documents throughout the enterprise is the first prerequisite. All other considerations must give way to that vital requirement. 5. Unless it is resolved, the situation described in item 4 above will not only begin to erode the existing installed base of FrameMaker licenses, it will also diminish the chances that Adobe can add new license holders to the installed base. The solution is unlikely to come from Adobe, thus it will have to come from third-party software developers. 6. If the conclusion in item 5 above describes what may happen to FrameMaker in the US, it doesn't matter what's happening in the rest of the world. If Adobe can't increase the installed base in the US, Adobe's support for the product will sag, putting its future in jeopardy. 7. It is an unavoidable and permanent fact of life that all document conversions are problematic. That includes conversions to HTML and PDF as well as Word. Anyone who monitors postings to the two Framers lists can attest that conversions from FrameMaker to PDF are fraught with peril, even though both products are made by the same company. This fact of life will ultimately convince most people that the only way to avoid such conversions is to author in SGML/XML. In the meantime, patchwork solutions are needed. 8. What I was proposing, in my original post to this thread, was that the RTF-DOC DTD, combined with the round-trip filters from SEMA, might offer a solution for publications groups confronted with the dilemma described in item 4 above. Although SGML/XML offers the ultimate solution for the electronic interchange of information, I was suggesting something much more limited than that. Namely: Replace FrameMaker with FM+SGML and the RTF-DOC DTD/EDD. This is not really a structured document solution. It's simply a solution that requires a structured document approach in order to carry out error-free round-trip conversions between FrameMaker and Word. ==================== | Nullius in Verba | ==================== Dan Emory, Dan Emory & Associates FrameMaker/FrameMaker+SGML Document Design & Database Publishing Voice/Fax: 949-722-8971 E-Mail: danemory@primenet.com 10044 Adams Ave. #208, Huntington Beach, CA 92646 ---Subscribe to the "Free Framers" list by sending a message to majordomo@omsys.com with "subscribe framers" (no quotes) in the body. ** To unsubscribe, send a message to majordomo@omsys.com ** ** with "unsubscribe framers" (no quotes) in the body. **