[Date Prev][Date Next]
[Thread Prev][Thread Next]
[Date Index]
[Thread Index]
[New search]
To: John Root <jroot@xxxxxxxxxxxx>, framers@xxxxxxxxxxxxxx (Framers List)
Subject: Re: Word to Structured Frame
From: DW Emory <danemory@xxxxxxxxxxxxxxxxxx>
Date: Sat, 15 Mar 2003 14:04:15 -0800
Cc: "FrameSGML List" <FrameSGML@xxxxxxxxxxx>, "Free Framers" <framers@xxxxxxxxx>
In-Reply-To: <LISTMANAGER-118537-38-2003.03.15-11.04.01--danemory#globalcrossing.net@lists.FrameUsers.com>
References: <LISTMANAGER-80659-17-2003.03.15-07.33.29--jroot#publisys.com@lists.FrameUsers.com>
Sender: owner-framers@xxxxxxxxx
Based on my experience with this kind of unstructured-to-structured process, I'd suggest an approach that is radically different and more complex than the simple process described below by John Root at the end of this post. The reason for my recommendations is that the likelihood of the tag structure in any set of Word documents matching your EDD structure is very low, and the likelihood that a set of Word documents has the required consistency of tagging is even lower. Actually, the procedure described below is just as applicable to converting a set of unstructured Frame documents to EDD-conformant structured ones. 1. Open the unstructured documents you want to convert in FrameMaker. Analyze them to determine whether they're consistently tagged, and are not full of those hateful format overrides. Also, determine whether formatted text strings within paragraphs are consistently formatted using character format tags, or (more likely) whether they have those hateful format overrides applied to them. 2. Identify the key structural tags in the documents to be converted, and, to the extent possible, match these structural tags to the corresponding elements in your EDD. In particular, look for the following structural elements in the unstructured documents: A. Tags in the unstructured documents for Titles and section headings which correspond to titles and section heading elements in your EDD. B. Tags in the unstructured documents for various types of Lists (ordered, unordered) which correspond to matching elements in your EDD. C. Tags in the unstructured documents for various types of ordinary paragraphs which appear under section headings, as well as under items within lists, etc., and their correspondence with matching elements in your EDD. D. Tags in the unstructured documents for formatted text ranges within section headings, items, ordinary paragraphs, etc., and their correspondence with matching text range elements in your EDD. E. Tags within tables in the unstructured documents (e.g., paragraph tags for table titles, heading rows, footing rows, and body rows), and their correspondence with matching elements in your EDD. F. Graphics in the unstructured documents, as well as graphic titles, which correspond to matching elements in your EDD. G. Any other structures (e.g., footnotes, notes, cautions and warnings) in the unstructured documents which correspond to matching elements in your EDD. 2. Upon completion of step 1, you will probably find that the extant tagging of the unstructured documents is inconsistent, and/or fails to sufficiently distinguish between elements in various contexts. In particular, you are likely to find many instances where a one-to-many relationships exist between a given tag in the unstructured document and the corresponding EDD elements applicable to different contexts in which that tag appears. Each such one-to-many relationship you find will, in most cases, result in an inability to properly structure the document using structure rules tables. You may also find structures within unstructured documents which do not correlate to anything in your EDD, in which case you may have to either modify your EDD or delete those aberrant portions of the unstructured documents. 3. If your EDD has any degree of complexity, and particularly if your EDD structure has numerous high-level "parent" or "anchor elements that do not correlate to anything in the unstructured documents, then it will become apparent that you cannot produce an EDD-conformant structured document in a single conversion pass. 4. If, as is likely, you discover that your unstructured documents have many of the the problems predicted in steps 2 and 3, it will become evident to you that: A. You must, after importing the Word documents into FrameMaker massively re-tag them to minimize the kinds of problems described in steps 2 and 3. B. It will not be possible, using structure rules tables, to produce, in a single pass, a structured document that comes even close to conforming with your "real" EDD. 5. If reality bites you in step 4. then you will be forced to conclude that: A. You must create a "conversion" EDD which is much simpler than your "real" EDD. As much as possible, however, the element names in the "conversion" EDD should use the same element names in your "real" EDD, and, if possible, the same structure rules as well. The "conversion" EDD should (usually) not contain high-level "parent" or "anchor" elements which cannot be correlated to tags in the unstructured documents. B. Next, you must create structure rules tables for that "conversion" EDD. Be sure you carefully read and fully understand the appendix on structure rules tables in the Developer's Guide. The development of effective structure rules tables for complex EDDs is rocket science. C. Next, before converting the documents, you must open them in FrameMaker and re-tag them so that the tags therein fully correspond to the tags defined in the structure rules tables. Obviously, the most effective way to do this is first to create the structure-rules-table-defined paragraph, text range, and other tags in a FrameMaker template, and then import those tags into each Frame document to be converted. D. Next, using the structure rules tables produced in step 5C, you convert each unstructured document to a structured document which (hopefully) will conform to the "conversion" EDD. Upon completion of the conversion, you import the "conversion" EDD's element definitions into the converted document. You will then observe that the structure view will almost assuredly reveal numerous red elements which violate the "conversion" EDD's structure rules, and that there are also instances where the wrong conversion occurred. You should be able to eliminate many of these problems by analyzing their cause and then refining the re-tagging in step 5C above, and/or by refining the"conversion" EDD in step 5A above, and/or by refining the structure rules tables in step 5B above. In other words, successful conversion is typically an iterative process in which you progressively refine the re-tagging of the unstructured documents, refine the "conversion" EDD, and refine the structure rules tables. 6. After the iterative process described in step 5 is completed, you will still not have a document which fully conforms to your "real" EDD. So here's what you do next: A. Import all applicable EDD-defined format tags from the master template for your "real" EDD into the structured document produced in step 5. Then, import the element definitions from that same template into that same structured document with Remove Format Overrides turned on. At the completion of this process, the structured document should be in accordance with the structure and format rules in your "real" EDD. B. Validate the structured document produced in step 5A. You will likely find many violations of the "real" EDD's structure, including,: (1) required attributes which have no value; (2) elements having default attribute values which must be changed. (3) elements (or a contiguous set of elements) which are missing a parent element that was not defined in the "conversion" EDD, in which case you must wrap these elements in the correct parent element; (4) elements whose names names must be changed to EDD-conformant names; (5) elements that do not have EDD-required children; and (6) other anomalies. Carefully analyze all of these problems. 7. If the problems found in step 6 are too burdensome to perform on each unstructured document to be converted, you may have luck by re-analyzing and refining substeps A thru D under step 5, and then repeating all of the substeps under step 6--another iterative process. The degree to which these iterative processes are justified will (usually) be determined by the volume of documents to be converted and whether the amount of required manual fix-up work is too burdensome or difficult. As you can see from the foregoing, this process ain't simple, and, in general, it cannot be justified from a cost standpoint unless: (1) the unstructured documents have spectacularly consistent tagging that clearly differentiates between different element contexts (i.e., few, if any, one-to-many relationships); or (2) there is a very substantial number of pages to be converted; or (3) you gotta do it, no matter what the cost, and you've warned your management of what the likely cost will be. At 10:03 AM 3/15/03 -0800, John Root wrote: >David, > >The basic process is to set up a conversion table which maps the styles in >the Word document to elements in your structured template. Depending on the >complexity of the Word document (and your EDD) this can be rather straight >forward or somewhat tedious. See Appendix A in the 'Structure Application >Developer's Guide' for info on how to set up a conversion table in Frame. > >Once you have a conversion table that is working correctly: > >Import the Word document (copy) into an empty document that contains the >structure information from your EDD. > >Select File:Utilities:Structure Current Document and choose your conversion >table document from the drop down list. Note: Conversion table document must >be open. > >If the conversion table is set up correctly the content from Word should now >be structured. Copy or import this structured content into an existing >document or build a new document around it. FrameMaker/FrameMaker+SGML Document Design & Database Publishing DW Emory <danemory@globalcrossing.net> ** To unsubscribe, send a message to majordomo@omsys.com ** ** with "unsubscribe framers" (no quotes) in the body. **