[Date Prev][Date Next] [Thread Prev][Thread Next]
[Date Index] [Thread Index] [New search]

Re: Word to Structured Frame



Based on my experience with this kind of unstructured-to-structured 
process, I'd suggest an approach that is radically different and more 
complex than the simple process described below by John Root at the end of 
this post. The reason for my recommendations is that the likelihood of the 
tag structure in any set of Word documents matching your EDD structure is 
very low, and the likelihood that a set of Word documents has the required 
consistency of tagging is even lower. Actually, the procedure described 
below is just as applicable to converting a set of unstructured Frame 
documents to EDD-conformant structured ones.

1. Open the unstructured documents you want to convert in FrameMaker. 
Analyze them to determine whether they're consistently tagged, and are not 
full of those hateful format overrides. Also, determine whether formatted 
text strings within paragraphs are consistently formatted using character 
format tags, or (more likely) whether they have those hateful format 
overrides applied to them.

2. Identify the key structural tags in the documents to be converted, and, 
to the extent possible, match these structural tags to the corresponding 
elements in your EDD. In particular, look for the following structural 
elements in the unstructured documents:

A. Tags in the unstructured documents for Titles and section headings which 
correspond to titles and section heading elements in your EDD.

B. Tags in the unstructured documents for various types of Lists (ordered, 
unordered) which correspond to matching elements in your EDD.

C.  Tags in the unstructured documents for various types of ordinary 
paragraphs which appear under section headings, as well as under items 
within lists, etc., and their correspondence with matching elements in your 
EDD.

D. Tags in the unstructured documents for formatted text ranges within 
section headings, items, ordinary paragraphs, etc., and their 
correspondence with matching text range elements in your EDD.

E. Tags within tables in the unstructured documents (e.g., paragraph tags 
for table titles, heading rows, footing rows, and body rows), and their 
correspondence with matching elements in your EDD.

F. Graphics in the unstructured documents, as well as graphic titles, which 
correspond to matching elements in your EDD.

G. Any other structures (e.g., footnotes, notes, cautions and warnings) in 
the unstructured documents which correspond to matching elements in your EDD.

2. Upon completion of step 1, you will probably find that the extant 
tagging of the unstructured documents is inconsistent, and/or fails to 
sufficiently distinguish between elements in various contexts. In 
particular, you are likely to find many instances where a one-to-many 
relationships exist between a given tag in the unstructured document and 
the corresponding EDD elements applicable to different contexts in which 
that tag appears. Each such one-to-many relationship you find will, in most 
cases, result in an inability to properly structure the document using 
structure rules tables. You may also find structures within unstructured 
documents which do not correlate to anything in your EDD, in which case you 
may have to either modify your EDD or delete those aberrant portions of the 
unstructured documents.

3. If your EDD has any degree of complexity, and particularly if your EDD 
structure has numerous high-level "parent" or "anchor elements that do not 
correlate to anything in the unstructured documents, then it will become 
apparent that you cannot produce an EDD-conformant structured document in a 
single conversion pass.

4. If, as is likely, you discover that your unstructured documents have 
many of the the problems predicted in steps 2 and 3, it will become evident 
to you that:

A. You must, after importing the Word documents into FrameMaker massively 
re-tag them to minimize the kinds of problems described in steps 2 and 3.

B. It will not be possible, using structure rules tables, to produce, in a 
single pass, a structured document that comes even close to conforming with 
your "real" EDD.

5. If reality bites you in step 4. then you will be forced to conclude that:

A. You must create a "conversion" EDD which is much simpler than your 
"real" EDD. As much as possible, however, the element names in the 
"conversion" EDD should use the same element names in your "real" EDD, and, 
if possible, the same structure rules as well. The "conversion" EDD should 
(usually) not contain high-level "parent" or "anchor" elements which cannot 
be correlated to tags in the unstructured documents.

B. Next, you must create structure rules tables for that "conversion" EDD. 
Be sure you carefully read and fully understand the appendix on structure 
rules tables in the Developer's Guide. The development of effective 
structure rules tables for complex EDDs is rocket science.

C. Next, before converting the documents, you must open them in FrameMaker 
and re-tag them so that the tags therein fully correspond to the tags 
defined in the structure rules tables. Obviously, the most effective way to 
do this is first to create the structure-rules-table-defined paragraph, 
text range, and other tags in a FrameMaker template, and then import those 
tags into each Frame document to be converted.

D. Next, using the structure rules tables produced in step 5C, you convert 
each unstructured document to a structured document which (hopefully) will 
conform to the "conversion" EDD. Upon completion of the conversion, you 
import the "conversion" EDD's element definitions into the converted 
document.  You will then observe that the structure view will almost 
assuredly reveal numerous red elements which violate the "conversion" EDD's 
structure rules, and that there are also instances where the wrong 
conversion occurred. You should be able to eliminate many of these problems 
by analyzing their cause and then refining the re-tagging in step 5C above, 
and/or by refining the"conversion" EDD in step 5A above, and/or by refining 
the structure rules tables in step 5B above. In other words, successful 
conversion is typically an iterative process in which you progressively 
refine the re-tagging of the unstructured documents, refine the 
"conversion" EDD, and refine the structure rules tables.

6. After the iterative process described in step 5 is completed, you will 
still not have a document which fully conforms to your "real" EDD. So 
here's what you do next:

A. Import all applicable EDD-defined format tags from the master template 
for your "real" EDD into the structured document produced in step 5. Then, 
import the element definitions from that same template into that same 
structured document with Remove Format Overrides turned on. At the 
completion of this process, the structured document should be in accordance 
with the structure and format rules in your "real" EDD.

B. Validate the structured document produced in step 5A. You will likely 
find many violations of the "real" EDD's structure, including,: (1) 
required attributes which have no value; (2) elements having default 
attribute values which must be changed. (3) elements (or a contiguous set 
of elements) which are missing a parent element that was not defined in the 
"conversion" EDD, in which case you must wrap these elements in the correct 
parent element; (4) elements whose names names must be changed to 
EDD-conformant names; (5) elements that do not have EDD-required children; 
and (6) other anomalies. Carefully analyze all of these problems.

7. If the problems found in step 6 are too burdensome to perform on each 
unstructured document to be converted, you may have luck by re-analyzing 
and refining substeps A thru D under step 5, and then repeating all of the 
substeps under step 6--another iterative process. The degree to which these 
iterative processes are justified will (usually) be determined by the 
volume of documents to be converted and whether the amount of required 
manual fix-up work is too burdensome or difficult.

As you can see from the foregoing, this process ain't simple, and, in 
general, it cannot be justified from a cost standpoint unless: (1) the 
unstructured documents have spectacularly consistent tagging that clearly 
differentiates between different element contexts (i.e., few, if any, 
one-to-many relationships); or (2) there is a very substantial number of 
pages to be converted; or (3) you gotta do it, no matter what the cost, and 
you've warned your management of what the likely cost will be.

At 10:03 AM 3/15/03 -0800, John Root wrote:
>David,
>
>The basic process is to set up a conversion table which maps the styles in
>the Word document to elements in your structured template. Depending on the
>complexity of the Word document (and your EDD) this can be rather straight
>forward or somewhat tedious. See Appendix A in the 'Structure Application
>Developer's Guide' for info on how to set up a conversion table in Frame.
>
>Once you have a conversion table that is working correctly:
>
>Import the Word document (copy) into an empty document that contains the
>structure information from your EDD.
>
>Select File:Utilities:Structure Current Document and choose your conversion
>table document from the drop down list. Note: Conversion table document must
>be open.
>
>If the conversion table is set up correctly the content from Word should now
>be structured. Copy or import this structured content into an existing
>document or build a new document around it.

FrameMaker/FrameMaker+SGML Document Design & Database Publishing
DW Emory <danemory@globalcrossing.net>


** To unsubscribe, send a message to majordomo@omsys.com **
** with "unsubscribe framers" (no quotes) in the body.   **