[Date Prev][Date Next]
[Thread Prev][Thread Next]
[Date Index]
[Thread Index]
[New search]
To: "FrameSGML List" <FrameSGML@xxxxxxxxxxx>, "Free Framers" <framers@xxxxxxxxx>, "TECHWR-L" <techwr-l@xxxxxxxxxxxxxxxxx>
Subject: Structured Document Design for XML or SGML
From: Dan Emory <danemory@xxxxxxxxxxxx>
Date: Thu, 18 May 2000 11:19:55 -0700
In-Reply-To: <862568E2.007F517D.00@LNMAIL01.IS.NWA.COM>
Sender: owner-framers@xxxxxxxxx
STRUCTURED AND UNSTRUCTURED DOCUMENTS ARE THE SAME AT THE "ATOMIC" LEVEL At this lowest level there are only document object types (e.g., paragraphs, text ranges within paragraphs, graphics, equations, tables, cross-references, markers) and sub-types (e.g., text paragraph, bulleted paragraph, numbered list paragraph, section head paragraph, figure caption, table caption, bolded or italicized text range, index marker), most of which, at least in FrameMaker, are represented by descriptive tags. Now, SGML purists would argue that, in a structured document, these "atomic" object types and sub-types must be assigned names that describe their content. Thus, if there are 25 content types, there would have to be 25 element names for text paragraphs, 25 names for figure captions, 25 names for bulleted paragraphs, and so on (and on and on and on, reductio ad absurdum). I contend that this is not only unnecessary but also self-defeating. Elements at the "atomic level should be given names that describe their objectness (i.e.,object type/subtype), which is distinctly different from formatting information. For example, Bullet_Item describes a paragraph of sub-type bulleted item. If there is a compelling need (unlikely) to describe the content type at this low level, then it should be done by assigning one or more attributes for that purpose. Incidentally one of the odd things about SGML purists is the way they cling to the idea that a single (usually cryptic) element name is sufficient to describe its content. Usually, content has many different facets. It makes more sense (to me at least) to provide attributes for this purpose. Not only does this approach to describing content make more sense, it also makes the DTD much simpler, and less vulneragle to the impact of evolving technologies and processes. STRUCTURED DOCUMENTS BEGIN TO DIFFER FROM UNSTRUCTURED ONES AT THE "MOLECULAR" LEVEL Here, groups of "atomic" elements are wrapped in containers. For example, a sequence of Bullet_Item elements would be wrapped in a BulletList container, the elements that compose a Figure (e.g., a Graphic element preceded or followed by a Figure_Caption element) would be wrapped in a Figure container, and so on. Although there are exceptions, most molecular-level container elements of the types I'm describing here are actually "super objects" that ought to also be given names that describe their objectness, not their content. If necessary at this level, attributes should be used to describe content. THE ADVANTAGES OF UNIVERSAL BUILDING BLOCKS The atomic and molecular elements described so far are the universal building blocks of any structured document, no matter what variations in content-oriented superstructure are imposed by different DTDs . Ideally, everyone would agree on definitions and naming convention for them so that this core element set could become common to all future DTDs. That would yield the following benefits: 1. The conversion of vast libraries of unstructured legacy documents to structured ones would be greatly simplified. The first step is always the conversion of tagged document objects to this core set of elements. In the next pass, the core elements are wrapped in the superstructure of content-named elements peculiar to each document type. 2. The conversion of structured documents (or portions thereof) from one DTD to another would also be greatly simplified. 3. A quantum jump in authoring productivity would result, because, at the atomic and molecular levels of structure, authors think in terms of document objects, not content. If the same element naming conventions for these atomic and molecular elements were used in all DTDs, the learning for becoming proficient with a new DTD could be substantially reduced. USING ATTRIBUTES TO SPECIFY FORMATTING Element context alone is usually not enough to define formatting. In my EDD/DTD designs, I use formatting attributes at all level of structure, and the combination of element context and attribute values determines the formatting. For example, formatting attributes for the ubiquitous Para element might include: ParaStyle Attribute Plain (default) Bold Italics Underlined Message (uses Courier font) TextSize Attribute Large (2 points larger than regular). Regular--the font size in the default paragraph format (default). Small (2 points smaller than regular). Width Attribute Across All Columns--text spans the sidehead and normal text columns. Normal--the text appears in the normal text column (default). Alignment Attribute Left (default) Centered Right The TblCellVertAlign Attribute - Para elements contained in a table cell have their vertical alignment within the cell specified, as follows: Top (default) Middle Bottom I know this approach gives SGML purists fits, but it allows the author to deploy a single element named Para in virtually any context where a text paragraph is needed. This approach, at least to me, makes more sense than using processing instructions or other obtuse techniques to specify formatting. FOR MORE INFORMATION I have a 23-page PDF paper on this subject that elaborates on the issues discussed here. To get it, go to: http://www.microtype.com When the page opens, click on Resources in the frame at the left. Scroll down to Links to Tutorials and articles. You'll find it, as well as 4 or 5 other papers that I've written. ==================== | Nullius in Verba | ==================== Dan Emory, Dan Emory & Associates FrameMaker/FrameMaker+SGML Document Design & Database Publishing Voice/Fax: 949-722-8971 E-Mail: danemory@primenet.com 10044 Adams Ave. #208, Huntington Beach, CA 92646 ---Subscribe to the "Free Framers" list by sending a message to majordomo@omsys.com with "subscribe framers" (no quotes) in the body. ** To unsubscribe, send a message to majordomo@omsys.com ** ** with "unsubscribe framers" (no quotes) in the body. **