[Date Prev][Date Next] [Thread Prev][Thread Next]
[Date Index] [Thread Index] [New search]

Re: Structured Document Design for XML or SGML



A QUIZ FOR SGML PURISTS

1. Define "content" as it applies to element naming conventions. Define it 
in such a way that a clear and unambiguous distinction can be made between 
a name that conveys content and one that describes a document object such 
as a paragraph, a text range within a paragraph, a graphic, a table, or a 
list.
Define it also in such a way that a clear and unambiguous distinction can 
be made between a name that conveys nothing but content and one that 
conveys format. Your description shall not include any cant, SGML purist 
insider jargon, or other escape mechanisms that seek to avoid the many 
contradictions involved in making a workable DTD that adheres to the SGML 
purist rule that content must be separated from format.

2. An element named Para has many parents, but context alone cannot 
determine how the element should be formatted. If defining the formatting 
parameters in attributes is forbidden, how do you solve this problem so 
that a style sheet of some sort can produce the correct formatting? If your 
answer is to use Processing Instructions, explain how that is a better 
solution than using attributes for the same purpose.

3. If the name Para is forbidden because it describes a document object 
rather than content, would you change its name to P because that name is 
"formatting neutral", even though everyone who uses the DTD is supposed to 
know that P means Paragraph? If that would be your solution, what content 
information does the name P convey which makes it superior to Para?

4. A List element has the content model (Item, item+). It is used to 
produce four types of lists:bulleted, arabid-numbered, alpha-numbered, and 
indented text with no prefix. An attribute name "Type" has a name token 
group with the permitted values 1, 2, 3, and 4, where each numeric value 
specifies one of the four list types. In order for authors to properly 
create such lists, each value must be permanently associated with a 
particular type. And any style sheet must format the lists according to the 
attribute-specified type. Replacement of the numeric values with names such 
as bulleted, arabic, alpha, and indented would  eliminate the need for 
authors to memorize the meaning of each numeric value. Would you refuse to 
make such a change on the grounds that it would introduce forbidden 
formatting information into the DTD? If so, please explain why the numeric 
values are nothing more than a figleaf to conceal the fact that the Type 
attribute is specifying formatting information, no matter what dinds of 
values are used.

5. Suppose that the content model of the Item element in the List element 
above is:
  (PCDATA, List?)
which would optionally allow another List to be nested under an item. But 
suppose further that such nesting is not allowed under the Indented text 
list type, and that, for the bulleted list type, only nested lists of type 
Bulleted are allowed to be nested. To make this possible, the content Model 
for the List element would have to be changed to (Bulleted | Arabic | Alpha 
| Indented) so that there could be a separate content model for each list 
type. Also, the Type attribute would be removed from the List element,. 
since it is no longer needed. Now, with this change, the content model for 
the Bulleted element would be:
((Item, Bulleted?), (Item, Bulleted?)+
whereas the content model for the Arabic element would be:
((Item, (Numbered | Bulleted | Alpha), (Item, (Numbered | Bulleted | Alpha)+)
Now, you have element names (Numbered, Apha, Bulleted, Indented) which are 
clearly conveying formatting information. What would you do? Would you 
change the element names for these list types to Type1, Type2, Type3, Type 
4 so as to once again conceal with a figleaf the fact that these elements 
are describing the forbidden formatting information?

6. Until the CALS table model came along, there was no viable way to 
describe how to build and display an SGML table. This apparently was 
because SGML purists could not bear the thought that any workable solution 
would inevitably introduce formatting into the DTD. The element names in 
the CALS table model describe document objects, not content, and most of 
the attributes for each element in the model describe how to format the 
table. The acceptance of the CALS table model is almost universal. How do 
you explain this exception, and what makes it different from from many 
needed exceptions which you reject?

7. The requirements specification for developing a DTD identifiies certain 
situations where four equally important facets (A, B, C, and D) of content 
are present, which can appear singly or in any combination. Thus the 
following facet combinations can occur: A, B, C, D, AB, AC, AD, ABC, ABD, 
ABCD. Would you create and name an element for each possible combination, 
or would you create a single element with attributes to describe each 
facet, where the default for each attribute is no value, or would you do 
something else?

8.To further elaborate on my statement arguing the need for multiple facets 
to describe information content, consider the new Resource Description 
Framework (RDF) in the XML standard, whose purpose is (among others) to 
facilitate database search and retrieval of information. RDF description 
patterns are applicable
to individual nodes or elements within documents as well as whole 
documents. Each RDF includes a Universal Resource Identifier that uniquely 
specifies
where the resource is located (e.g., within a database, a file, or an 
element whose ID attribute specifies an absolute or relative Xpointer 
location term.

RDFs can be created independently, or they can be embedded in the structure 
of the document, or both. There is no reason that I can think of why this could
not be incorporated into SGML documents as well as in XML ones.  It is 
possible to define many different description patterns, some more elaborate 
than others. If RDF offers a much better and more comprehensive way to 
describe information content at any level of structure, do you believe it 
might moderate the SGML purists' insistence that element names must always 
describe content? If not, why not?

9. The SGML purist's' claim is that "hardcoding formatting attribute values 
into the data is wrong and that the application should be responsible for 
rendering it so that the data can be used with different media. But XML 
defines a new style sheet standard, XSL. Using middleware, it should be 
possible to extract XML data from a database, and build a customized style 
sheet on the fly to fit the requirements of the user (human or non-human) 
who initiated the database query. If style sheets become dynamically 
generated doesn't that make the purists' concern irrelevant? Why not 
hardcode the formatting for the most demanding formatting requirement 
(e.g., high-quality printed books), and let the middleware either ignore 
formatting attributes or modify how they are used, depending on the media 
and the end user?

10. Why do most of the commonly used DTDs (J2008, ISO 12083, Docbook, 
MIL-M38784, HTML, aand even the ATA DTD ) violate with wild abandon the 
SGML pusits' view of how a DTD should be built? Is it because the people 
who developed them just don't get it right, or is it because, 
pragmatically, the reductionistic viewpoint of the purists is simply 
impractical in the real world?


====================
| Nullius in Verba |
====================
Dan Emory, Dan Emory & Associates
FrameMaker/FrameMaker+SGML Document Design & Database Publishing
Voice/Fax: 949-722-8971 E-Mail: danemory@primenet.com
10044 Adams Ave. #208, Huntington Beach, CA 92646
---Subscribe to the "Free Framers" list by sending a message to
majordomo@omsys.com with "subscribe framers" (no quotes) in the body.



** To unsubscribe, send a message to majordomo@omsys.com **
** with "unsubscribe framers" (no quotes) in the body.   **