[Date Prev][Date Next] [Thread Prev][Thread Next]
[Date Index] [Thread Index] [New search]

Re: [FrameSGML] Structured Document Design for XML or SGML (Long)



At 02:33 PM 5/22/00 -0400, you wrote:
>As for the brouhaha about formatting vs. content, I have to admit that
>I'm of two minds. My experiments with structured documents has been a
>learn-as-I-go thing as I try to add structure to certain reference
>material -- log reports, alarms, and CLI commands. In our documentation,
>all three of these have a fairly common format; I could rewrite them
>all using troff's manpage format. In this context, structure vs. format
>(IMO) boils down to something like this:
>
>    Strictly-structured
>    -------------------
>    <command>
>      <name>foobar</name>
>      <syntax>...</syntax>
>      <description>...</description>
>      <parameters>...</parameters>
>      <example>...</example>
>      <success_resp>...</success_resp>
>      <error_resp>...</error_resp>
>    <command>
>
>
>    Less-strict structuring
>    -----------------------
>    <command>
>      <name>foobar</name>
>      <subsection><name>Syntax</name>...</subsection>
>      <subsection><name>Description</name>...</subsection>
>      <subsection><name>Parameters</name>...</subsection>
>      <subsection><name>Example Command</name>...</subsection>
>      <subsection><name>Successful Response</name>...</subsection>
>      <subsection><name>Error Response</name>...</subsection>
>    </command>

=======================================
I would always use the strictly-structured one, provided the content model
makes some of the components optional (e.g., error_resp, success_resp).

Don't get me wrong. I'm wholeheartedly in favor of element names that
describe content as precisely as possible when it fits the situation.

What I would call the "molecular" structure you describe
above pertains to a particular kind of information type having "atomic" 
components
that can also be named to indicate their information sub-type.

Where I depart from the purists is that this is often not the case at the
molecular level. Instead, the "molecules" are simply wrappers for 
collections of "ordinary"
atoms (e.g., paragraphs, interspersed with lists, graphics, tables, etc.) 
that have no
classifiable  information type. However the document object type of each
such "atom" is always classifiable (e.g., Para, List, Item (within a list),
Graphic, figure caption, Table), and these "atoms" should
have names that describe there objectness, not their unclassifiable
information type.

So, some "molecules" (and perhaps their "atoms" should have names
that convey their information type, while other "molecules" and their
"atoms" should have names that describe their objectness.

It is at the level of superstructure above the molecular level where content
becomes important.


>In my mind, the strictly-structured version has the advantage of
>enforcing inclusion (and proper ordering) of all information --
>ensuring that the resulting document is both complete and consistent.
>A script could easily extract the name and syntax for automatically
>generating a quick reference.
>
>The not-so-structured document is easier to implement, and could
>apply to all three types of reference pages. A script could also
>pull the name and syntax, but would need a bit more intelligence.
>
>
>What Dan proposes in his "Information Design" paper, to me, seems
>to suggest using attributes to produce a hybrid structure like so:
>
>    <reference type="command">
>      <name>foobar</name>
>      <subsection type="syntax">...</subsection>
>      <subsection type="description">...</subsection>
>      <subsection type="parameters">...</subsection>
>      <subsection type="example">...</subsection>
>      <subsection type="success_resp">...</subsection>
>      <subsection type="error_resp">...</subsection>
>    </reference>
====================================================
No, that's not what I propose. What I say is use content-oriented names
when it fits, and use object-oriented (or even format-oriented) names when
that fits.

When the SGML standard was developed, the idea of separating content
from format was simply intended to make SGML docs independent of any
proprietary DTP or WP software used to produce/display/print them.

Certainly the SGML standard does not dictate any kind of element naming
conventions (other than length and permitted characters), nor does it limit
in any way how attributes should be used. The purists have tried to
impose another layer on top of the standard that requires element names
to always convey content and content only, and forbids the inclusion of
formatting information in any form whether it be in element names or
attributes. They claim that structure must consist solely of a hierarchy
of content-named elements, and that element context is always sufficient
to describe formatting.

If formatting attributes are forbidden, then any style sheet (or EDD) for 
formatting
SGML document instances must rely solely on element context. That
means the original developer of the DTD has predetermined for all time
what formatting variations are possible, because there are no
author-specified "hooks" that can be used by the style sheet to reflect
the author's vision by transcending context when the author thinks
there is a need to do so.

And, if element names are the only allowable way to indicate information
content, then information content must be solely determined by context.
But I argue that information content has many facets.

In your strictly-structured example above, The primary information content
facet is Command, and the name of each child element of Command
describes an information sub-type within a Command. This is all well and
good as far as it goes. But I suggest the following additional facets
of information content could (and perhaps should) be represented by means of
attributes:

* RequirementTrace - Traces the command back to the particular paragraph
in the software requirements specification where the requirement for the
command originated.

*FuntionName - The name of the function in which the code module (where
the command is executed) is located.

* CodeModule - Identifies the module of code where the command is executed.

* CodeVersion - Identifies the version of the code at the time the command was
documented.

* ECOs - Identifies any Engineering Change Orders that have affected the 
content
of the Command element and its children.

* Rationale - Explains the rationale for the command (this may be important 
during
document reviews and other activities to inform people who might otherwise
be in the dark, particularly when the explanation of the command does not
immediately precede the Command element).

* Keywords - Lists any keywords, such that, if a user executes a search for
a particular keyword, a hit will be produced. Consequently, a hit is produced,
even if that word does not actually appear in the text itself. By elevating
the listed keywords in this way, the typical problem with keyword searches
(i.e., too many hits, most of which are inconsequential) is ameliorated.

I could think of other facets, but I think my point is made. All of the above
attributes, in my opinion, are describing additional facets of the command
information type.

The whole purpose of information facets (element names and their context
being one facet, and amplifying attributes being another) is to facilitate
user searches. Most search engines, however, cannot search for an
element name within a particular context (i.e., some chain of antecedent
parents of the Command element), thus, if it is only possible to search on
an element name, every Command element in the entire document will be
found. But with amplifying attributes, I can search, say, for all Command
elements within the function with name XYZ, or, I can search on all
elements which reference a particular paragraph in the Software
Requirements Specification, and so on.

You can see, therefore, that multiple information facets offer much more
powerful search capabilities, thereby facilitating information reuse and
repurposing. Additionally, when design changes occur, these attributes
can facilitate document revision activities by locating all elements
within a document that might be affected by an Engineering Change Order,
or by a new version of the associated function or code module.






====================
| Nullius in Verba |
====================
Dan Emory, Dan Emory & Associates
FrameMaker/FrameMaker+SGML Document Design & Database Publishing
Voice/Fax: 949-722-8971 E-Mail: danemory@primenet.com
10044 Adams Ave. #208, Huntington Beach, CA 92646
---Subscribe to the "Free Framers" list by sending a message to
majordomo@omsys.com with "subscribe framers" (no quotes) in the body.



** To unsubscribe, send a message to majordomo@omsys.com **
** with "unsubscribe framers" (no quotes) in the body.   **