[Date Prev][Date Next] [Thread Prev][Thread Next]
[Date Index] [Thread Index] [New search]

Re: FM+SGML Information Design

At 11:27 AM 6/3/99 -0500, Ted Goranson wrote:
>I am impressed, both by your kindness in sending this to my attention, and
>in the clarity of the document itself. Some of the ideas in the paper are
>new to me, but I see some possibilities and want to follow through.  If you
wish, I will be happy to rewrite the message with a preface for posting to
the Frame list. Or alternately, I'll write a summary.
Unfortunately, my posting privileges on frameusers.com were suspended
about 9 months ago for having offended the list "owner." He appears
disinclined to lift the suspension. I will, however, post it to the Free
Framers list (you might want to consider subscribing to that one
too--information on how to subscribe is in my signature block). You have my
permission to post this response to the "other" list.
>--I already have 500 pages or so in a well-formed Frame document. It is
>well-formed in the sense of everything being tagged and the tags making
>sense. When upgrading to SGML, I suppose it is not difficult to go through
>and reassign everything to new conventions, just time consuming.
The method used by FM+SGML for converting unstructured docs to structured
ones is quite similar to that used by FrameMaker to convert to HTML. The
FM+SGML Structure Rules Tables method used by FM+SGML for conversions is
quite robust, but it requires consistent tagging of the unstructured doc.

1. Any ad-hoc character formatting (e.g., making a word Bold without using a
character format tag for that purpose) will be lost.

2. Any ad-hoc overrides to paragraph tag formats will be lost.

3. Any significant amount of mistagging will almost always produce an
unwelcome outcome.

4. The method has no capability, at the lowest level of structure, to
properly wrap unstructured tagged objects in elements based on the objects'
context. However, when two or more different unstructured object tags
correlate to the same element, and that element can occur in different
contexts, it is often possible to use the unstructured tagname to "qualify"
the resulting element so as to indicate its context for higher-level
wrapping. In some cases, it may even be possible to assign attribute values
to elements at the lowest level of structure.

Despite the limitations described above, I've had quite a bit of success in
using structure rules tables to accomplish conversion to structured docs. On
one large project in which I'm presently involved, we're achieving something
close to 90% structural validity on the first pass.
>--The idea of information modeling the document is compelling. You seem to
>be dealing with "ordinary tech manuals" where procedures are described.
Not really. The methods described in the paper should be applicable
to almost any document type. In section 5, "Extensibility of the Modular
Structure", the paper describes how encapsulation wrappers can be used to
encapsulate different information types.
>my content is related more deeply-in other words, I don't have a time
>sequence to fall back on.
Printed books are linear. Hypertexts and databases are not.
SGML was primarily intended for the former, which probably explains why its
hypertext linking capabilities are much less than needed to implement
non-linear hypertexts, or to store it in a database with sufficient metadata
to permit reliable retrieval.

XML (hopefully) will remove these limitations. To facilitate information
access, a Universal Resource Identifier (URI) and a Reference Description
Framework (RDF) description of the content can be assigned to each chunk.
XML links (or database queries) will retrieve the chunk by specifying the
chunk's unique URI, and even an anchor point (i.e., node) within that chunk.
I'm prepared to go the extra distance and
>actually model the relationships using some entity-relationship tool. Do
>people have experience using your idea and modeling? This would make the
>notions of the wrapper and the contents closer.
In XML, Each information chunk could be contained in an encapsulation
wrapper, and the encapsulation wrapper would be assigned a unique URI and an
RDF. The RDF describes the wrapper's information content, and includes a
pointer to the URI. The wrapper (with its contents), plus its RDF, would be
separately stored in a database. This would provide (at least) two ways to
retrieve the chunk:

1. A database search directed at the RDFs would deliver the information
chunks whose RDFs meet the search criteria.

2. Any hypertext link can retrieve a specific information chunk by
specifying its unique URI. The URI serves as a pointer to the chunk's
storage location (e.g., its location in the database).
>--The result would be a collection of information that has three
>structures: XML (for whatever web capability I desire); ODMA (as the
>ultimate open standard in document accessibility) ; and as structured data.
>I would probably move this data into a Filemaker database (with which I
>have no current experience). So the question is have you seen small users
>link Frame and FileMakers this way, and do you think that maintaining three
>structures (XML/ODMA/E-R)is possible?
I would say that the ultimate purpose of XML is to play a role in a system
that can deliver information in any form needed by a human or a machine, and
which provides the capability to access not only documents but also
meaningful information packets, which may or may not be part of conventional
"documents". In other words, XML is one component of a system that could
provide the ultimate in information access and interchange, in which
document access is only one (possibly insignificant) of those capabilities.

Information access would embody not only access by humans, but also by
machines (e.g., computers, music players, process controllers). XML has (or
will soon have) many features (e.g., Unicode, RDF, XLink, XSL) that should
make it superior to any other method of achieving seamless information
retrieval and interchange.

XML is intended to be the best method of storing information packets and
their metadata in a database repository. An ideal system would be capable of
delivering information, not only in XML, but in almost any other form
specified by the human or machine requesting it.

I doubt very much whether FileMaker is robust enough to serve as the
information repository in such a system, even if the system requirements
were significantly relaxed from that described above. High-powered database
repositories with the needed capabilities currently range in price from the
middle 5 figures up to 7 figures.

We might expect the price for an XML-aware database repository with the
minimum required capabilities will drop into the middle 4-figure range as
XML begins to catch on, and competition increases. 
>--It appears that everything depends on intelligent initial specification
>of the EDD/DTD. One cannot be refining it as one goes, right?
In some ways XML is more adaptable to evolving structure than SGML. For
instance, well-formed XML does not have to be conformant to a DTD in order
for it to be readable by humans and machines. Nevertheless, intelligently
designed structure remains the crux of the matter, and a modular design
makes it much easier to deal with evolving structure.

Although neither XML nor the XML capabilities of the latest release of
FM+SGML are quite ready for prime time, developing an EDD/DTD for SGML
should assure that your SGML documents can, when the time comes, be easily
convertable to XML. It remains to be seen, however, whether FM+SGML will
evolve into the tool of choice for importing and exporting XML. 
     | Nullius in Verba |
Dan Emory, Dan Emory & Associates
FrameMaker/FrameMaker+SGML Document Design & Database Publishing
Voice/Fax: 949-722-8971 E-Mail: danemory@primenet.com
10044 Adams Ave. #208, Huntington Beach, CA 92646
---Subscribe to the "Free Framers" list by sending a message to
   majordomo@omsys.com with "subscribe framers" (no quotes) in the body.

** To unsubscribe, send a message to majordomo@omsys.com **
** with "unsubscribe framers" (no quotes) in the body.   **