[Date Prev][Date Next] [Thread Prev][Thread Next]
[Date Index] [Thread Index] [New search]

Re: Variations in importing SGML docs into FM+SGML



At 12:11 PM 3/24/99 +1100, Marcus Carr wrote:
>
>Dan Emory wrote:
>
>> In my reply to your post, I reported the effects
>> on import time of the following additional changes from the
>> original test SGML file #2, which took 35 minutes to import.
>> This version had each of the 23 bio fields in a separately named
>> text range container, and used entity references to produce the
>> prefixes to each field:
>
>I'm not certain if the following is correct, but this is the way that I
understand
>what you're saying:
>
>
>ELEMENTS      TIME     SEC/ELEMENT
>
>--------     ------    -----------
>
>  12,000     35 min       5.714
>
>   2,000      5 min       6.666
>
>   2,600      8 min       5.416
>
===================================================================
No, you left some things out, particularly file 1 below, which was described
in my first post on this subject, which you apparently didn't see. All of
the files listed above (as well as those listed in the corrected list below)
contain the same 600 records.

FILE      ELEMENTS      FILE SIZE          TIME    PREFIX TYPE
1          12,000          400K             7 MIN  Prefix Rules in EDD
2          12,000          510K            35 MIN  Entity Refs
3           2,000          360K             5 MIN  Entity Refs
4           2,000          345k             5 MIN  50% Fewer Entity Refs
5           2,600          390k             8 MIN  50% Fewer Entity Refs

The variations in structure and content were as follows:

1. Using Prefix rules in the EDD to produce lead-in titles for each field
   (file 1 only).

2. Using entity references converted to variable definitions to produce
   those same lead-in titles (files 2 thru 5).

3. Putting each of the 23 fields in a separately named text range container
   (files 1 & 2)

4. Concatenating all 23 fields in a single text range container (files 3 & 4)

5. Reducing the number of entity references by 50% (files 4 & 5)

6. Wrapping one field in a separate text range container, and concatenating
   the remaining 22 fields in a single text range container (file 5).

Each of the 5 files used an extremely simple DTD/EDD (the only differences
resulting from variations 1, 3, 4, and 6). Each file, upon import into
FM+SGML, produced identical 16-page printed outputs.


==============================================================================
>> The common denominators that seem to affect the import time
>> in these three test cases are file size and element count.
>> Import time seems to be proportional to file size, and above
>> a certain point (a file size somewhere in the 375 KB to 500KB
>> range), the import time appears to increase almost exponentially
>> with increasing file size.
>
>I don't see that. I would anticipate that there is some overhead with
loading any
>file - perhaps that accounts for the high count on the document with the least
>number of elements. Between the first and third rows of the table above, the
>difference is less than 6%
is this what you're referring to? Surely there's no
>question that the complexity of the document will contribute to the amount
of time
>required to open it?
===========================================================================
As a benchmark comparison, I used a very complex 53-page structured document
(tables, graphics, very complex text structures) created in FM+SGML, using
an extremely complex EDD (190 pages, including 35 pages of format change
lists). This document contains about 2,000 elements, many of which have
numerous attributes. I then exported this document to SGML, producing a 202K
SGML file. When this file was imported into FM+SGML (replicating the
original document), the import time was only 90 seconds. If there is some
overhead with loading any file, and/or if the complexity of the document
contributes to the load time, then it would have been most apparent in the
benchmark document, which is the most complex by far, and also the smallest
in SGML file size.

So, adding in the benchmark file, the element loading rate (elements/sec) is
as follows:

FILE            LOAD RATE (ELEMENTS/SEC)     FILE SIZE
Benchmark               22.22                   202K
1                       28.57                   400K
2                        5.714                  510K
3                        6.666                  360K
4                        6.666                  345K
5                        5.416                  390K

Now, it becomes apparent that file size is a major determinant, and that
FM+SGML may be hitting some kind of wall at some point around a file size
400K. It's also apparent that SGML docs with lots of entity references also
increase load time, but that file size seems to be equally important, given
that the number of entity references in file 4 is 50% less than in file 3,
yet the load rate is the same.

File 1 seems to be anomalous, and the only explanation I have is that the
absence of entity references makes a big difference.
     ____________________
     | Nullius in Verba |
     ********************
Dan Emory, Dan Emory & Associates
FrameMaker/FrameMaker+SGML Document Design & Database Publishing
Voice/Fax: 949-722-8971 E-Mail: danemory@primenet.com
10044 Adams Ave. #208, Huntington Beach, CA 92646
---Subscribe to the "Free Framers" list by sending a message to
   majordomo@omsys.com with "subscribe framers" (no quotes) in the body.


** To unsubscribe, send a message to majordomo@omsys.com **
** with "unsubscribe framers" (no quotes) in the body.   **