[Date Prev][Date Next] [Thread Prev][Thread Next]
[Date Index] [Thread Index] [New search]

Re: Variations in importing SGML docs into FM+SGML



At 07:14 AM 3/22/99 -0500, Janice McConnell wrote:
>In your email below, you suggested that I was wrong about
>bug#239927 being the cause of performance differences between
>your original test files 1 and 2. I went back and re-checked
>that bug report. Actually, it was reported against version
>5.5, so you may be correct that this is not the cause.
======================================================
OK, so this was a bug in in 5.5.x which was fixed in
5.5.6.
==============================================================
>Running a timed test WITH THE SAME FILE on the same machine
>with each version of FM+SGML would resolve the question of
>whether number of entities converted to variables drastically
>slowed performance in versions before 5.5.6.
===========================================================
Yes, that's true, but it's even more interesting to find out
how import times vary when you use the same FM+SGML
version to import the exact same data content
while varying one one or more factors in the SGML content model and/
or in the EDD format rules. I've created a test bed for doing that,
and the results of that testing are what I reported in my
original post on this subject, as well as in my reply to your post.
The import time for the benchmark SGML file served only as a
beginning reference point.

The data source is a database extract (i.e., ASCII flat file)
containing 600 records, each having 29 character-delimited
fields. I then use UniMerge and a FrameMaker report template
(unstructured) to merge the flat file and convert it to valid
SGML. Each time I change a factor, I modify the FM report
template to produce valid SGML with the changed factor,
and also modify the DTD and EDD accordingly. In all
cases, the resulting output produced by printing the
imported SGML instance from FM+SGML is identical in all respects.
=================================================================
>Unless my eyes are crossing again this morning, which they do
>sometimes (i.e. mis-labeling your test and benchmark files),
>the only test that you have run so far which compares files
>with only one difference is your original test of test file
>#1 against test file #2. In every other test that you ran,
>you changed more than one parameter between the test files.
>That's what I meant when I said that you were comparing
>apples and oranges. One cannot conclude that a particular
>factor is causing a slow-down in peformance unless all other
>factors are the same.
============================================================
In my reply to your post, I reported the effects
on import time of the following additional changes from the
original test SGML file #2, which took 35 minutes to import.
This version had each of the 23 bio fields in a separately named
text range container, and used entity references to produce the
prefixes to each field:

CASE 1. Concatenated all 23 of the bio fields in a single container
element, preserving all the entity references in the original file,
This reduced the element count from about 12,000 to about 2,000 and
produced a major reduction in file size.
RESULT: Reduced the import time from 35 minutes to 5 minutes.

CASE 2. Halved the number of entity references, with everything
else the same as CASE 1, producing a relatively small reduction
in file size.
RESULT: No change in import time from that observed in CASE 1.

CASE 3. Left 22 of the 23 bio fields as a concatenated string
in a single container element, with same entity references as
CASE 2 above, and wrapped the first bio field in a separate
container element, producing a 30% increase in element count
(from 2000 to 2600), and a relatively small increase in file
size.
RESULT: Increased import time to 8 minutes (a 62% increase
over that observed in CASES 1 and 2).

The common denominators that seem to affect the import time
in these three test cases are file size and element count.
Import time seems to be proportional to file size, and above
a certain point (a file size somewhere in the 375 KB to 500KB
range), the import time appears to increase almost exponentially
with increasing file size.
     ____________________
     | Nullius in Verba |
     ********************
Dan Emory, Dan Emory & Associates
FrameMaker/FrameMaker+SGML Document Design & Database Publishing
Voice/Fax: 949-722-8971 E-Mail: danemory@primenet.com
10044 Adams Ave. #208, Huntington Beach, CA 92646
---Subscribe to the "Free Framers" list by sending a message to
   majordomo@omsys.com with "subscribe framers" (no quotes) in the body.


** To unsubscribe, send a message to majordomo@omsys.com **
** with "unsubscribe framers" (no quotes) in the body.   **