[Date Prev][Date Next] [Thread Prev][Thread Next]
[Date Index] [Thread Index] [New search]

Re: Variations in importing SGML docs into FM+SGML



Well, I did further testing to see if issue #239927 (which Janice McConnell
identifies below as the cause) was the actual cause. Here's what I did:

1. In the original test SGML file #1 (the one that uses prefix rules) and #2
(the one with the entity references) each of the 23 concatenated fields in
each record was in a separately named text range container.

2. So, for test SGML file #2, I changed the EDD and DTD so that all 23
fields were concatenated together in a single paragraph container element.
This reduced the element count in test SGML file #2 from about 12,000 to
about 2000, and reduced the SGML file size by about 130 KB. All of the
entity references that are converted to variables remained. The result was
that the import time was reduced from 35 minutes to 5 minutes (2 minutes
less than the import time for test SGML file #1, which uses prefix rules
instead of entity references.

3. Next, in the same SGML test file #2, I halved the number of entity
references in each record. This produced a further reduction in SGML file
size of about 30 KB. The import time, however, was still about 5 minutes,
suggesting that these entity references are not the main cause.

4. Next, I changed the EDD and DTD again to wrap one field in each record of
test SGML file #2 in a text range container (the remaining fields all
remained together in a single concatenated string, and all of the entity
references remained as in step 3). This increased the element count in test
SGML file #2 from 2000 to 2600, and increased its file size by about 12 KB
to a size of 350 KB. The import time increased from 5 minutes to 8 minutes.

CONCLUSION:
1. Once test file SGML file #2 was modified as described in steps 2 and 3
above to drastically reduce the element count and the file size, leaving all
other factors the same, the import time was still about 3.3 times longer
than the benchmark SGML file, even though the element count in that
benchmark file is greater than that of the modified test SGML file. However,
the modified test SGML file #2 is still about 140 KB larger than the
benchmark SGML file.

2. Consequently, I conclude that the main factor influencing import time is
SGML file size, not the known issue about entity references that are
converted to variables. If this conclusion is correct, then there must be
some file size somewhere between 200 KB (the size of the benchmark file) and
350 KB (the approximate sizes of the original test SGML file #1 and the
modified test SGML file #2) where FM+SGML hits a wall and begins to slow
down the import process. Since it's hard for me to believe I'm running out
of real memory (32 MB) or virtual memory (111 MB), I conclude that the wall
must be internal to FM+SGML. Once FM+SGML hits that wall, it must begin to
boggle everything.
  
At 07:43 PM 3/17/99 -0500, Janice McConnell wrote:
>Dan,
>
>Concerning the different opening times for files #2 and #3,
>there is a known issue (#239927) with importing files with a
>lot of entity references which are converted to variables in
>version 5.1.1. This problem was resolved in version 5.5.6.
>
>You compared the opening times for files #1 and #2, and
>suggested that it is related to size of file. I don't believe
>this is a valid comparison since, as you pointed out, the
>files are extremely different in content.
======================================================
No, test SGML files #1 and #2 are identical in content and structure, except
for the addition of the entity references in #2, which increased the file
size by about 110 KB. Admittedly, the benchmark SGML file is entirely
different in content and structure, but its EDD is very complex in both
structure and formatting, and the EDDs for test SGML files #1 and #2 are
very simple. The only difference between the EDD used for test SGML file #2
and that used for test SGML file #1 is that the prefix rules were deleted
from the EDD for #2. Both test SGML files #1 and #2 use the same DTD, and
the 600 records contained in both are identical.
=================================================================
>Without examining
>the documents and their respective applications, I can't
>speculate about opening time performance.
>
>Janice
> 
>*****************************
>Wide variations in the time required to import SGML document
>instances into
>FM+SGML have been observed:
>
>WIN platform (266 MHz CPU)
>Memory Size: 32 MB
>Memory Read/Write Cache: 2048 KB
>Virtual Memory: 111 MB (temporary)
>FM+SGML version 5.1.1
>
>TEST FILES:
>Benchmark SGML file: 202 KB, containing about six graphic entities, plus
>complex tables and text structures, using a very complex DTD/EDD. The EDD
>has 190 pages, including about 35 pages of format change
>lists, and a file
>size of 2.4 MB
>
>Test SGML file #1: 400 KB, containing 600 biographical records (about 6
>lines each) extracted from a conventional database and tagged to produce
>SGML.  Each biographical record contains up to 23
>concatenated data fields
>(each field is contained in a descriptively named SGML
>element) The DTD is
>quite simple.  The EDD (10 pages, 176 KB) contains prefix
>rules that specify
>the lead-in titles that precede some of the biographical data
>fields (e.g.,
>"Education:", "Address:", "Phone:", "Fax:", "E-mail:")
>
>Test SGML file #2: 510 KB, containing the exact same 600 biographical
>records as Test File #1, and using the same DTD. However,
>instead of using
>EDD prefix rules to specify the lead-in titles, the SGML elements contain
>entity references (e.g., &Educ; &Addr; &Ph; &Fx; &Eml;) to produce those
>titles. The SGML document instance contains internal entity
>declarations for
>these entities of the form:
>
>	<!ENTITY Educ "FM variable: Educ">
>For each such entity, the template used for import has a
>variable definition
>that produces the corresponding lead-in title.
>
>ANALYSIS OF THE THREE FILES:
>Test Files #1 and #2  are identical, with the exception that
>Test File #2
>has the added entity references and entity declarations,
>which accounts for
>the 110 KB difference in the size of the two files
>
>The Benchmark SGML file produces, on import to FM+SGML a
>richly structured
>and formatted 53-page document.
>
>Test Files 1 and 2 both produce, on import to FM+SGML,
>identical documents
>containing the 600 biographical records in 6.5-point type.
>The structure and
>formatting are simple. The EDD has no format change lists,
>and very simple
>format rules.
>
>HERE ARE THE TIMES IT TAKES TO COMPLETE THE IMPORT-TO-FM+SGML ACTION:
>
>Benchmark SGML file: 53 pages in 90 Seconds to produce a 1.0
>MB FM+SGML file.
>
>Test SGML File #1: 16 pages  in about 7 minutes to produce a
>2.3 MB FM+SGML
>file.
>
>Test SGML File #2: 16 pages  in about 35 minutes to produce a
>2.3 MB FM+SGML
>file.
>
>All tests were conducted several times, with nothing running but FM+SGML
>
>CONCLUSIONS
>>From the foregoing, it would appear that:
>
>1. The complexity of the EDD seems to have little impact on import time.
>
>2. A doubling of SGML file size from 200 KB to 400 KB
>increases the import
>time by a factor of at least 4.6. 
>
>3. The use of prefix rules in the EDD produces a 5-fold
>reduction in import
>time compared to the use of entity references for the exact same purpose.
>
>Doe anyone have an explanation for these wide variations in
>import times? 
>
>
>** To unsubscribe, send a message to majordomo@omsys.com **
>** with "unsubscribe framers" (no quotes) in the body.   **
>
>
     ____________________
     | Nullius in Verba |
     ********************
Dan Emory, Dan Emory & Associates
FrameMaker/FrameMaker+SGML Document Design & Database Publishing
Voice/Fax: 949-722-8971 E-Mail: danemory@primenet.com
10044 Adams Ave. #208, Huntington Beach, CA 92646
---Subscribe to the "Free Framers" list by sending a message to
   majordomo@omsys.com with "subscribe framers" (no quotes) in the body.


** To unsubscribe, send a message to majordomo@omsys.com **
** with "unsubscribe framers" (no quotes) in the body.   **