[Date Prev][Date Next]
[Thread Prev][Thread Next]
[Date Index]
[Thread Index]
[New search]
To: Marcus Carr <mrc@xxxxxxxxxxxxxx>, framers@xxxxxxxxx
Subject: Re: Variations in importing SGML docs into FM+SGML
From: Dan Emory <danemory@xxxxxxxxxxxx>
Date: Wed, 24 Mar 1999 09:59:02 -0700 (MST)
Sender: owner-framers@xxxxxxxxx
At 12:11 PM 3/24/99 +1100, Marcus Carr wrote: > >Dan Emory wrote: > >> In my reply to your post, I reported the effects >> on import time of the following additional changes from the >> original test SGML file #2, which took 35 minutes to import. >> This version had each of the 23 bio fields in a separately named >> text range container, and used entity references to produce the >> prefixes to each field: > >I'm not certain if the following is correct, but this is the way that I understand >what you're saying: > > >ELEMENTS TIME SEC/ELEMENT > >-------- ------ ----------- > > 12,000 35 min 5.714 > > 2,000 5 min 6.666 > > 2,600 8 min 5.416 > =================================================================== No, you left some things out, particularly file 1 below, which was described in my first post on this subject, which you apparently didn't see. All of the files listed above (as well as those listed in the corrected list below) contain the same 600 records. FILE ELEMENTS FILE SIZE TIME PREFIX TYPE 1 12,000 400K 7 MIN Prefix Rules in EDD 2 12,000 510K 35 MIN Entity Refs 3 2,000 360K 5 MIN Entity Refs 4 2,000 345k 5 MIN 50% Fewer Entity Refs 5 2,600 390k 8 MIN 50% Fewer Entity Refs The variations in structure and content were as follows: 1. Using Prefix rules in the EDD to produce lead-in titles for each field (file 1 only). 2. Using entity references converted to variable definitions to produce those same lead-in titles (files 2 thru 5). 3. Putting each of the 23 fields in a separately named text range container (files 1 & 2) 4. Concatenating all 23 fields in a single text range container (files 3 & 4) 5. Reducing the number of entity references by 50% (files 4 & 5) 6. Wrapping one field in a separate text range container, and concatenating the remaining 22 fields in a single text range container (file 5). Each of the 5 files used an extremely simple DTD/EDD (the only differences resulting from variations 1, 3, 4, and 6). Each file, upon import into FM+SGML, produced identical 16-page printed outputs. ============================================================================== >> The common denominators that seem to affect the import time >> in these three test cases are file size and element count. >> Import time seems to be proportional to file size, and above >> a certain point (a file size somewhere in the 375 KB to 500KB >> range), the import time appears to increase almost exponentially >> with increasing file size. > >I don't see that. I would anticipate that there is some overhead with loading any >file - perhaps that accounts for the high count on the document with the least >number of elements. Between the first and third rows of the table above, the >difference is less than 6% is this what you're referring to? Surely there's no >question that the complexity of the document will contribute to the amount of time >required to open it? =========================================================================== As a benchmark comparison, I used a very complex 53-page structured document (tables, graphics, very complex text structures) created in FM+SGML, using an extremely complex EDD (190 pages, including 35 pages of format change lists). This document contains about 2,000 elements, many of which have numerous attributes. I then exported this document to SGML, producing a 202K SGML file. When this file was imported into FM+SGML (replicating the original document), the import time was only 90 seconds. If there is some overhead with loading any file, and/or if the complexity of the document contributes to the load time, then it would have been most apparent in the benchmark document, which is the most complex by far, and also the smallest in SGML file size. So, adding in the benchmark file, the element loading rate (elements/sec) is as follows: FILE LOAD RATE (ELEMENTS/SEC) FILE SIZE Benchmark 22.22 202K 1 28.57 400K 2 5.714 510K 3 6.666 360K 4 6.666 345K 5 5.416 390K Now, it becomes apparent that file size is a major determinant, and that FM+SGML may be hitting some kind of wall at some point around a file size 400K. It's also apparent that SGML docs with lots of entity references also increase load time, but that file size seems to be equally important, given that the number of entity references in file 4 is 50% less than in file 3, yet the load rate is the same. File 1 seems to be anomalous, and the only explanation I have is that the absence of entity references makes a big difference. ____________________ | Nullius in Verba | ******************** Dan Emory, Dan Emory & Associates FrameMaker/FrameMaker+SGML Document Design & Database Publishing Voice/Fax: 949-722-8971 E-Mail: danemory@primenet.com 10044 Adams Ave. #208, Huntington Beach, CA 92646 ---Subscribe to the "Free Framers" list by sending a message to majordomo@omsys.com with "subscribe framers" (no quotes) in the body. ** To unsubscribe, send a message to majordomo@omsys.com ** ** with "unsubscribe framers" (no quotes) in the body. **