[Date Prev][Date Next] [Thread Prev][Thread Next]
[Date Index] [Thread Index] [New search]

Re: Variations in importing SGML docs into FM+SGML



Dan,

I mis-labeled your files in my remarks. Replace file#1 with
"benchmark file", file #2 with file #1 and file #3 with file
#2 and my remarks will make more sense.

In your email below, you suggested that I was wrong about
bug#239927 being the cause of performance differences between
your original test files 1 and 2. I went back and re-checked
that bug report. Actually, it was reported against version
5.5, so you may be correct that this is not the cause.
Running a timed test WITH THE SAME FILE on the same machine
with each version of FM+SGML would resolve the question of
whether number of entities converted to variables drastically
slowed performance in versions before 5.5.6.

Unless my eyes are crossing again this morning, which they do
sometimes (i.e. mis-labeling your test and benchmark files),
the only test that you have run so far which compares files
with only one difference is your original test of test file
#1 against test file #2. In every other test that you ran,
you changed more than one parameter between the test files.
That's what I meant when I said that you were comparing
apples and oranges. One cannot conclude that a particular
factor is causing a slow-down in peformance unless all other
factors are the same.

Janice

PS I am not claiming that there is not a slowing of
performance due to increased SGML document file size. I don't know. 

At 08:08 PM 3/18/99 -0700, you wrote:
   Well, I did further testing to see if issue #239927 (which
Janice McConnell
   identifies below as the cause) was the actual cause.
Here's what I did:
   
   1. In the original test SGML file #1 (the one that uses
prefix rules) and #2
   (the one with the entity references) each of the 23
concatenated fields in
   each record was in a separately named text range container.
   
   2. So, for test SGML file #2, I changed the EDD and DTD so that all 23
   fields were concatenated together in a single paragraph
container element.
   This reduced the element count in test SGML file #2 from
about 12,000 to
   about 2000, and reduced the SGML file size by about 130 KB. All of the
   entity references that are converted to variables
remained. The result was
   that the import time was reduced from 35 minutes to 5
minutes (2 minutes
   less than the import time for test SGML file #1, which
uses prefix rules
   instead of entity references.
   
   3. Next, in the same SGML test file #2, I halved the number of entity
   references in each record. This produced a further
reduction in SGML file
   size of about 30 KB. The import time, however, was still
about 5 minutes,
   suggesting that these entity references are not the main cause.
   
   4. Next, I changed the EDD and DTD again to wrap one field
in each record of
   test SGML file #2 in a text range container (the remaining fields all
   remained together in a single concatenated string, and all
of the entity
   references remained as in step 3). This increased the
element count in test
   SGML file #2 from 2000 to 2600, and increased its file
size by about 12 KB
   to a size of 350 KB. The import time increased from 5
minutes to 8 minutes.
   
   CONCLUSION:
   1. Once test file SGML file #2 was modified as described
in steps 2 and 3
   above to drastically reduce the element count and the file
size, leaving all
   other factors the same, the import time was still about
3.3 times longer
   than the benchmark SGML file, even though the element count in that
   benchmark file is greater than that of the modified test
SGML file. However,
   the modified test SGML file #2 is still about 140 KB larger than the
   benchmark SGML file.
   
   2. Consequently, I conclude that the main factor
influencing import time is
   SGML file size, not the known issue about entity references that are
   converted to variables. If this conclusion is correct,
then there must be
   some file size somewhere between 200 KB (the size of the
benchmark file) and
   350 KB (the approximate sizes of the original test SGML
file #1 and the
   modified test SGML file #2) where FM+SGML hits a wall and
begins to slow
   down the import process. Since it's hard for me to believe
I'm running out
   of real memory (32 MB) or virtual memory (111 MB), I
conclude that the wall
   must be internal to FM+SGML. Once FM+SGML hits that wall,
it must begin to
   boggle everything.
     
   At 07:43 PM 3/17/99 -0500, Janice McConnell wrote:
   >Dan,
   >
   >Concerning the different opening times for files #2 and #3,
   >there is a known issue (#239927) with importing files with a
   >lot of entity references which are converted to variables in
   >version 5.1.1. This problem was resolved in version 5.5.6.
   >
   >You compared the opening times for files #1 and #2, and
   >suggested that it is related to size of file. I don't believe
   >this is a valid comparison since, as you pointed out, the
   >files are extremely different in content.
   ======================================================
   No, test SGML files #1 and #2 are identical in content and
structure, except
   for the addition of the entity references in #2, which
increased the file
   size by about 110 KB. Admittedly, the benchmark SGML file is entirely
   different in content and structure, but its EDD is very
complex in both
   structure and formatting, and the EDDs for test SGML files
#1 and #2 are
   very simple. The only difference between the EDD used for
test SGML file #2
   and that used for test SGML file #1 is that the prefix
rules were deleted
   from the EDD for #2. Both test SGML files #1 and #2 use
the same DTD, and
   the 600 records contained in both are identical.
   =================================================================
   >Without examining
   >the documents and their respective applications, I can't
   >speculate about opening time performance.
   >
   >Janice
   > 
   >*****************************
   >Wide variations in the time required to import SGML document
   >instances into
   >FM+SGML have been observed:
   >
   >WIN platform (266 MHz CPU)
   >Memory Size: 32 MB
   >Memory Read/Write Cache: 2048 KB
   >Virtual Memory: 111 MB (temporary)
   >FM+SGML version 5.1.1
   >
   >TEST FILES:
   >Benchmark SGML file: 202 KB, containing about six graphic
entities, plus
   >complex tables and text structures, using a very complex
DTD/EDD. The EDD
   >has 190 pages, including about 35 pages of format change
   >lists, and a file
   >size of 2.4 MB
   >
   >Test SGML file #1: 400 KB, containing 600 biographical
records (about 6
   >lines each) extracted from a conventional database and
tagged to produce
   >SGML.  Each biographical record contains up to 23
   >concatenated data fields
   >(each field is contained in a descriptively named SGML
   >element) The DTD is
   >quite simple.  The EDD (10 pages, 176 KB) contains prefix
   >rules that specify
   >the lead-in titles that precede some of the biographical data
   >fields (e.g.,
   >"Education:", "Address:", "Phone:", "Fax:", "E-mail:")
   >
   >Test SGML file #2: 510 KB, containing the exact same 600 biographical
   >records as Test File #1, and using the same DTD. However,
   >instead of using
   >EDD prefix rules to specify the lead-in titles, the SGML
elements contain
   >entity references (e.g., &Educ; &Addr; &Ph; &Fx; &Eml;)
to produce those
   >titles. The SGML document instance contains internal entity
   >declarations for
   >these entities of the form:
   >
   >	<!ENTITY Educ "FM variable: Educ">
   >For each such entity, the template used for import has a
   >variable definition
   >that produces the corresponding lead-in title.
   >
   >ANALYSIS OF THE THREE FILES:
   >Test Files #1 and #2  are identical, with the exception that
   >Test File #2
   >has the added entity references and entity declarations,
   >which accounts for
   >the 110 KB difference in the size of the two files
   >
   >The Benchmark SGML file produces, on import to FM+SGML a
   >richly structured
   >and formatted 53-page document.
   >
   >Test Files 1 and 2 both produce, on import to FM+SGML,
   >identical documents
   >containing the 600 biographical records in 6.5-point type.
   >The structure and
   >formatting are simple. The EDD has no format change lists,
   >and very simple
   >format rules.
   >
   >HERE ARE THE TIMES IT TAKES TO COMPLETE THE IMPORT-TO-FM+SGML ACTION:
   >
   >Benchmark SGML file: 53 pages in 90 Seconds to produce a 1.0
   >MB FM+SGML file.
   >
   >Test SGML File #1: 16 pages  in about 7 minutes to produce a
   >2.3 MB FM+SGML
   >file.
   >
   >Test SGML File #2: 16 pages  in about 35 minutes to produce a
   >2.3 MB FM+SGML
   >file.
   >
   >All tests were conducted several times, with nothing
running but FM+SGML
   >
   >CONCLUSIONS
   >>From the foregoing, it would appear that:
   >
   >1. The complexity of the EDD seems to have little impact
on import time.
   >
   >2. A doubling of SGML file size from 200 KB to 400 KB
   >increases the import
   >time by a factor of at least 4.6. 
   >
   >3. The use of prefix rules in the EDD produces a 5-fold
   >reduction in import
   >time compared to the use of entity references for the
exact same purpose.
   >
   >Doe anyone have an explanation for these wide variations in
   >import times? 
   >
   >
   >** To unsubscribe, send a message to majordomo@omsys.com **
   >** with "unsubscribe framers" (no quotes) in the body.   **
   >
   >
        ____________________
        | Nullius in Verba |
        ********************
   Dan Emory, Dan Emory & Associates
   FrameMaker/FrameMaker+SGML Document Design & Database Publishing
   Voice/Fax: 949-722-8971 E-Mail: danemory@primenet.com
   10044 Adams Ave. #208, Huntington Beach, CA 92646
   ---Subscribe to the "Free Framers" list by sending a message to
      majordomo@omsys.com with "subscribe framers" (no
quotes) in the body.
   
   
   

** To unsubscribe, send a message to majordomo@omsys.com **
** with "unsubscribe framers" (no quotes) in the body.   **