[Date Prev][Date Next]
[Thread Prev][Thread Next]
[Date Index]
[Thread Index]
[New search]
To: "William D. Garriott" <wdg@xxxxxxxxxxx>, FrameUsers List <Framers@xxxxxxxxxxxxxx>, Frame List <Framers@xxxxxxxxx>
Subject: Re: to Re: Conversion Utility Wanted (scanned text --> live text)
From: Jay Smith <jay@xxxxxxxxxxxx>
Date: Tue, 11 May 1999 18:03:10 -0400
Organization: Jay Smith & Associates
References: <007f01be9bf4$ee5ab4c0$08091681@buzz-lightyear.cwru.edu>
Sender: owner-framers@xxxxxxxxx
Bill, You hit the nail on the head... And I have avoided doing OCR for three years myself! HOWEVER... 1) From what I hear from people who do use it extensively, the OCR software has gotten a little better. Just the same, training the software is essential. And if you are scanning (for OCR) from old hand-set letterpress type, forget it -- type it yourself. 2) If the material being scanned is small type size, one can sometimes do a photocopy blowup of it first -- depending upon the page size of the original and the scanner bed size. Lastly...and I don't want to start a thread here, but does anybody remember back when a version of OmniPage (OCR) would crash-hard a Win3.1 machine if the text being scanned included the character strings: "SS", "S.S.", or "S/S". Such as a ship named the "SS Victoria" or some such. I never got Omni to admit that somebody was "never forgetting". Bless that person, but don't crash my machine! Jay -- Jay Smith e-mail: jay@jaysmith.com The Press for History(tm), The Press for Education(tm), The Press for [Your Industry](tm), The Press for....(tm) On-demand printing and binding of hardbound books. Minimum run one copy. P.O. Box 650 Snow Camp, NC 27349 USA Phone: Int+US+336-376-9991 Toll-Free Phone in US & Canada: 1-800-447-8267 Fax: Int+US+336-376-6750 William D. Garriott wrote: > > Jay's information matches my experience (which was so painful that I haven't > attempted it for a few years). We worked with 300 dpi TIFF files, but what I > was not prepared for was the TRAINING required for EACH typeface. You have > to help some of these programs recognize the difference between a "cl" and a > "d", an "lo" and a "b". The tighter the kerning, and the smaller the text, > the harder it is for the software to distinguish between a single letter and > a combo. > > Once our OCR software was "trained," it produced 90-some percent accuracy. > However, the accuracy was based on the recognition of a "letter" (if it > guessed a letter, it was accurate!), not a correctly spelled word! We soon > discovered that with the vast array of typefaces in all their iterations > (each printer may print a little differently on spacing, software can set > lines of text tighter or looser...), there was no way we could pull it off > more cheaply than retyping the document! > > Oh, the unfulfilled promises of technology... > > Best wishes, > > Bill > ** To unsubscribe, send a message to majordomo@omsys.com ** ** with "unsubscribe framers" (no quotes) in the body. **