[Date Prev][Date Next] [Thread Prev][Thread Next]
[Date Index] [Thread Index] [New search]

Re: to Re: Conversion Utility Wanted (scanned text --> live text)



Bill,

You hit the nail on the head... And I have avoided doing OCR for three years
myself!  

HOWEVER...

1) From what I hear from people who do use it extensively, the OCR software
has gotten a little better.  Just the same, training the software is
essential.  And if you are scanning (for OCR) from old hand-set letterpress
type, forget it -- type it yourself.

2) If the material being scanned is small type size, one can sometimes do a
photocopy blowup of it first -- depending upon the page size of the original
and the scanner bed size.

Lastly...and I don't want to start a thread here, but does anybody remember
back when a version of OmniPage (OCR) would crash-hard a Win3.1 machine if the
text being scanned included the character strings: "SS", "S.S.", or "S/S". 
Such as a ship named the "SS Victoria" or some such.  I never got Omni to
admit that somebody was "never forgetting".  Bless that person, but don't
crash my machine!

Jay

-- 
Jay Smith

e-mail: jay@jaysmith.com

The Press for History(tm), The Press for Education(tm), 
The Press for [Your Industry](tm), The Press for....(tm)
  On-demand printing and binding of hardbound books.
  Minimum run one copy.

P.O. Box 650
Snow Camp, NC  27349  USA

Phone: Int+US+336-376-9991
Toll-Free Phone in US & Canada:
        1-800-447-8267
Fax: Int+US+336-376-6750



William D. Garriott wrote:
> 
> Jay's information matches my experience (which was so painful that I haven't
> attempted it for a few years). We worked with 300 dpi TIFF files, but what I
> was not prepared for was the TRAINING required for EACH typeface. You have
> to help some of these programs recognize the difference between a "cl" and a
> "d", an "lo" and a "b". The tighter the kerning, and the smaller the text,
> the harder it is for the software to distinguish between a single letter and
> a combo.
> 
> Once our OCR software was "trained," it produced 90-some percent accuracy.
> However, the accuracy was based on the recognition of a "letter" (if it
> guessed a letter, it was accurate!), not a correctly spelled word! We soon
> discovered that with the vast array of typefaces in all their iterations
> (each printer may print a little differently on spacing, software can set
> lines of text tighter or looser...), there was no way we could pull it off
> more cheaply than retyping the document!
> 
> Oh, the unfulfilled promises of technology...
> 
> Best wishes,
> 
> Bill
>

** To unsubscribe, send a message to majordomo@omsys.com **
** with "unsubscribe framers" (no quotes) in the body.   **