[Date Prev][Date Next] [Thread Prev][Thread Next]
[Date Index] [Thread Index] [New search]

Re: Conversion Utility Wanted (scanned text --> live text)



Emmy,

When you say that your scanner can only produce .bmp and .pdf files, I suspect
that it is the DRIVER, not the SCANNER that has this limitation.  The first
thing that I would do is to check with the scanner manufacturer to see that
there is not an updated scanner driver available.

To get your "page images" into workable text, you need to run them through an
OCR (optical character recognition) program.  [Note that OCR is only going to
be 95-99.9% accurate; the result will need to be proofed.]  The only one with
which I am personally familiar is OmniPage (I think made by Caere??), but
there are several.  GOOD ocr programs are not cheap.   And that leads us back
to your scanner.  If your scanner is so low-end that you can only get bmp and
pdf, then that scanner is probably not going to be any more productive/fast
(when you include the OCR time and cost as well) than actually TYPING the
content.

If you have a serious amount of this work to do, you may find that what you
need is a fast scanner (minimum $1000 for a good, FAST one) and decent OCR
software.  Last I checked OmniPage was a few hundred bucks.

HOWEVER, if you use your existing scanner, you can still use OCR software. 
What you need to do is convert your .bmp files to 300 dpi .tif (TIFF) files --
which is what I believe most OCR programs prefer to use.  There are several
programs that can do such conversions, each with varying effectiveness.  Check
to see what is already in your library of image editing programs.  Note that
there are conversion programs specifically designed to do such batch
conversions -- shop around.

AND HOWEVER AGAIN....  The problem with trying to do this scanning/ocr without
a better scanner (or better scanner driver -- probably TWAIN compliant) is
that OCR programs understand how to thread these various page images together
into a FLOW of text.  If you just work with individual .bmp files, YOU will
have to connect the flows after OCR'ing them.

If this project is really worth doing via scanning/ocr, it is probably only
worth doing it right.  With a fully functional scanner, TWAIN compliant
scanner driver, and a good OCR program.

Jay

-- 
Jay Smith

e-mail: jay@jaysmith.com

The Press for History(tm), The Press for Education(tm), 
The Press for [Your Industry](tm), The Press for....(tm)
  On-demand printing and binding of hardbound books.
  Minimum run one copy.

P.O. Box 650
Snow Camp, NC  27349  USA

Phone: Int+US+336-376-9991
Toll-Free Phone in US & Canada:
        1-800-447-8267
Fax: Int+US+336-376-6750



EMMY_ARICIOGLU@hp-roseville-om3.om.hp.com wrote:
> 
>      Howdy Everyone,
> 
>      Through research we can find older material produced by our company
>      that is still useful to us, but the original files no longer exist. We
>      would like to be able to scan the old pages, then edit the text into a
>      new doc. The scanner we are using is only able to produce bmp and pdf
>      files. But we can't seem to manipulate these in any way. We can import
>      the images into Word and Frame, but what we want is the actual text so
>      that we can update and reuse it.
> 
>      A utility that would convert scanned text into Word would be fine. The
>      engineers could use it and the writers could import Word into Frame.
> 
>      Can anyone recommend software that would do the job for us?
> 
>      TIA,
>      Emmy
>      emmy_aricioglu@hp.com
>

** To unsubscribe, send a message to majordomo@omsys.com **
** with "unsubscribe framers" (no quotes) in the body.   **