[aklug] Re: We're going to let it go, thanks for responses - Re: Anyone interested in a job importing 24k printed emails in Juneau/Anchorage into a database?

From: Jim Gribbin <jimgribbin@gmail.com>
Date: Wed Jun 08 2011 - 22:16:09 AKDT

I would find it interesting to assist in this effort, but am otherwise
committed (maybe that's "should be" committed).

I have been playing around with ocr conversions for my real estate work.
Thus far I have found gscan2pdf w/ Tesseract OCR conversion to be the
most workable.

Tesseract's accuracy far exceeded gocr for accuracy and will also work
command line. So even though gscan2pdf is a gui tool, you can do
everything command line as well.

I spent a lot of time beating my head against walls before figuring out
that gray scale is significantly easier to run OCR against than 2-bit
b/w.

Jim G

On Wed, 2011-06-08 at 16:16 -0800, Jason McEachen wrote:
> Apparently every news outlet and their freakin' cousin are already there
> or soon going to be for both electronic and visual scanning, so we're
> going to save our money and just send some reporters with buckets of
> Alaska history down there to spot-check and listen to what everyone else
> is doing.
>
> Thanks for the suggestions, thoughts, recommendations and entertaining
> responses
>
> --Jason
>
> On 06/08/2011 03:41 PM, Mark Neyhart wrote:
> > This was my first question as well...
> >
> > Does anybody know of a linux OCR tool which can convert images to
> > text? I've found references to Tesseract, but am not sure if it is
> > active. I've got a bunch of pages which have been scanned to PDF, and
> > would like to be able to make them searchable.
> >
> > Mark Neyhart
> >
> > Joshua J. Kugler wrote:
> >> First question: WHY ON EARTH are they printed? Why can't they give them
> >> to you on a CD or DVD?
> >>
> >> j
> >>
> >> On Wednesday 08 June 2011, Jason McEachen elucidated thus:
> >>> This Friday at 9am the State of Alaska is going to have a couple
> >>> boxes of printed emails in Juneau for me to have, and a hand truck to
> >>> help carry them. We could also pick them up at the Anchorage Airport
> >>> at 3pm.
> >>>
> >>> What I'd like to do is somehow import them into a database and set up
> >>> a quick and easy web-based interface to allow searches.
> >>>
> >>> Thanks for your help,
> >>>
> >>> --Jason
> >
> > ---------
> > To unsubscribe, send email to<aklug-request@aklug.org>
> > with 'unsubscribe' in the message body.
> >
>
> ---------
> To unsubscribe, send email to <aklug-request@aklug.org>
> with 'unsubscribe' in the message body.
>

---------
To unsubscribe, send email to <aklug-request@aklug.org>
with 'unsubscribe' in the message body.
Received on Wed Jun 8 22:16:21 2011

This archive was generated by hypermail 2.1.8 : Wed Jun 08 2011 - 22:16:21 AKDT