[aklug] Re: We're going to let it go, thanks for responses - Re: Anyone interested in a job importing 24k printed emails in Juneau/Anchorage into a database?

From: Jim Gribbin <jimgribbin@gmail.com>
Date: Fri Jun 10 2011 - 13:19:36 AKDT

300 bit seems to be plenty. I also labored for a long time under the
misconception that 2 bit would be better as well. Then I attended a
presentation from a Windows centric user's group that used to be around
town.

Jim G

On Thu, 2011-06-09 at 11:48 -0800, Mark Neyhart wrote:
> What scan resolution did you find to be effective? Your gray scale
> finding is interesting. I had been operating under the assumption
> that 2 bit black and white would be better.
>
> Mark
>
> Jim Gribbin wrote:
> > I would find it interesting to assist in this effort, but am otherwise
> > committed (maybe that's "should be" committed).
> >
> > I have been playing around with ocr conversions for my real estate work.
> > Thus far I have found gscan2pdf w/ Tesseract OCR conversion to be the
> > most workable.
> >
> > Tesseract's accuracy far exceeded gocr for accuracy and will also work
> > command line. So even though gscan2pdf is a gui tool, you can do
> > everything command line as well.
> >
> > I spent a lot of time beating my head against walls before figuring out
> > that gray scale is significantly easier to run OCR against than 2-bit
> > b/w.
> >
> > Jim G
> >
> > On Wed, 2011-06-08 at 16:16 -0800, Jason McEachen wrote:
> >> Apparently every news outlet and their freakin' cousin are already there
> >> or soon going to be for both electronic and visual scanning, so we're
> >> going to save our money and just send some reporters with buckets of
> >> Alaska history down there to spot-check and listen to what everyone else
> >> is doing.
> >>
> >> Thanks for the suggestions, thoughts, recommendations and entertaining
> >> responses
> >>
> >> --Jason
> >>
> >> On 06/08/2011 03:41 PM, Mark Neyhart wrote:
> >>> This was my first question as well...
> >>>
> >>> Does anybody know of a linux OCR tool which can convert images to
> >>> text? I've found references to Tesseract, but am not sure if it is
> >>> active. I've got a bunch of pages which have been scanned to PDF, and
> >>> would like to be able to make them searchable.
> >>>
> >>> Mark Neyhart
> >>>
> >>> Joshua J. Kugler wrote:
> >>>> First question: WHY ON EARTH are they printed? Why can't they give them
> >>>> to you on a CD or DVD?
> >>>>
> >>>> j
> >>>>
> >>>> On Wednesday 08 June 2011, Jason McEachen elucidated thus:
> >>>>> This Friday at 9am the State of Alaska is going to have a couple
> >>>>> boxes of printed emails in Juneau for me to have, and a hand truck to
> >>>>> help carry them. We could also pick them up at the Anchorage Airport
> >>>>> at 3pm.
> >>>>>
> >>>>> What I'd like to do is somehow import them into a database and set up
> >>>>> a quick and easy web-based interface to allow searches.
> >>>>>
> >>>>> Thanks for your help,
> >>>>>
> >>>>> --Jason
> >>> ---------
> >>> To unsubscribe, send email to<aklug-request@aklug.org>
> >>> with 'unsubscribe' in the message body.
> >>>
> >> ---------
> >> To unsubscribe, send email to <aklug-request@aklug.org>
> >> with 'unsubscribe' in the message body.
> >>
> >
> > ---------
> > To unsubscribe, send email to <aklug-request@aklug.org>
> > with 'unsubscribe' in the message body.
> >
>
> ---------
> To unsubscribe, send email to <aklug-request@aklug.org>
> with 'unsubscribe' in the message body.
>

---------
To unsubscribe, send email to <aklug-request@aklug.org>
with 'unsubscribe' in the message body.
Received on Fri Jun 10 13:19:50 2011

This archive was generated by hypermail 2.1.8 : Fri Jun 10 2011 - 13:19:51 AKDT