[aklug] Re: We're going to let it go, thanks for responses - Re: Anyone interested in a job importing 24k printed emails in Juneau/Anchorage into a database?

From: Mark Neyhart <Mark_Neyhart@legis.state.ak.us>
Date: Thu Jun 09 2011 - 11:48:27 AKDT

What scan resolution did you find to be effective? Your gray scale
finding is interesting. I had been operating under the assumption
that 2 bit black and white would be better.

Mark

Jim Gribbin wrote:
> I would find it interesting to assist in this effort, but am otherwise
> committed (maybe that's "should be" committed).
>
> I have been playing around with ocr conversions for my real estate work.
> Thus far I have found gscan2pdf w/ Tesseract OCR conversion to be the
> most workable.
>
> Tesseract's accuracy far exceeded gocr for accuracy and will also work
> command line. So even though gscan2pdf is a gui tool, you can do
> everything command line as well.
>
> I spent a lot of time beating my head against walls before figuring out
> that gray scale is significantly easier to run OCR against than 2-bit
> b/w.
>
> Jim G
>
> On Wed, 2011-06-08 at 16:16 -0800, Jason McEachen wrote:
>> Apparently every news outlet and their freakin' cousin are already there
>> or soon going to be for both electronic and visual scanning, so we're
>> going to save our money and just send some reporters with buckets of
>> Alaska history down there to spot-check and listen to what everyone else
>> is doing.
>>
>> Thanks for the suggestions, thoughts, recommendations and entertaining
>> responses
>>
>> --Jason
>>
>> On 06/08/2011 03:41 PM, Mark Neyhart wrote:
>>> This was my first question as well...
>>>
>>> Does anybody know of a linux OCR tool which can convert images to
>>> text? I've found references to Tesseract, but am not sure if it is
>>> active. I've got a bunch of pages which have been scanned to PDF, and
>>> would like to be able to make them searchable.
>>>
>>> Mark Neyhart
>>>
>>> Joshua J. Kugler wrote:
>>>> First question: WHY ON EARTH are they printed? Why can't they give them
>>>> to you on a CD or DVD?
>>>>
>>>> j
>>>>
>>>> On Wednesday 08 June 2011, Jason McEachen elucidated thus:
>>>>> This Friday at 9am the State of Alaska is going to have a couple
>>>>> boxes of printed emails in Juneau for me to have, and a hand truck to
>>>>> help carry them. We could also pick them up at the Anchorage Airport
>>>>> at 3pm.
>>>>>
>>>>> What I'd like to do is somehow import them into a database and set up
>>>>> a quick and easy web-based interface to allow searches.
>>>>>
>>>>> Thanks for your help,
>>>>>
>>>>> --Jason
>>> ---------
>>> To unsubscribe, send email to<aklug-request@aklug.org>
>>> with 'unsubscribe' in the message body.
>>>
>> ---------
>> To unsubscribe, send email to <aklug-request@aklug.org>
>> with 'unsubscribe' in the message body.
>>
>
> ---------
> To unsubscribe, send email to <aklug-request@aklug.org>
> with 'unsubscribe' in the message body.
>

---------
To unsubscribe, send email to <aklug-request@aklug.org>
with 'unsubscribe' in the message body.
Received on Thu Jun 9 11:48:35 2011

This archive was generated by hypermail 2.1.8 : Thu Jun 09 2011 - 11:48:35 AKDT