[aklug] Re: Anyone interested in a job importing 24k printed emails in Juneau/Anchorage into a database?

From: adam bultman <adamb@glaven.org>
Date: Wed Jun 08 2011 - 15:25:07 AKDT

Pro: Knowing what went on via Palin's email before anybody else.
Con: Processing 24,000 pages of printed email into a DB.

On 06/08/2011 02:55 PM, Jason McEachen wrote:
> This Friday at 9am the State of Alaska is going to have a couple boxes
> of printed emails in Juneau for me to have, and a hand truck to help
> carry them. We could also pick them up at the Anchorage Airport at 3pm.
>
> What I'd like to do is somehow import them into a database and set up a
> quick and easy web-based interface to allow searches.
>
> The problem is my first child is coming into this world that morning at
> Providence. My wife doesn't like the idea of me either being in Juneau
> to receive and scan/process these docs, nor me sitting at a machine that
> morning to write up a script to pull scans, parse them, and populate
> some tables.
>
> So we (AlaskaDispatch.com) are possibly interested in hiring someone to
> help us with this project.
>
> If you think this is a neat intellectual exercise, please respond to the
> group with your ideas or suggestions.
>
> If you're interested in doing this professionally (or can recommend
> someone), please contact me directly and let me know how you'd propose
> to do it and what you'd bill.
>
> My first thought is to find someone in Juneau (fedex/kinkos) with a big
> copier/scanner who can convert paper to PDF really quickly (there are,
> after all, ~24 thousand pages) and ftp/sftp them up to a server that's
> already got a nice pdf->text (maybe pdftohtml?) tools and "your favorite
> script language interpreter" to parse them into a table (probably only
> need fields like index, datetime, from, to, cc, bcc, subject, body,
> attachments) and a little web front end waiting for search/display.
>
> Has anyone on the list handled a similar task and can share what worked
> and what didn't?
>
> Thanks for your help,
>
> --Jason
> ---------
> To unsubscribe, send email to<aklug-request@aklug.org>
> with 'unsubscribe' in the message body.
>

-- 
Adam
---------
To unsubscribe, send email to <aklug-request@aklug.org>
with 'unsubscribe' in the message body.
Received on Wed Jun 8 15:25:17 2011

This archive was generated by hypermail 2.1.8 : Wed Jun 08 2011 - 15:25:17 AKDT