Re: extracting data from html files


Subject: Re: extracting data from html files
From: Christopher Swingley (cswingle@iarc.uaf.edu)
Date: Fri Jun 28 2002 - 11:46:16 AKDT


Bob,

* Bob Crosby <rcrosby@alaska.net> [2002-Jun-28 10:58 AKDT]:
> That works great. It returns just what I asked for. Now I'm wondering if
> sed can be used to do even more sophisticated editing? For example, given
> a bunch of files, each containing multiple blocks of text like the following:
>
> <TD width="300" valign="top"><FONT class=Price>$48.00</FONT><BR>
> <a href="JavaScript: funcname('../doit.asp', '', '', '',
> '100');">Widgetname</a><BR>
> Product_name<BR>
> Product_ID<BR>
>
> could I generate a csv file consisting of lines containing
> Price,Widgetname,Product_name,Product_ID?

sed and grep are stream processors that operate line by line. To do
what you are interested in (grabbing information from multiple lines),
you'll probably find Perl to be a much more useful tool. It's regular
expression parser is also line oriented, but you can change the
definition of a "line" such that it can match and extract data from more
than one line at a time.

Once you start using Perl's regular expression syntax, you'll wonder how
you did without it.

Chris

-- 
Christopher S. Swingley           phone: 907-474-2689
Computer Systems Manager          email: cswingle@iarc.uaf.edu
IARC -- Frontier Program          GPG and PGP keys at my web page:
University of Alaska Fairbanks    www.frontier.iarc.uaf.edu/~cswingle

--------- To unsubscribe, send email to <aklug-request@aklug.org> with 'unsubscribe' in the message body.



This archive was generated by hypermail 2a23 : Fri Jun 28 2002 - 11:45:52 AKDT