Re: extracting data from html files

Subject: Re: extracting data from html files
From: Christopher Swingley (cswingle@iarc.uaf.edu)
Date: Fri Jun 28 2002 - 11:46:16 AKDT

Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Next message: James Zuelow: "RE: apache sec hole.."
Previous message: FeLoNiouS MoNK: "Re: apache sec hole.."
In reply to: Bob Crosby: "Re: extracting data from html files"
Next in thread: Michael Fowler: "Re: extracting data from html files"
Reply: Christopher Swingley: "Re: extracting data from html files"
Reply: Christopher Swingley: "Re: extracting data from html files"

Bob,

* Bob Crosby <rcrosby@alaska.net> [2002-Jun-28 10:58 AKDT]:
> That works great. It returns just what I asked for. Now I'm wondering if
> sed can be used to do even more sophisticated editing? For example, given
> a bunch of files, each containing multiple blocks of text like the following:
>
> <TD width="300" valign="top">$48.00 
> <a href="JavaScript: funcname('../doit.asp', '', '', '',
> '100');">Widgetname</a> 
> Product_name 
> Product_ID 
>
> could I generate a csv file consisting of lines containing
> Price,Widgetname,Product_name,Product_ID?

sed and grep are stream processors that operate line by line. To do
what you are interested in (grabbing information from multiple lines),
you'll probably find Perl to be a much more useful tool. It's regular
expression parser is also line oriented, but you can change the
definition of a "line" such that it can match and extract data from more
than one line at a time.

Once you start using Perl's regular expression syntax, you'll wonder how
you did without it.

Chris

-- Christopher S. Swingley phone: 907-474-2689 Computer Systems Manager email: cswingle@iarc.uaf.edu IARC -- Frontier Program GPG and PGP keys at my web page: University of Alaska Fairbanks www.frontier.iarc.uaf.edu/~cswingle

--------- To unsubscribe, send email to <aklug-request@aklug.org> with 'unsubscribe' in the message body.

Next message: James Zuelow: "RE: apache sec hole.."
Previous message: FeLoNiouS MoNK: "Re: apache sec hole.."
In reply to: Bob Crosby: "Re: extracting data from html files"
Next in thread: Michael Fowler: "Re: extracting data from html files"
Reply: Christopher Swingley: "Re: extracting data from html files"
Reply: Christopher Swingley: "Re: extracting data from html files"

This archive was generated by hypermail 2a23 : Fri Jun 28 2002 - 11:45:52 AKDT