Re: Help with sorting another file


Subject: Re: Help with sorting another file
From: David J. Weller-Fahy (dave-lists-aklug@weller-fahy.com)
Date: Thu Feb 26 2004 - 07:19:25 AKST


* jonr@destar.net <jonr@destar.net> [2004-02-25 22:55 -0900]:
> > cat $file | sed -n '/^<programme start=.*stop=.*/,/<\/programme>/p' > ${file}.new
>
> cat TV.xml1 | sed -n '/^<programme start=.*stop=.*/,/<\/programme>/p' > TV.xml.new
> (This is the way I run it)
>
> Ok, so I am studying this line to delete any line between <programme
> start=* and </programme> that does not have a stop= in it. When this
> is ran it outputs to a file named TV.xml.new but the file is empty.
> Here is what I understand so far:
>
> cat reads the file into ?memory? or a ?buffer? it then pipes this into
> sed.

Right, cat is reading the file and printing it to 'standard output',
which is then piped (using the '|' symbol) to the next program. You
could probably accomplish the same thing by using the following command
line:

sed -n -e '/^<programme start=".*AKST" stop=".*AKST".*>$/,/<\/programme>/p' testit > testit2

> The first '/ is the beginning of the encapsulation of the regexp that
> is to be evaluated.

Actually, the first ' marks the beginning of the string that will
represent a sed program.

The / immediately following the ' indicates the beginning of the first
pattern.

> The . at the end of <programme start=. matches any character that comes
> after it until a new modifier is found. (I think)
>
> The * before and after *stop=.* is to match all characters on a line.

The . (dot) matches any character (but only once if it's not followed by
a modifier), the * tells sed to match the previous character (in this
case the dot, or "any character") any amount of times (so it could match
0 times, or one million, or any in between).

> The rest I can't quite figure out, I think the next / after the stop
> closes the evaluation of the line.

Right, this particular flavour of sed expression works like this: The
first /[regular expression goes here]/ tells sed to start reading input
into the pattern space, and to keep doing that until the next pattern is
matched. The comma separates the patterns.

The second /[regular expression goes here]/ is the pattern after which
sed stops reading lines into the pattern space.

> The forward slash inside the <\/programme> I have no idea what this means.

The forward slash makes sure the the back slash isn't interpreted by sed
as the beginning or end of a pattern. ;]

> The p I also have no idea what this does.

That prints the pattern space.

> Where in all of this is it being told to delete and how does it know
> to just delete the line it is evaluating if the stop= isn't in there
> but not to if it is? Is this what the p does at the end of the line?
> And what is the significance of the back slash inside the
> <\/programme>?

When sed encounters the first pattern in the above command line it
starts reading data, when it encounters the second pattern it stops
reading data, and prints all the data that it has read so far to
standard input (which is redirected to a file). It never starts reading
the data for the entries that don't have a "stop" in them, so those
items don't get written to the new file.

Got to get to work, now.

Hope that helps...

Regards,

-- 
dave [ please don't CC me ]
---------
To unsubscribe, send email to <aklug-request@aklug.org>
with 'unsubscribe' in the message body.



This archive was generated by hypermail 2a23 : Thu Feb 26 2004 - 07:19:27 AKST