Subject: RE: Help with sorting another file
From: Troy Melhase (tmelhase@gci.com)
Date: Thu Feb 26 2004 - 12:04:03 AKST
> The neat thing is that sed makes a great data extraction tool
> for XML. For
> functions like this I find using validating XML parsers to be
> a tad overkill.
The reason the original data is encoded as XML is (should be) because the
author thinks the format will vary, and yet wants to provide something that
will not break external apps when the format changes.
Put another way, the person providing the data is saying "here's my stuff,
but don't count on it to always look like this." He or she might add an
attribute somewhere. Maybe add an element, remove an element, or change
it's relative position in the data. He or she expects the way the data is
encoded to change.
(The other possibility is that the data author is using xml only because
it's nifty or because they've over-designed their app. Given the nature of
the data, I would guess they thought about what they're doing.)
So sed is a fine way to solve the problem today. Hopefully, or luckily, the
regexes will be constructed in such a way as to handle the future
modifications to the source format.
The obvious objection to using an xml parser or xslt transformation is "the
sed solution is only one line, but if I have to parse or transform the xml,
that will be many more lines of code, custom tools, and a big pain in the
rear!". Which is all true, of course, but is also an inherent facet of the
source data.
If it's a pure one off, that's different. But to expect the format to
remain constant when the author is using an extensible format is only
setting up your application to break.
Apologies for sounding like a software preacher.
troy
---------
To unsubscribe, send email to <aklug-request@aklug.org>
with 'unsubscribe' in the message body.
This archive was generated by hypermail 2a23 : Thu Feb 26 2004 - 12:04:10 AKST