Regular expressions


Subject: Regular expressions
From: Barsalou (mbarsalou@aidea.org)
Date: Tue Mar 26 2002 - 22:04:55 AKST


This article came out of Tech Republic.

INTRODUCING REGULAR EXPRESSIONS

Regular expressions provide a means for matching strings and characters
in order to obtain only the desired information (for example, from a
large input list). Also called regexps, regular expressions can be very
powerful and complex; however, they don't need to be.

For the most part, regexps are universal, but there are differences
between implementations. For instance, regexps used on the command line or
with grep may be slightly different than those used by Perl. The following
are some of the character sequences used in regexps:

* . Matches any single character

* ^ Matches the empty string at the beginning of a line

* $ Matches the empty string at the end of a line

* \< Matches the empty string at the beginning of a word

* \> Matches the empty string at the end of a word

* ? Matches the preceding item at most once

* * Matches the preceding item zero or more times

* + Matches the preceding item at least once

* [x] Matches any character in the brackets

If, for example, you had a list of files and wanted to find only those
files starting with an alphabetical character and ending in ".bz2" or
".gz", you might use:

# grep -E '\<[a-z].*\.(bz2|gz)\>' myfile

This executes grep against the file "myfile" with extended regexp
options enabled (the -E command). The regexp here is:

\<[a-z].*\.(bz2|gz)\>

This tells grep to look for a word that starts (\<) with an alphabetical
character ([a-z]) and is repeated any number of times (.*) until a
period is encountered and then the string "bz2" OR "gz" is matched
(\.(bz2|gz)) at the end of the word (\>). You'll also notice that, because the
period character is also a regexp, the period character is 'escaped' with a
backslash so that it is used literally.



This archive was generated by hypermail 2a23 : Tue Mar 26 2002 - 22:04:55 AKST