Re: bash syntax


Subject: Re: bash syntax
From: James Gibson (twistedhammer@subdimension.com)
Date: Sat Mar 30 2002 - 02:07:10 AKST


Nothing like a good puzzler at 1 in the morning... But it didn't keep me
up all night like I was afraid it would. =)

[Whoops! Spoke too soon. Was up til 3:30 typing this up. =) ]

Fair warning to those who value their sanity: This is, if anything, LESS
comprehensible than Bryan's original post. Don't blame me if you start
rambling about bash to your drinking buddies or dreaming in shell-script..

That being said, I have tried to make this as plain as possible, with
perhaps a touch of humour.

Oh.. and the [#] marks are references to the footnotes. (Yes. This is THAT
long...)

On Fri, 29 Mar 2002 bryan@ak.net wrote:
>
> I've got a question only a bash guru can answer. Fair warning --
> this is rather esoteric, so most of you can just 'delete' now.

I guess this makes me a bash guru then.. =) Odd.. when did THAT happen?!

> There's a command I like to use as a filter to escape single
> quote marks from shell interpretation. I use it so infrequently
> that each time I want it, I have to spend 5-10 minutes getting
> all the characters in the right place. So I decided I'd put it
> in an alias, or a simple shell script, to always have it available.
>
> First, I tried to alias the word quote to exactly the following:
> sed "s/'/\\\'/g"
> I tried a couple ways of putting the above in quotes, to make it
> a single string for alias. The best way seemed to be to use double
> quotes around the whole thing, and \ escape the existing two.
> Try it. Where the hell are those extra quotes coming from?
>
> Since that didn't work, I used a shell script, and it works nicely.
> I'm only asking now out of curiosity. I have one more question --
> in the original string, why is it neccessary to \ escape the second
> quote, but not the first one?
>
> I'm using bash 2.03.0(1), btw. If anyone's still reading at this
> point, I challenge you to make sense out of it. :)

I puzzled over this one for an agonising 20 minutes.. fingers flying over
the keyboard, I arrived at the answer!

Puzzle #1: 'Where did'' those quotes'' come from?!'

For those who walked in late the puzzle is this:

bash$ alias quote="sed \"s/'/\\\'/g\""
bash$ alias -p
alias quote='sed "s/'\''/\\'\''/g"'

so.. where did all the quotes come from?

The parser has un-nested the quotes.. spacing things out will make this
clearer.. I've added " _ " between the separate bits of the string:

alias quote='sed "s/' _ \' _ '/\\' _ \' _ '/g"'

Notice that there are no more messy nested single quotes; The string could
be safely truncated after any of the '' pairs or the \'s without leaving
any dangling quotes.. also notice that the more malleable double-quotes are
held safely within single-quotes.

Puzzle #2: Why doesn't this alias work right?

The easiest way to demonstrate this is to force bash to apply the
expansions that it does at run-time before hand so we can watch.. so,
assuming that we have our quote alias set as per the above example we do
this:

bash$ quote
 
Now press 'Ctrl-Alt-e'. This causes bash to perform almost all of it's
expansion on the current input line. So our line is magically transformed
into:

bash$ sed s/'/\'/g

Now it gets kind of hairy here. What you see is what gets passed, without
further expansion, to the interpretive portion of bash at which point
quotes are just another character [1]. However, sed does it's own
expansion, but only of backslash-escaped characters [2].. so the command
that sed finally ends up running looks like this:

s/'/'/g

which simply replaces single quotes with single quotes... DOH!

What we really need here is an additional escaped backslash in that final
command. This is easier to add in to the un-nested code than the nested
version.. So THIS alias command does what you want it to do:

bash$ alias quote2='sed "s/'\''/\\\\'\''/g"'
bash$ quote2
which transforms to:
bash$ sed s/'/\\\'/g

and performs as expected.

Puzzle #3 Why does only one ' need to be escaped?

I'll admit this one took me a while to figure out..
It has everything to do with sed and basically nothing to do with bash,
but I'll elaborate anyway. We are specifically looking at sed's "s"
command, and the man-page for sed shows this as the syntax:

s/*regexp*/*replacement*/

Basically what this says is that the first argument (everything between
the first and second slashes) is expected to be a regular expression, and
the second argument (everything between the second and third slashes) is
expected to be a string to replace matches to the first argument with
[3].

The trick is that the regexp and the string are treated completely
differently. In a regexp the \ character should only be used to escape
the smaller subset of the regexp's special characters, while in the string
the \ can escape anything. In fact you can remove one of the backslashes I
added into the alias above. That is so say that both of the following have
the same effect:

alias quote3='sed "s/'\''/\\\\'\''/g"'
alias quote4='sed "s/'\''/\\\'\''/g"'

which produce the following:

sed s/'/\\'/g

as you can see, neither ' is escaped at this point.

Puzzle #4 James's quandary

This leaves me with a puzzle of my own.. In playing with this I
found that

bash$ sed "s/\'/end/g"

will append the word 'end' to the end of every line.. which means that
"\'" as a regexp matches the end of the line. I can't come up with any
reason for this.. logical or otherwise. I will double check the man pages
in the morning, but I'm afraid I may have to start combing through
source-code to figure this one out.

Footnotes:
[1] While this is, in fact, the string that would have been passed
verbatim to the part that runs commands, pressing enter now will NOT have
the same effect. The reason is that although we just performed the
expansion stuff, bash will perform it AGAIN if you hit enter..

[2] This is not, strictly speaking, true. the & character is expanded, and
perhaps several others. But it's close enough to understand what we are
dealing with.

[3] The advanced student of the sed Dojo knows that the delimiting
characters need not be slashes. s#'#\\'#g and sA'A\\'Ag are equally
acceptable ways of writing the same command. Slashes are just the general
use defaults, but hashes are common when manipulating strings or regexps
with many slashes in then (e.g. file and directory listings)

++----------------

I hope this was informative, or at least mildly entertaining. And if
anyone figures out that \' regexp thing before I do you will have my
eternal gratitude..

Sincerely,
James Gibson



This archive was generated by hypermail 2a23 : Sat Mar 30 2002 - 02:30:09 AKST