Distclean part 2: some useful zsh tricks (Shallow Thoughts)

Akkana's Musings on Open Source Computing and Technology, Science, and Nature.

Fri, 04 Dec 2015

Distclean part 2: some useful zsh tricks

I wrote recently about a zsh shell function to run make distclean on a source tree even if something in autoconf is messed up. In order to save any arguments you've previously passed to configure or autogen.sh, my function parsed the arguments from a file called config.log.

But it might be a bit more reliable to use config.status -- I'm guessing this is the file that make uses when it finds it needs to re-run autogen.sh. However, the syntax in that file is more complicated, and parsing it taught me some useful zsh tricks.

I can see the relevant line from config.status like this:

$ grep '^ac_cs_config' config.status
ac_cs_config="'--prefix=/usr/local/gimp-git' '--enable-foo' '--disable-bar'"

--enable-foo --disable-bar are options I added purely for testing. I wanted to make sure my shell function would work with multiple arguments.

Ultimately, I want my shell function to call autogen.sh --prefix=/usr/local/gimp-git --enable-foo --disable-bar The goal is to end up with $args being a zsh array containing those three arguments. So I'll need to edit out those quotes and split the line into an array.

Sed tricks

The first thing to do is to get rid of that initial ac_cs_config= in the line from config.status. That's easy with sed:

$ grep '^ac_cs_config' config.status | sed -e 's/ac_cs_config=//'
"'--prefix=/usr/local/gimp-git' '--enable-foo' '--disable-bar'"

But since we're using sed anyway, there's no need to use grep to get the line: we can do it all with sed. First try:

sed -n '/^ac_cs_config/s/ac_cs_config=//p' config.status

Search for the line that starts with ac_cs_config (^ matches the beginning of a line); then replace ac_cs_config= with nothing, and p print the resulting line. -n tells sed not to print anything except when told to with a p.

But it turns out that if you give a sed substitution a blank pattern, it uses the last pattern it was given. So a more compact version, using the search pattern ^ac_cs_config, is:

sed -n '/^ac_cs_config=/s///p' config.status

But there's also another way of doing it:

sed '/^ac_cs_config=/!d;s///' config.status

! after a search pattern matches every line that doesn't match the pattern. d deletes those lines. Then for lines that weren't deleted (the one line that does match), do the substitution. Since there's no -n, sed will print all lines that weren't deleted.

I find that version more difficult to read. But I'm including it because it's useful to know how to chain several commands in sed, and how to use ! to search for lines that don't match a pattern.

You can also use sed to eliminate the double quotes:

sed '/^ac_cs_config=/!d;s///;s/"//g' config.status
'--prefix=/usr/local/gimp-git' '--enable-foo' '--disable-bar'
But it turns out that zsh has a better way of doing that.

Zsh parameter substitution

I'm still relatively new to zsh, but I got some great advice on #zsh. The first suggestion:

sed -n '/^ac_cs_config=/s///p' config.status | IFS= read -r; args=( ${(Q)${(z)${(Q)REPLY}}} ); print -rl - $args

I'll be using final print -rl - $args for all these examples: it prints an array variable with one member per line. For the actual distclean function, of course, I'll be passing the variable to autogen.sh, not printing it out.

First, let's look at the heart of that expression: the args=( ${(Q)${(z)${(Q)REPLY}}}.

The heart of this is the expression ${(Q)${(z)${(Q)x}}} The zsh parameter substitution syntax is a bit arcane, but each of the parenthesized letters does some operation on the variable that follows.

The first (Q) strips off a level of quoting. So:

$ x='"Hello world"'; print $x; print ${(Q)x}
"Hello world"
Hello world

(z) splits an expression and stores it in an array. But to see that, we have to use print -l, so array members will be printed on separate lines.

$ x="a b c"; print -l $x; print "....."; print -l ${(z)x}
a b c
.....
a
b
c

Zsh is smart about quotes, so if you have quoted expressions it will group them correctly when assigning array members:

$ 
x="'a a' 'b b' 'c c'"; print -l $x; print "....."; print -l ${(z)x} 'a a' 'b b' 'c c' ..... 'a a' 'b b' 'c c'

So let's break down the larger expression: this is best read from right to left, inner expressions to outer.

${(Q) ${(z) ${(Q) x }}}
   |     |     |   \
   |     |     |    The original expression, 
   |     |     |   "'--prefix=/usr/local/gimp-git' '--enable-foo' '--disable-bar'"
   |     |     \
   |     |      Strip off the double quotes:
   |     |      '--prefix=/usr/local/gimp-git' '--enable-foo' '--disable-bar'
   |     \
   |      Split into an array of three items
   \
    Strip the single quotes from each array member,
    ( --prefix=/usr/local/gimp-git --enable-foo --disable-bar )
Neat!

For more on zsh parameter substitutions, see the Zsh Guide, Chapter 5: Substitutions.

Passing the sed results to the parameter substitution

There's still a little left to wonder about in our expression, sed -n '/^ac_cs_config=/s///p' config.status | IFS= read -r; args=( ${(Q)${(z)${(Q)REPLY}}} ); print -rl - $args

The IFS= read -r seems to be a common idiom in zsh scripting. It takes standard input and assigns it to the variable $REPLY. IFS is the input field separator: you can split variables into words by spaces, newlines, semicolons or any other character you want. IFS= sets it to nothing. But because the input expression -- "'--prefix=/usr/local/gimp-git' '--enable-foo' '--disable-bar'" -- has quotes around it, IFS is ignored anyway.

So you can do the same thing with this simpler expression, to assign the quoted expression to the variable $x. I'll declare it a local variable: that makes no difference when testing it in the shell, but if I call it in a function, I won't have variables like $x and $args cluttering up my shell afterward.

local x=$(sed -n '/^ac_cs_config=/s///p' config.status); local args=( ${(Q)${(z)${(Q)x}}} ); print -rl - $args

That works in the version of zsh I'm running here, 5.1.1. But I've been warned that it's safer to quote the result of $(). Without quotes, if you ever run the function in an older zsh, $x might end up being set only to the first word of the expression. Second, it's a good idea to put "local" in front of the variable; that way, $x won't end up being set once you've returned from the function. So now we have:

local x="$(sed -n '/^ac_cs_config=/s///p' config.status)"; local args=( ${(Q)${(z)${(Q)x}}} ); print -rl - $args

You don't even need to use a local variable. For added brevity (making the function even more difficult to read! -- but we're way past the point of easy readability), you could say:

args=( ${(Q)${(z)${(Q)"$(sed -n '/^ac_cs_config=/s///p' config.status)"}}} ); print -rl - $args
or even
print -rl - ${(Q)${(z)${(Q)"$(sed -n '/^ac_cs_config=/s///p' config.status)"}}}
... but that final version, since it doesn't assign to a variable at all, isn't useful for the function I'm writing.

Tags: , , , ,
[ 13:25 Dec 04, 2015    More linux/cmdline | permalink to this entry | ]

Comments via Disqus:

blog comments powered by Disqus