How can you parse (split, search) a string of text to find the last word, the second column, and so on? There are a lot of different ways. Pick the one that works best for you - or invent another one! (UNIX has slots of ways to work with strings of text.)
The
expr command (45.28)
can grab part of a string with a regular expression.
The example below is from a shell script whose last command-line argument
is a filename.
The two commands below use expr to grab the last argument and all
arguments except the last one.
The "$*"
gives expr a list of all command-line arguments
in a single word.
(Using
"$@"
(44.15)
here wouldn't work because it gives individually quoted arguments.
expr needs all arguments in one word.)
last=`expr "$*" : '.* \(.*\)'` # LAST ARGUMENT first=`expr "$*" : '\(.*\) .*'` # ALL BUT LAST ARGUMENT
Let's look at the regular expression that gets the last word.
The leading part of the expression, .*
, matches as many
characters as it can, followed by a space.
This includes all words up to and including the last space.
After that, the end of the expression, \(.*\)
, matches the
last word.
The regular expression that grabs the first words is the same as the
previous one - but I've moved the \( \)
pair.
Now it grabs all words up to but not including the last space.
The end of the regular expression, .*
, matches the last space
and last word - and expr ignores them.
So the final .*
really isn't needed here (though the space is).
I've included that final .*
because it follows from the first
example.
expr is great when you want to split a string into just two parts.
The .*
also makes expr good for skipping a variable number
of words when you don't know how many words a string will have.
But expr is lousy for getting, say, the fourth word in a string.
And it's almost useless for handling more than one line of text at a time.
awk can split lines into words. But awk has a lot of overhead and can take some time to execute, especially on a busy system. The cut (35.14) and colrm (35.15) commands start more quickly than awk but they can't do as much.
All of those utilities are designed to handle multiple lines of text. You can tell awk to handle a single line with its pattern-matching operators and its NR variable. You can also run those utilities with a single line of text, fed to the standard input through a pipe from echo (8.6). For example, to get the third field from a colon-separated string:
string="this:is:just:a:dummy:string" field3_awk=`echo "$string" | awk -F: '{print $3}'` field3_cut=`echo "$string" | cut -d: -f3`
Let's combine two echo commands. One sends text to awk, cut, or colrm through a pipe; the utility ignores all the text from columns 1-24, then prints columns 25 to the end of the variable text. The outer echo prints The answer is and that answer. Notice that the inner double quotes are escaped with backslashes to keep the Bourne shell from interpreting them before the inner echo runs:
echo "The answer is `echo \"$text\" | awk '{print substr($0,25)}'`" echo "The answer is `echo \"$text\" | cut -c25-`" echo "The answer is `echo \"$text\" | colrm 1 24`"
The Bourne shell
set (44.19)
command can be used to parse a single-line string and
store it in the
command-line parameters (44.15)
"$@"
, $*
, $1
, $2
, and so on.
Then you can also loop through the words with a
for loop (44.16)
and use everything else the shell has for dealing with command-line
parameters.
Also, you can set the
IFS variable (35.21)
to control how the shell splits
the string.
The UNIX sed (34.24) utility is good at parsing input that you may or may not be able to split into words otherwise, at finding a single line of text in a group and outputting it, and many other things. In this example, I want to get the percentage-used of the filesystem mounted on /home. That information is buried in the output of the df (24.9) command. On my system, df output looks like:
%df
Filesystem kbytes used avail capacity Mounted on ... /dev/sd3c 1294854 914230 251139 78% /work /dev/sd4c 597759 534123 3861 99% /home ...
I want the number 99 from the line ending with /home.
The sed address / \/home$/
will find that line (including a
space before the /home makes sure the address doesn't match a
line ending with /something/home).
The -n option keeps sed from printing any lines except
the line we ask it to print (with its p command).
I know that the "capacity" is the only word on the line that ends
with a percent sign (%
).
A space after the first .*
makes sure that .*
doesn't
"eat" the first digit of the number that we want to match by [0-9]
.
The sed
escaped-parenthesis operators (34.10)
grab that number.
Here goes:
usage=`df | sed -n '/ \/home$/s/.* \([0-9][0-9]*\)%.*/\1/p'`
Combining sed with eval (8.10) lets you set several shell variables at once from parts of the same line. Here's a command line that sets two shell variables from the df output:
eval `df | sed -n '/ \/home$/s/^[^ ]* *\([0-9]*\) *\([0-9]*\).*/kb=\1 u=\2/p'`
The left-hand side of that substitution command has a regular expression that uses sed's escaped parenthesis operators. They grab the "kbytes" and "used" columns from the df output. The right-hand side outputs the two df values with Bourne shell variable-assignment commands to set the kb and u variables. After sed finishes, the resulting command line looks like this:
eval kb=597759 u=534123
Now $kb
will give you 597759
and $u
contains 534123.
-