In making global replacements, UNIX editors such as vi allow you to search not just for fixed strings of characters, but also for variable patterns of words, referred to as regular expressions.
When you specify a literal string of characters, the search
might turn up other occurrences that you didn't want to match.
The problem with searching for words in a file is that a word
can be used in different ways.
Regular expressions
help you conduct a search for words in context.
Note that regular expressions can be used with the vi search
commands /
and ?
as well as in the ex :g
and :s
commands. For the most part, the same regular
expressions work with other UNIX programs such as grep
,
sed
, and awk
.
Regular expressions are made up by combining normal characters with a number of special characters called metacharacters. The meta\%characters and their uses are listed below.
.
Matches any single character except a newline (carriage
return).
Remember that spaces are treated as characters.
For example, p.p
matches character strings such as
pep, pip, pcp.
*
Matches any number (or none) of the single character
that immediately precedes it. For example, bugs*
will
match bugs (one s) or bug (no s's).
The character preceding the * can be one that is
specified by a regular expression.
For example, since . (dot) means any character,
.*
means "match any number of any character."
Here's a specific
example of this. The command :s/End.*/End/
removes
all characters after End (it replaces the remainder of the
line with nothing).
^
Requires that the following regular expression be found at the beginning of
the line; for example, ^Part
matches
Part when it occurs at the beginning of a line, and ^...
matches the first three characters of a line.
$
Requires that the preceding regular expression be found at the end
of the line; for example, here:$
.
\
Treats the following special character as an ordinary character.
For example,
\.
matches an actual period instead of "any single
character," and \*
matches an actual asterisk instead of
"any number of a character." The \ (backslash)
prevents the interpretation of a special character.
This prevention is called "escaping the character."
[ ]
Matches any one of the characters enclosed between the brackets.
For example,
[AB]
matches either
A
or
B,
and
p[aeiou]t
matches
pat, pet, pit, pot, or put.
A range of consecutive characters can be specified by separating
the first and last characters in the range with a hyphen.
For example, [A-Z]
will match any uppercase
letter from A to Z, and [0-9]
will match any
digit from 0 to 9.
You can include more than one
range inside brackets, and you can specify a mix of ranges and
separate characters. For example, [:;A-Za-z()
]
will match four different punctuation marks, plus all letters.
Most metacharacters lose their special meaning inside brackets,
so you don't need to escape them if you want to use them as
ordinary characters. Within brackets, the three metacharacters
you still need to escape
are \
-
]
. (The hyphen (-
)
acquires meaning as a range specifier; to use an actual hyphen,
you can also place it as the the first character inside the
brackets.)
A caret (^
) has special meaning only when it is the
first character inside the brackets, but in this case the meaning
differs from that of the normal ^
metacharacter.
As the first character within brackets, a ^
reverses their sense: the brackets
will match any one character not in the list. For example,
[^a-z]
matches any character that is not a lowercase letter.
Saves the pattern enclosed between \(
and \)
into a special holding space or "hold buffer."
Up to nine patterns can be saved in this way on a single line.
For example, the pattern:
\(That\) or \(this\)
saves That in hold buffer number 1 and
saves this in hold buffer number 2.
The patterns held can be "replayed" in substitutions by the sequences
\1
to \9
.
For example, to rephrase That or this to read
this or That, you could enter:
:%s/\(That\) or \(this\)/\2 or \1/
\< \>
Matches characters at the beginning (\<
) or at the end
(\>
) of a word.
The end or beginning of a
word is determined either by a punctuation mark or by a space.
For example, the expression \<ac
will match only words
that begin with ac, such as action.
The expression ac\>
will match only words
that end with ac, such as maniac.
Neither expression will match react.
~
Matches whatever regular expression was used in the last
search. For example, if you searched for The,
you could search for Then
with /~n
.
Note that you can use this pattern only in a regular search
(with /
). It
won't work as the pattern in a substitute command. It does,
however, have a similar
meaning in the replacement portion of a substitute command.
When you make global replacements, the regular expressions above carry their special meaning only within the search portion (the first part) of the command. For example, when you type this:
:%s/1\. Start/2. Next, start with $100/
note that the replacement string
understands the characters .
and $
, without your
having to escape them.
By the same token, let's say you enter:
:%s/[ABC]/[abc]/g
If you're hoping to replace A with a, B with b, and C with c, you're in for a surprise. Since brackets behave like ordinary characters in a replacement string, this command will change every occurrence of A, B, or C to the five-character string [abc].
To solve problems like this, you need a way to specify variable replacement strings. Fortunately, there are additional regular expressions that have special meaning in a replacement string.
Matches the nth pattern previously saved by \( and \), where n is a number from 1 to 9, and previously saved patterns are counted from the left on the line. See the explanation for \( and \) in the previous section.
\
Treats the following special character as an ordinary character. Backslashes are metacharacters in replacement strings as well as in search patterns. To specify a real backslash, type two in a row (\\).
&
Prints the entire search pattern when used in a replacement string. This is useful when you want to avoid retyping text:
:%s/Yazstremski/&, Carl/
The replacement will say Yazstremski, Carl. The &
can
also replace a variable pattern (as specified by a regular
expression). For example, to surround each line from 1 to 10 with
parentheses, type:
:1,10s/.*/(&)/
The search pattern matches the whole line, and the &
"replays" the line, followed by your text.
~
Has a similar meaning as when it is used in a search pattern;
the string found is replaced with the replacement
text specified in the last substitute command. This is useful for
repeating an edit. For example, you could say :s/thier/their/
on
one line and repeat the change on another with :s/thier/~/
.
The search pattern doesn't need to be the same, though. For
example, you could say :s/his/their/
on
one line and repeat the replacement on another with
:s/her/~/
.
\u
or \l
Causes the next character in the replacement string to be changed to uppercase or lowercase, respectively. For example, to change yes, doctor into Yes, Doctor, you could say:
:%s/yes, doctor/\uyes, \udoctor/
This is a pointless example, though, since it's easier
just to type the replacement string with initial caps in the
first place. As with any regular expression, \u
and
\l
are most useful with a variable string. Take, for
example, the command we used earlier:
:%s/\(That\) or \(this\)/\2 or \1/
The result is this or That, but we need to adjust the
cases. We'll use \u
to uppercase the first letter in
this (currently saved in hold buffer 2);
we'll use \l
to lowercase the first letter in
That (currently saved in hold buffer 1):
:s/\(That\) or \(this\)/\u\2 or \l\1/
The result is This or that. (Don't confuse the number one
with the lowercase l
; the one comes after.)
\U
or \L
Similar to \u
or \l
, but all following characters are
converted to uppercase or lowercase until the end of the
replacement string or until \e
or \E
is reached.
If there is no \e
or \E
, all characters of the
replacement text are affected by the \U
or \L
.
For example, to uppercase Fortran, you could say:
:%s/Fortran/\UFortran/
or, using the &
character to repeat the search string:
:%s/Fortran/\U&/
All pattern searches are case-sensitive. That is, a search for the will not find The. You can get around this by specifying both uppercase and lowercase in the pattern:
/[Tt]he
You can also instruct vi to ignore case by typing
:set
ic
.
See Chapter 7, Advanced Editing ,
for additional details.