The UNIX formatter nroff produces output for line printers and CRT displays. To achieve such special effects as emboldening, it outputs the character followed by a backspace and then outputs the same character again. A sample of it viewed with a text editor or cat -v (25.7) might look like:
N^HN^HN^HNA^HA^HA^HAM^HM^HM^HME^HE^HE^HE
which emboldens the word "NAME." There are three overstrikes for each character output. Similarly, underlining is achieved by outputting an underscore, a backspace, and then the character to be underlined. Some pagers, such as less (25.4), take advantage of overstruck text. But there are many times when it's necessary to strip these special effects; for example, if you want to grep through formatted man pages (as we do in article 50.3). There are a number of ways to get rid of these decorations. The easiest way to do it is to use a utility like col, colcrt, or ul:
%col -b <
nroffoutput
>
strippedoutput
The -b option tells col to strip all backspaces (and the
character preceding the backspace) from the file. col doesn't
read from files; you need to redirect input from a pipe-or, as above,
with the shell
<
(13.1)
file-redirection character.
col is available on System V and BSD UNIX.
Under System V, add the -x option to avoid changing
spaces to TABs.
With colcrt, use a command like:
%colcrt -
nroffoutput
>
strippedoutput
The - (dash) option (yes, that's an option) says "ignore underlining." If you omit it, colcrt tries to save underlining by putting the underscores on a separate line. For example:
Refer to Installing System V for information about ---------- ------ - installing optional software.
colcrt is only available under BSD; in any case, col is probably preferable.
ul reads your TERM environment variable, and tries to translate backspace (underline and overstrike) into something your terminal can understand. It's used like this:
%ul
nroffoutput
The -t
term
option lets you specify a terminal type; it
overrides the
TERM (5.10)
variable.
I think that ul is probably the least useful of these commands;
it tries to be too intelligent, and doesn't always do what you want.
Both col and colcrt attempt to handle "half linefeeds" (used to print superscripts and subscripts) reasonably. Many printers handle half linefeeds correctly, but most terminals can't deal with them.
Here's one other solution to the problem: a simple sed (34.24) script. The virtue of this solution is that you can elaborate on it, adding other features that you'd like, or integrating it into larger sed scripts. The following sed command removes the sequences for emboldening and underscoring:
s/.^H//g
It removes any character preceding the backspace along with the
backspace itself.
In the case of underlining, "." matches the underscore; for emboldening,
it matches the overstrike character.
Because it is applied repeatedly, multiple occurrences of the overstrike
character are removed, leaving a single character for each sequence.
Note that ^H
is the single character CTRL-h.
If you're a
vi user, enter this character by typing
CTRL-v followed by CTRL-h (31.6).
If you're an emacs user, type
CTRL-q followed by CTRL-h (32.10).
-
,