[Chapter 45] 45.22 Handling Files Line-by-Line

45.22 Handling Files Line-by-Line

It isn't easy to see how to read a file line-by-line in a shell script. And while you can write a file line-by-line by using the file-appending operator >> (two right angle brackets) with each command that should add to the file, there's a more efficient way to do that as well.

The trick is to open the file and associate a file descriptor number (3, 4, ..., 9) with it. UNIX keeps a file pointer, like a bookmark in a book, that tells it where the next read or write should be in each open file. For example, if you open a file for reading and read the first line, the file pointer will stay at the start of the second line. The next read from that same open file will move the pointer to the start of the third line. This trick only works with files that stay open; each time you open a file, the file pointer is set to the start of the file. [1] The Bourne shell exec command (45.7) can open a file and associate a file descriptor with it. For example, this exec command makes the standard input of all following commands come from the file formfile:

[1] The file-appending operator >> sets the pointer to the end of the file before the first write.

...all commands read their stdin from default place
exec < formfile
   ...all commands will read their stdin from formfile

There's another way to rearrange file descriptors: by doing it at the last line of while loops, if and case statements. For example, all commands in the while loop below will take their standard inputs from the file formfile. The standard input outside the while loop isn't changed:

...all commands read their stdin from default place
while ...
do
   ...all commands will read their stdin from formfile
done < formfile
   ...all commands read their stdin from default place

I call those "redirected-I/O loops." Those and other Bourne shell structures have some problems (45.23), but they're usually worth the work to solve.

We'll use all that to make a shell script for filling in forms. The script, formprog, reads an empty form file like this one, line by line:

Name:
Address:
City:
State/Province:
Phone:
FAX: 
Project: Corporate Decision
Comments:

If a line has just a label, like Name:, the script will prompt you to fill it in. If you do, the script will add the completed line to an output file; otherwise, no output line is written. If a form line is already completed, like:

Project: Corporate Decision

the script doesn't prompt you; it just writes the line to the output file:

% formprog formfile completed
Name: Jerry Peek
Address: 123 Craigie St.
City: Cambridge
State/Province: MA
Phone: (617)456-7890
FAX: 
Project: Corporate Decision
Comments: 
% cat completed
Name: Jerry Peek
Address: 123 Craigie St.
City: Cambridge
State/Province: MA
Phone: (617)456-7890
Project: Corporate Decision

Here's the formprog script. The line numbers are for reference only; don't type them into the file. There's more explanation after the script:


 1  #!/bin/sh
 2  # formprog - fill in template form from $1, leave completed form in $2
 3  # TABSTOPS ARE SET AT 4 IN THIS SCRIPT
 4  
 5  template="$1"   completed="$2"   errors=/tmp/formprog$$
 6  myname=`basename $0`    # BASENAME OF THIS SCRIPT (NO LEADING PATH)
 7  trap 'rm -f $errors; exit' 0 1 2 15
 8  
 9  # READ $template LINE-BY-LINE, WRITE COMPLETED LINES TO $completed:
10  exec 4<&0   # SAVE ORIGINAL stdin (USUALLY TTY) AS FD 4
11  while read label text
12  do
13      case "$label" in
14      ?*:) # FIRST WORD ENDS WITH A COLON; LINE IS OKAY
15          case "$text" in
16          ?*) # SHOW LINE ON SCREEN AND PUT INTO completed FILE:
17              echo "$label $text"
18              echo "$label $text" 1>&3
19              ;;
20          *)  # FILL IT IN OURSELVES:
21              echo -n "$label "
22              exec 5<&0   # SAVE template FILE FD; DO NOT CLOSE!
23              exec 0<&4   # RESTORE ORIGINAL stdin TO READ ans
24              read ans
25              exec 0<&5   # RECONNECT template FILE TO stdin
26              case "$ans" in
27              "") ;;      # EMPTY; DO NOTHING
28              *)  echo "$label $ans" 1>&3 ;;
29              esac
30              ;;
31          esac
32          ;;
33      *)  echo "$myname: bad $1 line:   '$label $text'" 1>&2; break;;
34      esac
35  done <"$template" 2>$errors 3>"$completed"
36  
37  if [ -s $errors ]; then
38      /bin/cat $errors 1>&2
39      echo "$myname: should you remove '$completed' file?" 1>&2
40  fi

Line 10 uses the 4<&0 operator (45.21) to save the location of the original standard input - usually your terminal, but not always - as file descriptor 4. [2] (We'll need to read that original stdin in line 24.)

[2] We can't assume that standard input is coming from a terminal. If we do, it prevents you from running formprog this way:
% command-generator-program | formprog
% formprog < command-file

During lines 11-35 of the redirected-I/O while loop: all commands' standard input comes from the file named in $template, all standard error goes to the $errors file, and anything written to file descriptor 3 is added to the $completed file. UNIX keeps file pointers for all those open files - so each read and write is done just past the end of the previous one.

Here's what happens each time the loop is executed:

The read command (44.13) in line 11 reads the next line from its standard input - that's the open $template file.
The case (44.5) in lines 15-31 checks the text from the $template file:
- If the text has both a label -ding with a colon (:)) and some other text (stored in $text), the complete line is written two places. Line 17 writes the line to the standard output - which is probably your screen (it's not redirected by the script, anyway). Line 18 writes the line to file descriptor 3, the open $completed file.
- If the text has just a label, line 21 writes the label to standard output (usually your terminal) without a newline. We want to read the answer, at line 24, but there's a problem: on some Bourne shells, the read command can only read from file descriptor 0 and won't let you use operators like <&4 on its command line.
  So, in line 22, we save a copy of the open $template file descriptor and the location of the open file pointer in file descriptor 5. Line 23 changes standard input so the read in line 24 will read from the right place (usually the terminal). Line 25 adjusts standard input so the next read at the top of the loop (line 11) will come from the $template file.
  If line 24 doesn't read an answer, line 27 does not write a line. Otherwise, line 28 writes the line to file descriptor 3, the open $completed file.
- If the template label doesn't end with a colon, line 33 writes a message to stderr (file descriptor 2). These messages, together with messages to stderr from any other command in the loop, are redirected into the $errors file. After the loop, if the test (44.20) in line 37 sees any text in the file, the text is displayed in line 38 and the script prints a warning.

The loop keeps reading and writing line by line until the read at the top of the loop reaches the end-of-file of $template.

- JP


45.21 n>&m: Swap Standard Output and Standard Error		45.23 The Ins and Outs of Redirected I/O Loops