UNIX Power Tools

UNIX Power ToolsSearch this book
Previous: 24.6 Save Space with Tab Characters Chapter 24
Other Ways to Get Disk Space
Next: 24.8 Save Space: tar and compress a Directory Tree
 

24.7 Compressing Files to Save Space

Most files can be "squeezed" to take up less space. Let's say you have a text file. Each letter occupies a byte, but almost all of the characters in the file are alphanumeric or punctuation, and there are only about 70 such characters. Furthermore, most of the characters are (usually) lowercase; furthermore, the letter "e" turns up more often than "z," the letter "e" often shows up in pairs, and so on. All in all, you don't really need a full eight-bit byte per character. If you're clever, you can reduce the amount of space a file occupies by 50 percent or more.

gzip
Compression algorithms are a complex topic that we can't discuss here. Fortunately, you don't need to know anything about them. Many UNIX systems have a good compression utility built in. It's called compress. Unfortunately, though, the compress algorithm seems to be covered by software patents; many users avoid it for that reason. A newer utility that's even better, and doesn't have patent problems, is GNU's gzip. Those of you who don't have gzip can find it on the CD-ROM.

To compress a file, just give the command:

% gzip filename

The file's name is changed to filename.gz. The -v option asks gzip to tell you how much space you saved. The savings are usually between 40 and 90 percent.

If the file shouldn't be compressed - that is, if the file has hard links (18.4) or the corresponding file already exists-gzip prints a message. You can use the -f option to "force" gzip to compress such a file. This might be better if you're using gzip within a shell script and don't want to worry about files that might not be compressed.

Compressed files are always binary files; even if they started out as text files, you can't read them. To get back the original file, use the gunzip utility:

% gunzip filename

(gunzip also handles files from compress, or you can use uncompress if you'd rather.) You can omit the .gz at the end of the filename. If you just want to read the file but don't want to restore the original version, use the command gzcat; this just decodes the file and dumps it to standard output. It's particularly convenient to pipe gzcat into more (25.3) or grep (27.1). (There's a zcat for compressed files, but gzcat can handle those files too.)

The CD-ROM has several scripts that work on compressed files, uncompressing and recompressing them automatically: editing with zvi, zex, and zed (24.11); viewing with zmore, zless, and zpg (25.5); or running almost any command that can read from a pipe with zloop (24.10).

There are a number of other compression utilities floating around the UNIX world. gzip also works on other operating systems, though. It's reliable and freely available. So gzip has become the utility that more people choose.

- ML, JP


Previous: 24.6 Save Space with Tab Characters UNIX Power ToolsNext: 24.8 Save Space: tar and compress a Directory Tree
24.6 Save Space with Tab Characters Book Index24.8 Save Space: tar and compress a Directory Tree

The UNIX CD Bookshelf NavigationThe UNIX CD BookshelfUNIX Power ToolsUNIX in a NutshellLearning the vi Editorsed & awkLearning the Korn ShellLearning the UNIX Operating System