[Chapter 24] 24.16 Trimming a Huge Directory

24.16 Trimming a Huge Directory

Some implementations of the BSD fast filesystem never truncate directories. That is, when you delete a file, the filesystem marks its directory entry as "invalid," but doesn't actually delete the entry. The old entry can be re-used when someone creates a new file, but will never go away. Therefore, the directories themselves can only get larger with time. Directories usually don't occupy a huge amount of space, but searching through a large directory is noticeably slow. So you should avoid letting directories get too large.

On many UNIX systems, the only way to "shrink a directory" is to move all of its files somewhere else and then remove it; for example:

.[^A--/-^?]
ls -lgd old Get old owner, group, and mode mkdir new; chown user new; chgrp group new; chmod mode new mv old/.??* old/.[^A--/-^?] old/* new ^A and ^? are CTRL-a and DEL rmdir old mv new old

.[^A--/-^?]	ls -lgd old Get old owner, group, and mode mkdir new; chown `user` new; chgrp `group` new; chmod `mode` new mv old/.??* old/.[^A--/-^?] old/* new ^A and `^?` are CTRL-a and DEL rmdir old mv new old

This method also works on V7-ish filesystems. It cannot be applied to the root of a filesystem.

Other implementations of the BSD fast filesystem do truncate directories. They do this after a complete scan of the directory has shown that some number of trailing fragments are empty. Complete scans are forced for any operation that places a new name into the directory - such as creat(2) or link(2). In addition, new names are always placed in the earliest possible free slot. Hence, on these systems there is another way to shrink a directory. [How do you know if your BSD filesystem truncates directories? Try the pseudo-code below (but use actual commands), and see if it has an effect. -ML ]

while (the directory can be shrunk) {
    mv (file in last slot) (some short name)
    mv (the short name) (original name)
}

This works on the root of a filesystem as well as subdirectories.

Neither method should be used if some external agent (for example, a daemon) is busy looking at the directory. The first method will also fail if the external agent is quiet but will resume and hold the existing directory open (for example, a daemon program, like sendmail, that rescans the directory, but which is currently stopped or idle). The second method requires knowing a "safe" short name - i.e., a name that doesn't duplicate any other name in the directory.

I have found the second method useful enough to write a shell script to do the job. I call the script squoze:

IFS : -r -i &&
#! /bin/sh # # squoze last= ls -ldg IFS=' ' while : do set `ls -f | tail -10r` for i do case "$i" in "$last"|.|..) break 2;; esac # _ (underscore) is the "safe, short" filename /bin/mv -i "$i" _ && /bin/mv _ "$i" done last="$i" done ls -ldg

IFS : -r -i &&	#! /bin/sh # # squoze last= ls -ldg IFS=' ' while : do set `ls -f \| tail -10r` for i do case "$i" in "$last"\|.\|..) break 2;; esac # _ (underscore) is the "safe, short" filename /bin/mv -i "$i" _ && /bin/mv _ "$i" done last="$i" done ls -ldg

[The ls -f option lists entries in the order they appear in the directory; it doesn't sort. -JP ] This script does not handle filenames with embedded newlines. It is, however, safe to apply to a sendmail queue while sendmail is stopped.

- CT in comp.unix.admin on Usenet, 22 August 1991


24.15 Trimming a Directory		24.17 Disk Quotas