Perl Cookbook

Perl CookbookSearch this book
Previous: 1.15. Parsing Comma-Separated DataChapter 1
Strings
Next: 1.17. Program: fixstyle
 

1.16. Soundex Matching

Problem

You have two English surnames and want to know whether they sound somewhat similar, regardless of spelling. This would let you offer users a "fuzzy search" of names in a telephone book to catch "Smith" and "Smythe" and others within the set, such as "Smite" and "Smote."

Solution

Use the standard Text::Soundex module:

 use Text::Soundex;

 $CODE  = soundex($STRING);
 @CODES = soundex(@LIST);

Discussion

The soundex algorithm hashes words (particularly English surnames) into a small space using a simple model that approximates an English speaker's pronunciation of the words. Roughly speaking, each word is reduced to a four character string. The first character is an uppercase letter; the remaining three are digits. By comparing the soundex values of two strings, we can guess whether they sound similar.

The following program prompts for a name and looks for similarly sounding names from the password file. This same approach works on any database with names, so you could key the database on the soundex values if you wanted to. Such a key wouldn't be unique, of course.

use Text::Soundex;
use User::pwent;

print "Lookup user: ";
chomp($user = <STDIN>);
exit unless defined $user;
$name_code = soundex($user);

while ($uent = getpwent()) {
    ($firstname, $lastname) = $uent->gecos =~ /(\w+)[^,]*\b(\w+)/;

    if ($name_code eq soundex($uent->name) ||
        $name_code eq soundex($lastname)   ||
        $name_code eq soundex($firstname)  )
    {
        printf "%s: %s %s\n", $uent->name, $firstname, $lastname;
    }
}

See Also

The documentation for the standard Text::Soundex and User::pwent modules (also in Chapter 7 of Programming Perl); your system's passwd (5) manpage; Volume 3, Chapter 6 of The Art of Computer Programming


Previous: 1.15. Parsing Comma-Separated DataPerl CookbookNext: 1.17. Program: fixstyle
1.15. Parsing Comma-Separated DataBook Index1.17. Program: fixstyle