License Python Software Foundation License (Python 2.x)
 Lines 44
##### Keywords
convert (3) soundex (1) string (7)
##### Permissions
Viewable by Everyone
Editable by All Siafoo Users
Writing an article is easy - try our reStructured Text demo

# Convert Strings to Soundex Equivalents 0

 In Brief Converts strings to their soundex equivalent. Soundex is a phonetic algorithm for indexing names by sound in English. This allows similarly-pronounced but differently-spelled words to be matched.... more
 Language Python
# 's
` 1"""Soundex algorithm 2 3This program is part of "Dive Into Python", a free Python book for 4experienced programmers.  Visit http://diveintopython.org/ for the 5latest version. 6""" 7 8__author__ = "Mark Pilgrim (mark@diveintopython.org)" 9__version__ = "\$Revision: 1.5 \$"10__date__ = "\$Date: 2004/05/11 19:11:21 \$"11__copyright__ = "Copyright (c) 2004 Mark Pilgrim"12__license__ = "Python"1314import string1516allChar = string.uppercase + string.lowercase17charToSoundex = string.maketrans(allChar, "91239129922455912623919292" * 2)1819def soundex(source):20    "convert string to Soundex equivalent"2122    # Soundex requirements:23    # source string must be at least 1 character24    # and must consist entirely of letters25    if (not source) or (not source.isalpha()):26        return "0000"2728    # Soundex algorithm:29    # 1. make first character uppercase30    # 2. translate all other characters to Soundex digits31    digits = source[0].upper() + source[1:].translate(charToSoundex)3233    # 3. remove consecutive duplicates34    digits2 = digits[0]35    for d in digits[1:]:36        if digits2[-1] != d:37            digits2 += d38        39    # 4. remove all "9"s40    # 5. pad end with "0"s to 4 characters41    return (digits2.replace('9', '') + '000')[:4]4243if __name__ == '__main__':44    import sys45    if sys.argv[1:]:46        print soundex(sys.argv[1])47    else:48        from timeit import Timer49        names = ('Woo', 'Pilgrim', 'Flingjingwaller')50        for name in names:51            statement = "soundex('%s')" % name52            t = Timer(statement, "from __main__ import soundex")53            print name.ljust(15), soundex(name), min(t.repeat())`

Converts strings to their soundex equivalent. Soundex is a phonetic algorithm for indexing names by sound in English. This allows similarly-pronounced but differently-spelled words to be matched.

Check out the Wikipedia entry for more details; this sounds pretty cool.

Here's a neat little unittest to make sure everything works, and to give you an idea what translated strings look like:

` 1"""Unit test for soundex.py 2 3This program is part of "Dive Into Python", a free Python book for 4experienced programmers.  Visit http://diveintopython.org/ for the 5latest version. 6""" 7 8__author__ = "Mark Pilgrim (mark@diveintopython.org)" 9__version__ = "\$Revision: 1.1 \$"10__date__ = "\$Date: 2004/05/06 17:18:17 \$"11__copyright__ = "Copyright (c) 2004 Mark Pilgrim"12__license__ = "Python"1314import soundex15import unittest1617class KnownValues(unittest.TestCase):18    knownValues = (('', '0000'),19           ('Woo', 'W000'),20           ('Pilgrim', 'P426'),21           ('Radiohead', 'R330'),22           ('Flingjingwaller', 'F452'),23           ('Euler', 'E460'),24           ('Ellery', 'E460'),25           ('Gauss', 'G200'),26           ('Ghosh', 'G200'),27           ('Hilbert', 'H416'),28           ('Heilbronn', 'H416'),29           ('Knuth', 'K530'),30           ('Kant', 'K530'),31           ('Lukasiewicz', 'L222'),32           ('Lissajous', 'L222')33                  )3435    def testKnownValues(self):36        """soundex should give known result with known input"""37        for name, result in self.knownValues:38        self.assertEqual(soundex.soundex(name), result)3940if __name__ == "__main__":41    unittest.main()`

## Comments

over 12 years ago (27 Jun 2008 at 03:58 AM) by Stou S.
There is a PostgreSQL contrib module called Fuzzystrmatch that implements that algorithm.

http://www.postgresql.org/docs/8.3/interactive/fuzzystrmatch.html