License Python Software Foundation License (Python 2.x)
Lines 44
Keywords
convert (3) soundex (1) string (7)
Included in this Library
Permissions
Viewable by Everyone
Editable by All Siafoo Users
Hide
Solve a problem – Filter by language, license, keyword, owner, or search text to find code & info fast. Join Siafoo Now or Learn More

Convert Strings to Soundex Equivalents Atom Feed 0

In Brief Converts strings to their soundex equivalent. Soundex is a phonetic algorithm for indexing names by sound in English. This allows similarly-pronounced but differently-spelled words to be matched.... more
# 's
 1"""Soundex algorithm
2
3This program is part of "Dive Into Python", a free Python book for
4experienced programmers. Visit http://diveintopython.org/ for the
5latest version.
6"""
7
8__author__ = "Mark Pilgrim (mark@diveintopython.org)"
9__version__ = "$Revision: 1.5 $"
10__date__ = "$Date: 2004/05/11 19:11:21 $"
11__copyright__ = "Copyright (c) 2004 Mark Pilgrim"
12__license__ = "Python"
13
14import string
15
16allChar = string.uppercase + string.lowercase
17charToSoundex = string.maketrans(allChar, "91239129922455912623919292" * 2)
18
19def soundex(source):
20 "convert string to Soundex equivalent"
21
22 # Soundex requirements:
23 # source string must be at least 1 character
24 # and must consist entirely of letters
25 if (not source) or (not source.isalpha()):
26 return "0000"
27
28 # Soundex algorithm:
29 # 1. make first character uppercase
30 # 2. translate all other characters to Soundex digits
31 digits = source[0].upper() + source[1:].translate(charToSoundex)
32
33 # 3. remove consecutive duplicates
34 digits2 = digits[0]
35 for d in digits[1:]:
36 if digits2[-1] != d:
37 digits2 += d
38
39 # 4. remove all "9"s
40 # 5. pad end with "0"s to 4 characters
41 return (digits2.replace('9', '') + '000')[:4]
42
43if __name__ == '__main__':
44 import sys
45 if sys.argv[1:]:
46 print soundex(sys.argv[1])
47 else:
48 from timeit import Timer
49 names = ('Woo', 'Pilgrim', 'Flingjingwaller')
50 for name in names:
51 statement = "soundex('%s')" % name
52 t = Timer(statement, "from __main__ import soundex")
53 print name.ljust(15), soundex(name), min(t.repeat())

Converts strings to their soundex equivalent. Soundex is a phonetic algorithm for indexing names by sound in English. This allows similarly-pronounced but differently-spelled words to be matched.

Check out the Wikipedia entry for more details; this sounds pretty cool.

Here's a neat little unittest to make sure everything works, and to give you an idea what translated strings look like:

 1"""Unit test for soundex.py
2
3This program is part of "Dive Into Python", a free Python book for
4experienced programmers. Visit http://diveintopython.org/ for the
5latest version.
6"""
7
8__author__ = "Mark Pilgrim (mark@diveintopython.org)"
9__version__ = "$Revision: 1.1 $"
10__date__ = "$Date: 2004/05/06 17:18:17 $"
11__copyright__ = "Copyright (c) 2004 Mark Pilgrim"
12__license__ = "Python"
13
14import soundex
15import unittest
16
17class KnownValues(unittest.TestCase):
18 knownValues = (('', '0000'),
19 ('Woo', 'W000'),
20 ('Pilgrim', 'P426'),
21 ('Radiohead', 'R330'),
22 ('Flingjingwaller', 'F452'),
23 ('Euler', 'E460'),
24 ('Ellery', 'E460'),
25 ('Gauss', 'G200'),
26 ('Ghosh', 'G200'),
27 ('Hilbert', 'H416'),
28 ('Heilbronn', 'H416'),
29 ('Knuth', 'K530'),
30 ('Kant', 'K530'),
31 ('Lukasiewicz', 'L222'),
32 ('Lissajous', 'L222')
33 )
34
35 def testKnownValues(self):
36 """soundex should give known result with known input"""
37 for name, result in self.knownValues:
38 self.assertEqual(soundex.soundex(name), result)
39
40if __name__ == "__main__":
41 unittest.main()

Comments

over 6 years ago (27 Jun 2008 at 03:58 AM) by Stou S.
There is a PostgreSQL contrib module called Fuzzystrmatch that implements that algorithm.

http://www.postgresql.org/docs/8.3/interactive/fuzzystrmatch.html