Siafoo – the intersection of pastebin, help desk, version control, and social networking
Join Siafoo Now
or
Learn More
Convert Strings to Soundex Equivalents
0
| In Brief | Converts strings to their soundex equivalent. Soundex is a phonetic algorithm for indexing names by sound in English. This allows similarly-pronounced but differently-spelled words to be matched.... more |
| Language | Python |
# 's
1"""Soundex algorithm
2
3This program is part of "Dive Into Python", a free Python book for
4experienced programmers. Visit http://diveintopython.org/ for the
5latest version.
6"""
7
8__author__ = "Mark Pilgrim (mark@diveintopython.org)"
9__version__ = "$Revision: 1.5 $"
10__date__ = "$Date: 2004/05/11 19:11:21 $"
11__copyright__ = "Copyright (c) 2004 Mark Pilgrim"
12__license__ = "Python"
13
14import string
15
16allChar = string.uppercase + string.lowercase
17charToSoundex = string.maketrans(allChar, "91239129922455912623919292" * 2)
18
19def soundex(source):
20 "convert string to Soundex equivalent"
21
22 # Soundex requirements:
23 # source string must be at least 1 character
24 # and must consist entirely of letters
25 if (not source) or (not source.isalpha()):
26 return "0000"
27
28 # Soundex algorithm:
29 # 1. make first character uppercase
30 # 2. translate all other characters to Soundex digits
31 digits = source[0].upper() + source[1:].translate(charToSoundex)
32
33 # 3. remove consecutive duplicates
34 digits2 = digits[0]
35 for d in digits[1:]:
36 if digits2[-1] != d:
37 digits2 += d
38
39 # 4. remove all "9"s
40 # 5. pad end with "0"s to 4 characters
41 return (digits2.replace('9', '') + '000')[:4]
42
43if __name__ == '__main__':
44 import sys
45 if sys.argv[1:]:
46 print soundex(sys.argv[1])
47 else:
48 from timeit import Timer
49 names = ('Woo', 'Pilgrim', 'Flingjingwaller')
50 for name in names:
51 statement = "soundex('%s')" % name
52 t = Timer(statement, "from __main__ import soundex")
53 print name.ljust(15), soundex(name), min(t.repeat())
Converts strings to their soundex equivalent. Soundex is a phonetic algorithm for indexing names by sound in English. This allows similarly-pronounced but differently-spelled words to be matched.
Check out the Wikipedia entry for more details; this sounds pretty cool.
Here's a neat little unittest to make sure everything works, and to give you an idea what translated strings look like:
1"""Unit test for soundex.py
2
3This program is part of "Dive Into Python", a free Python book for
4experienced programmers. Visit http://diveintopython.org/ for the
5latest version.
6"""
7
8__author__ = "Mark Pilgrim (mark@diveintopython.org)"
9__version__ = "$Revision: 1.1 $"
10__date__ = "$Date: 2004/05/06 17:18:17 $"
11__copyright__ = "Copyright (c) 2004 Mark Pilgrim"
12__license__ = "Python"
13
14import soundex
15import unittest
16
17class KnownValues(unittest.TestCase):
18 knownValues = (('', '0000'),
19 ('Woo', 'W000'),
20 ('Pilgrim', 'P426'),
21 ('Radiohead', 'R330'),
22 ('Flingjingwaller', 'F452'),
23 ('Euler', 'E460'),
24 ('Ellery', 'E460'),
25 ('Gauss', 'G200'),
26 ('Ghosh', 'G200'),
27 ('Hilbert', 'H416'),
28 ('Heilbronn', 'H416'),
29 ('Knuth', 'K530'),
30 ('Kant', 'K530'),
31 ('Lukasiewicz', 'L222'),
32 ('Lissajous', 'L222')
33 )
34
35 def testKnownValues(self):
36 """soundex should give known result with known input"""
37 for name, result in self.knownValues:
38 self.assertEqual(soundex.soundex(name), result)
39
40if __name__ == "__main__":
41 unittest.main()
Comments
http://www.postgresql.org/docs/8.3/interactive/fuzzystrmatch.html