License Public Domain
Lines 9
Average
n/a
Rated
0
Times
5
4
3
2
1
0
Keywords
HTML (3) strip tags (1) XML (1)
Permissions
Owner: David
Viewable by Everyone
Editable by All Siafoo Users
Hide
Writing an article is easy - try our reStructured Text demo Join Siafoo Now or Learn More

Strip HTML and XML tags Atom Feed

In Brief Strips tags from a string, leaving only the text bits.
# 's
 1import re        
2
3def striptags(string):
4
5 # Note: this will mess up on tags of the form <img /> so make sure you don't have any of those
6 string = re.sub(r'(?sx) <[^>]*?(\s alt \s* = \s* ([\'"]) ([^>]*?) \2) [^>]*? >', r'\3', string)
7 string = re.sub(r'(?sx) </?[^>]*? >', '', string)
8 #TODO: there must be some way to not do 2 re's
9 string = re.sub('\s{2,}', ' ', string)
10
11 return string

Strips tags from a string, leaving only the text bits.