Hide
Know what you're getting – Unlike many sites, all our code is clearly licensed. Join Siafoo Now or Learn More

Strip HTML and XML tags Atom Feed 0

In Brief Strips tags from a string, leaving only the text bits.
# 's
 1import re        
2
3def striptags(string):
4
5 # Note: this will mess up on tags of the form <img /> so make sure you don't have any of those
6 string = re.sub(r'(?sx) <[^>]*?(\s alt \s* = \s* ([\'"]) ([^>]*?) \2) [^>]*? >', r'\3', string)
7 string = re.sub(r'(?sx) </?[^>]*? >', '', string)
8 #TODO: there must be some way to not do 2 re's
9 string = re.sub('\s{2,}', ' ', string)
10
11 return string

Strips tags from a string, leaving only the text bits.