Hide
Writing an article is easy - try our reStructured Text demo Join Siafoo Now or Learn More

PostgreSQL function for stripping HTML (or XML) tags from text Atom Feed 0

In Brief A simple function for stripping XML and HTML tags from a PostgreSQL field.
# 's
1CREATE OR REPLACE FUNCTION strip_tags(TEXT) RETURNS TEXT AS $$
2 SELECT regexp_replace(regexp_replace($1, E'(?x)<[^>]*?(\s alt \s* = \s* ([\'"]) ([^>]*?) \2) [^>]*? >', E'\3'), E'(?x)(< [^>]*? >)', '', 'g')
3$$ LANGUAGE SQL;

A simple function for stripping XML and HTML tags from a PostgreSQL field.

Comments

over 7 years ago (22 Dec 2008 at 09:06 AM) by semartin
I tried to use your function on PostgreSQL 8.3 and it works like a charm!
But no such luck on PostgreSQL 8.2.6 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21).
I gives me the following answer:
ERROR: invalid option of regexp_replace: x

Please help!

semartin
over 7 years ago (22 Dec 2008 at 02:49 PM) by Stou S.
Thanks StuckMojo from #postgresql I got it working again. It turned out that in 8.3 you could put things in the flags section that used to belong in the beginning of the regex, such as the 'x' flag.

Please test it and make sure that it does actually strip tags correctly on various document types =)
over 7 years ago (22 Dec 2008 at 02:50 PM) by Stou S.
"Thanks to StuckMojo" I mean... geez we really need comment editing.
over 7 years ago (22 Dec 2008 at 04:16 PM) by Spencer Ruport
ha! agreed.
over 7 years ago (22 Dec 2008 at 11:08 PM) by semartin
Thanks for prompt reply.
It works like a charm!!!
over 7 years ago (06 Jan 2009 at 06:52 AM) by leifbk
Great function, just what I needed. I searched for "postgresql strip html tags" on Google, and this link came up as no. 5.