Thursday, April 21, 2011

White listing certain HTML tags in python?

Let's say allowed_bits = ['a', 'p']

re.compile(r'<(%s)[^>]*(/>|.*?</\1>)' % ('|'.join(allowed_bits)))

matches:

<a href="blah blah">blah</a>
<p />

and not:

<html>blah blah blah</html>

What I want to do is turn it on its head, so that it matches

<html>blah blah</html>
<script type="text/javascript">blah blah</script>

and not:

<p>Hello</p>

My thinking was to do something like:

re.compile(r'<(**^**%s)[^>]*(/>|.*?</\1>)' % ('|'.join(allowed_bits)))

but this doesn't work.

Any ideas? I want to negatively match.

From stackoverflow

0 comments:

Post a Comment