Simplifying HTML by removing "invisible" parts

From XPUB & Lens-Based wiki

Use lxml to simplify an HTML page

import lxml.html.clean
lxml.html.clean.clean_html(source)

example: lxml.html.clean.clean_html("<html><head><title>Hello</title><script>var foo=3;</script></head><body><p>This is <u>some crazy text</u>. OK!</body></html>")

result:

'

Hello<body>

This is some crazy text. OK!

</body>

'