Simplifying HTML by removing "invisible" parts
Use lxml to simplify an HTML page
import lxml.html.clean
lxml.html.clean.clean_html(source)
example: lxml.html.clean.clean_html("<html><head><title>Hello</title><script>var foo=3;</script></head><body><p>This is <u>some crazy text</u>. OK!</body></html>")
result:
'
Hello<body>
This is some crazy text. OK!
</body>'