Turning part of a page back into code (aka serialization)

From Media Design: Networked & Lens-Based wiki
Jump to navigation Jump to search

Imagine you want to print out the full code of part of a page. Use lxml.etree.tostring. This converts any node back into source code -- a process called serialization.

htmlsource="<html><body><p>Example page.</p><p>More stuff with <i>markup</i>.</p></body></html>"
htmlparser = html5lib.HTMLParser(tree=html5lib.treebuilders.getTreeBuilder("lxml"), namespaceHTMLElements=False)
page = htmlparser.parse(htmlsource)
selector = lxml.cssselect.CSSSelector("p")
p = selector(page)[1]
print lxml.etree.tostring(p)