Web Spider in Python
Revision as of 18:28, 4 March 2014 by Michael Murtaugh (talk | contribs) (Created page with "Using html5lib <source lang="python"> import html5lib, urllib url = "http://wikipedia.org/" html = urllib.urlopen(url).read() tree = html5lib.parse(html, namespaceHTMLElemen...")
Using html5lib
import html5lib, urllib
url = "http://wikipedia.org/"
html = urllib.urlopen(url).read()
tree = html5lib.parse(html, namespaceHTMLElements=False)
for a in tree.findall(".//a"):
print "a element", a