Web Spider in Python

From XPUB & Lens-Based wiki
Revision as of 18:28, 4 March 2014 by Michael Murtaugh (talk | contribs) (Created page with "Using html5lib <source lang="python"> import html5lib, urllib url = "http://wikipedia.org/" html = urllib.urlopen(url).read() tree = html5lib.parse(html, namespaceHTMLElemen...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Using html5lib

import html5lib, urllib

url = "http://wikipedia.org/"
html = urllib.urlopen(url).read()
tree = html5lib.parse(html, namespaceHTMLElements=False)
for a in tree.findall(".//a"):
    print "a element", a