Scraping: Difference between revisions
No edit summary |
No edit summary |
||
Line 4: | Line 4: | ||
Other interesting libraries to consider: | Other interesting libraries to consider: | ||
* [http://codespeak.net/lxml/ lxml] | * [http://codespeak.net/lxml/ lxml] which can apparently deal with "mal-formed" HTML and quickly convert them to xml trees | ||
* [http://code.google.com/p/html5lib/ html5lib] | * [http://code.google.com/p/html5lib/ html5lib] |
Revision as of 14:01, 11 April 2009
Scraping (also Screen Scraping) is the process of extracting data out of something.
In the course, we have used the library BeautifulSoup to manipulate HTML pages in Python.
Other interesting libraries to consider: