Scraping: Difference between revisions

From XPUB & Lens-Based wiki
(New page: Scraping (also Screen Scraping) is the process of extracting data out of something. In the course, we have used the library Beautiful Soup to manipulate HTML pages in Python. Oth...)
 
No edit summary
Line 1: Line 1:
Scraping (also Screen Scraping) is the process of extracting data out of something.
Scraping (also Screen Scraping) is the process of extracting data out of something.


In the course, we have used the library [[Beautiful Soup]] to manipulate HTML pages in [[Python]].
In the course, we have used the library [[BeautifulSoup]] to manipulate HTML pages in [[Python]].


Other interesting libraries to consider:
Other interesting libraries to consider:
* [http://codespeak.net/lxml/ lxml]
* [http://codespeak.net/lxml/ lxml]
* [http://code.google.com/p/html5lib/ html5lib]
* [http://code.google.com/p/html5lib/ html5lib]

Revision as of 15:00, 11 April 2009

Scraping (also Screen Scraping) is the process of extracting data out of something.

In the course, we have used the library BeautifulSoup to manipulate HTML pages in Python.

Other interesting libraries to consider: