Sniff, Scrape, Crawl (Prototyping): Difference between revisions
(→Tools) |
No edit summary |
||
Line 1: | Line 1: | ||
In 2011, [[Sniff, Scrape, Crawl]] was a thematic project led by Aymeric Mansoux, Renee Turner, and Michael Murtaugh. | In 2011, [[Sniff, Scrape, Crawl]] was a thematic project led by Aymeric Mansoux, Renee Turner, and Michael Murtaugh. | ||
This prototyping module | This prototyping module covers some of the core themes and tools around the practice of "scraping", with the goal to better familiarize yourself with the possibilities of this technique and to develop strategic uses of the tools for your specific research. | ||
== Tools == | == Meeting 1 == | ||
=== Scraping Tools === | |||
* S: [[Simple Web Spider in Python]] | * S: [[Simple Web Spider in Python]] | ||
* M: [http://scrapy.org/ Scrapy] | * M: [http://scrapy.org/ Scrapy] | ||
* L: [http://en.wikipedia.org/wiki/Heritrix Heritrix] | * L: [http://en.wikipedia.org/wiki/Heritrix Heritrix] | ||
== Examples == | === Afternoon: Meeting to discuss / develop / brainstorm project ideas === | ||
== Some Examples == | |||
* [https://archive.org/search.php?query=collection%3A%22focused_crawls%22 focused_crawls] | * [https://archive.org/search.php?query=collection%3A%22focused_crawls%22 focused_crawls] | ||
* Lasse's Tumblr Jumper | * Lasse's Tumblr Jumper |
Revision as of 15:30, 19 May 2014
In 2011, Sniff, Scrape, Crawl was a thematic project led by Aymeric Mansoux, Renee Turner, and Michael Murtaugh.
This prototyping module covers some of the core themes and tools around the practice of "scraping", with the goal to better familiarize yourself with the possibilities of this technique and to develop strategic uses of the tools for your specific research.
Meeting 1
Scraping Tools
- S: Simple Web Spider in Python
- M: Scrapy
- L: Heritrix
Afternoon: Meeting to discuss / develop / brainstorm project ideas
Some Examples
- focused_crawls
- Lasse's Tumblr Jumper
- Birgit Bachler's Bonus Card Friends