Sniff, Scrape, Crawl (Prototyping): Difference between revisions

From XPUB & Lens-Based wiki
No edit summary
Line 1: Line 1:
In 2011, [[Sniff, Scrape, Crawl]] was a thematic project led by Aymeric Mansoux, Renee Turner, and Michael Murtaugh.
In 2011, [[Sniff, Scrape, Crawl]] was a thematic project led by Aymeric Mansoux, Renee Turner, and Michael Murtaugh.


This prototyping module will in part revisit some of the themes of this thematic project and in particular focus on the tools and practices of scraping.
This prototyping module covers some of the core themes and tools around the practice of "scraping", with the goal to better familiarize yourself with the possibilities of this technique and to develop strategic uses of the tools for your specific research.


== Elements ==
* Spidering
* Crawling
* Indexing
* Summarizing
* Break up the steps of Whoosh's indexing ()


== Tools ==
== Meeting 1 ==
=== Scraping Tools ===
* S: [[Simple Web Spider in Python]]
* S: [[Simple Web Spider in Python]]
* M: [http://scrapy.org/ Scrapy]
* M: [http://scrapy.org/ Scrapy]
* L: [http://en.wikipedia.org/wiki/Heritrix Heritrix]
* L: [http://en.wikipedia.org/wiki/Heritrix Heritrix]


== Examples ==
=== Afternoon: Meeting to discuss / develop / brainstorm project ideas ===
 
== Some Examples ==
* [https://archive.org/search.php?query=collection%3A%22focused_crawls%22 focused_crawls]
* [https://archive.org/search.php?query=collection%3A%22focused_crawls%22 focused_crawls]
* Lasse's Tumblr Jumper
* Lasse's Tumblr Jumper

Revision as of 15:30, 19 May 2014

In 2011, Sniff, Scrape, Crawl was a thematic project led by Aymeric Mansoux, Renee Turner, and Michael Murtaugh.

This prototyping module covers some of the core themes and tools around the practice of "scraping", with the goal to better familiarize yourself with the possibilities of this technique and to develop strategic uses of the tools for your specific research.


Meeting 1

Scraping Tools

Afternoon: Meeting to discuss / develop / brainstorm project ideas

Some Examples

Links