User:Tash/grad prototyping: Difference between revisions
Line 52: | Line 52: | ||
* [http://www.newsdiffs.org NewsDiffs] – as a way to expose the historiography of an article | * [http://www.newsdiffs.org NewsDiffs] – as a way to expose the historiography of an article | ||
* how about looking at comments? what can you scrape (and analyse) from social media? | * how about looking at comments? what can you scrape (and analyse) from social media? | ||
* how far can you go without using an API? | |||
* self-censorship: can you track the things people write but then retract? | * self-censorship: can you track the things people write but then retract? | ||
* [https://pzwiki.wdka.nl/mediadesign/An_Anthem_to_Open_Borders An Anthem to Open Borders] | * [https://pzwiki.wdka.nl/mediadesign/An_Anthem_to_Open_Borders An Anthem to Open Borders] |
Revision as of 11:28, 2 October 2018
Prototyping Session 1 & 2
Possible topics to explore:
- anonymity
- creating safe, temporary, local networks (freedom of speech - freedom of connection!) http://rhizome.org/editorial/2018/sep/11/rest-in-peace-ethira-an-interview-with-amalia-ulman/
- censorship
- scraping, archiving
- documenting redactions
- steganography?
- meme culture
Learning to use Scrapy
Scrapy is an application framework for crawling web sites and extracting structured data which can be used for a wide range of useful applications, like data mining, information processing or historical archival. Even though Scrapy was originally designed for web scraping, it can also be used to extract data using APIs (such as Amazon Associates Web Services) or as a general purpose web crawler.
Documentation: https://docs.scrapy.org/en/latest/index.html
Scraping headlines from an Indonesian news site:
Using a spider to extract header elements (H5) from: http://www.thejakartapost.com/news/index
import scrapy
class TitlesSpider(scrapy.Spider):
name = "titles"
def start_requests(self):
urls = [
'http://www.thejakartapost.com/news/index',
]
for url in urls:
yield scrapy.Request(url=url, callback=self.parse)
def parse(self, response):
for title in response.css('h5'):
yield {
'text': title.css('h5::text').extract()
}
Crawling and saving to a json file:
scrapy crawl titles -o titles.json
To explore
- NewsDiffs – as a way to expose the historiography of an article
- how about looking at comments? what can you scrape (and analyse) from social media?
- how far can you go without using an API?
- self-censorship: can you track the things people write but then retract?
- An Anthem to Open Borders