User:Fako Berkers/project2
Sniff, Scrape, Crawl
WikiAPI
I have had a look at Wikipedia and I'm interested in categories especially when they include people. You have for instance a category of Marxist Theorist (to stay a little bit in the same genre as last trimester). This page lists all people categorized as Marxist Theorist and nothing else.
I find categories exiting whenever I regard them as communities. The persons listed there may not even be aware of this community, but as a fact some common ideal or subject or whatever binds these persons together.
I would like to sniff, scrape and crawl in a number of ways to reveal these communities to themselves and others. The following possibilities occurred to me when viewing the Wiki API
- try to fetch jargon used by a community (or their wiki users/pages)
- try different kinds of mapping like (most quoted, highest rank by Google, most backlinks, voted most important by own community, voted most important by critics)
- fetch total bibliography of community and make up sorting algorithms
- create a "fieldview" by relating the communities of critics to the community being portrait
- try a community kickstart by putting email addresses associated with the names on a mailinglist
In the long run small aps like these might build up to article validation. For instance if a text called text.A contains jargon from community.13 then a computer could see to whom described ideas belong to and how these are regarded by other communities (critiques) and the rest of the world (popularity measured through Google ranking)
Article validation may be useful to counter information overload, but I do think that users should always be able to favor certain writers manually. This is to make sure that people choose to ignore or favor certain writing instead of a computer telling people what to read because most people read that.
First results
I've played around with the API and got some interesting results. By using a simple algorithm on the Wiki data I'm able to relate people. If you give a name to the program, it will calculate who is most likely some kind of colleague and indeed if you're interested in person A the computer can guess you also like G and I (for example). Here's one printout:
Slavoj Zizek:
[(u'Slavoj \u017di\u017eek', 41), (u'Jacques Lacan', 9), (u'Antonio Negri', 8), (u'Kojin Karatani', 8), (u'Judith Butler', 7), (u'Rosa Luxemburg', 7), (u'Jacques Derrida', 7), (u'Chopper Read', 7),
(u'Bo\u017eidar Debenjak', 7), (u'Victor Menezes', 6), (u'Julia Kristeva', 6), (u'Alexander Toradze', 6), (u'Ale\u0161 Debeljak', 6), (u'Jean-Pierre Jeunet', 6), (u'Luce Irigaray', 6),
(u'Boeing 727', 6), (u'Stephen Bronner', 6), (u'Rastko Mo\u010dnik', 6), (u'Steve Brookstein', 6), (u'Alain Badiou', 6)]