User:Fako Berkers/project2: Difference between revisions

From XPUB & Lens-Based wiki
 
(27 intermediate revisions by the same user not shown)
Line 1: Line 1:
==Sniff, Scrape, Crawl==
==Sniff, Scrape, Crawl==


===WikiAPI===
===Wicked Wiki===
I started the year with a scraper/tool that can investigate persons on Wikipedia. At the moment it can find collegues based on simularities in categorization. This is a great tool, because if you find one person interesting you are likely to appreciate the others. Investigating Ghandi for instance delivers some unknown Indian rulers and peace activists. I'm still working on improvements. You can read more [[wickedwiki|here]].


I have had a look at Wikipedia and I'm interested in categories especially when they include people. You have for instance a category of Marxist Theorist ([http://en.wikipedia.org/wiki/Category:Marxist_theorists to stay a little bit in the same genre as last trimester]). This page lists all people categorized as Marxist Theorist and nothing else.
===Anecdote: from WW to AA===


I find categories exiting whenever I regard them as communities. The persons listed there may not even be aware of this community, but as a fact some common ideal or subject or whatever binds these persons together.
When I was on my way to Berlin to visit Transmediale I was thinking about my Wicked Wiki idea. The program was slow, but functional and I was looking for a next step. For me this was in visualizing results. Something that concerned me was that a lot of people on Wikipedia disappear into the unknown. That's why a wanted to make superstars out of people that were on Wikipedia but not famous.


I would like to sniff, scrape and crawl in a number of ways to reveal these communities to themselves and others. The following possibilities occurred to me when viewing the Wiki API
When I got back from Berlin the results of my test subject (Slavoj Zizek) were poluted with "German Nachste Superstar", the German Idol program on television. Quite baffeled by this coincedent I started to work on a program that visualized the battle between Zizek and Lombardi (one of the German Idol participants). This grew into the much bigger tool that is now Attention Arena
* try to fetch jargon used by a community (or their wiki users/pages)
* try different kinds of mapping like (most quoted, highest rank by Google, most backlinks, voted most important by own community, voted most important by critics)
* fetch total bibliography of community and make up sorting algorithms
* create a "fieldview" by relating the communities of critics to the community being portrait
* try a community kickstart by putting email addresses associated with the names on a mailinglist


In the long run small aps like these might build up to article validation. For instance if a text called text.A contains jargon from community.13 then a computer could see to whom described ideas belong to and how these are regarded by other communities (critiques) and the rest of the world (popularity measured through Google ranking)
===Attention Arena===


Article validation may be useful to counter information overload, but I do think that users should always be able to favor certain writers manually. This is to make sure that people choose to ignore or favor certain writing instead of a computer telling people what to read because most people read that.
I'm interested in how internet regulates our attention. Some subjects may be very interesting, but because the majority doesn't watch these subjects they fade away on the internet being replaced by what others are watching. Attention for some interesting subjects will disappear because of this.


===First results===
I started to visualize this by putting two YouTube videos on top of each other fading the least popular video. This worked well with a video from Zizek and Lombardi, a philosopher and teenage superstar respectively.


I've played around with the API and got some interesting results. By using a simple algorithm on the Wiki data I'm able to relate people. If you give a name to the program, it will calculate who is most likely some kind of colleague and indeed if you're interested in person A the computer can guess you also like G and I (for example). Here's one printout:
Spurred on by this nice result I started to work on a tool that does the following:
* It allows a user to create a list of Youtube videos in Django Administration.
* These videos get downloaded automatically and converted for manipulation purposes if you run a script.
* The videos can be shown in OpenFrameworks.
* With Python you can save preferences for viewing in Django like: size, position, transparency and number of videos


<source lang="python">
My plan is to create a pyramid of six videos. The top video will be the most popular one and the lower ones less popular. The lower the video is on the pyramid, the more faded the video appears. This setup shows what is popular and what is not. It reveils where the attention of the public is heading and what slowly will be forgotten.
Slavoj Zizek:
[(u'Slavoj \u017di\u017eek', 41), (u'Jacques Lacan', 9), (u'Antonio Negri', 8), (u'Kojin Karatani', 8), (u'Judith Butler', 7), (u'Rosa Luxemburg', 7), (u'Jacques Derrida', 7), (u'Chopper Read', 7),
(u'Bo\u017eidar Debenjak', 7), (u'Victor Menezes', 6), (u'Julia Kristeva', 6), (u'Alexander Toradze', 6), (u'Ale\u0161 Debeljak', 6), (u'Jean-Pierre Jeunet', 6), (u'Luce Irigaray', 6), (u'Boeing 727', 6), (u'Stephen Bronner', 6),
(u'Rastko Mo\u010dnik', 6), (u'Steve Brookstein', 6), (u'Alain Badiou', 6)]
</source>


It's interesting that the algorithm can easily predict itself whether results will be reasonable or bad.
To write an algorythm that creates this pyramid will not be too hard. It is a challange to find a good collection of Youtube videos that strengthen my concept behind the form. This is where the work is at now.
The algorithm can use some fine tuning to get rid of the nonsense like  Boeing 727 :)
I do have idea's on how to do that, but making the calculations already takes up 10 to 20 minutes, so imagine with an improved version ... I'm optimizing before expanding for sure. Django could be my best friend in this.


Without being aware of it the results lead to some kind of new search engine. I like the emotions that I get while viewing the results. It seems like my attention is brought to interesting new people by using it.
===How Bieber is Your Hero??===


===Stage two plans===
For the open day I created a website where you can search for your hero. I got inspired by this comment on Youtube:


The most important thing now is to optimize. I assume the URL requests are taking the most time. The program will often make more than a thousand request, because Category:Living_people is often fully investigated. This means it has to go through half a million names. If I would save the category listing with Django in a sort of cache I could create the same results without over asking the connection.  
''NetaJi was inspired with Swami Vivekananda , who do we make our role models?...Actors or Sports stars...and its not our fault , the actors at least pretend to be heroes , and sports stars bring some pride( however small) to the country, we choose them because there are no real Heroes left , Look what the politicians have made of us , Netaji was considered a terrorist up till the 70's by the Cong. Govt. while Rajiv & Indira are considered Great DeshBhaktas.''


====Improved ranking====
This is very vague for a non-indian person, but what I get from it is that our hero's only consist of artists and sports people. This is exactly what I found when I was looking at Category:Living_people on Wikipedia. The best documented people were part of popular culture; either sports or singing. Yet they don't really inspire me and they are part of a culture that you will get to know whether you want to or not anyway.


There's a few interesting things I can do with the code once I optimized with Django to improve the ordering of the results.
The website is a place where you can spend your time with your real hero's (from popular culture or not). Through the Wicked Wiki software it finds colleagues of your hero. This is exciting, because you may never have heard from these people! At the same time these hero's are put into the perspective of popular culture by comparing their popularity with the most popular artist on Youtube at this moment. Every hero is a certain percentage of Bieber on the website, which is calculated by relating Youtube views of Bieber and your hero. Bieber has more than 500 million Youtube views and because of this, most hero's don't make a 1% Bieber rating.
* I can set up a “control group” for each search and use that data to make common used Categories less important than rare categories. This tool can easily be transformed to filter a vocabulary from Categories (for example the rare categories) which further expands the possibilities.
* I could distinguish between related and unrelated categories which may improve the ordering of results (pushing Boeing 727 and irrelevant people to the back) especially when dealing with people less documented.
* An alternative to improve results is relating categories and names to categories and names used on the page itself. This may delete results like Boeing 727 and some irrelevant people (like Michael Jackson in results for Albert Einstein).
All options will compliment each other and the first draws the idea potentially into another direction (search on words instead of names). If the Django-cache code is flexible enough it might optimize these improvements as well


However I'm doubting whether I want to reorder the results, because I kinda like the dirtiness (makes me pleasantly surprised). I would then however like to know why something like Boeing 727 was associated with Zizek. This could be done with an improved printing procedure.
The "How Bieber is Your Hero" website is a tool that can help you diminish the information overload. At the same time it makes you aware about the relation between your references and the icons in popular culture.
 
====Application one: community kickstarter====
 
====Application two: personal persons====
 
====Application three: search engine====
 
====Prospects: crowd sourcing====
 
RSS feedback loop system within community
Feedback to Wiki community (critiques?)
Improve English with Dutch grammar
hCard??
 
====Critique====
 
A point of possible critique is that Wikipedia is not for the common people and the same may be true for this algorithm. It might only be useful for people like me, who only know a little of a lot and are curious for more.
 
Michael Jackson
<source lang=”python”>
[(u'Michael Jackson', 60), (u'Jermaine Jackson', 20), (u'Janet Jackson', 19), (u'Stevie Wonder', 19), (u'Prince (musician)', 18), (u'Madonna (entertainer)', 18), (u'Justin Timberlake', 17), (u'Bob Dylan', 16), (u'Paul McCartney', 16), (u'Tina Turner', 16), (u'Marlon Jackson', 16), (u'La Toya Jackson', 15), (u'Mariah Carey', 15), (u'Lionel Richie', 15), (u'Britney Spears', 15), (u'Whitney Houston', 15), (u'Diana Ross', 15), (u'Little Richard', 15), (u'Usher (entertainer)', 14), (u'Christina Aguilera', 14)]
</source>

Latest revision as of 18:54, 22 May 2011

Sniff, Scrape, Crawl

Wicked Wiki

I started the year with a scraper/tool that can investigate persons on Wikipedia. At the moment it can find collegues based on simularities in categorization. This is a great tool, because if you find one person interesting you are likely to appreciate the others. Investigating Ghandi for instance delivers some unknown Indian rulers and peace activists. I'm still working on improvements. You can read more here.

Anecdote: from WW to AA

When I was on my way to Berlin to visit Transmediale I was thinking about my Wicked Wiki idea. The program was slow, but functional and I was looking for a next step. For me this was in visualizing results. Something that concerned me was that a lot of people on Wikipedia disappear into the unknown. That's why a wanted to make superstars out of people that were on Wikipedia but not famous.

When I got back from Berlin the results of my test subject (Slavoj Zizek) were poluted with "German Nachste Superstar", the German Idol program on television. Quite baffeled by this coincedent I started to work on a program that visualized the battle between Zizek and Lombardi (one of the German Idol participants). This grew into the much bigger tool that is now Attention Arena

Attention Arena

I'm interested in how internet regulates our attention. Some subjects may be very interesting, but because the majority doesn't watch these subjects they fade away on the internet being replaced by what others are watching. Attention for some interesting subjects will disappear because of this.

I started to visualize this by putting two YouTube videos on top of each other fading the least popular video. This worked well with a video from Zizek and Lombardi, a philosopher and teenage superstar respectively.

Spurred on by this nice result I started to work on a tool that does the following:

  • It allows a user to create a list of Youtube videos in Django Administration.
  • These videos get downloaded automatically and converted for manipulation purposes if you run a script.
  • The videos can be shown in OpenFrameworks.
  • With Python you can save preferences for viewing in Django like: size, position, transparency and number of videos

My plan is to create a pyramid of six videos. The top video will be the most popular one and the lower ones less popular. The lower the video is on the pyramid, the more faded the video appears. This setup shows what is popular and what is not. It reveils where the attention of the public is heading and what slowly will be forgotten.

To write an algorythm that creates this pyramid will not be too hard. It is a challange to find a good collection of Youtube videos that strengthen my concept behind the form. This is where the work is at now.

How Bieber is Your Hero??

For the open day I created a website where you can search for your hero. I got inspired by this comment on Youtube:

NetaJi was inspired with Swami Vivekananda , who do we make our role models?...Actors or Sports stars...and its not our fault , the actors at least pretend to be heroes , and sports stars bring some pride( however small) to the country, we choose them because there are no real Heroes left , Look what the politicians have made of us , Netaji was considered a terrorist up till the 70's by the Cong. Govt. while Rajiv & Indira are considered Great DeshBhaktas.

This is very vague for a non-indian person, but what I get from it is that our hero's only consist of artists and sports people. This is exactly what I found when I was looking at Category:Living_people on Wikipedia. The best documented people were part of popular culture; either sports or singing. Yet they don't really inspire me and they are part of a culture that you will get to know whether you want to or not anyway.

The website is a place where you can spend your time with your real hero's (from popular culture or not). Through the Wicked Wiki software it finds colleagues of your hero. This is exciting, because you may never have heard from these people! At the same time these hero's are put into the perspective of popular culture by comparing their popularity with the most popular artist on Youtube at this moment. Every hero is a certain percentage of Bieber on the website, which is calculated by relating Youtube views of Bieber and your hero. Bieber has more than 500 million Youtube views and because of this, most hero's don't make a 1% Bieber rating.

The "How Bieber is Your Hero" website is a tool that can help you diminish the information overload. At the same time it makes you aware about the relation between your references and the icons in popular culture.