Mail to Alexandre

From XPUB & Lens-Based wiki

Hello Alexandre,

Comment vas-tu?

Le reste de cet email sera en anglais, car en copie il est aussi envoye aux etudiants du master de media design du Piet Zwart Institute. Pendant deux jours avec eux, nous avons travaille sur les questions relatives aux donnees utilisateurs et nous avons pris Goodiff comme point de depart pour l'atelier. On s'est dit que tu pourrais trouver interessant de voir comment l'outil que tu as mis en place pourrait etre utilise.

The workshop participants were divided in 4 groups. They all used the dataset from Goodiff as a starting point for quick projects (only a day and a half). Here is the breakdown of the groups/themes:


Goodiff Terms of Service word frequency

Code returns the frequencies of selected words in TOS defined in path variable. The idea is to see what are the important terms in the selected pages and how this evolves through time. http://pzwart3.wdka.hro.nl/wiki/Goodiff_TOS_word_frequency


Ambiguity

Legal terminology used in terms and conditions policies are often ambiguous and arbitrary. They wanted to highlight this ambiguity by showing the incongruity of definitions dependent on other factors often not implicitly explained. The code displayed on the page checks the most used terms in the legal documents against Wordnet to see the multiple meanings of these words. http://pzwart3.wdka.hro.nl/wiki/16-03-2011_Laura_Amy_Laurier


Timeline

Assuming Goodiff has a valid reason to monitor a particular document, we set out to visualize the difference it records between updates of documents. Rather than highlighting the changes, our initial proposal was to white out what was added or changed. Conceptually, theoretically, and ideally, the document will slowly disappear over time. http://pzwart3.wdka.hro.nl/wiki/User:Inge_Hoonte/workshop_TOS


Game Labyrinth

The prototype works as a text-based adventure game where the player is in a fantasy world. S/he is lost in the terms of service of Facebook and is trying to navigate through it. http://pzwart3.wdka.hro.nl/wiki/16-03-2011_Danny_Fabien_Mirjam

We were all very excited to use the dataset and very interested to work with such a rich material. To be able to use Goodiff as a base for vizualisation and interpretation, the experience of the workshop has helped to formulate a sort of wishlist: - find a way to deal with the changing nature of the web. ie redirects: Google's TOS is not crawled, see dataset/google/www.google.com/terms_of_service.html (which redirects to accounts/TOS which is empty) - cope with the language settings detection: last three years of Facebook's TOS is crawled in French only. - post-processing scripts that could help to peel off the navigation bars and other secondary elements (when navigation links change, the document is considered changed even if the legal text has not been modified).

We realize the great potential of Goodiff and are impatient to launch other experiments. Also, is there a way we can help further?


Alex and Michael Feedback

Open Issues/Idea

Critical (open points)

  • Correct handling of HTTP redirects for tracked URLs
  • Automatic language detection
    • Especially with URL with GeoIP detection (facebook or

paypal have the tendency to switch languages when their GeoIP techniques change)

  • Extract relevant content, i.e. remove navigation bars etc.

-> testing http://code.google.com/p/boilerpipe/

New functionalities