Calendars:Networked Media Calendar/Networked Media Calendar/16-03-2011 -Event 1
11-18 | Nicolas Maleve - Thematic Project
Cookbook Recipes for Goodiff Workshop
- Simplifying_HTML_by_removing_"invisible"_parts
- Stripping all the tags from HTML to get pure text
- Looking up synonym-sets for a word
- Splitting text into sentences
- Removing common words / stopwords
- Finding capitalized words
- Extracting parts of an HTML document
- Extracting the text contents of a node
- Turning part of a page back into code (aka serialization)
- TOS selected words frequency in time (by Dusan and Natasa)
- Simple statistics TOS
- TOS Game