Calendars:Networked Media Calendar/Networked Media Calendar/16-03-2011 -Event 1: Difference between revisions
No edit summary |
No edit summary |
||
Line 13: | Line 13: | ||
* [[Extracting the text contents of a node]] | * [[Extracting the text contents of a node]] | ||
* [[Turning part of a page back into code (aka serialization)]] | * [[Turning part of a page back into code (aka serialization)]] | ||
; History TOS mashup | |||
* [https://spreadsheets.google.com/pub?key=0AgT6KLPteXsOdF84Y0F3RWpxQnQ2ODFOLVA3RG9XWFE&output=html Facebook TOS] | |||
* [https://spreadsheets.google.com/pub?key=0AgT6KLPteXsOdHRuczQxUEU4dWxjWmNjaUtKb2JfM1E&single=true&gid=0&output=html Skype TOS] |
Revision as of 17:32, 16 March 2011
11-18 | Nicolas Maleve - Thematic Project
Cookbook Recipes for Goodiff Workshop
- Simplifying_HTML_by_removing_"invisible"_parts
- Stripping all the tags from HTML to get pure text
- Looking up synonym-sets for a word
- Splitting text into sentences
- Removing common words / stopwords
- Finding capitalized words
- Extracting parts of an HTML document
- Extracting the text contents of a node
- Turning part of a page back into code (aka serialization)
- History TOS mashup