Calendars:Networked Media Calendar/Networked Media Calendar/16-03-2011 -Event 1: Difference between revisions
Line 1: | Line 1: | ||
11-18 | Nicolas Maleve - Thematic Project | 11-18 | Nicolas Maleve - Thematic Project | ||
= Cookbook Recipes for Goodiff Workshop = | === Cookbook Recipes for Goodiff Workshop === | ||
* [[Simplifying_HTML_by_removing_"invisible"_parts]] | * [[Simplifying_HTML_by_removing_"invisible"_parts]] | ||
* [[Stripping all the tags from HTML to get pure text]] | * [[Stripping all the tags from HTML to get pure text]] | ||
* [[Looking up synonym-sets for a word]] | * [[Looking up synonym-sets for a word]] | ||
* [[Splitting text into sentences]] | * [[Splitting text into sentences]] | ||
* [[Removing common words / stopwords]] | * [[Removing common words / stopwords]] | ||
* [[Finding capitalized words]] | * [[Finding capitalized words]] | ||
* [[Extracting parts of an HTML document]] | * [[Extracting parts of an HTML document]] | ||
* [[Extracting the text contents of a node]] | * [[Extracting the text contents of a node]] | ||
* [[Turning part of a page back into code (aka serialization)]] | * [[Turning part of a page back into code (aka serialization)]] |
Revision as of 14:03, 16 March 2011
11-18 | Nicolas Maleve - Thematic Project
Cookbook Recipes for Goodiff Workshop
- Simplifying_HTML_by_removing_"invisible"_parts
- Stripping all the tags from HTML to get pure text
- Looking up synonym-sets for a word
- Splitting text into sentences
- Removing common words / stopwords
- Finding capitalized words
- Extracting parts of an HTML document
- Extracting the text contents of a node
- Turning part of a page back into code (aka serialization)