User:Laurier Rochon/prototyping/npl

From XPUB & Lens-Based wiki
< User:Laurier Rochon
Revision as of 16:54, 18 November 2010 by Laurier Rochon (talk | contribs) (Created page with "== NPL with NLTK/Python == *Count number of words (word tokens) : len(text) *Count number of distinct words (word types) : len(set(text)) *The diversity of a text can be found w...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

NPL with NLTK/Python

  • Count number of words (word tokens) : len(text)
  • Count number of distinct words (word types) : len(set(text))
  • The diversity of a text can be found with : len(text) / len(set(text))
  • Dispersion plot : shows you usage of certain words in time (useful for quick overviews) (i.e. text.dispersion_plot(['of','the']))
  • Collocations : 2 words that are almost always together (i.e. red wine) text.collocations()
  • Join/split to create strings/lists from delimiters
  • All the words starting with B in text 5. Sorted and unique words only : sorted([w for w in set(text5) if w.startswith('b')])
  • Find exact occurrence of a word = text.index('word')