User:Laurier Rochon/prototyping/npl: Difference between revisions
From XPUB & Lens-Based wiki
|
|
(6 intermediate revisions by the same user not shown) |
Line 1: |
Line 1: |
| == NPL with NLTK/Python ==
| |
|
| |
|
| *Count number of words (word tokens) : len(text)
| |
| *Count number of distinct words (word types) : len(set(text))
| |
| *The diversity of a text can be found with : len(text) / len(set(text))
| |
| *Dispersion plot : shows you usage of certain words in time (useful for quick overviews) (i.e. text.dispersion_plot(['of','the']))
| |
| *Collocations : 2 words that are almost always together (i.e. red wine) text.collocations()
| |
| *Join/split to create strings/lists from delimiters
| |
| *All the words starting with B in text 5. Sorted and unique words only : sorted([w for w in set(text5) if w.startswith('b')])
| |
| *Find exact occurrence of a word = text.index('word')
| |
Latest revision as of 18:49, 18 November 2010