2009 206

From XPUB & Lens-Based wiki

Toward a navigable text

Acquiring

Today we are working with the text of 10 poems by Edgar Allen Poe, from Project Gutenberg.

Processing

import sys, re
wc = {}

for line in sys.stdin:
    line = line.rstrip()
    words = re.split("[^a-zA-Z]*", line)
    for word in words:
        word=word.lower()
        if word:
            wc[word]=wc.get(word, 0)+1


allwords = wc.keys()
allwords.sort()
for word in allwords:
    print word, wc[word]

Now we make a function that takes a file and turns it into a "word count dictionary". Then we can use this function on different poems.

Visualising

Interacting