User:Joca/python-experiments: Difference between revisions
No edit summary |
No edit summary |
||
Line 76: | Line 76: | ||
== Wordtagger V2 == | == Wordtagger V2 == | ||
Based on V1, Wordtagger V2 tags the text for Part-of-Speech, stopwords and sentiments. These are saved in a python dictionary. Based on the output an html page is generated using jinja2. Using javascript and data attributes, the content is swapped after a click by the user. I presented this version at the beta launch at Varia. Based on the feedback I changed the way the words and tags are visualized in the reading interface, to improve the readability. | |||
[https://madebyjoca.com/xpub/wordtagger/index.html Example of the output >>] | |||
<syntaxhighlight lang="python" line='line'> | |||
#code | |||
</syntaxhighlight> |
Revision as of 09:46, 28 March 2018
Word Tagger V1
This script reads an input text, tokenized the words and runs a Part-of-Speech tagger. The tags are changed into human readable equivalents, which are saved in a list. The script joins the list items into a string, which is printed in the terminal.
import nltk
# Step 1: define input and set up a list
input = 'input/kittler.txt'
taggedwordlist = []
txtfile = open(input, 'r')
string = txtfile.read()
words = nltk.word_tokenize(string)
taggedwordlist = nltk.pos_tag(words)
for word, pos in nltk.pos_tag(words):
taggedwordlist = nltk.pos_tag(words)
print('{0} is a {1}'.format(word,pos))
taglist = [ pos for word,pos in taggedwordlist ]
#print(taglist)
readabletaglist = []
for tag in taglist:
if tag in {"NNP","NNS","NN","NNPS"}:
readabletag = 'noun'
elif tag in {'VB','VBD','VBG','VBN','VBP','VBZ'}:
readabletag = 'verb'
elif tag in {'RB','RBR','RBS','WRB'}:
readabletag = 'adverb'
elif tag in {'PRP','PRP$'}:
readabletag = 'pronoun'
elif tag in {'JJ','JJR','JJS'}:
readabletag = 'adjective'
elif tag == 'IN':
readabletag = 'preposition'
elif tag == 'WDT':
readabletag = 'determiner'
elif tag in {'WP','WP$'}:
readabletag = 'pronoun'
elif tag == 'UH':
readabletag = 'interjection'
elif tag == 'POS':
readabletag = 'possesive ending'
elif tag == 'SYM':
readabletag = 'symbol'
elif tag == 'EX':
readabletag = 'existential there'
elif tag == 'DT':
readabletag = 'determiner'
elif tag == 'MD':
readabletag = 'modal'
elif tag == 'LS':
readabletag = 'list item marker'
elif tag == 'FW':
readabletag = 'foreign word'
elif tag == 'CC':
readabletag = 'coordinating conjunction '
elif tag == 'CD':
readabletag = 'cardinal number'
elif tag == 'TO':
readabletag = 'to'
elif tag == '.':
readabletag = 'line ending'
elif tag == ',':
readabletag = 'comma'
else:
readabletag = tag
readabletaglist.append(readabletag)
print(' '.join(readabletaglist))
Wordtagger V2
Based on V1, Wordtagger V2 tags the text for Part-of-Speech, stopwords and sentiments. These are saved in a python dictionary. Based on the output an html page is generated using jinja2. Using javascript and data attributes, the content is swapped after a click by the user. I presented this version at the beta launch at Varia. Based on the feedback I changed the way the words and tags are visualized in the reading interface, to improve the readability.
#code