User:Joca/python-experiments: Difference between revisions

From XPUB & Lens-Based wiki
(Created page with "=== Word Tagger === <syntaxhighlight lang="python" line='line'> import nltk # Step 1: define input and set up a list input = 'input/kittler.txt' taggedwordlist = [] txtfile...")
 
Line 1: Line 1:
=== Word Tagger ===
== Word Tagger V1 ==
This script reads an input text, tokenized the words and runs a Part-of-Speech tagger. The tags are changed into human readable equivalents, which are saved in a list. The script joins the list items into a string, which is printed in the terminal.


<syntaxhighlight lang="python" line='line'>
<syntaxhighlight lang="python" line='line'>

Revision as of 20:52, 27 March 2018

Word Tagger V1

This script reads an input text, tokenized the words and runs a Part-of-Speech tagger. The tags are changed into human readable equivalents, which are saved in a list. The script joins the list items into a string, which is printed in the terminal.

import nltk

# Step 1: define input and set up a list
input = 'input/kittler.txt'
taggedwordlist = []

txtfile = open(input, 'r')
string = txtfile.read()
words = nltk.word_tokenize(string)
taggedwordlist = nltk.pos_tag(words)

for word, pos in nltk.pos_tag(words):
    taggedwordlist = nltk.pos_tag(words)
    print('{0} is a {1}'.format(word,pos))

taglist = [ pos for word,pos in taggedwordlist ]

#print(taglist)

readabletaglist = []

for tag in taglist:
    if tag in {"NNP","NNS","NN","NNPS"}:
        readabletag = 'noun'
    elif tag in {'VB','VBD','VBG','VBN','VBP','VBZ'}:
        readabletag = 'verb'
    elif tag in {'RB','RBR','RBS','WRB'}:
        readabletag = 'adverb'
    elif tag in {'PRP','PRP$'}:
        readabletag = 'pronoun'
    elif tag in {'JJ','JJR','JJS'}:
        readabletag = 'adjective'
    elif tag == 'IN':
        readabletag = 'preposition'
    elif tag == 'WDT':
        readabletag = 'determiner'
    elif tag in {'WP','WP$'}:
        readabletag = 'pronoun'
    elif tag == 'UH':
        readabletag = 'interjection'
    elif tag == 'POS':
        readabletag = 'possesive ending'
    elif tag == 'SYM':
        readabletag = 'symbol'
    elif tag == 'EX':
        readabletag = 'existential there'
    elif tag == 'DT':
        readabletag = 'determiner'
    elif tag == 'MD':
        readabletag = 'modal'
    elif tag == 'LS':
        readabletag = 'list item marker'
    elif tag == 'FW':
        readabletag = 'foreign word'
    elif tag == 'CC':
        readabletag = 'coordinating conjunction '
    elif tag == 'CD':
        readabletag = 'cardinal number'
    elif tag == 'TO':
        readabletag = 'to'
    elif tag == '.':
        readabletag = 'line ending'
    elif tag == ',':
        readabletag = 'comma'
    else:
        readabletag = tag

    readabletaglist.append(readabletag)

print(' '.join(readabletaglist))