User:Bohye Woo/nltk-Terms of Service: Difference between revisions

From XPUB & Lens-Based wiki
No edit summary
 
(11 intermediate revisions by the same user not shown)
Line 9: Line 9:
cd to the folder where "venv" is and...
cd to the folder where "venv" is and...
<source lang="python">
<source lang="python">
     source venb/bin/activate
     source venv/bin/activate
</source>
</source>


===NLTK===
==NLTK==
===NLTK beginner===
I started using FaceApp's ToS for NLTK experiments.
I started using FaceApp's ToS for NLTK experiments.
Original Terms of Service from FaceApp:
Original Terms of Service from FaceApp:
Line 70: Line 71:


<source lang="python">
<source lang="python">
//////resrult///////
//////result///////
Displaying 11 of 11 matches:
Displaying 11 of 11 matches:
t about FaceApp or our products or Services ( collectively , “ Feedback ” ) ,  
t about FaceApp or our products or Services ( collectively , “ Feedback ” ) ,  
Line 98: Line 99:


<source lang="python">
<source lang="python">
//////resrult///////
//////result///////
rights operation
rights operation
</source>
</source>
Line 146: Line 147:
</source>
</source>
<source lang="python">
<source lang="python">
//////resrult///////
//////result///////
content;16
content;16
user;14
user;14
Line 199: Line 200:
</source>
</source>


===NLTK POS-tagger===
*List
CC coordinating conjunction <br>
CD cardinal digit<br>
DT determiner<br>
EX existential there (like: “there is” … think of it like “there exists”)<br>
FW foreign word<br>
IN preposition/subordinating conjunction<br>
JJ adjective ‘big’<br>
JJR adjective, comparative ‘bigger’<br>
JJS adjective, superlative ‘biggest’<br>
LS list marker 1)<br>
MD modal could, will<br>
NN noun, singular ‘desk’<br>
NNS noun plural ‘desks’<br>
NNP proper noun, singular ‘Harrison’<br>
NNPS proper noun, plural ‘Americans’<br>
PDT predeterminer ‘all the kids’<br>
POS possessive ending parent’s<br>
PRP personal pronoun I, he, she<br>
PRP$ possessive pronoun my, his, hers<br>
RB adverb very, silently,<br>
RBR adverb, comparative better<br>
RBS adverb, superlative best<br>
RP particle give up<br>
TO, to go ‘to’ the store.<br>
UH interjection, errrrrrrrm <br>
VB verb, base form take<br>
VBD verb, past tense, took<br>
VBG verb, gerund/present participle taking<br>
VBN verb, past participle is taken<br>
VBP verb, sing. present, known-3d take<br>
VBZ verb, 3rd person sing. present takes<br>
WDT wh-determiner which<br>
WP wh-pronoun who, what<br>
WP$ possessive wh-pronoun whose<br>
WRB wh-adverb where, when<br>
*Using a Tagger: Pos_tagger
<source lang="python">
>>> from nltk import sent_tokenize, word_tokenize, pos_tag
>>> text = "You hereby grant to FaceApp a fully paid, royalty-free, perpetual, irrevocable, worldwide, non-exclusive, and fully sublicensable right and license to use, reproduce, perform, display, distribute, adapt, modify, re-format, create derivative works of, and otherwise commercially or non-commercially exploit in any manner, any and all Feedback, and to sublicense the foregoing rights, in connection with the operation and maintenance of the Services and/or FaceApp’s business."
>>> sent_tokenize(text)
['You hereby grant to FaceApp a fully paid, royalty-free, perpetual, irrevocable, worldwide, non-exclusive, and fully sublicensable right and license to use, reproduce, perform, display, distribute, adapt, modify, re-format, create derivative works of, and otherwise commercially or non-commercially exploit in any manner, any and all Feedback, and to sublicense the foregoing rights, in connection with the operation and maintenance of the Services and/or FaceApp’s business.']
>>> [word_tokenize(sent) for sent in sent_tokenize(text)]
[['You', 'hereby', 'grant', 'to', 'FaceApp', 'a', 'fully', 'paid', ',', 'royalty-free', ',', 'perpetual', ',', 'irrevocable', ',', 'worldwide', ',', 'non-exclusive', ',', 'and', 'fully', 'sublicensable', 'right', 'and', 'license', 'to', 'use', ',', 'reproduce', ',', 'perform', ',', 'display', ',', 'distribute', ',', 'adapt', ',', 'modify', ',', 're-format', ',', 'create', 'derivative', 'works', 'of', ',', 'and', 'otherwise', 'commercially', 'or', 'non-commercially', 'exploit', 'in', 'any', 'manner', ',', 'any', 'and', 'all', 'Feedback', ',', 'and', 'to', 'sublicense', 'the', 'foregoing', 'rights', ',', 'in', 'connection', 'with', 'the', 'operation', 'and', 'maintenance', 'of', 'the', 'Services', 'and/or', 'FaceApp', '’', 's', 'business', '.']]
>>> [pos_tag(word_tokenize(sent))for sent in sent_tokenize(text)]
[[('You', 'PRP'), ('hereby', 'VBP'), ('grant', 'JJ'), ('to', 'TO'), ('FaceApp', 'VB'), ('a', 'DT'), ('fully', 'RB'), ('paid', 'VBN'), (',', ','), ('royalty-free', 'JJ'), (',', ','), ('perpetual', 'JJ'), (',', ','), ('irrevocable', 'JJ'), (',', ','), ('worldwide', 'JJ'), (',', ','), ('non-exclusive', 'JJ'), (',', ','), ('and', 'CC'), ('fully', 'RB'), ('sublicensable', 'JJ'), ('right', 'NN'), ('and', 'CC'), ('license', 'NN'), ('to', 'TO'), ('use', 'VB'), (',', ','), ('reproduce', 'VB'), (',', ','), ('perform', 'VB'), (',', ','), ('display', 'NN'), (',', ','), ('distribute', 'NN'), (',', ','), ('adapt', 'NN'), (',', ','), ('modify', 'VB'), (',', ','), ('re-format', 'JJ'), (',', ','), ('create', 'JJ'), ('derivative', 'JJ'), ('works', 'NNS'), ('of', 'IN'), (',', ','), ('and', 'CC'), ('otherwise', 'RB'), ('commercially', 'RB'), ('or', 'CC'), ('non-commercially', 'RB'), ('exploit', 'NNS'), ('in', 'IN'), ('any', 'DT'), ('manner', 'NN'), (',', ','), ('any', 'DT'), ('and', 'CC'), ('all', 'DT'), ('Feedback', 'NNP'), (',', ','), ('and', 'CC'), ('to', 'TO'), ('sublicense', 'VB'), ('the', 'DT'), ('foregoing', 'NN'), ('rights', 'NNS'), (',', ','), ('in', 'IN'), ('connection', 'NN'), ('with', 'IN'), ('the', 'DT'), ('operation', 'NN'), ('and', 'CC'), ('maintenance', 'NN'), ('of', 'IN'), ('the', 'DT'), ('Services', 'NNPS'), ('and/or', 'NN'), ('FaceApp', 'NNP'), ('’', 'NNP'), ('s', 'NN'), ('business', 'NN'), ('.', '.')]]
</source>
<source lang="python">
</source>
<source lang="python">
</source>




<source lang="python">
<source lang="python">
</source>
</source>

Latest revision as of 15:08, 30 March 2020

Virtual Environment

To create a virtual environment

cd to the place you want to make it and...

    python3 -m venv venv

To activate a virtual environment

cd to the folder where "venv" is and...

    source venv/bin/activate

NLTK

NLTK beginner

I started using FaceApp's ToS for NLTK experiments. Original Terms of Service from FaceApp:

Any questions, comments, suggestions, ideas, original or creative materials or other information you submit about FaceApp or our products or Services (collectively, “Feedback”), is non-confidential and we have no obligations (including without limitation obligations of confidentiality) with respect to such Feedback. You hereby grant to FaceApp a fully paid, royalty-free, perpetual, irrevocable, worldwide, non-exclusive, and fully sublicensable right and license to use, reproduce, perform, display, distribute, adapt, modify, re-format, create derivative works of, and otherwise commercially or non-commercially exploit in any manner, any and all Feedback, and to sublicense the foregoing rights, in connection with the operation and maintenance of the Services and/or FaceApp’s business.

- If you choose to login to the Services via a third-party platform or social media network, you will need to use your credentials (e.g., username and password) from a third-party online platform. You must maintain the security of your third party account and promptly notify us if you discover or suspect that someone has accessed your account without your permission. If you permit others to use your account credentials, you are responsible for the activities of such users that occur in connection with your account.

- 4. User Content
Our Services may allow you and other users to create, post, store and share content, including photos, videos, messages, text, software and other materials (collectively, “User Content”). User Content does not include user-generated filters. Subject to this Agreement and the Privacy Policy, you retain all rights in and to your User Content, as between you and FaceApp. Further, FaceApp does not claim ownership of any User Content that you post on or through the Services. You grant FaceApp a nonexclusive, royalty-free, worldwide, fully paid license to use, reproduce, modify, adapt, create derivative works from, distribute, perform and display your User Content during the term of this Agreement solely to provide you with the Services.

You acknowledge that some of the Services are supported by advertising revenue and may display advertisements and promotions, and you hereby agree that FaceApp may place such advertising and promotions on the Services or on, about, or in conjunction with your User Content. The manner, mode and extent of such advertising and promotions are subject to change without specific notice to you. You acknowledge that we may not always identify paid services, sponsored content, or commercial communications as such.

You represent and warrant that: (i) you own or otherwise have the right to use the User Content modified by you on or through the Services in accordance with the rights and licenses set forth in this Agreement; (ii) you agree to pay for all royalties, fees, and any other monies owed by reason of User Content you stylize on or through the Services; and (iii) you have the legal right and capacity to enter into this Agreement in your jurisdiction.

You may not create, post, store or share any User Content that violates this Agreement or for which you do not have all the rights necessary to grant us the license described above. Although we have no obligation to screen, edit or monitor User Content, we may delete or remove User Content at any time and for any reason.

FaceApp is not a backup service and you agree that you will not rely on the Services for the purposes of User Content backup or storage. FaceApp will not be liable to you for any modification, suspension, or discontinuation of the Services, or the loss of any User Content.


Tokenize

>>> import nltk
>>> text = "If you choose to login to the Services via a third-party platform or social media network, you will need to use your credentials."
>>> token = nltk.word_tokenize(text)
>>> token
['If', 'you', 'choose', 'to', 'login', 'to', 'the', 'Services', 'via', 'a', 'third-party', 'platform', 'or', 'social', 'media', 'network', ',', 'you', 'will', 'need', 'to', 'use', 'your', 'credentials', '.']

sort

>>> token.sort()
>>> token
[',', '.', 'If', 'Services', 'a', 'choose', 'credentials', 'login', 'media', 'need', 'network', 'or', 'platform', 'social', 'the', 'third-party', 'to', 'to', 'to', 'use', 'via', 'will', 'you', 'you', 'your']

collections

>>> import collections
>>> collections.Counter(token)
Counter({'to': 3, 'you': 2, ',': 1, '.': 1, 'If': 1, 'Services': 1, 'a': 1, 'choose': 1, 'credentials': 1, 'login': 1, 'media': 1, 'need': 1, 'network': 1, 'or': 1, 'platform': 1, 'social': 1, 'the': 1, 'third-party': 1, 'use': 1, 'via': 1, 'will': 1, 'your': 1})

Concordance

nltk.download("stopwords")

file=open('faceapp.txt','r')
raw=file.read()
tokens = nltk.word_tokenize(raw)
faceapp = nltk.Text(tokens)

faceapp.concordance('services')
//////result///////
Displaying 11 of 11 matches:
t about FaceApp or our products or Services ( collectively ,  Feedback  ) , 
e operation and maintenance of the Services and/or FaceApp  s business . - If
 . - If you choose to login to the Services via a third-party platform or soci
connection with your account . Our Services may allow you and other users to c
nt that you post on or through the Services . You grant FaceApp a nonexclusive
ent solely to provide you with the Services . You acknowledge that some of the
. You acknowledge that some of the Services are supported by advertising reven
 advertising and promotions on the Services or on , about , or in conjunction 
at we may not always identify paid services , sponsored content , or commercia
 modified by you on or through the Services in accordance with the rights and 
tent you stylize on or through the Services ; and ( iii ) you have the legal r

Similar

nltk.download("stopwords")

file=open('faceapp.txt','r')
raw=file.read()
tokens = nltk.word_tokenize(raw)
faceapp = nltk.Text(tokens)

faceapp.concordance('services')
//////result///////
rights operation


Output top 50 words

import sys
import codecs
import nltk
from nltk.corpus import stopwords

# NLTK's default English stopwords
default_stopwords = set(nltk.corpus.stopwords.words('english'))

#read stop words from a file (one stopword per line, UTF-8)
stopwords_file = './stopwords.txt'
custom_stopwords = set(codecs.open('stopwords.txt', 'r', 'utf-8').read().splitlines())

all_stopwords = default_stopwords | custom_stopwords

file = open('faceapp.txt','r')
raw = file.read()
tokens = nltk.word_tokenize(raw)
faceapp = nltk.Text(tokens)

# Remove single-character tokens (mostly punctuation)
tokens = [word for word in tokens if len(word) > 1]

# Remove numbers
tokens = [word for word in tokens if not word.isnumeric()]

# Lowercase all words (default_stopwords are lowercase too)
tokens = [word.lower() for word in tokens]

# Remove stopwords
tokens = [word for word in tokens if word not in all_stopwords]

# Calculate frequency distribution
fdist = nltk.FreqDist(tokens)

# Output top 50 words
for word, frequency in fdist.most_common(10):
    print(u'{};{}'.format(word, frequency))
//////result///////
content;16
user;14
services;13
faceapp;9
may;6
use;5
agreement;5
create;4
rights;4
account;4
feedback;3
without;3
grant;3
fully;3
paid;3
right;3
license;3
display;3
post;3
advertising;3
promotions;3
agree;3
materials;2
collectively;2
obligations;2
including;2
hereby;2
royalty-free;2
worldwide;2
reproduce;2
perform;2
distribute;2
adapt;2
modify;2
derivative;2
works;2
otherwise;2
manner;2
connection;2
third-party;2
platform;2
credentials;2
us;2
users;2
store;2
share;2
subject;2
acknowledge;2
reason;2
backup;2


NLTK POS-tagger

  • List

CC coordinating conjunction
CD cardinal digit
DT determiner
EX existential there (like: “there is” … think of it like “there exists”)
FW foreign word
IN preposition/subordinating conjunction
JJ adjective ‘big’
JJR adjective, comparative ‘bigger’
JJS adjective, superlative ‘biggest’
LS list marker 1)
MD modal could, will
NN noun, singular ‘desk’
NNS noun plural ‘desks’
NNP proper noun, singular ‘Harrison’
NNPS proper noun, plural ‘Americans’
PDT predeterminer ‘all the kids’
POS possessive ending parent’s
PRP personal pronoun I, he, she
PRP$ possessive pronoun my, his, hers
RB adverb very, silently,
RBR adverb, comparative better
RBS adverb, superlative best
RP particle give up
TO, to go ‘to’ the store.
UH interjection, errrrrrrrm
VB verb, base form take
VBD verb, past tense, took
VBG verb, gerund/present participle taking
VBN verb, past participle is taken
VBP verb, sing. present, known-3d take
VBZ verb, 3rd person sing. present takes
WDT wh-determiner which
WP wh-pronoun who, what
WP$ possessive wh-pronoun whose
WRB wh-adverb where, when

  • Using a Tagger: Pos_tagger
>>> from nltk import sent_tokenize, word_tokenize, pos_tag

>>> text = "You hereby grant to FaceApp a fully paid, royalty-free, perpetual, irrevocable, worldwide, non-exclusive, and fully sublicensable right and license to use, reproduce, perform, display, distribute, adapt, modify, re-format, create derivative works of, and otherwise commercially or non-commercially exploit in any manner, any and all Feedback, and to sublicense the foregoing rights, in connection with the operation and maintenance of the Services and/or FaceApp’s business."

>>> sent_tokenize(text)
['You hereby grant to FaceApp a fully paid, royalty-free, perpetual, irrevocable, worldwide, non-exclusive, and fully sublicensable right and license to use, reproduce, perform, display, distribute, adapt, modify, re-format, create derivative works of, and otherwise commercially or non-commercially exploit in any manner, any and all Feedback, and to sublicense the foregoing rights, in connection with the operation and maintenance of the Services and/or FaceApp’s business.']

>>> [word_tokenize(sent) for sent in sent_tokenize(text)]
[['You', 'hereby', 'grant', 'to', 'FaceApp', 'a', 'fully', 'paid', ',', 'royalty-free', ',', 'perpetual', ',', 'irrevocable', ',', 'worldwide', ',', 'non-exclusive', ',', 'and', 'fully', 'sublicensable', 'right', 'and', 'license', 'to', 'use', ',', 'reproduce', ',', 'perform', ',', 'display', ',', 'distribute', ',', 'adapt', ',', 'modify', ',', 're-format', ',', 'create', 'derivative', 'works', 'of', ',', 'and', 'otherwise', 'commercially', 'or', 'non-commercially', 'exploit', 'in', 'any', 'manner', ',', 'any', 'and', 'all', 'Feedback', ',', 'and', 'to', 'sublicense', 'the', 'foregoing', 'rights', ',', 'in', 'connection', 'with', 'the', 'operation', 'and', 'maintenance', 'of', 'the', 'Services', 'and/or', 'FaceApp', '’', 's', 'business', '.']]

>>> [pos_tag(word_tokenize(sent))for sent in sent_tokenize(text)]
[[('You', 'PRP'), ('hereby', 'VBP'), ('grant', 'JJ'), ('to', 'TO'), ('FaceApp', 'VB'), ('a', 'DT'), ('fully', 'RB'), ('paid', 'VBN'), (',', ','), ('royalty-free', 'JJ'), (',', ','), ('perpetual', 'JJ'), (',', ','), ('irrevocable', 'JJ'), (',', ','), ('worldwide', 'JJ'), (',', ','), ('non-exclusive', 'JJ'), (',', ','), ('and', 'CC'), ('fully', 'RB'), ('sublicensable', 'JJ'), ('right', 'NN'), ('and', 'CC'), ('license', 'NN'), ('to', 'TO'), ('use', 'VB'), (',', ','), ('reproduce', 'VB'), (',', ','), ('perform', 'VB'), (',', ','), ('display', 'NN'), (',', ','), ('distribute', 'NN'), (',', ','), ('adapt', 'NN'), (',', ','), ('modify', 'VB'), (',', ','), ('re-format', 'JJ'), (',', ','), ('create', 'JJ'), ('derivative', 'JJ'), ('works', 'NNS'), ('of', 'IN'), (',', ','), ('and', 'CC'), ('otherwise', 'RB'), ('commercially', 'RB'), ('or', 'CC'), ('non-commercially', 'RB'), ('exploit', 'NNS'), ('in', 'IN'), ('any', 'DT'), ('manner', 'NN'), (',', ','), ('any', 'DT'), ('and', 'CC'), ('all', 'DT'), ('Feedback', 'NNP'), (',', ','), ('and', 'CC'), ('to', 'TO'), ('sublicense', 'VB'), ('the', 'DT'), ('foregoing', 'NN'), ('rights', 'NNS'), (',', ','), ('in', 'IN'), ('connection', 'NN'), ('with', 'IN'), ('the', 'DT'), ('operation', 'NN'), ('and', 'CC'), ('maintenance', 'NN'), ('of', 'IN'), ('the', 'DT'), ('Services', 'NNPS'), ('and/or', 'NN'), ('FaceApp', 'NNP'), ('’', 'NNP'), ('s', 'NN'), ('business', 'NN'), ('.', '.')]]