User:Pleun/grad/altrightlexicon-practice: Difference between revisions

Latest revision as of 16:51, 5 December 2017

The implementation and part-creation of a tool that will gather jargon on (a) main right-wing forum(s), through Pattern (a Natural Language Processing library in Python), because of my believe this jargon captures the core of their ideology. I will focus on Reddits subforum The Red Pill first, and then on 4chans subforum /pol/ second. I believe those are two of the most highly influential subforums in the Manosphere and the Alt-Right media-bubble.

I plan to use text-scraping to scrape the forum and then start to filter out content so I would be left with non-dictionary words. Michael told me I could use a text minus text method, where you, for instance, extract all the words used in a New York Times article, from the words used in a forum thread. Then I also want to look at which words, nouns/adjectives are used most (not only to get insight into popularity but also to filter out spelling mistakes). Next, I can also look at sentiment. Is a text negative or positive and to which standards? The difficulty of a text could be tested with the Flesch/Kincaid Readability Test, where word-length, syllables and sentence length are taken into account. This test was used in a few news articles to grade the vocabulary of the election candidates in the 2016 US election. As expected, Drumpf scored lowest with a 4, while Bernie scored highest with an 8.

Urban Dictionary could provide meaning to the jargon that's found.

Natural Language Processing

– https://pypi.python.org/pypi/redditnlp/0.1.3
– Python Pattern Library: Sentiment

Bag-of-words model

Urban Dictionary

http://api.urbandictionary.com/v0/define?term=word
https://market.mashape.com/community/urban-dictionary

Flesch/Kincaid Readability Test

https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests

The Scuntorpe Problem

https://en.wikipedia.org/wiki/Scunthorpe_problem

@@ Line 1: / Line 1: @@
 The implementation and part-creation of a tool that will gather jargon on (a) main right-wing forum(s), through Pattern (a Natural Language Processing library in Python), because of my believe this jargon captures the core of their ideology. I will focus on Reddits subforum The Red Pill first, and then on 4chans subforum /pol/ second. I believe those are two of the most highly influential subforums in the Manosphere and the Alt-Right media-bubble.
-I plan to use text-scraping to scrape the forum and then start to filter out content so I would be left with non-dictionary words. Michael told me I could use a text minus text method, where you, for instance, extract all the words used in a New York Times article, from the words used in a forum thread. Then I also want to look at which words, nouns/adjectives are used most (not only to get insight into popularity but also to filter out spelling mistakes). Next, I can also look at sentiment. Is a text negative or positive and to which standards? The difficulty of a text could be tested with the Flesch/Kincaid Readability Test, where word-length, syllables and sentence length are taken into account. This test was used in a few news articles to grade the vocabulary of the election candidates in the 2016 US election. As expected, Trump scored lowest with a 4, while Bernie scored highest with an 8.
+I plan to use text-scraping to scrape the forum and then start to filter out content so I would be left with non-dictionary words. Michael told me I could use a text minus text method, where you, for instance, extract all the words used in a New York Times article, from the words used in a forum thread. Then I also want to look at which words, nouns/adjectives are used most (not only to get insight into popularity but also to filter out spelling mistakes). Next, I can also look at sentiment. Is a text negative or positive and to which standards? The difficulty of a text could be tested with the Flesch/Kincaid Readability Test, where word-length, syllables and sentence length are taken into account. This test was used in a few news articles to grade the vocabulary of the election candidates in the 2016 US election. As expected, Drumpf scored lowest with a 4, while Bernie scored highest with an 8.
@@ Line 19: / Line 19: @@
 * Flesch/Kincaid Readability Test<br />
 https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests
+* The Scuntorpe Problem<br />
+https://en.wikipedia.org/wiki/Scunthorpe_problem