User:Pleun/grad/altrightlexicon-practice: Difference between revisions
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
The implementation and part-creation of a tool that will gather jargon on (a) main right-wing forum(s), through Pattern (a Natural Language Processing library in Python), because of my believe this jargon captures the core of their ideology. I will focus on Reddits subforum The Red Pill first, and then on 4chans subforum /pol/ second. I believe those are two of the most highly influential subforums in the Manosphere and the Alt-Right media-bubble. | The implementation and part-creation of a tool that will gather jargon on (a) main right-wing forum(s), through Pattern (a Natural Language Processing library in Python), because of my believe this jargon captures the core of their ideology. I will focus on Reddits subforum The Red Pill first, and then on 4chans subforum /pol/ second. I believe those are two of the most highly influential subforums in the Manosphere and the Alt-Right media-bubble. | ||
I plan to use text-scraping to scrape the forum and then start to filter out content so I would be left with non-dictionary words. Michael told me I could use a text minus text method, where you, for instance, extract all the words used in a New York Times article, from the words used in a forum thread. Then I also want to look at which words, nouns/adjectives are used most (not only to get insight into popularity but also to filter out spelling mistakes). Next, I can also look at sentiment. Is a text negative or positive and to which standards? The difficulty of a text could be tested with the Flesch/Kincaid Readability Test, where word-length, syllables and sentence length are taken into account. This test was used in a few news articles to grade the vocabulary of the election candidates in the 2016 US election. As expected, | I plan to use text-scraping to scrape the forum and then start to filter out content so I would be left with non-dictionary words. Michael told me I could use a text minus text method, where you, for instance, extract all the words used in a New York Times article, from the words used in a forum thread. Then I also want to look at which words, nouns/adjectives are used most (not only to get insight into popularity but also to filter out spelling mistakes). Next, I can also look at sentiment. Is a text negative or positive and to which standards? The difficulty of a text could be tested with the Flesch/Kincaid Readability Test, where word-length, syllables and sentence length are taken into account. This test was used in a few news articles to grade the vocabulary of the election candidates in the 2016 US election. As expected, Drumpf scored lowest with a 4, while Bernie scored highest with an 8. | ||
Line 19: | Line 19: | ||
* Flesch/Kincaid Readability Test<br /> | * Flesch/Kincaid Readability Test<br /> | ||
https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests | https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests | ||
* The Scuntorpe Problem<br /> | |||
https://en.wikipedia.org/wiki/Scunthorpe_problem |
Latest revision as of 15:51, 5 December 2017
The implementation and part-creation of a tool that will gather jargon on (a) main right-wing forum(s), through Pattern (a Natural Language Processing library in Python), because of my believe this jargon captures the core of their ideology. I will focus on Reddits subforum The Red Pill first, and then on 4chans subforum /pol/ second. I believe those are two of the most highly influential subforums in the Manosphere and the Alt-Right media-bubble.
I plan to use text-scraping to scrape the forum and then start to filter out content so I would be left with non-dictionary words. Michael told me I could use a text minus text method, where you, for instance, extract all the words used in a New York Times article, from the words used in a forum thread. Then I also want to look at which words, nouns/adjectives are used most (not only to get insight into popularity but also to filter out spelling mistakes). Next, I can also look at sentiment. Is a text negative or positive and to which standards? The difficulty of a text could be tested with the Flesch/Kincaid Readability Test, where word-length, syllables and sentence length are taken into account. This test was used in a few news articles to grade the vocabulary of the election candidates in the 2016 US election. As expected, Drumpf scored lowest with a 4, while Bernie scored highest with an 8.
Urban Dictionary could provide meaning to the jargon that's found.
- Natural Language Processing
– https://pypi.python.org/pypi/redditnlp/0.1.3
– Python Pattern Library: Sentiment
- Bag-of-words model
- Urban Dictionary
http://api.urbandictionary.com/v0/define?term=word
https://market.mashape.com/community/urban-dictionary
- Flesch/Kincaid Readability Test
https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests
- The Scuntorpe Problem