User:Pleun/grad/altrightlexicon-practice: Difference between revisions

From XPUB & Lens-Based wiki
No edit summary
No edit summary
 
Line 1: Line 1:
The implementation and part-creation of a tool that will gather jargon on (a) main right-wing forum(s), through Pattern (a Natural Language Processing library in Python), because of my believe this jargon captures the core of their ideology. I will focus on Reddits subforum The Red Pill first, and then on 4chans subforum /pol/ second. I believe those are two of the most highly influential subforums in the Manosphere and the Alt-Right media-bubble.  
The implementation and part-creation of a tool that will gather jargon on (a) main right-wing forum(s), through Pattern (a Natural Language Processing library in Python), because of my believe this jargon captures the core of their ideology. I will focus on Reddits subforum The Red Pill first, and then on 4chans subforum /pol/ second. I believe those are two of the most highly influential subforums in the Manosphere and the Alt-Right media-bubble.  


I plan to use text-scraping to scrape the forum and then start to filter out content so I would be left with non-dictionary words. Michael told me I could use a text minus text method, where you, for instance, extract all the words used in a New York Times article, from the words used in a forum thread. Then I also want to look at which words, nouns/adjectives are used most (not only to get insight into popularity but also to filter out spelling mistakes). Next, I can also look at sentiment. Is a text negative or positive and to which standards? The difficulty of a text could be tested with the Flesch/Kincaid Readability Test, where word-length, syllables and sentence length are taken into account. This test was used in a few news articles to grade the vocabulary of the election candidates in the 2016 US election. As expected, Trump scored lowest with a 4, while Bernie scored highest with an 8.  
I plan to use text-scraping to scrape the forum and then start to filter out content so I would be left with non-dictionary words. Michael told me I could use a text minus text method, where you, for instance, extract all the words used in a New York Times article, from the words used in a forum thread. Then I also want to look at which words, nouns/adjectives are used most (not only to get insight into popularity but also to filter out spelling mistakes). Next, I can also look at sentiment. Is a text negative or positive and to which standards? The difficulty of a text could be tested with the Flesch/Kincaid Readability Test, where word-length, syllables and sentence length are taken into account. This test was used in a few news articles to grade the vocabulary of the election candidates in the 2016 US election. As expected, Drumpf scored lowest with a 4, while Bernie scored highest with an 8.  




Line 19: Line 19:
* Flesch/Kincaid Readability Test<br />
* Flesch/Kincaid Readability Test<br />
https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests
https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests
* The Scuntorpe Problem<br />
https://en.wikipedia.org/wiki/Scunthorpe_problem

Latest revision as of 15:51, 5 December 2017

The implementation and part-creation of a tool that will gather jargon on (a) main right-wing forum(s), through Pattern (a Natural Language Processing library in Python), because of my believe this jargon captures the core of their ideology. I will focus on Reddits subforum The Red Pill first, and then on 4chans subforum /pol/ second. I believe those are two of the most highly influential subforums in the Manosphere and the Alt-Right media-bubble.

I plan to use text-scraping to scrape the forum and then start to filter out content so I would be left with non-dictionary words. Michael told me I could use a text minus text method, where you, for instance, extract all the words used in a New York Times article, from the words used in a forum thread. Then I also want to look at which words, nouns/adjectives are used most (not only to get insight into popularity but also to filter out spelling mistakes). Next, I can also look at sentiment. Is a text negative or positive and to which standards? The difficulty of a text could be tested with the Flesch/Kincaid Readability Test, where word-length, syllables and sentence length are taken into account. This test was used in a few news articles to grade the vocabulary of the election candidates in the 2016 US election. As expected, Drumpf scored lowest with a 4, while Bernie scored highest with an 8.


Urban Dictionary could provide meaning to the jargon that's found.


  • Natural Language Processing

https://pypi.python.org/pypi/redditnlp/0.1.3
– Python Pattern Library: Sentiment

  • Bag-of-words model
  • Urban Dictionary

http://api.urbandictionary.com/v0/define?term=word
https://market.mashape.com/community/urban-dictionary

  • Flesch/Kincaid Readability Test

https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests



  • The Scuntorpe Problem

https://en.wikipedia.org/wiki/Scunthorpe_problem