User:Rita Graca/gradproject/prototyping/twittertrends: Difference between revisions

Latest revision as of 22:32, 3 December 2019

Twitter trends

API

Cancel culture happens in Twitter through design features such as hashtags and trending topics.
To investigate better this movement, I understood I had to inform myself about which topics/people/things were being cancelled, how was the engagement with this topic, what were the language and strategies used.

1. Using the Twitter API I could get the current trends in the US.

Steps:

Create a Twitter developer account
Get keys and tokens from Twitter
Install Ruby
Install Twurl
Install JQ to read JSON
Use the command line

  twurl "/1.1/trends/place.json?id=23424977" | jq

2. I was only interested in the trends related to cancel culture, so I used Python to develop the script a bit more.

Steps:

Use Python library Tweepy
Get trends
Look for trends with words related with cancel culture

3. It was useful to save the trends. Instead of saving them in a .txt file, it made more sense to post them back in a Twitter account.

Steps:

Create a status with the search results (a status is a tweet in the library)

4. To make it look for trends regularly I created a cron job on my computer.

  46 * * * * /usr/local/bin/python3 /Users/0972516/desktop/ritaiscancelled/trends.py

Outcome:
The account @CancelledWho looks for trends related to my topic and posts them. This way I can be always monitoring an important topic of my research.

#!/usr/bin/python

import tweepy
import key # this is a pyhton file with my API passwords
import time

# using the passwords to OAuth process, authentication
auth = tweepy.OAuthHandler(key.consumer_key, key.consumer_secret)
auth.set_access_token(key.access_token, key.access_token_secret)
api = tweepy.API(auth)


trends1 = api.trends_place(23424977)  # american woeid id

trends = set([trend['name'] for trend in trends1[0]['trends']]) # just getting the name, not timestamp, author, etc.

trendsLower = [item.lower() for item in trends] # makes everything lowercase, important for then to match with cancelwords.txt
trendsLine = '\n'.join(trendsLower) # makes it more readable, puts the names with line breaks

#print(trendsLine)

cancelwords = ["cancelled", "canceled", "cancel", "isoverparty", "booed", "boycott"]
#print(cancelwords)


for line in trendsLine.splitlines():
            #print(line)
    for word in cancelwords:
        if  word in line:
            try:
                status = "Who are we fighting today? " + line
                print(status)
                api.update_status(status) # Creates a tweet, a status is a tweet
                time.sleep(5)
            except tweepy.TweepError as e: # the error is occuring when the last status is the same
                print("ups, you already tweeted this")
                break
    #    time.sleep(3600) # so the script will wait 1h to run again if catches error

Outcome

Get trends from historical archive

I wanted to also have access to old trending topics. e.g. trending topics from last year.

Steps:

Have sandbox subscription from Twitter (attention: this only allows 50 requests)
Install searchtweets Python library (wrapper for the Twitter premium search APIs)

I could see it was possible before, but with the current limitations of Twitter API I could only search for tweets, not trends.
So, mission failed.

Script for tweets in general:

from searchtweets import ResultStream, gen_rule_payload, load_credentials, collect_results

import requests

premium_search_args = load_credentials("~/twitter_keys.yaml",
                                       yaml_key="search_tweets_premium",
                                       env_overwrite=False)


rule = gen_rule_payload("isoverparty", from_date="2019-09-07", to_date="2019-09-09", results_per_call=10) # testing with a sandbox account

print(rule)

from searchtweets import collect_results

tweets = collect_results(rule,
                         max_results=10,
                         result_stream_args=premium_search_args)

# print(tweets.all_text)

[print(tweet.all_text, end='\n\n') for tweet in tweets[0:10]];

Scrape from existing website

(less accurate, abandoned)

When the API Archive failed, I started looking for existing projects scraping trends. There's a website that has been saving daily trends from Twitter.
Using Selenium I could go through the website and scrape the trends related with cancel culture.
I stopped the prototype because I couldn't rely on the website accuracy.

Script to go through different pages with Selenium:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import os
import time
import datetime
from pprint import pprint
import requests
import multiprocessing
import base64

m = 1
d = 1

driver = webdriver.Firefox(executable_path=os.path.dirname(os.path.realpath(__file__)) + '/geckodriver')

for m in range(0, 12):  # for every month until 12

    for d in range(0, 31): # for every day until 31
        url = ("https://us.trend-calendar.com/trend/2019-{0:02d}-{1:02d}.html".format(m+1, d+1))
        print(url)
        # Implicit wait tells Selenium how long it should wait before it throws an exception
        driver.implicitly_wait(5)
        # driver.get(url)
        time.sleep(3)

        #driver.find_element_by_xpath("/html/body/div[1]/div/div/main/article/div/section/div[1]/a").click(); # click the 'More' button
        #print ('opening day......')
        #time.sleep(3)
        #driver.close()
        #print("DONE! Closing Window")

    else:
        print("Month finished")


else:
    print("Year finished")

Use archives

Looking for archives:
https://archive.org/details/twitterstream

@@ Line 6: / Line 6: @@
 To investigate better this movement, I understood I had to inform myself about which topics/people/things were being cancelled, how was the engagement with this topic, what were the language and strategies used.
-Using the Twitter API I could get the current trends in the US.
+'''1. Using the Twitter API I could get the current trends in the US.'''
 Steps:
@@ Line 16: / Line 17: @@
 *Use the command line
-	twurl "/1.1/trends/place.json?id=23424977" | jq
+   twurl "/1.1/trends/place.json?id=23424977" | jq
-I was only interested in the trends related to ''cancel culture'', so I used Python to develop the script a bit more.
+'''2. I was only interested in the trends related to cancel culture, so I used Python to develop the script a bit more.'''
 Steps:
-*use python library Tweepy
+*Use Python library Tweepy
-*get trends
+*Get trends
-*look for trends with words related with cancel culture
+*Look for trends with words related with cancel culture
-It was useful to save the trends. Instead of saving them in a .txt file, it made more sense to post them back in a Twitter account.
+'''3. It was useful to save the trends. Instead of saving them in a .txt file, it made more sense to post them back in a Twitter account.'''
 Steps:
 *Create a status with the search results (a status is a tweet in the library)
-To make it look for trends regularly I created a cron job on my computer.
-* * * * /usr/local/bin/python3 /Users/0972516/desktop/ritaiscancelled/trends.py
+'''4. To make it look for trends regularly I created a cron job on my computer.'''
+* * * * /usr/local/bin/python3 /Users/0972516/desktop/ritaiscancelled/trends.py
-Outcome:
+'''Outcome:'''</br>
 The account @CancelledWho looks for trends related to my topic and posts them. This way I can be always monitoring an important topic of my research.
+<source lang="python">
+#!/usr/bin/python
+import tweepy
+import key # this is a pyhton file with my API passwords
+import time
+# using the passwords to OAuth process, authentication
+auth = tweepy.OAuthHandler(key.consumer_key, key.consumer_secret)
+auth.set_access_token(key.access_token, key.access_token_secret)
+api = tweepy.API(auth)
+trends1 = api.trends_place(23424977)  # american woeid id
+trends = set([trend['name'] for trend in trends1[0]['trends']]) # just getting the name, not timestamp, author, etc.
+trendsLower = [item.lower() for item in trends] # makes everything lowercase, important for then to match with cancelwords.txt
+trendsLine = '\n'.join(trendsLower) # makes it more readable, puts the names with line breaks
+#print(trendsLine)
+cancelwords = ["cancelled", "canceled", "cancel", "isoverparty", "booed", "boycott"]
+#print(cancelwords)
+for line in trendsLine.splitlines():
+            #print(line)
+    for word in cancelwords:
+        if  word in line:
+            try:
+                status = "Who are we fighting today? " + line
+                print(status)
+                api.update_status(status) # Creates a tweet, a status is a tweet
+                time.sleep(5)
+            except tweepy.TweepError as e: # the error is occuring when the last status is the same
+                print("ups, you already tweeted this")
+                break
+    #    time.sleep(3600) # so the script will wait 1h to run again if catches error
+</source>
+[[File:Bot timeline.png|400px|thumb|left|Outcome]]
+<br clear=all>
 ===Get trends from historical archive===
-===Use existing database===
+I wanted to also have access to old trending topics. e.g. trending topics from last year.
+Steps:
+*Have ''sandbox'' subscription from Twitter (attention: this only allows 50 requests)
+*Install searchtweets Python library (wrapper for the Twitter premium search APIs)
+I could see it was possible before, but with the current limitations of Twitter API I could only search for tweets, not trends.</br>
+So, mission failed.
+Script for tweets in general:
+<source lang="python">
+from searchtweets import ResultStream, gen_rule_payload, load_credentials, collect_results
+import requests
+premium_search_args = load_credentials("~/twitter_keys.yaml",
+                                       yaml_key="search_tweets_premium",
+                                       env_overwrite=False)
+rule = gen_rule_payload("isoverparty", from_date="2019-09-07", to_date="2019-09-09", results_per_call=10) # testing with a sandbox account
+print(rule)
+from searchtweets import collect_results
+tweets = collect_results(rule,
+                         max_results=10,
+                         result_stream_args=premium_search_args)
+# print(tweets.all_text)
+[print(tweet.all_text, end='\n\n') for tweet in tweets[0:10]];
+</source>
+===Scrape from existing website===
+(less accurate, abandoned)
+When the API Archive failed, I started looking for existing projects scraping trends. There's a website that has been saving daily trends from Twitter.</br>
+Using Selenium I could go through the website and scrape the trends related with cancel culture.</br>
+I stopped the prototype because I couldn't rely on the website accuracy.</br>
+Script to go through different pages with Selenium:
+<source lang="python">
+from selenium import webdriver
+from selenium.webdriver.common.keys import Keys
+import os
+import time
+import datetime
+from pprint import pprint
+import requests
+import multiprocessing
+import base64
+m = 1
+d = 1
+driver = webdriver.Firefox(executable_path=os.path.dirname(os.path.realpath(__file__)) + '/geckodriver')
+for m in range(0, 12):  # for every month until 12
+    for d in range(0, 31): # for every day until 31
+        url = ("https://us.trend-calendar.com/trend/2019-{0:02d}-{1:02d}.html".format(m+1, d+1))
+        print(url)
+        # Implicit wait tells Selenium how long it should wait before it throws an exception
+        driver.implicitly_wait(5)
+        # driver.get(url)
+        time.sleep(3)
+        #driver.find_element_by_xpath("/html/body/div[1]/div/div/main/article/div/section/div[1]/a").click(); # click the 'More' button
+        #print ('opening day......')
+        #time.sleep(3)
+        #driver.close()
+        #print("DONE! Closing Window")
+    else:
+        print("Month finished")
+else:
+    print("Year finished")
+</source>
-===Scrape from existing website)===
+===Use archives===
-(less accurate, abandoned
+Looking for archives:</br>
+https://archive.org/details/twitterstream