User:Rita Graca/gradproject/prototyping/twittertrends: Difference between revisions
Rita Graca (talk | contribs) |
Rita Graca (talk | contribs) |
||
(13 intermediate revisions by the same user not shown) | |||
Line 90: | Line 90: | ||
===Get trends from historical archive=== | ===Get trends from historical archive=== | ||
=== | I wanted to also have access to old trending topics. e.g. trending topics from last year. | ||
Steps: | |||
*Have ''sandbox'' subscription from Twitter (attention: this only allows 50 requests) | |||
*Install searchtweets Python library (wrapper for the Twitter premium search APIs) | |||
I could see it was possible before, but with the current limitations of Twitter API I could only search for tweets, not trends.</br> | |||
So, mission failed. | |||
Script for tweets in general: | |||
<source lang="python"> | |||
from searchtweets import ResultStream, gen_rule_payload, load_credentials, collect_results | |||
import requests | |||
premium_search_args = load_credentials("~/twitter_keys.yaml", | |||
yaml_key="search_tweets_premium", | |||
env_overwrite=False) | |||
rule = gen_rule_payload("isoverparty", from_date="2019-09-07", to_date="2019-09-09", results_per_call=10) # testing with a sandbox account | |||
print(rule) | |||
from searchtweets import collect_results | |||
tweets = collect_results(rule, | |||
max_results=10, | |||
result_stream_args=premium_search_args) | |||
# print(tweets.all_text) | |||
[print(tweet.all_text, end='\n\n') for tweet in tweets[0:10]]; | |||
</source> | |||
===Scrape from existing website=== | ===Scrape from existing website=== | ||
Line 96: | Line 132: | ||
(less accurate, abandoned) | (less accurate, abandoned) | ||
There's a website that has been saving daily trends from Twitter. | When the API Archive failed, I started looking for existing projects scraping trends. There's a website that has been saving daily trends from Twitter.</br> | ||
Using Selenium I could go through the website and scrape the trends related with cancel culture. | Using Selenium I could go through the website and scrape the trends related with cancel culture.</br> | ||
I | I stopped the prototype because I couldn't rely on the website accuracy.</br> | ||
Script to go through different pages with Selenium: | |||
<source lang="python"> | |||
from selenium import webdriver | |||
from selenium.webdriver.common.keys import Keys | |||
import os | |||
import time | |||
import datetime | |||
from pprint import pprint | |||
import requests | |||
import multiprocessing | |||
import base64 | |||
m = 1 | |||
d = 1 | |||
driver = webdriver.Firefox(executable_path=os.path.dirname(os.path.realpath(__file__)) + '/geckodriver') | |||
for m in range(0, 12): # for every month until 12 | |||
for d in range(0, 31): # for every day until 31 | |||
url = ("https://us.trend-calendar.com/trend/2019-{0:02d}-{1:02d}.html".format(m+1, d+1)) | |||
print(url) | |||
# Implicit wait tells Selenium how long it should wait before it throws an exception | |||
driver.implicitly_wait(5) | |||
# driver.get(url) | |||
time.sleep(3) | |||
#driver.find_element_by_xpath("/html/body/div[1]/div/div/main/article/div/section/div[1]/a").click(); # click the 'More' button | |||
#print ('opening day......') | |||
#time.sleep(3) | |||
#driver.close() | |||
#print("DONE! Closing Window") | |||
else: | |||
print("Month finished") | |||
else: | |||
print("Year finished") | |||
</source> | |||
===Use archives=== | |||
Looking for archives:</br> | |||
https://archive.org/details/twitterstream |
Latest revision as of 22:32, 3 December 2019
Twitter trends
API
Cancel culture happens in Twitter through design features such as hashtags and trending topics.
To investigate better this movement, I understood I had to inform myself about which topics/people/things were being cancelled, how was the engagement with this topic, what were the language and strategies used.
1. Using the Twitter API I could get the current trends in the US.
Steps:
- Create a Twitter developer account
- Get keys and tokens from Twitter
- Install Ruby
- Install Twurl
- Install JQ to read JSON
- Use the command line
twurl "/1.1/trends/place.json?id=23424977" | jq
2. I was only interested in the trends related to cancel culture, so I used Python to develop the script a bit more.
Steps:
- Use Python library Tweepy
- Get trends
- Look for trends with words related with cancel culture
3. It was useful to save the trends. Instead of saving them in a .txt file, it made more sense to post them back in a Twitter account.
Steps:
- Create a status with the search results (a status is a tweet in the library)
4. To make it look for trends regularly I created a cron job on my computer.
46 * * * * /usr/local/bin/python3 /Users/0972516/desktop/ritaiscancelled/trends.py
Outcome:
The account @CancelledWho looks for trends related to my topic and posts them. This way I can be always monitoring an important topic of my research.
#!/usr/bin/python
import tweepy
import key # this is a pyhton file with my API passwords
import time
# using the passwords to OAuth process, authentication
auth = tweepy.OAuthHandler(key.consumer_key, key.consumer_secret)
auth.set_access_token(key.access_token, key.access_token_secret)
api = tweepy.API(auth)
trends1 = api.trends_place(23424977) # american woeid id
trends = set([trend['name'] for trend in trends1[0]['trends']]) # just getting the name, not timestamp, author, etc.
trendsLower = [item.lower() for item in trends] # makes everything lowercase, important for then to match with cancelwords.txt
trendsLine = '\n'.join(trendsLower) # makes it more readable, puts the names with line breaks
#print(trendsLine)
cancelwords = ["cancelled", "canceled", "cancel", "isoverparty", "booed", "boycott"]
#print(cancelwords)
for line in trendsLine.splitlines():
#print(line)
for word in cancelwords:
if word in line:
try:
status = "Who are we fighting today? " + line
print(status)
api.update_status(status) # Creates a tweet, a status is a tweet
time.sleep(5)
except tweepy.TweepError as e: # the error is occuring when the last status is the same
print("ups, you already tweeted this")
break
# time.sleep(3600) # so the script will wait 1h to run again if catches error
Get trends from historical archive
I wanted to also have access to old trending topics. e.g. trending topics from last year.
Steps:
- Have sandbox subscription from Twitter (attention: this only allows 50 requests)
- Install searchtweets Python library (wrapper for the Twitter premium search APIs)
I could see it was possible before, but with the current limitations of Twitter API I could only search for tweets, not trends.
So, mission failed.
Script for tweets in general:
from searchtweets import ResultStream, gen_rule_payload, load_credentials, collect_results
import requests
premium_search_args = load_credentials("~/twitter_keys.yaml",
yaml_key="search_tweets_premium",
env_overwrite=False)
rule = gen_rule_payload("isoverparty", from_date="2019-09-07", to_date="2019-09-09", results_per_call=10) # testing with a sandbox account
print(rule)
from searchtweets import collect_results
tweets = collect_results(rule,
max_results=10,
result_stream_args=premium_search_args)
# print(tweets.all_text)
[print(tweet.all_text, end='\n\n') for tweet in tweets[0:10]];
Scrape from existing website
(less accurate, abandoned)
When the API Archive failed, I started looking for existing projects scraping trends. There's a website that has been saving daily trends from Twitter.
Using Selenium I could go through the website and scrape the trends related with cancel culture.
I stopped the prototype because I couldn't rely on the website accuracy.
Script to go through different pages with Selenium:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import os
import time
import datetime
from pprint import pprint
import requests
import multiprocessing
import base64
m = 1
d = 1
driver = webdriver.Firefox(executable_path=os.path.dirname(os.path.realpath(__file__)) + '/geckodriver')
for m in range(0, 12): # for every month until 12
for d in range(0, 31): # for every day until 31
url = ("https://us.trend-calendar.com/trend/2019-{0:02d}-{1:02d}.html".format(m+1, d+1))
print(url)
# Implicit wait tells Selenium how long it should wait before it throws an exception
driver.implicitly_wait(5)
# driver.get(url)
time.sleep(3)
#driver.find_element_by_xpath("/html/body/div[1]/div/div/main/article/div/section/div[1]/a").click(); # click the 'More' button
#print ('opening day......')
#time.sleep(3)
#driver.close()
#print("DONE! Closing Window")
else:
print("Month finished")
else:
print("Year finished")
Use archives
Looking for archives:
https://archive.org/details/twitterstream