Real-Time Twitter Sentiment Analysis

Published by georgiannacambel on 2 September 20202 September 2020

Donald Trump vs Warren Twitter Sentiment | US Election 2020

In this project we are going to extract live data from twitter related to Donald Trump and Elizabeth Warren. We are going to analyze the sentiment of the data and then plot the data in a single graph which will update in real time.

Before starting this project you will have to create a new application by visiting this link.

https://developer.twitter.com/app/new

Here, you will have to create a developer account. After you sign up for the developer account your twitter profile will be reviewed. You will get an email similar to this.

After that you can create a new application. You will have to give details about the project you are using the API for. After you create the application you will get 4 keys- API key, API secret key, Access token and Access token secret.

Now we will install and import tweepy which will help us to get data from twitter.

!pip install tweepy
import tweepy

Here we have stored the 4 keys which we have got. You should use your own APIs.

consumer_key = "SaMSYfPnpQXsgeEFywA8pLg1C"
consumer_secret = "o9BjVfJHhxWmPOAT39f7i0KHJuwGb8r9k1VjHQvl4q51Gaz5I5"
access_token = "2238923408-XqfhQ40evLEZSVDUYAlZmLJF6IJXR0SCtp04Xy7"
access_token_secret = "BKqoLg2YglIqm0txjSI9oahhecxWox6a4q7g66iQe5ZvA"

Now we will do authorization. We have created an OAuthHandler instance in which we pass the consumer token and secret. We will set the access tokens using set_access_token(). Finally, we will get the API using tweepy.API().

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

Get my timeline

home_timeline() returns a collection of the most recent Tweets and Retweets posted by the authenticating user and the users they follow. It also contains all the details about that tweet. We are going to use .text to extract only the text from the tweets.

twittes = api.home_timeline()
for tweet in twittes:
    print(tweet.text)

Get twitts stream and do sentiment analysis

By the above method we only get a limited number of tweets. Now we will see how to get a continuous stream of tweets.

First we are importing the necessary libraries.

Stream and StreamListeneris imported from tweepy to get continuous data from twitter.
json is imported to get the actual tweet.
Textblob is imported from textblob for sentiment analysis.
re is used to filter out the tweets.
csv is used to make a csv file of the tweets.

You can install textblob by running the following command

!pip install textblob

To learn more about textblob and sentiment analysis using textblob you can watch this video

from tweepy import Stream
from tweepy import StreamListener
import json
from  textblob import TextBlob
import re
import csv

First we are creating a csv file sentiment.csv to save the data extracted from twitter to draw the plot. csv.DictWriter() creates an object which operates like a regular writer but maps dictionaries onto output rows. The writeheader() method writes the headers to the CSV file.

We are going to create a class Listener(). This simple stream listener prints status text. The on_data method of Tweepy’s StreamListener conveniently passes data from statuses to the on_status method. We are going to override on_data().

json.load() takes a file object and returns the json object. A JSON object contains data in the form of key/value pair.

We only need plain text for sentiment analysis. So using re we will remove all the links, emojis, special characters, punctuations, etc. We are even removing ‘RT’ which is there in all retweets.

Now we will perform sentiment analysis usingtextblob. sentiment returns a tuple of form (polarity, subjectivity) where polarity is a float within the range [-1.0, 1.0] where -1.0 is very negative and 1.0 is very positive and subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective. For every tweet we are going to check if the tweet is related to Trump or if it is related to Warren. Then we are going to add the polarity of the current tweet to the sum of the earlier polarities and write it to the csv file under the appropriate header.

trump = 0
warren = 0

header_name = ['Trump', 'Warren']
with open('sentiment.csv', 'w') as file:
    writer = csv.DictWriter(file, fieldnames=header_name)
    writer.writeheader()

class Listener(StreamListener):
    
    def on_data(self, data):
        raw_twitts = json.loads(data)
        try:
            tweets = raw_twitts['text']
            
            tweets = ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)", " ", tweets).split())
            tweets = ' '.join(re.sub('RT',' ', tweets).split())
            
            blob = TextBlob(tweets.strip())
            
            global trump
            global warren
            
            trump_sentiment = 0
            warren_sentiment = 0
            
            for sent in blob.sentences:
                if "Trump" in sent and "Warren" not in sent:
                    trump_sentiment = trump_sentiment + sent.sentiment.polarity
                else:
                    warren_sentiment = warren_sentiment + sent.sentiment.polarity
            
            trump = trump + trump_sentiment
            warren = warren + warren_sentiment
            
            with open('sentiment.csv', 'a') as file:
                writer = csv.DictWriter(file, fieldnames=header_name)
                info = {
                    'Trump': trump,
                    'Warren': warren
                }
                writer.writerow(info)
            
            print(tweets)
            print()
        except:
            print('Found an Error')
        
    def on_error(self, status):
        print(status)

We need an api to stream. We have an api auth and a status listener Listener() so now we can create our stream object twitter_stream.

Then we will use filter to stream all tweets containing the word Trump or Warren. The track parameter is an array of search terms to stream.

twitter_stream = Stream(auth, Listener())
twitter_stream.filter(track = ['Trump', 'Warren'])

Honored to be quoted in this piece along with amp Agree with Tribe s conclusion that obstructing

Boom

BREAKING New York Gov Andrew Cuomo just responded to Trump switching residency to Florida HE SAID THIS Good rid

I suspected but President Trump showed us how deep and dangerous it is

yall mad at yg for kicking a grown ass man off his stage at HIS event but yall won t be mad at donald trump for kicking

I see many long breadlines in the future under a President Warren

The greatest traitors to the Ummah in our time are the Saudi Regime and those who support it What other regime in thi

It has been 3 years and Donald Trump hasn t done anything wrong Donald Trump hasn t done a single thing of which he

Plotting the data

Now we are going to plot the sentiment of the data which we have got from twitter.

We will begin by importing the necessary libraries-

pandas is used for data manipulation.
pyplot from matplotlib is used for drawing plots.
FuncAnimation from matplotlib.animation is used for real-time plots.

%matplotlib notebook ensures that the plot is plotted in the jupyter notebook itself.

import pandas as pd
import matplotlib.pyplot as plt

from matplotlib.animation import FuncAnimation
%matplotlib notebook

Here we have set the style to fivethirtyeight which is an in-built matplotlib style.

plt.style.use('fivethirtyeight')

We are going to show the recent 10000 tweet data on the plot. Hence we have set frame_len = 10000.
We have set the canvas size to 9x6.
Then we are creating a function animate() and inside that, we are going to read the sentiment.csv file which contains the live data.
After that, we are copying the Trump sentiment data to y1 and Warren sentiment data to y2.
Then we are checking the length. If length is less than 10000 we are plotting the data as it is otherwise we are only plotting the recent 10000.
FuncAnimation() makes an animation by repeatedly calling the function animate().The first parameter of FuncAnimation() is the figure object used to get needed events, such as draw or resize. plt.gcf() gets the current figure. interval sets the delay between frames in milliseconds.

frame_len = 10000

fig = plt.figure(figsize=(9,6))

def animate(i):
    data = pd.read_csv('sentiment.csv')
    y1 = data['Trump']
    y2 = data['Warren']

    if len(y1)<=frame_len:
        plt.cla()
        plt.plot(y1, label='Donald Trump')
        plt.plot(y2, label='Elizabeth Warren')
    else:
        plt.cla()
        plt.plot(y1[-frame_len: ], label='Donald Trump')
        plt.plot(y2[-frame_len: ], label='Elizabeth Warren')
    
    plt.legend(loc='upper left')
    plt.tight_layout()


ani = FuncAnimation(plt.gcf(), animate, interval=1000)

Summary

We have seen many things in this project. We started by getting access to the keys. Then we authorized our API using the keys. After that, we scrapped live filtered tweets from twitter. We pre-processed them, found their sentiment, and stored the sentiment in a CSV file in a proper format. Then we read the CSV file and created a function that continuously plots the data. Further on you can change the keywords and filter tweets according to your need and draw various plots.

Real-Time Twitter Sentiment Analysis

Donald Trump vs Warren Twitter Sentiment | US Election 2020

Get my timeline

Get twitts stream and do sentiment analysis

Plotting the data

Summary

1 Comment

Leave a Reply Cancel reply

Feature Engineering Tutorial Series 6: Variable magnitude

Feature Engineering Tutorial Series 5: Outliers

Feature Engineering Tutorial Series 4: Linear Model Assumptions

Real-Time Twitter Sentiment Analysis

Donald Trump vs Warren Twitter Sentiment | US Election 2020

Get my timeline

Get twitts stream and do sentiment analysis

Plotting the data

Summary

1 Comment

Leave a Reply Cancel reply

Related Posts

Feature Engineering Tutorial Series 6: Variable magnitude

Feature Engineering Tutorial Series 5: Outliers

Feature Engineering Tutorial Series 4: Linear Model Assumptions