Twitter Sentiment Analysis

Goal

Given tweets and sentiment labels (or sentiment scores):

Clean text
Explore sentiment distribution
Visualize sentiment over time (if timestamps exist)

Step 1: Load

Load tweets

import pandas as pd
 
df = pd.read_csv("data/tweets.csv")
print(df.head())

Load tweets

import pandas as pd
 
df = pd.read_csv("data/tweets.csv")
print(df.head())

Step 2: Basic text cleaning

Clean text

import re
 
def clean_tweet(s: str) -> str:
    s = re.sub(r"http\S+", "", s)       # remove URLs
    s = re.sub(r"@\w+", "", s)         # remove mentions
    s = re.sub(r"#", "", s)            # remove hashtag symbol
    s = re.sub(r"\s+", " ", s).strip() # normalize whitespace
    return s
 
if "text" in df.columns:
    df["text_clean"] = df["text"].astype(str).apply(clean_tweet)

Clean text

import re
 
def clean_tweet(s: str) -> str:
    s = re.sub(r"http\S+", "", s)       # remove URLs
    s = re.sub(r"@\w+", "", s)         # remove mentions
    s = re.sub(r"#", "", s)            # remove hashtag symbol
    s = re.sub(r"\s+", " ", s).strip() # normalize whitespace
    return s
 
if "text" in df.columns:
    df["text_clean"] = df["text"].astype(str).apply(clean_tweet)

Step 3: Sentiment distribution

Sentiment counts

import seaborn as sns
import matplotlib.pyplot as plt
 
plt.figure(figsize=(7, 4))
sns.countplot(data=df, x="sentiment")
plt.title("Sentiment distribution")
plt.tight_layout()
plt.show()

Sentiment counts

import seaborn as sns
import matplotlib.pyplot as plt
 
plt.figure(figsize=(7, 4))
sns.countplot(data=df, x="sentiment")
plt.title("Sentiment distribution")
plt.tight_layout()
plt.show()

Deliverable

Most common sentiment
Examples of strongly positive/negative tweets
Trend over time (optional)

If this helped you, consider buying me a coffee ☕

Buy me a coffee

Twitter Sentiment Analysis

Goal

Step 1: Load

Step 2: Basic text cleaning

Step 3: Sentiment distribution

Deliverable

Was this page helpful?