Skip to content

Twitter Sentiment Analysis

Goal

Given tweets and sentiment labels (or sentiment scores):

  • Clean text
  • Explore sentiment distribution
  • Visualize sentiment over time (if timestamps exist)

Step 1: Load

Load tweets
import pandas as pd
 
df = pd.read_csv("data/tweets.csv")
print(df.head())
Load tweets
import pandas as pd
 
df = pd.read_csv("data/tweets.csv")
print(df.head())

Step 2: Basic text cleaning

Clean text
import re
 
def clean_tweet(s: str) -> str:
    s = re.sub(r"http\S+", "", s)       # remove URLs
    s = re.sub(r"@\w+", "", s)         # remove mentions
    s = re.sub(r"#", "", s)            # remove hashtag symbol
    s = re.sub(r"\s+", " ", s).strip() # normalize whitespace
    return s
 
if "text" in df.columns:
    df["text_clean"] = df["text"].astype(str).apply(clean_tweet)
Clean text
import re
 
def clean_tweet(s: str) -> str:
    s = re.sub(r"http\S+", "", s)       # remove URLs
    s = re.sub(r"@\w+", "", s)         # remove mentions
    s = re.sub(r"#", "", s)            # remove hashtag symbol
    s = re.sub(r"\s+", " ", s).strip() # normalize whitespace
    return s
 
if "text" in df.columns:
    df["text_clean"] = df["text"].astype(str).apply(clean_tweet)

Step 3: Sentiment distribution

Sentiment counts
import seaborn as sns
import matplotlib.pyplot as plt
 
plt.figure(figsize=(7, 4))
sns.countplot(data=df, x="sentiment")
plt.title("Sentiment distribution")
plt.tight_layout()
plt.show()
Sentiment counts
import seaborn as sns
import matplotlib.pyplot as plt
 
plt.figure(figsize=(7, 4))
sns.countplot(data=df, x="sentiment")
plt.title("Sentiment distribution")
plt.tight_layout()
plt.show()

Deliverable

  • Most common sentiment
  • Examples of strongly positive/negative tweets
  • Trend over time (optional)

If this helped you, consider buying me a coffee ☕

Buy me a coffee

Was this page helpful?

Let us know how we did