Twitter Sentiment Analysis
Goal
Given tweets and sentiment labels (or sentiment scores):
- Clean text
- Explore sentiment distribution
- Visualize sentiment over time (if timestamps exist)
Step 1: Load
Load tweets
import pandas as pd
df = pd.read_csv("data/tweets.csv")
print(df.head())Load tweets
import pandas as pd
df = pd.read_csv("data/tweets.csv")
print(df.head())Step 2: Basic text cleaning
Clean text
import re
def clean_tweet(s: str) -> str:
s = re.sub(r"http\S+", "", s) # remove URLs
s = re.sub(r"@\w+", "", s) # remove mentions
s = re.sub(r"#", "", s) # remove hashtag symbol
s = re.sub(r"\s+", " ", s).strip() # normalize whitespace
return s
if "text" in df.columns:
df["text_clean"] = df["text"].astype(str).apply(clean_tweet)Clean text
import re
def clean_tweet(s: str) -> str:
s = re.sub(r"http\S+", "", s) # remove URLs
s = re.sub(r"@\w+", "", s) # remove mentions
s = re.sub(r"#", "", s) # remove hashtag symbol
s = re.sub(r"\s+", " ", s).strip() # normalize whitespace
return s
if "text" in df.columns:
df["text_clean"] = df["text"].astype(str).apply(clean_tweet)Step 3: Sentiment distribution
Sentiment counts
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(7, 4))
sns.countplot(data=df, x="sentiment")
plt.title("Sentiment distribution")
plt.tight_layout()
plt.show()Sentiment counts
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(7, 4))
sns.countplot(data=df, x="sentiment")
plt.title("Sentiment distribution")
plt.tight_layout()
plt.show()Deliverable
- Most common sentiment
- Examples of strongly positive/negative tweets
- Trend over time (optional)
If this helped you, consider buying me a coffee ☕
Buy me a coffeeWas this page helpful?
Let us know how we did
