Sentiment Analysis Tutorial
Goal
Classify text as positive/negative sentiment.
This is a perfect first NLP project because:
- preprocessing matters
- classic models work well
A baseline approach
TF-IDF features + Logistic Regression.
false
flowchart LR T[Text] --> V[TF-IDF] V --> C[Classifier] C --> S[Sentiment]
false
Example code (template)
Sentiment model (TF-IDF + LogisticRegression)
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
texts = [
"I love this product",
"This is terrible",
"Amazing quality",
"Worst purchase ever",
]
labels = [1, 0, 1, 0]
X_train, X_test, y_train, y_test = train_test_split(
texts, labels, test_size=0.25, random_state=42, stratify=labels
)
model = Pipeline(
steps=[
("tfidf", TfidfVectorizer(ngram_range=(1, 2))),
("clf", LogisticRegression(max_iter=1000)),
]
)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))Sentiment model (TF-IDF + LogisticRegression)
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
texts = [
"I love this product",
"This is terrible",
"Amazing quality",
"Worst purchase ever",
]
labels = [1, 0, 1, 0]
X_train, X_test, y_train, y_test = train_test_split(
texts, labels, test_size=0.25, random_state=42, stratify=labels
)
model = Pipeline(
steps=[
("tfidf", TfidfVectorizer(ngram_range=(1, 2))),
("clf", LogisticRegression(max_iter=1000)),
]
)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))Practical tips
- include bigrams (
ngram_range=(1,2)ngram_range=(1,2)) to capture phrases like “not good” - avoid removing negations like “not”
- check class balance
Mini-checkpoint
Why is “not good” tricky for BoW?
- without bigrams, “not” and “good” may cancel signals incorrectly.
If this helped you, consider buying me a coffee ☕
Buy me a coffeeWas this page helpful?
Let us know how we did
