Credit Card Fraud Detection
Goal
Fraud datasets are usually highly imbalanced.
You will:
- Explore class imbalance
- Check feature distributions
- Build a baseline model evaluation plan
Step 1: Load
Load
import pandas as pd
df = pd.read_csv("data/fraud.csv")
print(df.shape)
print(df.head())Load
import pandas as pd
df = pd.read_csv("data/fraud.csv")
print(df.shape)
print(df.head())Step 2: Class imbalance
Imbalance
print(df["fraud"].value_counts())
print(df["fraud"].value_counts(normalize=True))Imbalance
print(df["fraud"].value_counts())
print(df["fraud"].value_counts(normalize=True))Step 3: Visualize distributions
Feature distribution
import seaborn as sns
import matplotlib.pyplot as plt
# Example: compare one feature by class
feature = "amount"
plt.figure(figsize=(7, 4))
sns.kdeplot(data=df, x=feature, hue="fraud", common_norm=False)
plt.title(f"{feature} distribution by class")
plt.tight_layout()
plt.show()Feature distribution
import seaborn as sns
import matplotlib.pyplot as plt
# Example: compare one feature by class
feature = "amount"
plt.figure(figsize=(7, 4))
sns.kdeplot(data=df, x=feature, hue="fraud", common_norm=False)
plt.title(f"{feature} distribution by class")
plt.tight_layout()
plt.show()Step 4: Baseline modeling note
For fraud, accuracy is misleading.
Prefer:
- Precision / Recall
- F1 score
- ROC-AUC
- PR-AUC
Deliverable
- How imbalanced is the dataset?
- Which features differ between classes?
- What metric will you optimize?
If this helped you, consider buying me a coffee ☕
Buy me a coffeeWas this page helpful?
Let us know how we did
