Customer Churn Analysis

Goal

Given a customer dataset with churnchurn (0/1), analyze:

Overall churn rate
Churn by segment (plan, region)
Numeric differences (tenure, usage)

Step 1: Load

Load

import pandas as pd
 
df = pd.read_csv("data/churn.csv")
print(df.shape)
print(df.head())

Load

import pandas as pd
 
df = pd.read_csv("data/churn.csv")
print(df.shape)
print(df.head())

Step 2: Churn rate

Overall churn

rate = df["churn"].mean()
print("Churn rate:", rate)

Overall churn

rate = df["churn"].mean()
print("Churn rate:", rate)

Step 3: Churn by category

Churn by segment

import seaborn as sns
import matplotlib.pyplot as plt
 
plt.figure(figsize=(7, 4))
sns.barplot(data=df, x="plan", y="churn")
plt.title("Churn rate by plan")
plt.tight_layout()
plt.show()

Churn by segment

import seaborn as sns
import matplotlib.pyplot as plt
 
plt.figure(figsize=(7, 4))
sns.barplot(data=df, x="plan", y="churn")
plt.title("Churn rate by plan")
plt.tight_layout()
plt.show()

Step 4: Numeric differences

Tenure by churn

import seaborn as sns
import matplotlib.pyplot as plt
 
plt.figure(figsize=(7, 4))
sns.boxplot(data=df, x="churn", y="tenure")
plt.title("Tenure vs churn")
plt.tight_layout()
plt.show()

Tenure by churn

import seaborn as sns
import matplotlib.pyplot as plt
 
plt.figure(figsize=(7, 4))
sns.boxplot(data=df, x="churn", y="tenure")
plt.title("Tenure vs churn")
plt.tight_layout()
plt.show()

Step 5: Create a simple model-ready dataset

Handle missing values
Encode categories
Split train/test

This connects back to Phase 4 preprocessing.

Deliverable

Write insights:

Which plan/segment churns more?
Which features differ strongly?
What interventions might reduce churn?

If this helped you, consider buying me a coffee ☕

Buy me a coffee