Exploratory Data Analysis (EDA) on Titanic
Goal
Perform EDA on the Titanic dataset and produce:
- Data quality findings (missing values, types)
- A handful of clear plots
- Insights about survival patterns
Dataset
Common sources:
- Kaggle: Titanic - Machine Learning from Disaster
Typical columns:
SurvivedSurvived,PclassPclass,SexSex,AgeAge,SibSpSibSp,ParchParch,FareFare,EmbarkedEmbarked
Step 1: Load data
Load Titanic CSV
import pandas as pd
df = pd.read_csv("data/titanic.csv")
print(df.shape)
print(df.head())Load Titanic CSV
import pandas as pd
df = pd.read_csv("data/titanic.csv")
print(df.shape)
print(df.head())Step 2: Schema and missingness
Info + missing
print(df.info())
missing = (df.isna().mean() * 100).sort_values(ascending=False)
print(missing)Info + missing
print(df.info())
missing = (df.isna().mean() * 100).sort_values(ascending=False)
print(missing)Focus on missing in:
AgeAgeCabinCabinEmbarkedEmbarked
Step 3: Clean minimal issues
Handle missing Embarked (small)
Embarked fill
if "Embarked" in df.columns:
df["Embarked"] = df["Embarked"].fillna(df["Embarked"].mode().iloc[0])Embarked fill
if "Embarked" in df.columns:
df["Embarked"] = df["Embarked"].fillna(df["Embarked"].mode().iloc[0])Keep Cabin as “has cabin” flag
Cabin flag
if "Cabin" in df.columns:
df["has_cabin"] = df["Cabin"].notna()Cabin flag
if "Cabin" in df.columns:
df["has_cabin"] = df["Cabin"].notna()Step 4: Univariate plots
Survival distribution
Survival count
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(6, 4))
sns.countplot(data=df, x="Survived")
plt.title("Survival counts")
plt.tight_layout()
plt.show()Survival count
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(6, 4))
sns.countplot(data=df, x="Survived")
plt.title("Survival counts")
plt.tight_layout()
plt.show()Age distribution
Age distribution
import seaborn as sns
import matplotlib.pyplot as plt
if "Age" in df.columns:
plt.figure(figsize=(7, 4))
sns.histplot(df["Age"].dropna(), bins=30, kde=True)
plt.title("Age distribution")
plt.tight_layout()
plt.show()Age distribution
import seaborn as sns
import matplotlib.pyplot as plt
if "Age" in df.columns:
plt.figure(figsize=(7, 4))
sns.histplot(df["Age"].dropna(), bins=30, kde=True)
plt.title("Age distribution")
plt.tight_layout()
plt.show()Step 5: Survival by category
Survival by sex
Survival by sex
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(7, 4))
sns.barplot(data=df, x="Sex", y="Survived")
plt.title("Survival rate by sex")
plt.tight_layout()
plt.show()Survival by sex
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(7, 4))
sns.barplot(data=df, x="Sex", y="Survived")
plt.title("Survival rate by sex")
plt.tight_layout()
plt.show()Survival by passenger class
Survival by class
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(7, 4))
sns.barplot(data=df, x="Pclass", y="Survived")
plt.title("Survival rate by class")
plt.tight_layout()
plt.show()Survival by class
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(7, 4))
sns.barplot(data=df, x="Pclass", y="Survived")
plt.title("Survival rate by class")
plt.tight_layout()
plt.show()Step 6: Numeric relationships
Fare vs survival (boxplot)
Fare vs survival
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(7, 4))
sns.boxplot(data=df, x="Survived", y="Fare")
plt.title("Fare vs survival")
plt.tight_layout()
plt.show()Fare vs survival
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(7, 4))
sns.boxplot(data=df, x="Survived", y="Fare")
plt.title("Fare vs survival")
plt.tight_layout()
plt.show()Step 7: Write insights (example)
Write 5–10 bullet insights such as:
- Survival rate is higher for females.
- Higher class passengers survived more.
- Passengers who paid higher fare tended to survive more.
- Missingness is high in Cabin; treat as a feature (“has_cabin”).
Deliverable
Save a cleaned dataset version:
Save output
df.to_csv("output/titanic_cleaned.csv", index=False)Save output
df.to_csv("output/titanic_cleaned.csv", index=False)If this helped you, consider buying me a coffee ☕
Buy me a coffeeWas this page helpful?
Let us know how we did
