Netflix Movies & TV Shows Analysis
Goal
Analyze the Netflix titles dataset to answer:
- Movies vs TV shows split
- Top countries producing content
- Content additions per year
- Popular genres/categories
Common columns (Netflix titles dataset)
typetype(Movie/TV Show)titletitle,directordirector,castcastcountrycountry,date_addeddate_added,release_yearrelease_yearratingrating,durationduration,listed_inlisted_in
Step 1: Load and parse dates
Load dataset
import pandas as pd
df = pd.read_csv("data/netflix_titles.csv")
if "date_added" in df.columns:
df["date_added"] = pd.to_datetime(df["date_added"], errors="coerce")
print(df.shape)
print(df.head())Load dataset
import pandas as pd
df = pd.read_csv("data/netflix_titles.csv")
if "date_added" in df.columns:
df["date_added"] = pd.to_datetime(df["date_added"], errors="coerce")
print(df.shape)
print(df.head())Step 2: Clean multi-valued country/genre
The dataset often stores multiple values separated by commas.
Split and explode
# Countries
if "country" in df.columns:
countries = (
df.dropna(subset=["country"])
.assign(country=df["country"].str.split(","))
.explode("country")
)
countries["country"] = countries["country"].str.strip()
# Genres
if "listed_in" in df.columns:
genres = (
df.dropna(subset=["listed_in"])
.assign(genre=df["listed_in"].str.split(","))
.explode("genre")
)
genres["genre"] = genres["genre"].str.strip()Split and explode
# Countries
if "country" in df.columns:
countries = (
df.dropna(subset=["country"])
.assign(country=df["country"].str.split(","))
.explode("country")
)
countries["country"] = countries["country"].str.strip()
# Genres
if "listed_in" in df.columns:
genres = (
df.dropna(subset=["listed_in"])
.assign(genre=df["listed_in"].str.split(","))
.explode("genre")
)
genres["genre"] = genres["genre"].str.strip()Step 3: Movies vs TV shows
Type counts
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(6, 4))
sns.countplot(data=df, x="type")
plt.title("Movies vs TV Shows")
plt.tight_layout()
plt.show()Type counts
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(6, 4))
sns.countplot(data=df, x="type")
plt.title("Movies vs TV Shows")
plt.tight_layout()
plt.show()Step 4: Top countries
Top countries
import seaborn as sns
import matplotlib.pyplot as plt
if "country" in df.columns:
top_c = countries["country"].value_counts().head(10).reset_index()
top_c.columns = ["country", "count"]
plt.figure(figsize=(10, 4))
sns.barplot(data=top_c, x="count", y="country")
plt.title("Top 10 countries by number of titles")
plt.tight_layout()
plt.show()Top countries
import seaborn as sns
import matplotlib.pyplot as plt
if "country" in df.columns:
top_c = countries["country"].value_counts().head(10).reset_index()
top_c.columns = ["country", "count"]
plt.figure(figsize=(10, 4))
sns.barplot(data=top_c, x="count", y="country")
plt.title("Top 10 countries by number of titles")
plt.tight_layout()
plt.show()Step 5: Additions over time
Titles added by year
import matplotlib.pyplot as plt
if "date_added" in df.columns:
yearly = df.dropna(subset=["date_added"]).groupby(df["date_added"].dt.year).size()
plt.figure(figsize=(8, 4))
plt.plot(yearly.index, yearly.values, marker="o")
plt.title("Titles added by year")
plt.xlabel("Year")
plt.ylabel("Count")
plt.tight_layout()
plt.show()Titles added by year
import matplotlib.pyplot as plt
if "date_added" in df.columns:
yearly = df.dropna(subset=["date_added"]).groupby(df["date_added"].dt.year).size()
plt.figure(figsize=(8, 4))
plt.plot(yearly.index, yearly.values, marker="o")
plt.title("Titles added by year")
plt.xlabel("Year")
plt.ylabel("Count")
plt.tight_layout()
plt.show()Deliverable
Summarize 5–10 insights:
- Which type dominates?
- Which countries dominate?
- Any clear growth periods?
- Which genres show up most?
If this helped you, consider buying me a coffee ☕
Buy me a coffeeWas this page helpful?
Let us know how we did
