Covid-19 Data Analysis & Visualization
Goal
Build a small Covid-19 analysis project:
- Load time series case data
- Clean/parse dates
- Plot trends and moving averages
- Compare regions
Data sources
Possible sources:
- Our World in Data (OWID)
- Johns Hopkins dataset
Step 1: Load and inspect
Load data
import pandas as pd
df = pd.read_csv("data/covid.csv")
print(df.shape)
print(df.head())Load data
import pandas as pd
df = pd.read_csv("data/covid.csv")
print(df.shape)
print(df.head())Step 2: Parse dates and subset
Parse date
import pandas as pd
if "date" in df.columns:
df["date"] = pd.to_datetime(df["date"], errors="coerce")Parse date
import pandas as pd
if "date" in df.columns:
df["date"] = pd.to_datetime(df["date"], errors="coerce")Step 3: Trend plot
Trend
import matplotlib.pyplot as plt
country = "India"
sub = df[df["location"] == country].sort_values("date")
plt.figure(figsize=(10, 4))
plt.plot(sub["date"], sub["new_cases"], label="new_cases")
plt.title(f"New cases over time - {country}")
plt.xlabel("Date")
plt.ylabel("New cases")
plt.xticks(rotation=20)
plt.tight_layout()
plt.show()Trend
import matplotlib.pyplot as plt
country = "India"
sub = df[df["location"] == country].sort_values("date")
plt.figure(figsize=(10, 4))
plt.plot(sub["date"], sub["new_cases"], label="new_cases")
plt.title(f"New cases over time - {country}")
plt.xlabel("Date")
plt.ylabel("New cases")
plt.xticks(rotation=20)
plt.tight_layout()
plt.show()Step 4: Moving average
Rolling average
sub = sub.copy()
sub["new_cases_ma7"] = sub["new_cases"].rolling(7).mean()
plt.figure(figsize=(10, 4))
plt.plot(sub["date"], sub["new_cases"], alpha=0.4, label="daily")
plt.plot(sub["date"], sub["new_cases_ma7"], label="7-day avg")
plt.title(f"New cases (7-day avg) - {country}")
plt.legend()
plt.tight_layout()
plt.show()Rolling average
sub = sub.copy()
sub["new_cases_ma7"] = sub["new_cases"].rolling(7).mean()
plt.figure(figsize=(10, 4))
plt.plot(sub["date"], sub["new_cases"], alpha=0.4, label="daily")
plt.plot(sub["date"], sub["new_cases_ma7"], label="7-day avg")
plt.title(f"New cases (7-day avg) - {country}")
plt.legend()
plt.tight_layout()
plt.show()Step 5: Compare regions
Compare countries
import seaborn as sns
import matplotlib.pyplot as plt
countries = ["India", "United States", "Brazil"]
sub = df[df["location"].isin(countries)].copy()
sub = sub.sort_values("date")
plt.figure(figsize=(10, 4))
sns.lineplot(data=sub, x="date", y="new_cases", hue="location")
plt.title("New cases comparison")
plt.xticks(rotation=20)
plt.tight_layout()
plt.show()Compare countries
import seaborn as sns
import matplotlib.pyplot as plt
countries = ["India", "United States", "Brazil"]
sub = df[df["location"].isin(countries)].copy()
sub = sub.sort_values("date")
plt.figure(figsize=(10, 4))
sns.lineplot(data=sub, x="date", y="new_cases", hue="location")
plt.title("New cases comparison")
plt.xticks(rotation=20)
plt.tight_layout()
plt.show()Deliverable
Write a short report:
- When did peaks occur?
- Is the trend seasonal?
- What data quality issues exist (missing dates/values)?
If this helped you, consider buying me a coffee ☕
Buy me a coffeeWas this page helpful?
Let us know how we did
