Skip to content

Covid-19 Data Analysis & Visualization

Goal

Build a small Covid-19 analysis project:

  • Load time series case data
  • Clean/parse dates
  • Plot trends and moving averages
  • Compare regions

Data sources

Possible sources:

  • Our World in Data (OWID)
  • Johns Hopkins dataset

Step 1: Load and inspect

Load data
import pandas as pd
 
df = pd.read_csv("data/covid.csv")
print(df.shape)
print(df.head())
Load data
import pandas as pd
 
df = pd.read_csv("data/covid.csv")
print(df.shape)
print(df.head())

Step 2: Parse dates and subset

Parse date
import pandas as pd
 
if "date" in df.columns:
    df["date"] = pd.to_datetime(df["date"], errors="coerce")
Parse date
import pandas as pd
 
if "date" in df.columns:
    df["date"] = pd.to_datetime(df["date"], errors="coerce")

Step 3: Trend plot

Trend
import matplotlib.pyplot as plt
 
country = "India"
sub = df[df["location"] == country].sort_values("date")
 
plt.figure(figsize=(10, 4))
plt.plot(sub["date"], sub["new_cases"], label="new_cases")
plt.title(f"New cases over time - {country}")
plt.xlabel("Date")
plt.ylabel("New cases")
plt.xticks(rotation=20)
plt.tight_layout()
plt.show()
Trend
import matplotlib.pyplot as plt
 
country = "India"
sub = df[df["location"] == country].sort_values("date")
 
plt.figure(figsize=(10, 4))
plt.plot(sub["date"], sub["new_cases"], label="new_cases")
plt.title(f"New cases over time - {country}")
plt.xlabel("Date")
plt.ylabel("New cases")
plt.xticks(rotation=20)
plt.tight_layout()
plt.show()

Step 4: Moving average

Rolling average
sub = sub.copy()
sub["new_cases_ma7"] = sub["new_cases"].rolling(7).mean()
 
plt.figure(figsize=(10, 4))
plt.plot(sub["date"], sub["new_cases"], alpha=0.4, label="daily")
plt.plot(sub["date"], sub["new_cases_ma7"], label="7-day avg")
plt.title(f"New cases (7-day avg) - {country}")
plt.legend()
plt.tight_layout()
plt.show()
Rolling average
sub = sub.copy()
sub["new_cases_ma7"] = sub["new_cases"].rolling(7).mean()
 
plt.figure(figsize=(10, 4))
plt.plot(sub["date"], sub["new_cases"], alpha=0.4, label="daily")
plt.plot(sub["date"], sub["new_cases_ma7"], label="7-day avg")
plt.title(f"New cases (7-day avg) - {country}")
plt.legend()
plt.tight_layout()
plt.show()

Step 5: Compare regions

Compare countries
import seaborn as sns
import matplotlib.pyplot as plt
 
countries = ["India", "United States", "Brazil"]
sub = df[df["location"].isin(countries)].copy()
sub = sub.sort_values("date")
 
plt.figure(figsize=(10, 4))
sns.lineplot(data=sub, x="date", y="new_cases", hue="location")
plt.title("New cases comparison")
plt.xticks(rotation=20)
plt.tight_layout()
plt.show()
Compare countries
import seaborn as sns
import matplotlib.pyplot as plt
 
countries = ["India", "United States", "Brazil"]
sub = df[df["location"].isin(countries)].copy()
sub = sub.sort_values("date")
 
plt.figure(figsize=(10, 4))
sns.lineplot(data=sub, x="date", y="new_cases", hue="location")
plt.title("New cases comparison")
plt.xticks(rotation=20)
plt.tight_layout()
plt.show()

Deliverable

Write a short report:

  • When did peaks occur?
  • Is the trend seasonal?
  • What data quality issues exist (missing dates/values)?

If this helped you, consider buying me a coffee ☕

Buy me a coffee

Was this page helpful?

Let us know how we did