Skip to content

E-commerce Sales Analysis

Goal

Given an e-commerce orders dataset, answer:

  • Overall revenue and order trends
  • Top products / categories
  • Average order value (AOV)
  • Revenue by segment (city, category)

Example columns

  • order_idorder_id, order_dateorder_date, customer_idcustomer_id
  • productproduct, categorycategory, qtyqty, priceprice
  • citycity, statusstatus

Step 1: Load and basic prep

Load
import pandas as pd
 
df = pd.read_csv("data/orders.csv")
 
df["order_date"] = pd.to_datetime(df["order_date"], errors="coerce")
 
df["revenue"] = df["qty"] * df["price"]
 
print(df.shape)
print(df.head())
Load
import pandas as pd
 
df = pd.read_csv("data/orders.csv")
 
df["order_date"] = pd.to_datetime(df["order_date"], errors="coerce")
 
df["revenue"] = df["qty"] * df["price"]
 
print(df.shape)
print(df.head())

Step 2: KPIs

KPIs
orders = df["order_id"].nunique()
customers = df["customer_id"].nunique()
revenue = df["revenue"].sum()
aov = revenue / orders
 
print("orders:", orders)
print("customers:", customers)
print("revenue:", revenue)
print("aov:", aov)
KPIs
orders = df["order_id"].nunique()
customers = df["customer_id"].nunique()
revenue = df["revenue"].sum()
aov = revenue / orders
 
print("orders:", orders)
print("customers:", customers)
print("revenue:", revenue)
print("aov:", aov)

Step 3: Revenue trend

Monthly revenue
import matplotlib.pyplot as plt
 
monthly = (
    df.dropna(subset=["order_date"])
      .groupby(df["order_date"].dt.to_period("M"))
      .agg(revenue=("revenue", "sum"), orders=("order_id", "nunique"))
      .reset_index()
)
 
monthly["order_date"] = monthly["order_date"].astype(str)
 
plt.figure(figsize=(10, 4))
plt.plot(monthly["order_date"], monthly["revenue"], marker="o")
plt.title("Monthly revenue")
plt.xticks(rotation=30)
plt.tight_layout()
plt.show()
Monthly revenue
import matplotlib.pyplot as plt
 
monthly = (
    df.dropna(subset=["order_date"])
      .groupby(df["order_date"].dt.to_period("M"))
      .agg(revenue=("revenue", "sum"), orders=("order_id", "nunique"))
      .reset_index()
)
 
monthly["order_date"] = monthly["order_date"].astype(str)
 
plt.figure(figsize=(10, 4))
plt.plot(monthly["order_date"], monthly["revenue"], marker="o")
plt.title("Monthly revenue")
plt.xticks(rotation=30)
plt.tight_layout()
plt.show()

Step 4: Top products

Top products
import seaborn as sns
import matplotlib.pyplot as plt
 
top = (
    df.groupby("product")
      .agg(revenue=("revenue", "sum"))
      .sort_values("revenue", ascending=False)
      .head(10)
      .reset_index()
)
 
plt.figure(figsize=(10, 4))
sns.barplot(data=top, x="revenue", y="product")
plt.title("Top 10 products by revenue")
plt.tight_layout()
plt.show()
Top products
import seaborn as sns
import matplotlib.pyplot as plt
 
top = (
    df.groupby("product")
      .agg(revenue=("revenue", "sum"))
      .sort_values("revenue", ascending=False)
      .head(10)
      .reset_index()
)
 
plt.figure(figsize=(10, 4))
sns.barplot(data=top, x="revenue", y="product")
plt.title("Top 10 products by revenue")
plt.tight_layout()
plt.show()

Deliverable

Summarize:

  • Revenue trend (up/down)
  • Main revenue drivers (top products/categories)
  • Any anomalies (returns, cancellations)

If this helped you, consider buying me a coffee ☕

Buy me a coffee

Was this page helpful?

Let us know how we did