Skip to content

Grouping and Aggregations (groupby, agg)

The idea

groupbygroupby answers questions like:

  • Average revenue per city?
  • Total orders per customer?
  • Highest score per class?

Think: split โ†’ apply โ†’ combine.

Example dataset

Sales sample
import pandas as pd
 
sales = pd.DataFrame({
    "city": ["pune", "pune", "delhi", "delhi", "delhi"],
    "category": ["A", "B", "A", "A", "B"],
    "amount": [100, 200, 150, 120, 300],
})
 
print(sales)
Sales sample
import pandas as pd
 
sales = pd.DataFrame({
    "city": ["pune", "pune", "delhi", "delhi", "delhi"],
    "category": ["A", "B", "A", "A", "B"],
    "amount": [100, 200, 150, 120, 300],
})
 
print(sales)

Basic aggregations

Sum per city

Sum per city
print(sales.groupby("city")["amount"].sum())
Sum per city
print(sales.groupby("city")["amount"].sum())

Mean per city

Mean per city
print(sales.groupby("city")["amount"].mean())
Mean per city
print(sales.groupby("city")["amount"].mean())

Multiple aggregations with aggagg

Multiple metrics
summary = (
    sales.groupby("city")
    .agg(
        total_amount=("amount", "sum"),
        avg_amount=("amount", "mean"),
        orders=("amount", "count"),
    )
)
 
print(summary)
Multiple metrics
summary = (
    sales.groupby("city")
    .agg(
        total_amount=("amount", "sum"),
        avg_amount=("amount", "mean"),
        orders=("amount", "count"),
    )
)
 
print(summary)

Group by multiple keys

Group by city and category
summary = (
    sales.groupby(["city", "category"])
    .agg(total_amount=("amount", "sum"), orders=("amount", "count"))
    .reset_index()
)
 
print(summary)
Group by city and category
summary = (
    sales.groupby(["city", "category"])
    .agg(total_amount=("amount", "sum"), orders=("amount", "count"))
    .reset_index()
)
 
print(summary)

Common gotchas

  • After groupbygroupby, the grouped columns become the index. Use .reset_index().reset_index() if you want them as normal columns.
  • Aggregation functions ignore missing numeric values by default (NaNNaN) in many cases.

Mental model

When you write:

sales.groupby("city")["amount"].sum()
sales.groupby("city")["amount"].sum()

You are saying:

  • Split the rows by citycity
  • Select the amountamount column
  • Sum within each group

This pattern comes up constantly in analytics.

๐Ÿงช Try It Yourself

Exercise 1 โ€“ Create a DataFrame

Exercise 2 โ€“ Select a Column

Exercise 3 โ€“ Filter Rows

If this helped you, consider buying me a coffee โ˜•

Buy me a coffee

Was this page helpful?

Let us know how we did