Grouping and Aggregations (groupby, agg)
The idea
groupbygroupby answers questions like:
- Average revenue per city?
- Total orders per customer?
- Highest score per class?
Think: split โ apply โ combine.
Example dataset
Sales sample
import pandas as pd
sales = pd.DataFrame({
"city": ["pune", "pune", "delhi", "delhi", "delhi"],
"category": ["A", "B", "A", "A", "B"],
"amount": [100, 200, 150, 120, 300],
})
print(sales)Sales sample
import pandas as pd
sales = pd.DataFrame({
"city": ["pune", "pune", "delhi", "delhi", "delhi"],
"category": ["A", "B", "A", "A", "B"],
"amount": [100, 200, 150, 120, 300],
})
print(sales)Basic aggregations
Sum per city
Sum per city
print(sales.groupby("city")["amount"].sum())Sum per city
print(sales.groupby("city")["amount"].sum())Mean per city
Mean per city
print(sales.groupby("city")["amount"].mean())Mean per city
print(sales.groupby("city")["amount"].mean())Multiple aggregations with aggagg
Multiple metrics
summary = (
sales.groupby("city")
.agg(
total_amount=("amount", "sum"),
avg_amount=("amount", "mean"),
orders=("amount", "count"),
)
)
print(summary)Multiple metrics
summary = (
sales.groupby("city")
.agg(
total_amount=("amount", "sum"),
avg_amount=("amount", "mean"),
orders=("amount", "count"),
)
)
print(summary)Group by multiple keys
Group by city and category
summary = (
sales.groupby(["city", "category"])
.agg(total_amount=("amount", "sum"), orders=("amount", "count"))
.reset_index()
)
print(summary)Group by city and category
summary = (
sales.groupby(["city", "category"])
.agg(total_amount=("amount", "sum"), orders=("amount", "count"))
.reset_index()
)
print(summary)Common gotchas
- After
groupbygroupby, the grouped columns become the index. Use.reset_index().reset_index()if you want them as normal columns. - Aggregation functions ignore missing numeric values by default (
NaNNaN) in many cases.
Mental model
When you write:
sales.groupby("city")["amount"].sum()sales.groupby("city")["amount"].sum()You are saying:
- Split the rows by
citycity - Select the
amountamountcolumn - Sum within each group
This pattern comes up constantly in analytics.
๐งช Try It Yourself
Exercise 1 โ Create a DataFrame
Exercise 2 โ Select a Column
Exercise 3 โ Filter Rows
If this helped you, consider buying me a coffee โ
Buy me a coffeeWas this page helpful?
Let us know how we did
