Skip to content

Data Inspection (head, tail, info, describe)

The first 60 seconds with a new dataset

Whenever you load a new dataset, do this first.

1) Check shape

Shape
import pandas as pd
 
df = pd.read_csv("data/sales.csv")
print(df.shape)  # (rows, columns)
Shape
import pandas as pd
 
df = pd.read_csv("data/sales.csv")
print(df.shape)  # (rows, columns)

2) Preview rows

Head / Tail
print(df.head())
print(df.tail())
Head / Tail
print(df.head())
print(df.tail())

3) Random sample

Great for spotting weird values.

Sample rows
print(df.sample(5, random_state=42))
Sample rows
print(df.sample(5, random_state=42))

4) Columns and dtypes

Info
df.info()
Info
df.info()

This tells you:

  • Column names
  • Non-null counts
  • Dtypes
  • Memory usage (helpful when data grows)

5) Descriptive stats

Describe
print(df.describe())
Describe
print(df.describe())

For categorical columns:

Describe object columns
print(df.describe(include=["object"]))
Describe object columns
print(df.describe(include=["object"]))

Core sanity checks

Missing values per column

Missing values
missing = df.isna().sum().sort_values(ascending=False)
print(missing)
Missing values
missing = df.isna().sum().sort_values(ascending=False)
print(missing)

Duplicates

Duplicate rows
print("duplicate rows:", df.duplicated().sum())
Duplicate rows
print("duplicate rows:", df.duplicated().sum())

Value counts for a category

Value counts
print(df["city"].value_counts(dropna=False).head(10))
Value counts
print(df["city"].value_counts(dropna=False).head(10))

Tip: create an inspection helper

Quick inspection helper
import pandas as pd
 
def inspect(df: pd.DataFrame, n: int = 5) -> None:
    print("shape:", df.shape)
    print("columns:", list(df.columns))
    print("\nhead:")
    print(df.head(n))
    print("\nmissing:")
    print(df.isna().sum())
 
# inspect(df)
Quick inspection helper
import pandas as pd
 
def inspect(df: pd.DataFrame, n: int = 5) -> None:
    print("shape:", df.shape)
    print("columns:", list(df.columns))
    print("\nhead:")
    print(df.head(n))
    print("\nmissing:")
    print(df.isna().sum())
 
# inspect(df)

๐Ÿงช Try It Yourself

Exercise 1 โ€“ Create a DataFrame

Exercise 2 โ€“ Select a Column

Exercise 3 โ€“ Filter Rows

If this helped you, consider buying me a coffee โ˜•

Buy me a coffee

Was this page helpful?

Let us know how we did