Introduction to Pandas

What is Pandas?

Pandas is a Python library for working with structured data.

It gives you two core data structures:

Series: a 1‑dimensional labeled array (like a single column)
DataFrame: a 2‑dimensional labeled table (like a spreadsheet / SQL table)

Pandas is built on top of NumPy, so many NumPy concepts (like arrays, vectorization, and missing values) show up again here.

Why Pandas is so popular in Data Analytics

In real data work, you spend a lot of time:

Reading data from CSV/Excel/JSON/APIs
Cleaning messy values
Handling missing data
Filtering and transforming rows/columns
Aggregating data into summaries
Preparing datasets for visualization and machine learning

Pandas is designed for exactly this.

When (not) to use Pandas

Pandas is great for:

Small-to-medium datasets that fit in memory
Exploratory analysis
Data cleaning and feature engineering

Pandas may not be ideal for:

Huge datasets that don’t fit in memory (use DuckDB/Polars/Spark)
Highly-parallel compute workloads

Installing Pandas

If you already installed libraries in Phase 1, you probably have it.

Install with pip

Install pandas with pip

pip install pandas

Install pandas with pip

pip install pandas

Install with conda

Install pandas with conda

conda install pandas

Install pandas with conda

conda install pandas

Your first Pandas import

Import pandas

import pandas as pd
 
print(pd.__version__)

Import pandas

import pandas as pd
 
print(pd.__version__)

Quick mental model: DataFrame thinking

A DataFrame is basically:

Rows (observations / records)
Columns (features / fields)

Common questions you’ll ask:

What columns do I have?
How many rows?
Are there missing values?
How do I filter rows?
How do I compute group summaries?

We’ll answer all of these in this phase.

🧪 Try It Yourself

Exercise 1 – Create a DataFrame

Exercise 2 – Select a Column

Exercise 3 – Filter Rows

If this helped you, consider buying me a coffee ☕

Buy me a coffee