Introduction to Pandas
What is Pandas?
Pandas is a Python library for working with structured data.
It gives you two core data structures:
- Series: a 1βdimensional labeled array (like a single column)
- DataFrame: a 2βdimensional labeled table (like a spreadsheet / SQL table)
Pandas is built on top of NumPy, so many NumPy concepts (like arrays, vectorization, and missing values) show up again here.
Why Pandas is so popular in Data Analytics
In real data work, you spend a lot of time:
- Reading data from CSV/Excel/JSON/APIs
- Cleaning messy values
- Handling missing data
- Filtering and transforming rows/columns
- Aggregating data into summaries
- Preparing datasets for visualization and machine learning
Pandas is designed for exactly this.
When (not) to use Pandas
Pandas is great for:
- Small-to-medium datasets that fit in memory
- Exploratory analysis
- Data cleaning and feature engineering
Pandas may not be ideal for:
- Huge datasets that donβt fit in memory (use DuckDB/Polars/Spark)
- Highly-parallel compute workloads
Installing Pandas
If you already installed libraries in Phase 1, you probably have it.
Install with pip
Install pandas with pip
pip install pandasInstall pandas with pip
pip install pandasInstall with conda
Install pandas with conda
conda install pandasInstall pandas with conda
conda install pandasYour first Pandas import
Import pandas
import pandas as pd
print(pd.__version__)Import pandas
import pandas as pd
print(pd.__version__)Quick mental model: DataFrame thinking
A DataFrame is basically:
- Rows (observations / records)
- Columns (features / fields)
Common questions youβll ask:
- What columns do I have?
- How many rows?
- Are there missing values?
- How do I filter rows?
- How do I compute group summaries?
Weβll answer all of these in this phase.
π§ͺ Try It Yourself
Exercise 1 β Create a DataFrame
Exercise 2 β Select a Column
Exercise 3 β Filter Rows
If this helped you, consider buying me a coffee β
Buy me a coffeeWas this page helpful?
Let us know how we did
