Virtual Environments for Data Science
What is a virtual environment?
A virtual environment is an isolated Python setup for a specific project.
It keeps:
- Python version (sometimes)
- Installed libraries
- Tooling (Jupyter, linters, etc.)
separate from other projects.
Why it’s essential in data analytics
Data analytics projects often depend on:
- Specific versions of NumPy/Pandas/Matplotlib
- Jupyter
- Database drivers
- Visualization libraries
If you install everything globally, you will eventually face:
- Version conflicts
- “It worked yesterday” problems
- Broken notebooks after updates
Virtual environments prevent most of those issues.
Two popular choices
Option 1: venvvenv (built-in)
- Comes with Python
- Lightweight
- Uses
pippipfor packages
Option 2: condaconda environments
- Great for data science libraries
- Handles compiled packages easily
- Works with both
conda installconda installandpip installpip install
Using venvvenv (recommended for pure pip projects)
Create a new environment
From your project folder:
python -m venv .venvpython -m venv .venvActivate the environment
- macOS/Linux:
source .venv/bin/activatesource .venv/bin/activate- Windows (PowerShell):
.\.venv\Scripts\Activate.ps1.\.venv\Scripts\Activate.ps1Install packages
pip install numpy pandas matplotlib seaborn jupyterpip install numpy pandas matplotlib seaborn jupyterFreeze requirements
This creates a reproducible spec:
pip freeze > requirements.txtpip freeze > requirements.txtLater someone can recreate the same installs:
pip install -r requirements.txtpip install -r requirements.txtUsing conda environments (recommended for analytics stacks)
Create and activate
conda create -n analytics python=3.12conda create -n analytics python=3.12conda activate analyticsconda activate analyticsInstall packages
conda install numpy pandas matplotlib seaborn jupyterconda install numpy pandas matplotlib seaborn jupyterMixing conda + pip safely
Sometimes a package is not available in conda.
Recommended approach:
- Install as much as possible with
condaconda - Then install remaining packages with
pippip
Example:
conda install numpy pandas
pip install yfinanceconda install numpy pandas
pip install yfinanceEnvironment naming conventions
Good names:
analyticsanalyticstitanic-edatitanic-edavizviz
Avoid generic names like testtest or newenvnewenv.
Best practices for data analytics projects
- Create one environment per project
- Pin versions for important packages (especially for long projects)
- Keep a
requirements.txtrequirements.txt(pip) orenvironment.ymlenvironment.yml(conda) - Store notebooks inside a project folder
Example conda environment file (environment.ymlenvironment.yml)
This is a common way to share environment configuration:
name: analytics
channels:
- conda-forge
dependencies:
- python=3.12
- numpy
- pandas
- matplotlib
- seaborn
- jupyter
- pip
- pip:
- yfinancename: analytics
channels:
- conda-forge
dependencies:
- python=3.12
- numpy
- pandas
- matplotlib
- seaborn
- jupyter
- pip
- pip:
- yfinanceThen create it with:
conda env create -f environment.ymlconda env create -f environment.ymlNext
Continue to: Installing Data Science Libraries (pip & conda) to learn the best ways to install and verify common analytics libraries.
If this helped you, consider buying me a coffee ☕
Buy me a coffeeWas this page helpful?
Let us know how we did
