Skip to content

Virtual Environments for Data Science

What is a virtual environment?

A virtual environment is an isolated Python setup for a specific project.

It keeps:

  • Python version (sometimes)
  • Installed libraries
  • Tooling (Jupyter, linters, etc.)

separate from other projects.

Why it’s essential in data analytics

Data analytics projects often depend on:

  • Specific versions of NumPy/Pandas/Matplotlib
  • Jupyter
  • Database drivers
  • Visualization libraries

If you install everything globally, you will eventually face:

  • Version conflicts
  • “It worked yesterday” problems
  • Broken notebooks after updates

Virtual environments prevent most of those issues.

Option 1: venvvenv (built-in)

  • Comes with Python
  • Lightweight
  • Uses pippip for packages

Option 2: condaconda environments

  • Great for data science libraries
  • Handles compiled packages easily
  • Works with both conda installconda install and pip installpip install

Create a new environment

From your project folder:

command
python -m venv .venv
command
python -m venv .venv

Activate the environment

  • macOS/Linux:
command
source .venv/bin/activate
command
source .venv/bin/activate
  • Windows (PowerShell):
command
.\.venv\Scripts\Activate.ps1
command
.\.venv\Scripts\Activate.ps1

Install packages

command
pip install numpy pandas matplotlib seaborn jupyter
command
pip install numpy pandas matplotlib seaborn jupyter

Freeze requirements

This creates a reproducible spec:

command
pip freeze > requirements.txt
command
pip freeze > requirements.txt

Later someone can recreate the same installs:

command
pip install -r requirements.txt
command
pip install -r requirements.txt

Create and activate

command
conda create -n analytics python=3.12
command
conda create -n analytics python=3.12
command
conda activate analytics
command
conda activate analytics

Install packages

command
conda install numpy pandas matplotlib seaborn jupyter
command
conda install numpy pandas matplotlib seaborn jupyter

Mixing conda + pip safely

Sometimes a package is not available in conda.

Recommended approach:

  1. Install as much as possible with condaconda
  2. Then install remaining packages with pippip

Example:

command
conda install numpy pandas
pip install yfinance
command
conda install numpy pandas
pip install yfinance

Environment naming conventions

Good names:

  • analyticsanalytics
  • titanic-edatitanic-eda
  • vizviz

Avoid generic names like testtest or newenvnewenv.

Best practices for data analytics projects

  • Create one environment per project
  • Pin versions for important packages (especially for long projects)
  • Keep a requirements.txtrequirements.txt (pip) or environment.ymlenvironment.yml (conda)
  • Store notebooks inside a project folder

Example conda environment file (environment.ymlenvironment.yml)

This is a common way to share environment configuration:

environment.yml
name: analytics
channels:
  - conda-forge
dependencies:
  - python=3.12
  - numpy
  - pandas
  - matplotlib
  - seaborn
  - jupyter
  - pip
  - pip:
      - yfinance
environment.yml
name: analytics
channels:
  - conda-forge
dependencies:
  - python=3.12
  - numpy
  - pandas
  - matplotlib
  - seaborn
  - jupyter
  - pip
  - pip:
      - yfinance

Then create it with:

command
conda env create -f environment.yml
command
conda env create -f environment.yml

Next

Continue to: Installing Data Science Libraries (pip & conda) to learn the best ways to install and verify common analytics libraries.

If this helped you, consider buying me a coffee ☕

Buy me a coffee

Was this page helpful?

Let us know how we did