Custom Conda Environments for Data Science on HPC Clusters

Step 1: Install miniconda in user space.

Miniconda is a mini version of Anaconda that includes just conda and its dependencies. It is a very small download. If you want to use python 3 (recommended) you can call

wget -O
wget -O

Step 2: Run Miniconda

Now you actually run miniconda to install the package manager. The trick is to specify the install directory within your home directory, rather in the default system-wide installation (which you won’t have permissions to do). You then have to add this directory to your path.

bash -b -p $HOME/miniconda export PATH="$HOME/miniconda/bin:$PATH"

Step 3: Create a custom conda environment specification

You now have to define what packages you actually want to install. A good way to do this is with a custom conda environment file. The contents of this file will differ for each project. Below is the environment.yml that I use for my xmitgcm project.

name: xmitgcm
- numpy
- scipy
- xarray
- netcdf4
- dask
- jupyter
- matplotlib
- pip:
- pytest
- xmitgcm

Step 4: Create the conda environment

You should now be able to run the following command

conda env create --file environment.yml

Step 5: Activate The Environment

The environment you created needs to be activated before you can actually use it. To do this, you call

source activate xmitgcm

Step 6: Use Python!

You can now call ipython on the command line or launch a jupyter notebook. On most clusters, this should be done from a compute node, rather than the head node. Connecting with your notebook running on a cluster is a bit complicated, but it can definitely be done. That will be the topic of my next post.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store