Skip to main content

Setting up Conda

One of the easiest ways to set up your environment that works across platforms is to use conda. Conda creates 'virtual environments' that don't break the rest of your system, and uses a comprehensive package manager. It has a dedicated bioconda channel that makes it easy to install software for biomedical research.

Installing conda

The recommended way is to install miniconda which is a minimal environment that lets you install the packages you want. To get it:

If you are on Mac OS X or linux, download the appropriate installer from the miniconda download page.

If you are on Windows, download the linux 64-bit version anyway. This is because we will install it into the Linux subsystem for Windows.

Note. The installer is a bash (.sh) file. On Mac OS X, there are also OS X packages (.pkg) installers available - run this instead if you want to and skip to the next section.

Note. Because this is an installer downloaded from the internet, you should check it's the real thing before installing it. See the page on cryptographic hash verification and compare the output to the SHA256 has in the output table. If it's different, don't install!

To install, start a terminal and change directory to the downloads folder:

  • on Mac OS X:
$ cd Downloads
  • on Windows:
$ cd /mnt/c/Users/<username>/Downloads
  • on Linux: probably
$ cd Downloads

You can check what's there by running ls. Now run the installer:

$ ./Miniconda3-latest-<platform>.sh

You will be asked to accept the license and choose an install location. If in doubt, the defaults are fine. Say 'yes' when asked if you want to initialise the installer.

Activating and deactivating conda

If you read the blurb this command outputs, you'll see it says it is activating the conda environment by default on startup. This means, when you start a new terminal, conda is managing your environment for you. You'll see this because in new terminals the command prompt will look like this:

(base) <username>@<computer>:~$

You can deactivate the environment (going back to normal) with the conda deactivate command

$ conda deactivate

And you can reactivate it with - you guessed it!

$ conda activate

This is a downside of using conda: you have to remember what environment you're in at any one time.

Using conda to install software

Conda makes installing stuff easy. The first thing we'll want is a better (faster) version of conda itself, called mamba:

$ conda install -c conda-forge mamba

The mamba package lives in the conda-forge channel, hence the -c above. Type 'y' and press <enter> to install.

Now let's try installing samtools, which is a workhorse tool for handling next-generation sequencing data. While you can download the source code and compile it yourself, conda makes this easy. You'll want a fairly recent version, so let's get version 1.15 which is available from the bioconda channel:

$ mamba install -c conda-forge -c bioconda 'samtools>=1.15'

Note. This may not work if you are on an M1 Mac. If so don't worry, we'll find a workaround later.

If you look at the output you'll see that this is getting htslib and samtools from bioconda, but also libdeflate from conda-forge. Go ahead and install. Running samtools now gives you some output:

$ samtools

Program: samtools (Tools for alignments in the SAM format)
Version: 1.15.1

Usage: samtools <command> [options]
...

Note. For biomedical work you will use bioconda and conda-forge a great deal. To avoid version issues it's best to go ahead and set these channels up at the start. The bioconda page explains how to do this, namely, run:

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --set channel_priority strict

This bit of configuration says: "search conda-forge first, then bioconda, then the default conda repository" for packages. This will help it find up-to-date versions of the software we need.

Aside: what even is an 'environment'?

UNIX figures out how to find programs and other things using so-called 'environment variables'. You can see them all using the env command:

$ env

All conda is really doing is changing environment variables to point to its own copies of files.

For example the HOME environment variable points at your home folder:

$ echo ${HOME}
/users/<username> (or similar)

Let's go there now and see what's there:

$ cd ${HOME}
$ ls

If you've followed the above, you should see that conda has created a directory called miniconda3 in there where it puts the things it installs. For example the executable programs go in miniconda3/bin:

$ ls miniconda3/bin

If you look there you will see (among many other things) the samtools executable - because we just installed it.

To make this work, when you activate conda it sets relevant environment variables to point into this folder. In particular it adds this bin directory to your PATH environment variable, which the terminal uses to know where to look for programs. Look:

$ echo ${PATH}
/users/<username>/miniconda3/bin: (other stuff here...)

So if you type samtools, the first place the terminal looks is in that folder.

If you deactivate the conda environment, PATH changes to remove that folder and samtools will no longer work:

$ conda deactivate
$ samtools

Command 'samtools' not found...

However samtools is still there on your filesystem - as it happens, you can still run it by specifying its full path:

$ ./miniconda3/bin/samtools

In other words conda isn't doing anything magical here: it's just managing your environment variables for you. This is basically how 'environments' work: they are systems of environment variables including PATH that tell the UNIX shell where to look for things.