Skip to main content

Getting set up

In this tutorial we will use a set of command-line tools to look at some next-generation sequencing data:

  • samtools, a program for manipulating next-generation sequencing data reads
  • fastqc, a program for performing quality control checks on sequence data
  • bwa, a program for aligning reads to a reference sequence

For the purpose of today's tutorial we have pre-installed this software onto the JupyterHub instance.

Logging in to JupyterHub

If you are logging into JupyterHub for the first time, we will have allocated you a username (usually in the format 'surname_initial), which the instructor will be able to confirm for you if needed. Enter this username on the login page https://mscjupyter.bmrc.ox.ac.uk and specify a password of your choice (please choose a sensibly secure one that is not the same as your University password). Make sure to remember this as you will then use this same password to access the site in future.

If you have previously logged in, go to https://mscjupyter.bmrc.ox.ac.uk and login using the username and password you set in previous sessions.

Note

Your JupyterHub instance is private to you. You can log out and log back in at any time and your session should still be in place.

Getting IGV

We will also use the IGV desktop application. To get this, you should download and install it onto your laptop it from the IGV download page.

Getting the data

To get the data for this tutorial, open a Terminal on the JupyterHub instance and download the file called sequence_data_analysis.tgz from this folder.

For example using curl:

curl -O https://www.chg.ox.ac.uk/bioinformatics/training/msc_gm/2024/data/sequence_data_analysis.tgz

This will probably take a minute or two to download. Once the download is finished, extract it:

tar -xzf sequence_data_analysis.tgz

You should now have a folder called sequence_data_analysis. We can delete the tarball i.e. the .tgz file we originally downloaded, as the contents are now available in the folder called sequence_data_analysis (which has 3 sub-folders with various files in them). Then change into that directory:

rm sequence_data_analysis.tgz
cd sequence_data_analysis

Now you're ready to start.