Getting set up
In this tutorial we will use a set of command-line tools to look at some next-generation sequencing data:
- samtools, a program for manipulating next-generation sequencing data reads
- fastqc, a program for performing quality control checks on sequence data
- bwa, a program for aligning reads to a reference sequence
For the purpose of today's tutorial we have pre-installed this software onto the JupyterHub instance.
Logging in to JupyterHub
If you are logging into JupyterHub for the first time, we will have allocated you a username (usually in the format 'surname_initial), which the instructor will be able to confirm for you if needed. Enter this username on the login page https://mscjupyter.bmrc.ox.ac.uk and specify a password of your choice (please choose a sensibly secure one that is not the same as your University password). Make sure to remember this as you will then use this same password to access the site in future.
If you have previously logged in, go to https://mscjupyter.bmrc.ox.ac.uk and login using the username and password you set in previous sessions.
Your JupyterHub instance is private to you. You can log out and log back in at any time and your session should still be in place.
Getting IGV
We will also use the IGV desktop application. To get this, you should download and install it onto your laptop it from the IGV download page.
Getting the data
To get the data for this tutorial, open a Terminal on the JupyterHub instance and download the file called sequence_data_analysis.tgz
from
this folder.
For example using curl:
curl -O https://www.chg.ox.ac.uk/bioinformatics/training/msc_gm/2024/data/sequence_data_analysis.tgz
This will probably take a minute or two to download. Once the download is finished, extract it:
tar -xzf sequence_data_analysis.tgz
You should now have a folder called sequence_data_analysis
. We can delete the tarball i.e. the .tgz file we originally downloaded, as the contents are now available in the folder called sequence_data_analysis
(which has 3 sub-folders with various files in them). Then change into that directory:
rm sequence_data_analysis.tgz
cd sequence_data_analysis
Now you're ready to start.