Inspecting rawdata using FastQC =============================== a) Download the ”rawdata” files from http://www.well.ox.ac.uk/bioinformatics/training/RNASeq_Data_Analysis/Monday_am/rawdata/ b) Download FastQC from https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ c) Run the software by clicking on the “run_fastqc.bat” or “fastqc” files. d) From the ”File” menu, click ”Open” and select the downloaded files. e) Once loaded, inspect both files and compare the results. Questions --------- 1) Open both example datasets files provided and compare how good and bad data looks like. 2) Which file contains better quality data? 3) Try to identify in each of the plots a cutoff or pattern to classify data as good. 4) Do you think that rawdata metrics patterns for RNA will be different than for DNA? If so, could you think which of them could change and how? Inspecting rawdata using PrinSeq ================================ a) Go to http://prinseq.sourceforge.net/ b) Click on “Use PRINSEQ” c) Click on “Access data” d) Click on different example datasets and compare them. Questions --------- 1) Compare an Illumina dataset against a 454 dataset. Can you see the patterns that identified each sequencing technology? 2) Can you interpret the results of the "Tag Sequence Check" plot? Inspecting aligned data using bam.iobio.io ========================================== a) Go to http://bam.iobio.io/ b) Click on “choose bam url” c) Click on “Go” Questions --------- 1) How many chromosomes does the reference have? 2) What is the proportion of unmapped reads? 3) a) What is the approximate insert length? b) Does it change if you increase the reads sampled? c) Is it different for chromosome 2? 4) What is the proportion of duplicates in chromosome 16? Conceptual questions -------------------- 1) What is called a "singleton"? 2) What is a "pair-end" read? Why are they useful? 3) What are "proper pairs"? 4) What is considered a duplicate read? 5) Why is there a "Forward strand" metric? 6) What is the difference between mapping quality and base quality? 7) What is the "insert length"? 8) What is the read coverage? Inspecting aligned data using Integrative Genomics Viewer (IGV) =============================================================== a) Download the "alignment" files from http://www.well.ox.ac.uk/bioinformatics/training/RNASeq_Data_Analysis/Monday_am/practical/alignment/ b) Go to https://software.broadinstitute.org/software/igv/download c) Download the version that is appropriate for your operating system. d) Open IGV by following the instructions provided on the download page. e) Once opened, from the "File" menu, click "Load from file" and select both recently downloaded BAM files and click "Open". f) Play around with the tool by choosing and zooming in different regions of the genome. In the blank box, you could even write the name of a gene to quickly go to it. Questions --------- 1) What do the colors of the reads mean? 2) Why some reads are connected by a line? 3) How could you quickly compare the expression levels in a particular region on both loaded samples without scrolling to count the reads?