Inspecting raw data using FastQC

Instructions

  1. Download the raw data files from http://www.well.ox.ac.uk/bioinformatics/training/RNASeq_Data_Analysis/Monday_am/practical/rawdata
  2. Download FastQC
  3. Run the software by clicking on the "run_fastqc.bat" or "fastqc" files.
  4. From the "File" menu, click "Open" and select the downloaded files.
  5. Once loaded, inspect both files and compare the results.

Questions

Open both of the example dataset files provided and identify the attributes of good and bad data.

  1. Which file contains better quality data?
  2. Try to identify in each of the plots a cutoff or pattern by which data can be classified as good.
  3. Do you think that raw data metrics patterns for RNA will be different than for DNA? If so, could you think which of them could change and how?

Inspecting raw data using PrinSeq

Instructions

  1. Download PRINSEQ
  2. Click on "Use PRINSEQ"
  3. Click on "Access data"
  4. Click on different example datasets and compare them.

Questions

  1. Compare an Illumina dataset against a 454 dataset. Can you see the patterns that differentiate the sequencing technologies from each other?
  2. Can you interpret the results of the "Tag Sequence Check" plot?

Inspecting aligned data using bam.iobio.io

Instructions

  1. Go to http://bam.iobio.io/
  2. Click on "launch with demo data"
  3. Click on "Go"

Questions

  1. How many chromosomes does the reference have?
  2. What is the proportion of unmapped reads?
  3. What is the approximate insert length?
  4. What is the proportion of duplicates in chromosome 16?

Conceptual questions

  1. What does "singleton" refer to?
  2. What is a "paired-end" read? Why are they useful?
  3. What are "proper pairs"?
  4. What is considered a duplicate read?
  5. Why is there a "forward strand" metric?
  6. What is the difference between mapping quality and base quality?
  7. What is the "insert length"?
  8. What is the read coverage?

Inspecting aligned data using Integrative Genomics Viewer (IGV)

Instructions

  1. Download the alignment files from http://www.well.ox.ac.uk/bioinformatics/training/RNASeq_Data_Analysis/Monday_am/practical/alignment/
  2. Go to the IGV page
  3. Download the version that is appropriate for your operating system.
  4. Open IGV by following the instructions provided on the download page.
  5. Once opened, from the top left dropdown menu, choose "Human hg19".
  6. Then, from the "File" menu, click "Load from file" and select the recently downloaded BAM files and click "Open".
  7. Play around with the tool by choosing and zooming in different regions of the genome. In the blank box, you could even write the name of a gene to quickly go to it.

Questions

  1. What do the colors of the reads mean?
  2. Why are some reads connected by a line?
  3. How could you quickly compare the expression levels in a particular region on both loaded samples without scrolling to count the reads?
  4. Find gene "GNB1" and compare the expression levels in each sample. Can you see any differentially expressed region?