Version: 0.1

1 Project Summary

Summary Value
Samples 12
Genome None
Genes 124370106
Detected genes 15771

2 Library Complexity

Poor-quality samples often have a smaller number of genes represented than the rest, or a disproportionately large number of reads mapping to only a subset of genes.

2.1 Diversity of Reads

These curves show how many of the total reads map to the most-expressed genes for each sample. Samples that start at a high value, or follow a markedly different curve to the rest, are likely to be unreliable.

2.2 Detected Genes

These curves show the total number of detected genes for each sample, as the number varies with counts-per-million thresholds.

3 Normalisation Performance

For the following plots, the gene counts are normalised using a variance-stabilising technique1 In some projects, with poor-quality data, the normalisation is less effective than is ideal. The ideal result is for the red trend line to be close to horizontal. For projects where it is not, the rest of the QC may not be reliable.

Each cell is coloured depending on how many genes fall within it.

4 Hierarchical Clustering

Inter-sample similarity can be used to confirm that different experimental groups for clusters in the data. It can also reveal unexpected features, such as batch effects, that may need to be adjusted for in downstream analysis.

Two methods of measuring this similarity are given here: Euclidean distance and Spearman’s Rank correlation. The former is more sensitive to outlier samples but typically provides more detail as to clustering.

The heatmaps show sample-to-sample similarity scores. The dendrograms show how samples cluster in a binary tree.

Distance Heatmap

Distance Dendrogram

Spearman’s Rank Heatmap

Spearman’s Rank Dendrogram

5 Principal Components Analysis

PCA aims to simplify the display of high dimensional data set in such a way as to make any clustering immediately apparent. The most significant components are shown here as one of the axes in each plot. Often, only two components are significant and hence only a single plot is required.

The amount of the total variation accounted for by each component is shown.

Ideally each experimental group will be its own cluster, on at least one of these plots.

5.1 By Sample Names

Each sample is represented by its name.

5.2 By Groups

Each grouping category is shown separately, if more than one is given. Each dot is one sample.

Expt_batch

Expt_condition

6 Tool documentation


  1. The details can be found in the DESeq2 documentation.↩︎