Assessing linkage disequilibrium
-compute-ld-with
option can be used to compute LD metrics between
the genotype data and a seperate set of genotype contained in a second file. E.g.:
For each biallelic variant in example.bgen
, and each biallelic variant in second.bgen
,
QCTOOL constructs the pairwise table of non-missing genotypes, uses an EM algorithm to resolve phase
of the double heterozygotes, and then outputs the frequency of each haplotype and the D'
and r2 statistics. (Multiallelic variants are currently not handled.)
Note: to compute LD, samples are matched between datasets using the primary ID (the first column
in the sample files) by default. To alter this, use the -match-sample-ids
option.
This is described further on the page on merging datasets.
Note:
Currently LD output is always to a sqlite database file. In the above command,
results are placed in the file ld.sqlite
and in a table called
LD
; a convenience view, called LDView
is also constructed. The simplest way
to view that data is to use the sqlite3 command-line client
(which is already installed on most linux systems by default). E.g. the command:
-min-r2
and -max-ld-distance
options can be used, e.g.
-max-ld-distance 1000
), in kilobases (e.g. -max-ld-distance 1kb
),
or in megabases as in the example above.
-prior-ld-weight
option can be used to adjust the strength of this prior, e.g. the command
-snp-stats
, the -stratify
option tells QCTOOL to compute LD statistics stratified over
subsets of the data. E.g. suppose the sample file contains a column called POP
which reflects the population group of each sample. Then the command:
will compute LD statistics for all pairs of variants in each population. Output columns will be named
in the form POP=<level>:<variable>
.
For example, suppose we've extracted data for the O blood group mutation rs8176719 from the
1000 Genomes Project data into a file named rs8176719.vcf
. This command will
find all tagging SNPs in the flanking region:
Note that currently it's first necessary to make a sample file in the format described here for this to work.
Note: This will also take a while to run, since at present it scans the whole of chromosome 9. A quicker way would be to use bgenix together with QCTOOL in in a pipeline.