QCTOOL v2

Manipulating the sample file

Add genotype dosages for given variants to the sample file

The -condition-on option can be used to extract genotype dosages for specific given variants and place them as columns in the sample file. E.g.:

$ qctool -g example_#.gen -s example.sample -os example.sample -condition-on rs1234

This adds a column with name rs1234:additive_dosage, containing the additive dosage from the SNP. You can also select SNPs by position:

$ qctool -g example_#.gen -s example.sample -os example.sample -condition-on pos~03:10001

It is also possible to select dominant, recessive, or heterozygote dosages from the SNP. For example, to select additive and heterozygote dosages:

$ qctool -g example_#.gen -s example.sample -os example.sample -condition-on "rs1234(add|het)"

Note: as shown above, it is sometimes necessary to place the argument in quotation marks to avoid problems with the system shell.

The full syntax for this option is: -condition-on [selector~]value[(dosage1[|dosage2...])] where selector is either "rsid", "snpid", or "position" or "pos", value is a string specifying the ID or position of the required variant, and dosage can be add, dom, rec, het. The value gen adds both additive and heterozygote dosages.

Note: these options behave in a similar way to the -condition_on option found in SNPTEST, but may be preferable because they can write the data into the sample file directly where it can be later inspected.

Quantile-normalise columns of the sample file

The -quantile-normalise option takes a numerical column in the sample file, and adds a new column reflecting the same data after mapping empirical quantiles to those of a standard gaussian distribution. E.g.:

$ qctool -g example.bgen -s example.sample -os example.sample -quantile-normalise column1

This adds a column with name column1:quantile-normalised to the sample file. The specified columns must be continuous, i.e. of type 'P' or 'C'. (Note that even though the SNPs are not used here, currently you must still specify the -g option.)

The screen output (and log file) will contain details of the column values before and after normalisation.

Technical note: specifically, suppose there are N samples. This option first computes the quantiles q_j = Φ^-1(j/N+1) for each j=1,...,N, where Φ is the cumulative distribution function of a standard Gaussian. The normalised value for sample i is then obtained as n_i = q_r(i) where r(i) is the rank of the sample among the values of the given column. If several samples are tied, we instead set n_i to the average of q_r(i) across all samples with the same column value. This makes the normalised value well-defined (i.e. independent of the ranking of these samples), and means that samples with identical values of the input column will also have identical values in the computed quantile-normalised column.

Note that -quantile-normalise can be given a list of columns:

$ qctool -g example.bgen -s example.sample -os example.sample -quantile-normalise column1 column2...

This produces a set of quantile-normalised columns, one for each specified input column.