Manipulating the sample file
-condition-on option can be used to extract genotype dosages for specific given variants
and place them as columns in the sample file. E.g.:
Note: as shown above, it is sometimes necessary to place the argument in quotation marks to avoid problems with the system shell.
The full syntax for this option is:
selector is either "rsid", "snpid", or "position" or "pos",
is a string specifying the ID or position of the required variant, and dosage can be
gen adds both additive and heterozygote dosages.
Note: these options behave in a similar way to the
found in SNPTEST,
but may be preferable because they can write the data into the sample file directly where it can be
-quantile-normalise option takes a numerical column in the sample file,
and adds a new column reflecting the same data after mapping empirical quantiles to those of a
standard gaussian distribution. E.g.:
This adds a column with name
column1:quantile-normalised to the sample file.
The specified columns must be continuous, i.e. of type 'P' or 'C'.
(Note that even though the SNPs are not used here, currently you must still specify the -g option.)
The screen output (and log file) will contain details of the column values before and after normalisation.
Technical note: specifically, suppose there are N samples. This option first computes the quantiles qj = Φ-1(j/N+1) for each j=1,...,N, where Φ is the cumulative distribution function of a standard Gaussian. The normalised value for sample i is then obtained as ni = qr(i) where r(i) is the rank of the sample among the values of the given column. If several samples are tied, we instead set ni to the average of qr(i) across all samples with the same column value. This makes the normalised value well-defined (i.e. independent of the ranking of these samples), and means that samples with identical values of the input column will also have identical values in the computed quantile-normalised column.
-quantile-normalise can be given a list of columns: