Merging variants from one dataset into another
-merge-in option can be used to merge variants in one dataset into another. For example:
This command produces a dataset that contains a record for each variant
from first.bgen and a record for each variant from second.bgen
- i.e. it has L1+L2 variants,
where L1 and L2 are the
number of variants in the two datsets.
Data is output for the set of samples in the first dataset; any other samples in the merged-in dataset are ignored.
By default, samples are matched by the first ID column in each dataset.
The -match-sample-ids option can be used to change this. For example:
column1 and column2 are columns in first.sample and second.sample respectively,
containing the fields to match on.
We recommend that sample file columns used to match samples should contain unique sample identifiers.
The -merge-strategy option controls what happens when the same variant appears in both
datasets. Possible values are -keep-all (the default) or -drop-duplicates.
For example:
In this command, if the same variant appears in first.bgen and in second.bgen,
only the first will be output. As when combining datasets,
the -compare-variants-by option is used to control how variants are compared, and it is assumed
that variants are sorted by these fields in each input dataset.
To further help disambiguate the source of data in the output file,
the -merge-prefix option can also be used to add a prefix to the identifier of each merged-in -variant, e.g.:
Currently this only affects the 'alternate' identifier fields (e.g. the SNPID field of GEN or BGEN files).