Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

The number of human genomes being genotyped or sequenced increases exponentially and efficient haplotype estimation methods able to handle this amount of data are now required. Here we present a method, SHAPEIT4, which substantially improves upon other methods to process large genotype and high coverage sequencing datasets. It notably exhibits sub-linear running times with sample size, provides highly accurate haplotypes and allows integrating external phasing information such as large reference panels of haplotypes, collections of pre-phased variants and long sequencing reads. We provide SHAPEIT4 in an open source format and demonstrate its performance in terms of accuracy and running times on two gold standard datasets: the UK Biobank data and the Genome In A Bottle.

Original publication




Journal article


Nature communications

Publication Date





Department of Computational Biology, University of Lausanne, Génopode, 1015, Lausanne, Switzerland.


Humans, Data Interpretation, Statistical, Sample Size, Sequence Analysis, DNA, Genotype, Haplotypes, Polymorphism, Single Nucleotide, Software, Biological Specimen Banks, High-Throughput Nucleotide Sequencing, Datasets as Topic