Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

<jats:title>Abstract</jats:title><jats:sec><jats:title>Background</jats:title><jats:p>Non-coding variants have emerged as important contributors to the pathogenesis of human diseases, not only as common susceptibility alleles but also as rare high-impact variants. Despite recent advances in the study of regulatory elements and the availability of specialized data collections, the systematic annotation of non-coding variants from genome sequencing remains challenging.</jats:p></jats:sec><jats:sec><jats:title>Results</jats:title><jats:p>We integrated 24 data sources to develop a standardized collection of 2.4 million regulatory elements in the human genome, transcription factor binding sites, DNase peaks, ultra-conserved non-coding elements, and super-enhancers. Information on controlled gene(s), tissue(s) and associated phenotype(s) are provided for regulatory elements when possible. We also calculated a variation constraint metric for regulatory regions and showed that genes controlled by constrained regions are more likely to be disease-associated genes and essential genes from mouse knock-out screenings. Finally, we evaluated 16 non-coding impact prediction scores providing suggestions for variant prioritization. The companion tool allows for annotation of VCF files with information about the regulatory regions as well as non-coding prediction scores to inform variant prioritization. The proposed annotation framework was able to capture previously published disease-associated non-coding variants and its integration in a routine prioritization pipeline increased the number of candidate genes, including genes potentially correlated with patient phenotype, and established clinically relevant genes.</jats:p></jats:sec><jats:sec><jats:title>Conclusion</jats:title><jats:p>We have developed a resource for the annotation and prioritization of regulatory variants in WGS analysis to support the discovery of candidate disease-associated variants in the non-coding genome.</jats:p></jats:sec>

Original publication

DOI

10.1101/2020.09.17.301960

Type

Journal article

Publisher

Cold Spring Harbor Laboratory

Publication Date

20/09/2020