A molecular barcode and online tool to identify and map imported infection with Plasmodium vivax
Trimarsanto H., Amato R., Pearson RD., Sutanto E., Noviyanti R., Trianty L., Marfurt J., Pava Z., Echeverry DF., Lopera-Mesa TM., Montenegro LM., Tobón-Castaño A., Grigg MJ., Barber B., William T., Anstey NM., Getachew S., Petros B., Aseffa A., Assefa A., Rahim AG., Chau NH., Hien TT., Alam MS., Khan WA., Ley B., Thriemer K., Wangchuck S., Hamedi Y., Adam I., Liu Y., Gao Q., Sriprawat K., Ferreira MU., Barry A., Mueller I., Drury E., Goncalves S., Simpson V., Miotto O., Miles A., White NJ., Nosten F., Kwiatkowski DP., Price RN., Auburn S.
<jats:title>Abstract</jats:title><jats:p>Imported cases present a considerable challenge to the elimination of malaria. Traditionally, patient travel history has been used to identify imported cases, but the long-latency liver stages confound this approach in <jats:italic>Plasmodium vivax</jats:italic>. Molecular tools to identify and map imported cases offer a more robust approach, that can be combined with drug resistance and other surveillance markers in high-throughput, population-based genotyping frameworks. Using a machine learning approach incorporating hierarchical FST (HFST) and decision tree (DT) analysis applied to 831 <jats:italic>P. vivax</jats:italic> genomes from 20 countries, we identified a 28-Single Nucleotide Polymorphism (SNP) barcode with high capacity to predict the country of origin. The Matthews correlation coefficient (MCC), which provides a measure of the quality of the classifications, ranging from −1 (total disagreement) to 1 (perfect prediction), exceeded 0.9 in 15 countries in cross-validation evaluations. When combined with an existing 37-SNP <jats:italic>P. vivax</jats:italic> barcode, the 65-SNP panel exhibits MCC scores exceeding 0.9 in 17 countries with up to 30% missing data. As a secondary objective, several genes were identified with moderate MCC scores (median MCC range from 0.54-0.68), amenable as markers for rapid testing using low-throughput genotyping approaches. A likelihood-based classifier framework was established, that supports analysis of missing data and polyclonal infections. To facilitate investigator-lead analyses, the likelihood framework is provided as a web-based, open-access platform (vivaxGEN-geo) to support the analysis and interpretation of data produced either at the 28-SNP core or full 65-SNP barcode. These tools can be used by malaria control programs to identify the main reservoirs of infection so that resources can be focused to where they are needed most.</jats:p>