MinION Analysis and Reference Consortium: Phase 2 data release and analysis of R9.0 chemistry
Jain M., Tyson JR., Loose M., Ip CLC., Eccles DA., O'Grady J., Malla S., Leggett RM., Wallerman O., Jansen HJ., Zalunin V., Birney E., Brown BL., Snutch TP., Olsen HE.
<ns4:p>Background: Long-read sequencing is rapidly evolving and reshaping the suite of opportunities for genomic analysis. For the MinION in particular, as both the platform and chemistry develop, the user community requires reference data to set performance expectations and maximally exploit third-generation sequencing. We performed an analysis of MinION data derived from whole genome sequencing of <ns4:italic>Escherichia</ns4:italic> <ns4:italic>coli</ns4:italic> K-12 using the R9.0 chemistry, comparing the results with the older R7.3 chemistry.</ns4:p><ns4:p> Methods: We computed the error-rate estimates for insertions, deletions, and mismatches in MinION reads.</ns4:p><ns4:p> Results: Run-time characteristics of the flow cell and run scripts for R9.0 were similar to those observed for R7.3 chemistry, but with an 8-fold increase in bases per second (from 30 bps in R7.3 and SQK-MAP005 library preparation, to 250 bps in R9.0) processed by individual nanopores, and less drop-off in yield over time. The 2-dimensional (“2D”) N50 read length was unchanged from the prior chemistry. Using the proportion of alignable reads as a measure of base-call accuracy, 99.9% of “pass” template reads from 1-dimensional (“1D”) experiments were mappable and ~97% from 2D experiments. The median identity of reads was ~89% for 1D and ~94% for 2D experiments. The total error rate (miscall + insertion + deletion ) decreased for 2D “pass” reads from 9.1% in R7.3 to 7.5% in R9.0 and for template “pass” reads from 26.7% in R7.3 to 14.5% in R9.0.</ns4:p><ns4:p> Conclusions: These Phase 2 MinION experiments serve as a baseline by providing estimates for read quality, throughput, and mappability. The datasets further enable the development of bioinformatic tools tailored to the new R9.0 chemistry and the design of novel biological applications for this technology.</ns4:p><ns4:p> Abbreviations: K: thousand, Kb: kilobase (one thousand base pairs), M: million, Mb: megabase (one million base pairs), Gb: gigabase (one billion base pairs).</ns4:p>