STITCH
Last update: October 2, 2016
STITCH is an R program for reference panel free, read aware, low coverage sequencing genotype imputation. STITCH runs on a set of samples with sequencing reads in BAM format, as well as a list of positions to genotype, and outputs imputed genotypes in VCF format.
STITCH works by modelling each chromosome in the set of samples as a mosaic of K unknown founders or ancestral haplotypes. STITCH employs a hidden Markov model, whose parameters are sequentially updated using expectation maximization. Both steps are handled in a read aware fashion done without using external reference haplotype sets.
STITCH has been tested on low coverage mouse and human data. STTICH has been run on mouse genotype-by-sequencing (GBS) data, although it may encounter issues when the number of reads becomes close to 0. For guidance on parameter options (like K), please see the supplementary note in Davies et al. given below
License
STITCH is free for academic use. For commercial inquiries please contact Robert Davies (robertwilliamdavies at gmail dot com) and Simon Myers (myers at stats dot ox dot ac dot uk)
Changelog
- v1.1.4
- Can work off CRAM files or BAM files. To use CRAM files, see cramlist and reference variables, or see examples script
- Changed GL as genotype likelihood to GP as genotype posterior probability in output VCF
- v1.1.3
- Changed R example script to work on provided example data
- Changed default downsampleToCov to 50 to reduce likelihood of overflow at high coverage SNPs
- Miscellaneous small fixes to BAM conversion script to better handle samples with very few reads
- Changed default region where reads from BAM are loaded (chrStart and chrEnd) to NA, to be inferred from posfile and the region to be imputed, rather than to grab reads from the whole chromosome
- Fixed ability to use high coverage validation samples (genfile) when using generateInputOnly and regenerateInput
- v1.1.2
- Fixed typo in header of outputted VCF
- v1.1.1
- Fixed bug where samples with no reads on a chromosome gave an old input format
- v1.1.0
- Changed default output to VCF (added option outputBlockSize to control how it is written)
- Removed two package dependencies
- Remove ability to write output to .gen format (remove outputGenFormat)
- Add change log to README and miscellaneous other changes
- v1.0.1
- Added ability to use soft clipped bases
- v1.0.0
Downloads
- STITCH_1.1.4.tar.gz STITCH program. For installation see README or example below
- STITCH_README_1.1.4.txt README file. Includes description, change log, installation information, included files, description of required or important program options and output
- STITCH_1.1.4.pdf PDF with command options available in R to run STITCH
- STITCH_examples_1.1.4.R Some examples of commands that can be used to run STITCH, illustrated on the example data
- STITCH_example_2016_05_10.tgz Example data to test installation of STITCH and demonstrate functionality
- mm10_2016_10_02.fa.gz Mouse reference fasta used for aligning example data. Useful to test CRAM functionality
Complete working example
- Install R if not already installed. Install R dependencies parallel, Rcpp and RcppArmadillo from CRAN (using the "install.packages" option within R), and Rsamtools from BioConductor
- Download STITCH from above. Install by opening R and using install.packages, giving install.packages the path to the downloaded STITCH tar.gz file above
- Download example tar.gz from above and open using a command such as tar -xzvf
- Run STITCH. Open R, change your working directory using setwd() to the directory where the example tar.gz was unzipped, and then run STITCH(tempdir = tempdir(), chr = "chr19", bamlist = "bamlist.txt", posfile = "pos.txt", genfile = "gen.txt", outputdir = paste0(getwd(), "/"), K = 4, nGen = 100, nCores = 1). Once complete, a VCF should appear in the current working directory named stitch.chr19.vcf.gz
Contact
If you have any problems running STITCH please contact Robert Davies (robertwilliamdavies at gmail dot com)
Citation
If you use STITCH, please cite
Davies, R. W., Flint J, Myers S., Mott R. Rapid genotype imputation from sequence without reference panels. Nat. Genet. 48, 965-969 (2016)