Introduction. bingwa is a command-line tool used to carry out meta-analysis of genome-wide association studies. It is particularly suited for use with the program SNPTEST, but can also be used with other association testing programs such as MMM. Features of bingwa include:
- Direct support for SNPTEST output files.
- Support for fixed-effect frequentist and a flexible bayesian meta-analysis method.
- Support for both univariate and multivariate meta-analysis (including support for meta-analysing tests performed using SNPTEST's multinomial test).
Citing bingwa.
This page documents version 2 of bingwa.
Citing bingwa.
If you make use of bingwa for multivariate analysis, please cite:
- "Insights into malaria susceptibility using genome-wide data on 17,000 individuals from Africa, Asia and Oceania", doi:10.1038/s41467-019-13480-z, Malaria Genomic Epidemiology Network, Nature Communications (2019)
If you make use of bingwa for univariate analysis please cite the following paper:
- "A novel locus of resistance to severe malaria in a region of ancient balancing selection", doi:10.1038/nature15390, Malaria Genomic Epidemiology Network, Nature (2015)
The methodological ideas underlying bingwa were developed and applied in several other papers:
- "Bayesian meta-analysis across genome-wide association studies of diverse phenotypes, doi:10.1002/gepi.22202, Genetic Epidemiology (2019).
- "Reappraisal of known malaria resistance loci in a large multi-centre study", doi:10.1038/ng.3107, Rockett K. A. et al, Nat. Genetics (2014).
- "Imputation-Based Meta-Analysis of Severe Malaria in Three African Populations", doi:10.1371/journal.pgen.1003509, Band G. et al, PLOS Genetics (2013)
- "Genome-wide association study identifies a variant in HDAC9 associated with large vessel ischemic stroke", doi:10.1038/ng.1081, International Stroke Genetics Consortium (ISGC) and Wellcome Trust Case Control Consortium 2 (WTCCC2), Nat. Genetics (2012)
Acknowledgements. The bingwa software was developed by Gavin Band at the Wellcome Centre for Human Genetics. Several people contributed to development of the underlying methods - including Chris C. A. Spencer, Matti Pirinen, Quang Si Le, Geraldine C. Clarke, and Holly Trochet.
(This page is under construction.)
The general process followed by bingwa is as follows. Briefly, bingwa proceeds by
- Loading association test data from a set of cohorts (e.g. SNPTEST files).
- Matching variants between cohorts.
- Computing frequentist meta-analysis and one or more Bayes factors.
- Writing results.
We'll update this page with more details on the method soon. In the meantime the tutorial has usage examples.
Running bingwa
-o sqlite://<filename>
can be used to force sqlite3 output format.
fixed:bf
.
fixed:bf
and independent:bf
columns.
Here, the independent model assumes that true effects are uncorrelated between the cohorts - i.e.
they have diagonal variance-covariance matrix.
In addition to the two model-specific bayes factors, bingwa will compute an equally weighted model-averaged bayes factor
over all the models specified, named mean_bf
.
Note: In the above command we use a short-hand version of the prior specification,
identified because it is of the form tau=...
.
Here, the first part (tau) denotes the between-cohort correlation, while the rest now
denotes the prior covariance matrix within each cohort. In this case there is one
effect parameter in each cohort, so the prior covariance is the 1x1 matrix (0.2^2),
and the correlation between cohorts is 1 (for fixed-effect model) or 0 (for independent-effect model).
While the difference in this example are small, this feature comes into its own when many cohorts are analysed. However, we always advise always checking the screen output to debug model definitions.
The -priors
option tells bingwa to load prior specifications from a file.
Priors in the file should be specified as on the command line, but with a couple of additional
features:
- Any line starting with # is ignored - this allows models to be commented.
- Line breaks (newlines) within model specifications are ignored if they
follow one of the 'punctuation' characters - i.e. immediately after a
:
,/
, or comma character. This allows long models to be formatted across lines.
Using a priors file has the advantage of allowing priors to be stored and edited seperately from the command lines.
Note: By default all models are equally weighted in the model-average bayes factor. Use the -prior-weights
option to reweight them. Weights are renormalised to sum to one in the mean bayes factor computation - i.e.
the mean Bayes factor is computed as
$$\text{BF}_{\text{avg}} = \frac{\sum_i \text{weight}_i \cdot \text{BF}_i}{\sum_i \text{weight}_i}$$
where $\text{weight}_i$ and $\text{BF}_i$ are the specified weight and Bayes factor for model $i$.
If -prior-weights
is specified, any model not specified in the argument is given weight zero in the mean Bayes factor.
The argument to -prior-weights
can also be the name of a file, in which case bingwa will open the file
and read prior weights from it (in the same format, i.e. <model name>=<weight>
).
-extra-columns
option tells bingwa to read data from the specified columns in the input file
and include it in the output.
Understanding bingwa sqlite output files
Below we've given some examples of using the shell or various programming languages to extract data from a bingwa
sqlite file.
We assume the file is named bingwa.sqlite
and the meta-analysis results were stored
using the default prefix ('Bingwa') for table names.
Note: The sqlite3
program is installed by default on most UNIX-like systems.
It's quite flexible, and can be used in interactive mode, or to run queries directly from the command-line.
We'll use both methods below.
See the sqlite3 command shell documentation for a full list of options.
Inspecting the contents of the results file
sqlite files can be inspected using the sqlite3 command shell. Open the file by typing:
$ sqlite3 -column -header bingwa.sqlite
You should see the sqlite prompt (sqlite3>
). Let's see what tables are stored in this file:
sqlite3> .tables Analysis BingwaCounts BingwaMetaView AnalysisProperty BingwaCountsView Variant AnalysisPropertyView BingwaDetail VariantIdentifier AnalysisStatus BingwaDetailView VariantView AnalysisStatusView BingwaMeta
Here, the main meta-analysis results are stored in the BingwaMeta
table,
and bingwa has also created the BingwaMetaView
view which provides a nicer view of the results.
The BingwaCounts
and BingwaDetail
tables contain, respectivelty,
genotype counts taken from the input files, and the raw summary statistics (beta and standard error)
that bingwa operated on.
The Analysis*
tables are used to store metadata about the analyses you have run.
Let's see what's stored in this file:
sqlite> SELECT * FROM Analysis ; id name chunk ---------- --------------- ---------- 1 bingwa analysis NA sqlite> SELECT COUNT(*) AS count FROM BingwaMetaView count ------- 200
So this file stores meta-analysis results for 200 variants from a single analysis called "bingwa analysis".
In a large, real analysis, it's often useful to use the -analysis-name
and -analysis-chunk
options to give the analysis a
more descriptive name and explain what part of the data it corresponds to. It's possible to store several analyses in the same sqlite file
(or even in the same table, provided analysis columns are compatible), so this information becomes important to understand the results.
Let's have a look at what the results actually look like. Because there are lots of columns we'll switch sqlite3 to line-based output:
sqlite3> .mode line sqlite3> SELECT * FROM BingwaMetaView LIMIT 1 ; rsid = RSID_2 alleleA = A alleleB = - analysis = bingwa analysis analysis_id = 1 variant_id = 2 chromosome = NA position = 2 cohort 1:bf = 0.685505148530258 cohort 2:bf = 0.754326805029774 FixedEffectMetaAnalysis:included_betas = 11 FixedEffectMetaAnalysis:N = 998.000188827515 FixedEffectMetaAnalysis:beta_1:add/bin2=1 = -0.048782749136173 FixedEffectMetaAnalysis:se_1 = 0.130323119437487 FixedEffectMetaAnalysis:wald_pvalue_1 = 0.708165117723267 FixedEffectMetaAnalysis:pvalue = 0.708165117723267 fixed:bf = 0.573458567964815 independent:bf = 0.517094908522291 mean_bf = 0.545276738243553 max_bf_model = fixed max_bf = 0.573458567964815 best_posterior_model = fixed best_posterior = 0.525841767807702 2nd_best_posterior_model = independent 2nd_best_posterior = 0.474158232192298
This analysis contains the results of both a fixed-effect frequentist and a bayesian analysis of two cohorts,
by default called "cohort 1" and "cohort 2". (The -cohort-names
option could have been used to change these names).
The first eight columns reflect metadata, that tells us the variant being tested and a link to the Analysis table (which useful if several analyses are stored in the same table).
The chromosome information here is unspecified (the -assume-chromosome
option could have been used
to fill in chromosome information during the analysis).
The next two columns contain Bayes factors (BFs) computed based on the data in each cohort independently. These are not strictly meta-analysis results, but are stored here for comparison with the meta-analysis Bayes factors below. (This might change in a future version of the software.)
The columns labelled 'FixedEffectMetaAnalysis' reflect the frequentist analysis. They include 'N'
a variable ('included_betas') which here indicates that data from both cohorts informed the analysis, and the
combined effect size estimate and standard error from meta-analysis. Here, because there is only one
effect size being estimed, the two p-value columns (pvalue
and wald_pvalue_1
) are
the same.
The following columns reflect the bayesian meta-analysis. In order, they represent the BF for each model, the model average BF, the maximum BF and the model with the max BF, details of the model with the highest posterior mass (which might be different from the maximum BF if the weighting is not uniform). Similarly, the last two columns reflect the model with the 2nd highest posterior mass.
These particular results don't look very interesting (i.e. a p-value of P=0.7, or a Bayes factor less than one, suggest very little evidence that the indel is associated). Nevertheless, if we were interested we could take a look at the data underlying this meta-analysis:
sqlite3> SELECT * FROM BingwaDetail WHERE analysis_id == 1 AND variant_id == 2; analysis_id = 1 variant_id = 2 chromosome = NA position = 2 cohort 1:beta_1:add/bin2=1 = 0.0237287990748882 cohort 1:se_1 = 0.186789005994797 cohort 1:pvalue = 0.898887991905212 cohort 2:beta_1:add/bin2=1 = -0.117560997605324 cohort 2:se_1 = 0.181916996836662 cohort 2:pvalue = 0.517305016517639
Estimated effects have opposite directions in the two cohorts. Or we could look at the genotype counts that SNPTEST reported for this test:
sqlite3> SELECT * FROM BingwaCounts WHERE analysis_id == 1 AND variant_id == 2; analysis_id = 1 variant_id = 2 chromosome = NA position = 2 cohort 1:A = 0.0 cohort 1:B = 0.0 cohort 1:AA = 24.1889991760254 cohort 1:AB = 149.192001342773 cohort 1:BB = 325.618988037109 cohort 1:NULL = 3.98792002025139e-13 cohort 1:N = 498.999988555908 cohort 1:B_allele_frequency = 0.802034063901899 cohort 1:trusted = 1 cohort 2:A = 0.0 cohort 2:B = 0.0 cohort 2:AA = 28.2371997833252 cohort 2:AB = 142.957000732422 cohort 2:BB = 327.805999755859 cohort 2:NULL = 5.11590986587707e-13 cohort 2:N = 499.000200271606 cohort 2:B_allele_frequency = 0.800169017777426 cohort 2:trusted = 1
Processing bingwa sqlite output files
Below are some example commands that process bingwa output in the command-line or in R. See above for a full description of bingwa's sqlite output.
01
, 02
, etc.
> library( RSQLite )
> db = dbConnect( dbDriver( "SQLite" ), "bingwa.sqlite" )
> D = dbGetQuery( db, "SELECT * FROM BingwaMetaView" )
Input file formats
.txt
, .csv
or .tsv
).
Output file formats
Bingwa can either output to a flat file or to a table in a sqlite database. Both have pros and cons.
Flat files are simple and easy to use. However, they can become difficult to read from for very large analyses (such as genome-wide analyses of datasets imputed into recent reference panels) which may require writing special file-handling code.
Flat files are written when the -flat-file
option is specified.
Sqlite files are easy to use via the sqlite3 command shell, which is installed on most UNIX-like systems. They can also be read directly from programming languages, e.g. by using the RSQLite package in R or the sqlite3 module in python. An attractive feature is that they are indexed, so that data for specific regions of the genome or specific variants can be easily found. See the examples page for code snippets for processing bingwa-produced sqlite files.
Sqlite is the default output format for bingwa.
Specifying priors
The file format used for the -priors
option follows the following rules.
- Blank lines and lines starting with # (comment lines) are ignored.
- Every other line is part of a prior specification.
- Each prior must start with a name specification of the form '
<name>:
'. - Two forms of prior specification are permissible, full and shorthand prior specification.
-
A full prior specification is of the form
sd=a,b,c,.../cor=1,x,y,...
. Herea,b,c,...,x,y,...
are real numbers. The number of sds specified must equal the combined number of parameters in all cohorts put together, denoted d; the number of correlations specified must then be equal tod×(d+1)/2
. Conceptually, suppose the valuesa,b,c,...
specify the diagonal entries of a matrix denoted σ, and suppose the values1,x,y,...
specify the upper triangle of the correlation matrix Ρ (which should have 1s on the diagonal as in the example). The prior matrix used for analysis is then σ×Ρ×σ. -
A shorthand prior specification is of the form
tau=z/sd=a,b,c,d,.../cor=1,x,y,...
. In this form, the number of sds specified must equal the number of parameters in each cohort (e.g. 1 for univariate analysis), denoted d; the number of correlations specified must then be equal tod×(d+1)/2
.
Binaries
Pre-compiled binaries are available for the following platforms. (See here for other builds.)
Version | Platform | File |
---|---|---|
v2.0.6 | CentOS Linux 7.8 x86-64 | bingwa_v2.0.6-CentOS_Linux7.8.2003-x86_64.tgz (1.6Mb) |
v2.0.6 | Mac OS X | bingwa_v2.0.6-osx.tgz (1.5Mb) |
To run bingwa, download the relevant file and extract it as follows.
$ tar -xzf bingwa_[version].tgz $ cd bingwa_[version] $ ./bingwa -help
Source
The source code to bingwa can be found in the bingwa repository at code.enkre.net. A tarball of the latest release is available via the download link.