Introduction. bingwa is a command-line tool used to carry out meta-analysis of genome-wide association studies. It is particularly suited for use with the program SNPTEST, but can also be used with other association testing programs such as MMM. Features of bingwa include:

Direct support for SNPTEST output files.
Support for fixed-effect frequentist and a flexible bayesian meta-analysis method.
Support for both univariate and multivariate meta-analysis (including support for meta-analysing tests performed using SNPTEST's multinomial test).

Citing bingwa.

This page documents version 2 of bingwa.

Citing bingwa.

If you make use of bingwa for multivariate analysis, please cite:

"Insights into malaria susceptibility using genome-wide data on 17,000 individuals from Africa, Asia and Oceania", doi:10.1038/s41467-019-13480-z, Malaria Genomic Epidemiology Network, Nature Communications (2019)

If you make use of bingwa for univariate analysis please cite the following paper:

"A novel locus of resistance to severe malaria in a region of ancient balancing selection", doi:10.1038/nature15390, Malaria Genomic Epidemiology Network, Nature (2015)

The methodological ideas underlying bingwa were developed and applied in several other papers:

"Bayesian meta-analysis across genome-wide association studies of diverse phenotypes, doi:10.1002/gepi.22202, Genetic Epidemiology (2019).
"Reappraisal of known malaria resistance loci in a large multi-centre study", doi:10.1038/ng.3107, Rockett K. A. et al, Nat. Genetics (2014).
"Imputation-Based Meta-Analysis of Severe Malaria in Three African Populations", doi:10.1371/journal.pgen.1003509, Band G. et al, PLOS Genetics (2013)
"Genome-wide association study identifies a variant in HDAC9 associated with large vessel ischemic stroke", doi:10.1038/ng.1081, International Stroke Genetics Consortium (ISGC) and Wellcome Trust Case Control Consortium 2 (WTCCC2), Nat. Genetics (2012)

Acknowledgements. The bingwa software was developed by Gavin Band at the Wellcome Centre for Human Genetics. Several people contributed to development of the underlying methods - including Chris C. A. Spencer, Matti Pirinen, Quang Si Le, Geraldine C. Clarke, and Holly Trochet.

Running bingwa

This page shows common command lines to use bingwa. These examples assume the program is being run from a directory containing two SNPTEST files, cohort1.snptest and cohort2.snptest. You can see example files in the examples directory.

View the program usage page

$ bingwa -help | less

Compute fixed-effect meta-analysis of the two files

$ bingwa -data cohort1.snptest cohort2.snptest -o output.sqlite

$ bingwa -data cohort1.snptest cohort2.snptest -o output.csv

Note: bingwa determines the format of the output file by looking at the file extension. The first command above writes output to a sqlite3 file (more details on this below), while the second command writes to a csv file. Also supported are space-separated (.txt) or tab-separated (.tsv) files. The format -o sqlite://<filename> can be used to force sqlite3 output format.

Compute a fixed-effect bayesian meta-analysis

$ bingwa -data cohort1.snptest cohort2.snptest -o output.sqlite -prior fixed:sd=0.2,0.2/cor=1,1,1

Note: The above specifies a multivariate normal (MVN) prior on the 'true' effects that is centred at zero and has variance-covariance matrix with every entry equal 0.2². (Bingwa will print this model to the screen output so you can double-check this). Under this model the true effects in the two populations are assumed perfectly correlated; the only variation comes from estimation effects represented by the within-study standard errors. The first part of the prior specification (before the colon) is a user-specifiable name for the model. In this case the bayes factor for the model will be stored in a column named fixed:bf.

Compute both a fixed and independent-effect bayesian meta-analysis

$ bingwa -data cohort1.snptest cohort2.snptest -o output.sqlite -prior fixed:sd=0.2,0.2/cor=1,1,1 -prior independent:sd=0.2,0.2/cor=1,0,1

Note: In the above command the output will contain fixed:bf and independent:bf columns. Here, the independent model assumes that true effects are uncorrelated between the cohorts - i.e. they have diagonal variance-covariance matrix. In addition to the two model-specific bayes factors, bingwa will compute an equally weighted model-averaged bayes factor over all the models specified, named mean_bf.

Using model shorthand

$ bingwa -data cohort1.snptest ... cohort<n>.snptest -o output.sqlite -prior fixed:tau=1/sd=0.2/cor=1 -prior independent:tau=0/sd=0.2/cor=1

Note: In the above command we use a short-hand version of the prior specification, identified because it is of the form tau=.... Here, the first part (tau) denotes the between-cohort correlation, while the rest now denotes the prior covariance matrix within each cohort. In this case there is one effect parameter in each cohort, so the prior covariance is the 1x1 matrix (0.2^2), and the correlation between cohorts is 1 (for fixed-effect model) or 0 (for independent-effect model).

While the difference in this example are small, this feature comes into its own when many cohorts are analysed. However, we always advise always checking the screen output to debug model definitions.

Loading priors from a file

$ bingwa -data cohort1.snptest ... cohort<n>.snptest -o output.sqlite -priors priors.txt

The -priors option tells bingwa to load prior specifications from a file. Priors in the file should be specified as on the command line, but with a couple of additional features:

Any line starting with # is ignored - this allows models to be commented.
Line breaks (newlines) within model specifications are ignored if they follow one of the 'punctuation' characters - i.e. immediately after a :, /, or comma character. This allows long models to be formatted across lines.

Using a priors file has the advantage of allowing priors to be stored and edited seperately from the command lines.

Getting nicer output

$ bingwa -data cohort1.snptest cohort2.snptest -o output.sqlite -table-name Meta -cohort-names set1 set2 -analysis-name "My first meta-analysis"

Note: Bingwa includes a number of options that help in giving more interpretable names to the output. In the above command, we set the name of the table in which the primary output is produced (see the file formats page for details), a name for the analysis overall, and names for each of the cohorts.

Reweighting models

$ bingwa -data cohort1.snptest cohort2.snptest -o output.sqlite -prior fixed:sd=0.2,0.2/cor=1,1,1 independent:sd=0.2,0.2/cor=1,0,1 -prior-weights fixed=1 independent=0.5

Note: By default all models are equally weighted in the model-average bayes factor. Use the -prior-weights option to reweight them. Weights are renormalised to sum to one in the mean bayes factor computation - i.e. the mean Bayes factor is computed as $$\text{BF}_{\text{avg}} = \frac{\sum_i \text{weight}_i \cdot \text{BF}_i}{\sum_i \text{weight}_i}$$ where $\text{weight}_i$ and $\text{BF}_i$ are the specified weight and Bayes factor for model $i$. If -prior-weights is specified, any model not specified in the argument is given weight zero in the mean Bayes factor.

The argument to -prior-weights can also be the name of a file, in which case bingwa will open the file and read prior weights from it (in the same format, i.e. <model name>=<weight> ).

Getting extra columns in the output

$ bingwa -data cohort1.snptest cohort2.snptest -o output.sqlite -priors priors.txt -extra-columns controls_AA controls_AB controls_BB cases_AA cases_AB cases_BB frequentist_add_null_ll frequentist_add_alternative_ll

Note: It's often useful to record extra information from the input files in the output. The -extra-columns option tells bingwa to read data from the specified columns in the input file and include it in the output.

Understanding bingwa sqlite output files

Below we've given some examples of using the shell or various programming languages to extract data from a bingwa sqlite file. We assume the file is named bingwa.sqlite and the meta-analysis results were stored using the default prefix ('Bingwa') for table names.

Note: The sqlite3 program is installed by default on most UNIX-like systems. It's quite flexible, and can be used in interactive mode, or to run queries directly from the command-line. We'll use both methods below. See the sqlite3 command shell documentation for a full list of options.

Inspecting the contents of the results file

sqlite files can be inspected using the sqlite3 command shell. Open the file by typing:


							$ sqlite3 -column -header bingwa.sqlite

You should see the sqlite prompt (sqlite3>). Let's see what tables are stored in this file:

sqlite3> .tables
Analysis							BingwaCounts					BingwaMetaView			
AnalysisProperty			BingwaCountsView			Variant							
AnalysisPropertyView	BingwaDetail					VariantIdentifier		
AnalysisStatus				BingwaDetailView			VariantView					
AnalysisStatusView		BingwaMeta

Here, the main meta-analysis results are stored in the BingwaMeta table, and bingwa has also created the BingwaMetaView view which provides a nicer view of the results. The BingwaCounts and BingwaDetail tables contain, respectivelty, genotype counts taken from the input files, and the raw summary statistics (beta and standard error) that bingwa operated on.

The Analysis* tables are used to store metadata about the analyses you have run. Let's see what's stored in this file:

sqlite> SELECT * FROM Analysis ;
id					name						 chunk		 
----------	---------------	 ----------
1						bingwa analysis	 NA				 

sqlite> SELECT COUNT(*) AS count FROM BingwaMetaView
count	 
-------
200

So this file stores meta-analysis results for 200 variants from a single analysis called "bingwa analysis".

In a large, real analysis, it's often useful to use the -analysis-name and -analysis-chunk options to give the analysis a more descriptive name and explain what part of the data it corresponds to. It's possible to store several analyses in the same sqlite file (or even in the same table, provided analysis columns are compatible), so this information becomes important to understand the results.

Let's have a look at what the results actually look like. Because there are lots of columns we'll switch sqlite3 to line-based output:

sqlite3> .mode line
sqlite3> SELECT * FROM BingwaMetaView LIMIT 1 ;
                                    rsid = RSID_2
                                  alleleA = A
                                  alleleB = -
                                 analysis = bingwa analysis
                              analysis_id = 1
                               variant_id = 2
                               chromosome = NA
                                 position = 2
                              cohort 1:bf = 0.685505148530258
                              cohort 2:bf = 0.754326805029774
   FixedEffectMetaAnalysis:included_betas = 11
                FixedEffectMetaAnalysis:N = 998.000188827515
FixedEffectMetaAnalysis:beta_1:add/bin2=1 = -0.048782749136173
             FixedEffectMetaAnalysis:se_1 = 0.130323119437487
    FixedEffectMetaAnalysis:wald_pvalue_1 = 0.708165117723267
           FixedEffectMetaAnalysis:pvalue = 0.708165117723267
                                 fixed:bf = 0.573458567964815
                           independent:bf = 0.517094908522291
                                  mean_bf = 0.545276738243553
                             max_bf_model = fixed
                                   max_bf = 0.573458567964815
                     best_posterior_model = fixed
                           best_posterior = 0.525841767807702
                 2nd_best_posterior_model = independent
                       2nd_best_posterior = 0.474158232192298

This analysis contains the results of both a fixed-effect frequentist and a bayesian analysis of two cohorts, by default called "cohort 1" and "cohort 2". (The -cohort-names option could have been used to change these names).

The first eight columns reflect metadata, that tells us the variant being tested and a link to the Analysis table (which useful if several analyses are stored in the same table). The chromosome information here is unspecified (the -assume-chromosome option could have been used to fill in chromosome information during the analysis).

The next two columns contain Bayes factors (BFs) computed based on the data in each cohort independently. These are not strictly meta-analysis results, but are stored here for comparison with the meta-analysis Bayes factors below. (This might change in a future version of the software.)

The columns labelled 'FixedEffectMetaAnalysis' reflect the frequentist analysis. They include 'N' a variable ('included_betas') which here indicates that data from both cohorts informed the analysis, and the combined effect size estimate and standard error from meta-analysis. Here, because there is only one effect size being estimed, the two p-value columns (pvalue and wald_pvalue_1) are the same.

The following columns reflect the bayesian meta-analysis. In order, they represent the BF for each model, the model average BF, the maximum BF and the model with the max BF, details of the model with the highest posterior mass (which might be different from the maximum BF if the weighting is not uniform). Similarly, the last two columns reflect the model with the 2nd highest posterior mass.

These particular results don't look very interesting (i.e. a p-value of P=0.7, or a Bayes factor less than one, suggest very little evidence that the indel is associated). Nevertheless, if we were interested we could take a look at the data underlying this meta-analysis:

sqlite3> SELECT * FROM BingwaDetail WHERE analysis_id == 1  AND variant_id == 2;
               analysis_id = 1
                variant_id = 2
                chromosome = NA
                  position = 2
cohort 1:beta_1:add/bin2=1 = 0.0237287990748882
             cohort 1:se_1 = 0.186789005994797
           cohort 1:pvalue = 0.898887991905212
cohort 2:beta_1:add/bin2=1 = -0.117560997605324
             cohort 2:se_1 = 0.181916996836662
           cohort 2:pvalue = 0.517305016517639

Estimated effects have opposite directions in the two cohorts. Or we could look at the genotype counts that SNPTEST reported for this test:

sqlite3> SELECT * FROM BingwaCounts WHERE analysis_id == 1  AND variant_id == 2;
                analysis_id = 1
                 variant_id = 2
                 chromosome = NA
                   position = 2
                 cohort 1:A = 0.0
                 cohort 1:B = 0.0
                cohort 1:AA = 24.1889991760254
                cohort 1:AB = 149.192001342773
                cohort 1:BB = 325.618988037109
              cohort 1:NULL = 3.98792002025139e-13
                 cohort 1:N = 498.999988555908
cohort 1:B_allele_frequency = 0.802034063901899
           cohort 1:trusted = 1
                 cohort 2:A = 0.0
                 cohort 2:B = 0.0
                cohort 2:AA = 28.2371997833252
                cohort 2:AB = 142.957000732422
                cohort 2:BB = 327.805999755859
              cohort 2:NULL = 5.11590986587707e-13
                 cohort 2:N = 499.000200271606
cohort 2:B_allele_frequency = 0.800169017777426
           cohort 2:trusted = 1

Processing bingwa sqlite output files

Below are some example commands that process bingwa output in the command-line or in R. See above for a full description of bingwa's sqlite output.

Extract a csv file of all results.

$ sqlite3 -header -csv bingwa.sqlite "SELECT * FROM BingwaMetaView" > my_file.csv

Extract results from a specific region.

$ sqlite3 -header -csv bingwa.sqlite "SELECT * FROM BingwaMetaView WHERE chromosome == '05' AND position BETWEEN 120000000 AND 121000000" > my_file.csv

Note: bingwa always stores chromosome identifiers as strings. Currently these will be two-character strings with single-digit chromosomes appearing as 01, 02, etc.

Read bingwa results into R

# R code to load bingwa results
> library( RSQLite )
> db = dbConnect( dbDriver( "SQLite" ), "bingwa.sqlite" )
> D = dbGetQuery( db, "SELECT * FROM BingwaMetaView" )

Note: You need the RSQLite library installed to process sqlite files from R.

Input file formats

Bingwa supports reading SNPTEST output files. Files are assumed space-, comma- or tab-delimited according to the file extension (.txt, .csv or .tsv).

Output file formats

Bingwa can either output to a flat file or to a table in a sqlite database. Both have pros and cons.

Flat files are simple and easy to use. However, they can become difficult to read from for very large analyses (such as genome-wide analyses of datasets imputed into recent reference panels) which may require writing special file-handling code.

Flat files are written when the -flat-file option is specified.

Sqlite files are easy to use via the sqlite3 command shell, which is installed on most UNIX-like systems. They can also be read directly from programming languages, e.g. by using the RSQLite package in R or the sqlite3 module in python. An attractive feature is that they are indexed, so that data for specific regions of the genome or specific variants can be easily found. See the examples page for code snippets for processing bingwa-produced sqlite files.

Sqlite is the default output format for bingwa.

Specifying priors

The file format used for the -priors option follows the following rules.

Blank lines and lines starting with # (comment lines) are ignored.
Every other line is part of a prior specification.
Each prior must start with a name specification of the form '<name>:'.
Two forms of prior specification are permissible, full and shorthand prior specification.
A full prior specification is of the form sd=a,b,c,.../cor=1,x,y,.... Here a,b,c,...,x,y,... are real numbers. The number of sds specified must equal the combined number of parameters in all cohorts put together, denoted d; the number of correlations specified must then be equal to d×(d+1)/2. Conceptually, suppose the values a,b,c,... specify the diagonal entries of a matrix denoted σ, and suppose the values 1,x,y,... specify the upper triangle of the correlation matrix Ρ (which should have 1s on the diagonal as in the example). The prior matrix used for analysis is then σ×Ρ×σ.
A shorthand prior specification is of the form tau=z/sd=a,b,c,d,.../cor=1,x,y,.... In this form, the number of sds specified must equal the number of parameters in each cohort (e.g. 1 for univariate analysis), denoted d; the number of correlations specified must then be equal to d×(d+1)/2.

Version	Platform	File
v2.0.6	CentOS Linux 7.8 x86-64	bingwa_v2.0.6-CentOS_Linux7.8.2003-x86_64.tgz (1.6Mb)
v2.0.6	Mac OS X	bingwa_v2.0.6-osx.tgz (1.5Mb)