This website contains current versions of the software packages and working code resulting from our published research. Please contact us if you have any questions or concerns regarding the use of the methods or if you encounter any errors or bugs. Early releases of our code will be periodically updated.
Sparse Linear Discriminant Analysis (sLDA)
Fast, Permutation-Free PERMANOVA
Kernelize: Computation of Useful Kernel Matrices
Descriptions:
The Sequence Kernel Association Test is tool for region based testing of rare variants from sequencing data. In particular, the SKAT is designed for testing the association of rare (and common) variants from sequence data with a dichotomous or quantitative trait. We also provide tools for estimation of power and sample size in order to design future sequencing studies. Although we focus on rare variants within a region, the method is applicable to any set of rare variants and can be applied to accurately estimate p-values even at low (e.g. 10^-6) levels.
The method was developed and tailored towards rare variants. It can be applied to other types of data, e.g. gene expression data or common variants, but the tests can be slightly conservative. For other types of data, we recommend using the KM Test (below).
Downloads:
R packages: (NOTE: More Updated versions are available on CRAN (see below))
Windows
Linux
Manual
Reference:
Wu, M.C.#, Lee, S.#, Cai, T., Li, Y., Boehnke, M., Lin, X. (2011). "Rare variant association testing for sequencing data with the sequence kernel association test (SKAT)". The American Journal of Human Genetics, 89, 82-93 PDF
Additional Resources:
Most recent versions of the code as well as some examples can be found here.
Descriptions:
The Multi-Kernel SKAT is a practical framework built on the Sequence Kernel Association Test (SKAT) for conducting region based testing of rare variants from sequencing data. Specifically, the MK-SKAT takes a pragmatic approach to answering the questions: (1) which group of variants in the region should I test and (2) which of the many existing rare variant tests should I use? Since the answer to both questions depends on the true probalistic genetic model underlying the trait value (which is never known), MK-SKAT tests across a range of candidate groupings and candidate rare variant tests to generate a single p-value for significance of the region using perturbation. The methods allows for covariates and either quantitative or dichotomous traits.
Downloads:
R packages: Coming Soon!
Reference:
Coming Soon!
Descriptions:
The logistic kernel machine test is used for testing the association of a SNP set with a dichotomous outcome. Here, we define a SNP set to multiple SNPs which have been grouped based on some criterion: proximity to a gene, pathway/function grouping membership, or within a window of the genome. The method is developed for SNP data, but can, in principle, be applied to a wide range of genomic data types.
Note that the SKAT method (above) is built on the same framework, but is tailored towards rare variants and may be a little bit conservative for common variants at larger alpha-levels.
The software for conducting the logistic KMT has been superseded by the SKAT software (above), but modifications to the default SKAT parameters are necessary.
Downloads:
The previous software for the Logistic Kernel Machine Test has been superseded by the Sequence Kernel Association Test (SKAT) software (above). IMPORTANT: modifications to the default SKAT settings are needed since the defaults are aimed towards rare variants. (1) Please change the "kernel" parameter to "linear" or "IBS" since the weighted versions are primarily designed for rare variants. (2) One can set "method" equal to "liu" in order to more closely mimic the results of the original Logistic Kernel Machine Test.
Reference:
Wu, M.C., Kraft, P., Epstein, M.P., Taylor, D.M., Chanock, S.J., Hunter, D.J., and Lin, X. (2010). "Powerful SNP set analysis for case-control genome wide association studies". The American Journal of Human Genetics, 86, 929-942. PDF
Additional Resources:
500 Simulated data sets based on Model 1: Download
Descriptions:
The SNP-set Kernel Interaction Test, (SKIT -- not to be confused with SKAT), is a tool for conducting gene or region based testing of gene-gene interactions. In particular, SKIT is used to test whether the SNPs in one SNP-set (the SNPs within a particular region or gene) interact with the SNPs in a second SNP-set. Currently, the method is only applied to quantitative traits, but extensions to dichotomous traits are possible and under development.
Downloads:
Working Code (in R)
Reference:
Clark, J.J., Maity, A., Harmon, Q.E., Engel, S.E., Epstein, M.P., Wu, M.C. (2013). "Gene and Region Based Testing of Gene-Gene Interactions for Quantitative Traits with the SNP-Set Kernel Interaction Test (SKIT)". Submitted.
Descriptions:
The Dual Kernel-Based Association Test (DKAT) is a tool for associating a multivariate (possibly high-dimensional or structured) outcome with one or more genetic variants of interest. Both the outcome and genetic variant(s) are embedded within kernels to accommodate structures.
Downloads:
R packages:
Source/Linux
Manual
Reference:
Zhan, X., Zhao, N., Plantinga, A., Thornton, T.A., Conneely, K.N., Epstein, M.P., Wu, M.C. (2017). "Powerful genetic association analysis for common or rare variants with high-dimensional structured traits". Genetics, 206(4): 1779-1790. PDF
Descriptions:
This package is designed to conduct "global analysis" of DNA methylation data, particularly from the Illumina 450k Infinium platform. Instead of examining the effect of individual CpGs, the idea is to compare the overall profile or distribution of CpG measurements across individuals.
Briefly, each individual's methylation profile is summarized by approximating the density of the methylation distribution OR the cumulative distribution function (CDF) of the methylation distribution using B-splines. The B-spline coefficients are used to represent each individual's overall methylation distribution. To test for association between the overall distribution and a continuous or dichotomous variable of interest, we apply the SKAT test (above) to the spline coefficients. A single p-value is generated.
Although the method is developed for DNA methylation data, it can be adapted to other types of data as well; however, the current software assumes that input values are between 0 and 1 (corresponding to percent methylation).
This package depends on the fda and SKAT R packages.
Downloads:
R packages:
Windows/Linux
Manual
Reference:
Zhao, N., Bell, D.A., Maity, A., Staicu, A.-M., Joubert, B.R., London, S.J., Wu, M.C. (2015). "Global analysis of methylation profiles from high resolution CpG data". Genetic Epidemiology, 39:53-64. PDF
Descriptions:
This package is designed to test for differential microbiome composition in reference to a continuous or dichotomous variable of interest at the community level. The main MiRKAT function can be used to analyze microbiome data under a single kernel (dissimilarity metric) or under multiple dissimilarities (optimal MiRKAT) by specifying multiple kernels.
Note: the package no longer requires the BiasedUrn Package.
Downloads:
R Package:
Linux/Mac
Windows
Manual
Vignette
Reference:
Zhao, N., Chen, J., Carroll, I.M., Ringel-Kulka, T., Epstein, M.P., Zhou, H., Zhou, J.J., Ringel, Y., Li, H., Wu, M.C. (2015). "Testing in microbiome profiling studies with MiRKAT, the Microbiome Regression-based Kernel Association Test (MiRKAT)". The American Journal of Human Genetics, 96(5): 797-807. PDF
Descriptions:
This package is designed to test for differential microbiome composition in reference to a multivariate outcome at the community level.
MMiRKAT assumes that the dimensionality of the outcome is not excessively large. Please contact us for an alternative approach if interest is in high-dimensional or structured outcomes.
Downloads:
R Package:
Linux/Mac
Windows
Manual
Reference:
Zhan, X., Tong, X., Zhao, N., Maity, A., Wu, M.C.#, Chen, J.# (2017). "A small-sample multivariate kernel machine test for microbiome association studies". Genetic Epidemiology, 41(3): 210-220. PDF
[#Joint Corresponding Author]
Descriptions:
This package is designed to test for differential microbiome composition in reference to a censored survival outcome at the community level.
Downloads:
R Package:
Linux/Mac
Windows
Manual
Vignette
Reference:
Plantinga, A., Zhan, X., Zhao, N., Chen, J., Jenq, R.R., Wu, M.C. (2017). "MiRKAT-S: A community-level test of association between the microbiota and survival times". Microbiome, 5(1): 17. Open Access
Descriptions:
We apply Sparse Linear Discriminant Analysis (sLDA) for testing the significance of Gene Pathways when signal is relatively weak. Also included is general code for running two-group L1 penalized linear discriminant analysis. Current software is only working code. Please contact us if you have any questions or concerns.
Downloads:
Working code (in R)
Reference:
Wu, M.C., Zhang, L., Wang, Z., Christiani, D.C., Lin, X. (2009). "Sparse linear discriminant analysis for simultaneous gene set/pathway significance test and gene selection". Bioinformatics, 25,1145-1151. PDF
Descriptions:
This package facilitates testing between a group of omics features and one or more other variables of interest. The approach is generally equivalent to PERMANOVA (See below) but uses parametric resampling instead of permutation in order to improve computational efficiency.
PERMANOVA continues to represent a powerful approach for analyzing gene expression, germline genotyping, and microbiome data by associating groups of features (e.g. transcripts, polymorphisms, taxa etc) with other meta data. However, the reliance on permutation is computationally expensive when sample size is large, number of permutations is large (for getting accurate p-values at low alpha levels), or when many analyses need to be done (for simulations or power calculations).
Downloads:
R packages:
Windows
Linux
Manual
Reference:
Zhan, X., Wu, M.C. "FPERMANOVA: A fast, permutation-free PERMANOVA procedure". Submitted.
Descriptions:
The Kernelize package is for estimation of certain commonly used kernel matrices. This includes standard polynomial kernels, gaussian kernels, as well as the IBS kernel. Since standard estimation involves "for loops", the main contribution here is the use of a back end in C.
Downloads: