Fred Hutch Logo

Software

This website contains current versions of the software packages and working code resulting from our published research. Please contact us if you have any questions or concerns regarding the use of the methods or if you encounter any errors or bugs. Early releases of our code will be periodically updated.

Software for Analysis of Genotype Data

Sequence Kernel Association Test (SKAT)

Descriptions:

The Sequence Kernel Association Test is tool for region based testing of rare variants from sequencing data. In particular, the SKAT is designed for testing the association of rare (and common) variants from sequence data with a dichotomous or quantitative trait. We also provide tools for estimation of power and sample size in order to design future sequencing studies. Although we focus on rare variants within a region, the method is applicable to any set of rare variants and can be applied to accurately estimate p-values even at low (e.g. 10^-6) levels.

The method was developed and tailored towards rare variants. It can be applied to other types of data, e.g. gene expression data or common variants, but the tests can be slightly conservative. For other types of data, we recommend using the KM Test (below).

Downloads:

R packages: (NOTE: More Updated versions are available on CRAN (see below))
Windows
Linux
Manual

Reference:

Wu, M.C.#, Lee, S.#, Cai, T., Li, Y., Boehnke, M., Lin, X. (2011). "Rare variant association testing for sequencing data with the sequence kernel association test (SKAT)". The American Journal of Human Genetics, 89, 82-93 PDF

Additional Resources:

Most recent versions of the code as well as some examples can be found here.

Multi-Kernel Sequence Kernel Association Test (MK-SKAT)

Descriptions:

The Multi-Kernel SKAT is a practical framework built on the Sequence Kernel Association Test (SKAT) for conducting region based testing of rare variants from sequencing data. Specifically, the MK-SKAT takes a pragmatic approach to answering the questions: (1) which group of variants in the region should I test and (2) which of the many existing rare variant tests should I use? Since the answer to both questions depends on the true probalistic genetic model underlying the trait value (which is never known), MK-SKAT tests across a range of candidate groupings and candidate rare variant tests to generate a single p-value for significance of the region using perturbation. The methods allows for covariates and either quantitative or dichotomous traits.

Downloads:

R packages: Coming Soon!

Reference:

Coming Soon!

Logistic Kernel Machine Test

Descriptions:

The logistic kernel machine test is used for testing the association of a SNP set with a dichotomous outcome. Here, we define a SNP set to multiple SNPs which have been grouped based on some criterion: proximity to a gene, pathway/function grouping membership, or within a window of the genome. The method is developed for SNP data, but can, in principle, be applied to a wide range of genomic data types.

Note that the SKAT method (above) is built on the same framework, but is tailored towards rare variants and may be a little bit conservative for common variants at larger alpha-levels.

The software for conducting the logistic KMT has been superseded by the SKAT software (above), but modifications to the default SKAT parameters are necessary.

Downloads:

The previous software for the Logistic Kernel Machine Test has been superseded by the Sequence Kernel Association Test (SKAT) software (above). IMPORTANT: modifications to the default SKAT settings are needed since the defaults are aimed towards rare variants. (1) Please change the "kernel" parameter to "linear" or "IBS" since the weighted versions are primarily designed for rare variants. (2) One can set "method" equal to "liu" in order to more closely mimic the results of the original Logistic Kernel Machine Test.

Reference:

Wu, M.C., Kraft, P., Epstein, M.P., Taylor, D.M., Chanock, S.J., Hunter, D.J., and Lin, X. (2010). "Powerful SNP set analysis for case-control genome wide association studies". The American Journal of Human Genetics, 86, 929-942. PDF

Additional Resources:

500 Simulated data sets based on Model 1: Download

SNP-Set Kernel Interaction Test (SKIT)

Descriptions:

The SNP-set Kernel Interaction Test, (SKIT -- not to be confused with SKAT), is a tool for conducting gene or region based testing of gene-gene interactions. In particular, SKIT is used to test whether the SNPs in one SNP-set (the SNPs within a particular region or gene) interact with the SNPs in a second SNP-set. Currently, the method is only applied to quantitative traits, but extensions to dichotomous traits are possible and under development.

Downloads:

Working Code (in R)

Reference:

Clark, J.J., Maity, A., Harmon, Q.E., Engel, S.E., Epstein, M.P., Wu, M.C. (2013). "Gene and Region Based Testing of Gene-Gene Interactions for Quantitative Traits with the SNP-Set Kernel Interaction Test (SKIT)". Submitted.

Dual Kernel-Based Association Test (DKAT)

Descriptions:

The Dual Kernel-Based Association Test (DKAT) is a tool for associating a multivariate (possibly high-dimensional or structured) outcome with one or more genetic variants of interest. Both the outcome and genetic variant(s) are embedded within kernels to accommodate structures.

Downloads:

R packages:
Source/Linux
Manual

Reference:

Zhan, X., Zhao, N., Plantinga, A., Thornton, T.A., Conneely, K.N., Epstein, M.P., Wu, M.C. (2017). "Powerful genetic association analysis for common or rare variants with high-dimensional structured traits". Genetics, 206(4): 1779-1790. PDF

Software for Analysis of DNA Methylation Data

Global Analysis of Methylation Profiles (GAMP)

Descriptions:

This package is designed to conduct "global analysis" of DNA methylation data, particularly from the Illumina 450k Infinium platform. Instead of examining the effect of individual CpGs, the idea is to compare the overall profile or distribution of CpG measurements across individuals.

Briefly, each individual's methylation profile is summarized by approximating the density of the methylation distribution OR the cumulative distribution function (CDF) of the methylation distribution using B-splines. The B-spline coefficients are used to represent each individual's overall methylation distribution. To test for association between the overall distribution and a continuous or dichotomous variable of interest, we apply the SKAT test (above) to the spline coefficients. A single p-value is generated.

Software

Genotype Data

Epigenetic Data

Microbiome Data

Gene Expression Data

Other Software and Tools

Software for Analysis of Genotype Data

Sequence Kernel Association Test (SKAT)

Multi-Kernel Sequence Kernel Association Test (MK-SKAT)

Logistic Kernel Machine Test

SNP-Set Kernel Interaction Test (SKIT)

Dual Kernel-Based Association Test (DKAT)

Software for Analysis of DNA Methylation Data

Global Analysis of Methylation Profiles (GAMP)

Software for Analysis of Microbiome Data

Microbiome Regression-Based Kernel Association Test (MiRKAT)

MMiRKAT: Microbiome Regression-Based Kernel Association Test with Multivariate Outcomes

MiRKAT-S: Microbiome Regression-Based Kernel Association Test with Survival Outcomes

Software for Analysis of Gene Expression Data

Sparse Linear Discriminant Analysis (sLDA)

Other Software and Non-paradigm Specific Tools

Fast, Permutation-Free PERMANOVA

Kernelize: Computation of Useful Kernel Matrices