This is an illustration for using the package MiSTi to implement set-based GxE interaction test under mixed effects models. To form the input data, make sure that the columns of the data are arranged in the following order [Y E X G], where Y is an outcome variable, E is a matrix of environmental risk factors, X is a matrix of confounder(s), and G is a matrix of genotype(s). The current version of MiSTi allows multiple environmental risk factors (dimension is specified by option m), multiple confounders (dimension is specified by option d), and multiple variants (dimension is specified by option p).
Several R packages are called in MiSTi and should be installed beforehand. These include:
penalized
Matrix
psych
CompQuadForm (note: built under R version 3.3.2)
parallel
The main function to implement the integrative GxE analysis is MiSTi. The required inputs include
#’ @param
data: a nx(p+m+d+1) [Y,E,X,G] with rows represent subjects and columns are Y (outcome), E (environmental risk factors), X (confounders), and G (genotypes).
outcome_type: Either “Continuous” for quantitative trait, or “Binary” for dichotomous trait.
p: Number of genetic variants in the set.
m: Number of environmental risk factors.
d: Number of confounders. If d is 0, there is no confounder, and the input data should be arranged as [Y E G].
R: Number of burden scores (R>=0). When is 0, only the variance component test is performed and more than one terms are required in this situation.
Other options to customize the analysis are available, including
glm_use: A logical value with TRUE indicating the conventional generalized linear regression is used to fit the null model, and FALSE (default) indicating a ridge regression is used.
weight_method: One of “User”, “No weight”, “ScrnStat1”, and “ScrnStat2”. This specifies weights for calculating burden scores. “User”(default) allows for user-defined weights for each variant (e.g., functional annotations or equal weights). The weights are specified by the input option “user_weight”. “No weight” sets all weights = 0 and tests only the variance component. “ScrnStat1” and “ScrnStat2” use screening test statistics as weights, where “ScrnStat1” uses the maximum of G-E correlation and marginal association of G test statistics (Jiao et al. 2015) and “ScrnStat2” uses the sum of chi-squares of correlation and marginal association test statistics (Gauderman et al. 2013).
user_weight: A vector or a matrix specifying weights for calculating weighted burden scores. This option only works when the weight_method is set as “User”. If the weight_method is set as “User”" and no user_weight is specified, the weight is set as 1.
ind_burden_return: A logical value (default: TRUE) indicating if the test statistics and p-values for individual interaction terms between each E and each of burden scores are returned. This option only works when there are more than one ExBurden Scores terms (i.e. multiple E’s and/or multiple weights for each varaint).
combinations_return: A logical value (default: TRUE) indicating if the combination methods, including the optimal linear combination, data-adpative weighted combination and Fisher’s combination method, are returned. By assigning FALSE, only the burden and variance component tests are returned.
combination_preference: Either of “All” (default), or a vector containing “OptMin”, “AdptWt”, or “Fisher” to specify the combination method(s).
mac_lower_bound: The lower bound of minor allele count (MAC) for individual SNPs to be included in the analysis. Default is 6. SNPs with MAC<=5 are excluded from the analysis. A warning message is given when any SNPs are removed from the set.
chisq_app: Either “3M” (default) or “4M” for the moment matching (Liu’s) method in the quantile approximation in the optimal linear combination of burden and variance components. “3M” matches the 3rd moment and “4M” matches the 4th moment of the target and approximate distributions.
acc: A numerical value indicating the precision of the Davies method for p-value calculation. Default is 5e-10.
acc_auto: A logical value (default: TRUE) indicating if data adaptive precision is used in optimal linear combination. We recommend to set this as TRUE for computational efficiency.
accurate_app_threshold: A numerical value specifying the threshold to determine when the Liu or Davies method is used in the quantile approximation in the optimal linear combination of burden and variance components. Default is -log10(0.05).
max_core: An integer specifying the maximum number of cores that can be recruited in the parallel package. Default is 4 cores.
The references for the methods are:
Su, Y., Di, C. and Hsu, L. (2016) A unified powerful set-based test for sequencing data analysis of GxE interactions. Biostatistics, 18(1):119-131.
Jiao, S., Peters, U., Berndt, S., Bézieau, S., Brenner, H., Campbell, P.T., Chan, A.T., Chang‐Claude, J., Lemire, M., Newcomb, P.A. and Potter, J.D., 2015. Powerful Set‐Based Gene‐Environment Interaction Testing Framework for Complex Diseases. Genetic epidemiology, 39(8), pp.609-618.
Gauderman, W.J., Zhang, P., Morrison, J.L. and Lewinger, J.P., 2013. Finding Novel Genes by Testing G× E Interactions in a Genome‐Wide Association Study. Genetic epidemiology, 37(6), pp.603-613.
The following is an example of having simple burden score (weight being 1’s for all variants) and variance component test in MiSTi.
library(MiSTi)
# Y: binary outcome variable. a vector of length n.
# E: environmental risk factors. a vector of length n or a matrix of n rows.
# X: d confounders
# Either a vector of length n or a n*d matrix if d is greather than 1.
# G: genotypes of varaints. a n*p matrix.
# A special case with 1 environmental factor.
# Data generation
n = 2000
X = rbinom(n,size=1,prob=0.5)
E = rbinom(n,size=1,prob=0.5)
MAF = runif(10,min=0.001,max=0.01)
G = sapply(MAF,function(maf) rbinom(n,size=1,prob=maf))
eta = E*0.5 + G%*%rnorm(10,mean=0,sd=0.5) + (G*E)%*%rnorm(10,mean=0.5,sd=5)
Y = rbinom(n,size=1,prob=exp(eta)/(1+exp(eta)))
d = 1
p = 10
data = data.frame(Y=Y,E=E,X=X,G=G)
misti = MiSTi(data = data,
outcome_type = "Binary",
d = d,
p = p,
m = 1,
R = 1,
weight_method = "User"
)
misti
# If only the burden and variance component tests are of interest:
misti.NoCombination = MiSTi(data = data,
outcome_type = "Binary",
d = d,
p = p,
m = 1,
R = 1,
weight_method = "User",
combinations_return = FALSE
)
misti.NoCombination
# If only the Fisher's combination method is of interests:
misti.FisherCombination = MiSTi(data = data,
outcome_type = "Binary",
d = d,
p = p,
m = 1,
R = 1,
weight_method = "User",
combinations_return = TRUE,
combination_preference = "Fisher"
)
misti.FisherCombination
If a functional annotation for the genetic variants is available, one can incorporate the functional annotation into the GxE test. Below is the sample code.
# FA: a vector of length p containing functional annotation for variants in consideration.
FA = runif(p)
misti.FAOnly = MiSTi(data = data,
outcome_type = "Binary",
d = d,
p = p,
m = 1,
R = 1,
weight_method = "User",
user_weight = FA
)
misti.FAOnly
If the user would like to jointly test the effects of two weighted burden scores (e.g., a vector of 1’s and functional annotation), the sample code is shown below. In this case, the p-value of the joint BxE effects of the two burden scores and their individual p-values after adjusting for the presence of each other will be returned by default.
misti.2burdens = MiSTi(data = data,
outcome_type = "Binary",
d = d,
p = p,
m = 1,
R = 2,
weight_method = "User",
user_weight = cbind(rep(1,p),FA)
)
misti.2burdens
Below we provide sample code for situations under which there are multiple environmental factors to be tested, and their joint interaction effect with the genetic varaints are of interest. Extension to situations with functional annotations is straightforward by assigning values to the option ‘user_weight’.
E = sapply(c(0.5,0.5,0.5),function(p_e) rbinom(n,size=1,prob=p_e))
GE = do.call(cbind,sapply(1:ncol(E),simplify=FALSE,function(j) G*E[,j]))
eta = E%*%c(0,0,0) + G%*%rnorm(10,mean=0,sd=0.5) + GE%*%rnorm(ncol(GE),mean=0.5,sd=5)
Y = rbinom(n,size=1,prob=exp(eta)/(1+exp(eta)))
m = 3
data = data.frame(Y=Y,E=E,X=X,G=G)
misti.3Es = MiSTi(data = data,
outcome_type = "Binary",
d = d,
p = p,
m = m,
R = 1,
weight_method = "User"
)
misti.3Es