This is an illustration for using the package MiST to implement set-based genetic association test under mixed effects models. To form the input data, make sure that the columns of the data are arranged in the following order [Y X G], where Y is an outcome variable, X is a matrix of confounder(s), and G is a matrix of genotype(s). MiST allows for multiple confounders (dimension is specified by option d), and multiple variants (dimension is specified by option p).
Several R packages are called in MiST and should be installed beforehand. These include:
The main function to implement the set-based test is MiST. The required inputs include:
data: a nx(p+d+1) [Y,X,G] with rows represent subjects and columns are Y (outcome), X (confounders), and G (genotypes).
outcome_type: Either “Continuous” for quantitative trait, or “Binary” for dichotomous trait.
p: Number of genetic variants in the set.
d: Number of confounders. If d is 0, there is no confounder, and the input data should be arranged as [Y E G].
R: Number of burden scores (R>=0). When is 0, only the variance component test is performed and more than one terms are required in this situation.
Other options to customize the analysis are available, including
weight_method: One of “User” or “No weight”. This specifies weights for calculating burden scores. “User”(default) allows for user-defined weights for each variant (e.g., functional annotations or equal weights). The weights are specified by the input option “user_weight”. “No weight” sets all weights = 0 and tests only the variance component.
user_weight: A vector or a matrix specifying weights for calculating weighted burden scores. This option only works when the weight_method is set as “User”. If the weight_method is set as “User”" and no user_weight is specified, the weight is set as 1.
combinations_return: A logical value (default: TRUE) indicating if the combination methods, including the optimal linear combination, data-adpative weighted combination and Fisher’s combination method, are returned. By assigning FALSE, only the burden and variance component tests are returned.
combination_preference: Either of “All” (default), or a vector containing “OptMin”, “AdptWt”, or “Fisher” to specify the combination method(s).
chisq_app: Either “3M” (default) or “4M” for the moment matching (Liu’s) method in the quantile approximation in the optimal linear combination of burden and variance components. “3M” matches the 3rd moment and “4M” matches the 4th moment of the target and approximate distributions.
acc: A numerical value indicating the precision of the Davies method for p-value calculation. Default is 5e-10.
acc_auto: A logical value (default: TRUE) indicating if data adaptive precision is used in optimal linear combination. We recommend to set this as TRUE for computational efficiency.
accurate_app_threshold: A numerical value specifying the threshold to determine when the Liu or Davies method is used in the quantile approximation in the optimal linear combination of burden and variance components. Default is -log10(0.05).
max_core: An integer specifying the maximum number of cores that can be recruited in the parallel package. Default is 4 cores.
The references for the methods are:
The following is an example of having simple burden score (weight being 1’s for all variants) and variance component test in MiST.
library(MiSTi)
# Y: binary outcome variable. a vector of length n.
# X: d confounders, either a nx1 vector or nxd matrix if d>1.
# G: genotypes of varaints. a n*p matrix.
# Data generation
n = 2000
set.seed(1234)
X = rbinom(n,size=1,prob=0.5)
MAF = runif(10,min=0.001,max=0.01)
G = sapply(MAF,function(maf) rbinom(n,size=1,prob=maf))
eta = X*0.5 + G%*%rnorm(10,mean=0,sd=0.5)
Y = rbinom(n,size=1,prob=exp(eta)/(1+exp(eta)))
d = 1
p = 10
data = data.frame(Y=Y, X=X,G=G)
mist = MiST(data = data,
outcome_type = "Binary",
d = d,
p = p,
R = 1,
weight_method = "User"
)
print(mist)
If a functional annotation for the genetic variants is available, one can incorporate the functional annotation into the set-based test. Below is the sample code.
# FA: a vector of length p containing functional annotation for variants in consideration.
FA = runif(p)
mist.FAOnly = MiST(data = data,
outcome_type = "Binary",
d = d,
p = p,
R = 1,
weight_method = "User",
user_weight = FA
)
If the user would like to jointly test the effects of two weighted burden scores (e.g., a vector of 1’s and functional annotation), the sample code is shown below. In this case, the p-value of the joint effects of the two burden scores and their individual p-values after adjusting for the presence of each other will be returned by default.
mist.2burdens = MiST(data = data,
outcome_type = "Binary",
d = d,
p = p,
m = 1,
R = 2,
weight_method = "User",
user_weight = cbind(rep(1,p),FA)
)