simulateRiskPerf {mgrp}R Documentation

Simulate genotypes and risk prediction performance.

Description

Simulates a set of genotype profiles and disease states for each member of a sample and estimates risk prediction performance.

simMultRiskPerf allows for disease odds ratio ("or"), frequency of the risk allele("f"), disease prevalence("p") and number of genes in the model("nog") to be vectors. A separate simulation is done for the genotype profile specified by every combination of the parameters.

simulateRiskPerf takes the same parameters but doesn't allow for multiple genotype profiles (or,f,nog,p). Instead simulateRiskPerf supports genotype profiles with varying effect strengths across the genes in the profile (different "f" and "or" within one profile). In this case, simulateRiskPerf also takes a vector of "f" and "or" which it uses to specify a single genotype profile. (NOTE: simulateRiskPerf is used by simMultRiskPerf.)

Usage

simMultRiskPerf(or, f, p, n = 1e+05, nog = 400, 
                highRisk = 0.2, roundDigits = 3, seed = NULL, silent=FALSE)

simulateRiskPerf(or, f, p, n = 1e+05, nog = 400, varyEffects=FALSE, 
                 highRisk = 0.2, roundDigits = 3, seed = NULL, silent=FALSE )

Arguments

or The disease odds ratio for the heterozygote relative to the homozygote non-risk allele (may be a vector in either simMultRiskPerf or simulateRiskPerf BUT the vectors are used differently)
f The frequency of the risk allele in the population (may be a vector in either simMultRiskPerf or simulateRiskPerf BUT the vectors are used differently)
p The prevalence of the disease in the population (may be a vector in simMultRiskPerf)
n (default 100,000) Total sample size
nog Number of genes in the model (may be a vector in either simMultRiskPerf or simulateRiskPerf and used in the same way)
varyEffects TRUE or FALSE(default). specifies whether to allow or & f to vary over the different genes (see help for simulateGenotypes) NOTE: varyEffects is ONLY supported in simulateRiskPerf() NOT in simMultRiskPerf
highRisk (default 0.2) The definition for high risk to use as a cutpoint to define percentage at high risk, TPR, FPR (may be a vector in either simMultRiskPerf or simulateRiskPerf and used in the same way)
roundDigits (default 3) The number of digits to which to round the results table.
seed (default NULL) A random seed which is set prior to the random number generation. Setting the seed consistently will produce consistent results.
silent (default FALSE) Argument for simulateGenotypes(). (see help for simulateGenotypes) Controls whether or not the memory allocation issues are reported to the user.

Details

Given an odds ratio, a frequency of the risk allele (in the population) and an overall disease prevalence, a disease likelihood ratio can be calculated (assuming Hardy-Weinberg equilibrium, and that each instance of the risk allele confers the same multiplicative risk).

Genetic profiles can be simulated given the overall frequency of the risk allele for a panel of genes. Assuming a known correct risk model (1) disease states can be simulated and (2) the performance of the panel of genes as a combined marker for risk prediction can be assessed.

Available measures of risk marker performance include:
[1] % identified as high risk in the full population
[2] % identified as high risk in the controls
[3] % identified as high risk in the cases
[4] AUC of the Receiver Operating Characteristic curve
[5] NRI (net reclassification improvment) as defined by Pencina et al (2008)
[6] reclassification % as defined by Cook et al (2007)
[7] expected benefit as defined by Vickers et al (2006)

Author(s)

Daryl Morris

Examples

######################################################
# to generate a single row of Table 2.
simulateRiskPerf(or=1.05,f=.05,p=0.1,nog=50,seed=834234,highRisk=.2)

###################################################
# code below generates the table 2 from the paper
#   (the code takes about 3-5 minutes to execute)
#simMultRiskPerf(or=c(1.05,1.1,1.25,1.5),f=c(.05,.1,.3),p=.1,
#                nog=c(50,150,250,350), seed=834234)

###################################################
# Code below generates the table 3 from the paper
#  (an example with genes with varying f and or)
f<-c(.05+.005*(1:50),.3+.0005*(1:350))
maxOR <- 1.5
or <- c(seq(maxOR,1.15,length=20),1.15-.1/380*(1:380))
simulateRiskPerf(or=or,f=f,p=0.1,nog=c(20,50,150,250,350),
                 seed=834234, varyEffects=TRUE)



[Package mgrp version 1.0 Index]