| simulateRiskPerf {mgrp} | R Documentation |
Simulates a set of genotype profiles and disease states for each member of a sample and estimates risk prediction performance.
simMultRiskPerf allows for disease odds ratio ("or"), frequency of the risk allele("f"), disease prevalence("p") and number of genes in the model("nog") to be vectors. A separate simulation is done for the genotype profile specified by every combination of the parameters.
simulateRiskPerf takes the same parameters but doesn't allow for multiple genotype profiles (or,f,nog,p). Instead simulateRiskPerf supports genotype profiles with varying effect strengths across the genes in the profile (different "f" and "or" within one profile). In this case, simulateRiskPerf also takes a vector of "f" and "or" which it uses to specify a single genotype profile. (NOTE: simulateRiskPerf is used by simMultRiskPerf.)
simMultRiskPerf(or, f, p, n = 1e+05, nog = 400,
highRisk = 0.2, roundDigits = 3, seed = NULL, silent=FALSE)
simulateRiskPerf(or, f, p, n = 1e+05, nog = 400, varyEffects=FALSE,
highRisk = 0.2, roundDigits = 3, seed = NULL, silent=FALSE )
or |
The disease odds ratio for the heterozygote relative to the homozygote non-risk allele (may be a vector in either simMultRiskPerf or simulateRiskPerf BUT the vectors are used differently) |
f |
The frequency of the risk allele in the population (may be a vector in either simMultRiskPerf or simulateRiskPerf BUT the vectors are used differently) |
p |
The prevalence of the disease in the population (may be a vector in simMultRiskPerf) |
n |
(default 100,000) Total sample size |
nog |
Number of genes in the model (may be a vector in either simMultRiskPerf or simulateRiskPerf and used in the same way) |
varyEffects |
TRUE or FALSE(default). specifies whether to allow or & f to vary over the different genes (see help for simulateGenotypes) NOTE: varyEffects is ONLY supported in simulateRiskPerf() NOT in simMultRiskPerf |
highRisk |
(default 0.2) The definition for high risk to use as a cutpoint to define percentage at high risk, TPR, FPR (may be a vector in either simMultRiskPerf or simulateRiskPerf and used in the same way) |
roundDigits |
(default 3) The number of digits to which to round the results table. |
seed |
(default NULL) A random seed which is set prior to the random number generation. Setting the seed consistently will produce consistent results. |
silent |
(default FALSE) Argument for simulateGenotypes(). (see help for simulateGenotypes) Controls whether or not the memory allocation issues are reported to the user. |
Given an odds ratio, a frequency of the risk allele (in the population)
and an overall disease prevalence, a disease likelihood ratio can be
calculated (assuming Hardy-Weinberg equilibrium, and that each instance of
the risk allele confers the same multiplicative risk).
Genetic profiles can be simulated given the overall frequency of the risk
allele for a panel of genes. Assuming a known correct risk model (1)
disease states can be simulated and (2) the performance of the panel of
genes as a combined marker for risk prediction can be assessed.
Available measures of risk marker performance include:
| [1] | % identified as high risk in the full population |
| [2] | % identified as high risk in the controls |
| [3] | % identified as high risk in the cases |
| [4] | AUC of the Receiver Operating Characteristic curve |
| [5] | NRI (net reclassification improvment) as defined by Pencina et al (2008) |
| [6] | reclassification % as defined by Cook et al (2007) |
| [7] | expected benefit as defined by Vickers et al (2006) |
Daryl Morris
######################################################
# to generate a single row of Table 2.
simulateRiskPerf(or=1.05,f=.05,p=0.1,nog=50,seed=834234,highRisk=.2)
###################################################
# code below generates the table 2 from the paper
# (the code takes about 3-5 minutes to execute)
#simMultRiskPerf(or=c(1.05,1.1,1.25,1.5),f=c(.05,.1,.3),p=.1,
# nog=c(50,150,250,350), seed=834234)
###################################################
# Code below generates the table 3 from the paper
# (an example with genes with varying f and or)
f<-c(.05+.005*(1:50),.3+.0005*(1:350))
maxOR <- 1.5
or <- c(seq(maxOR,1.15,length=20),1.15-.1/380*(1:380))
simulateRiskPerf(or=or,f=f,p=0.1,nog=c(20,50,150,250,350),
seed=834234, varyEffects=TRUE)