mgrp-package {mgrp}R Documentation

Multiple Gene Risk Prediction Performance Simulation

Description

This package supports simulations of large groups of independent markers each conferring (multiplicative) risk of disease, where the risk conferred by each genotype is assumed to be known. For each simulation, various summary measures of the performance of the risk prediction algorithm are calculated.

Given an odds ratio, a frequency of the risk allele (in the population) and an overall disease prevalence, a disease likelihood ratio can be calculated (assuming Hardy-Weinberg equilibrium, and that each instance of the risk allele confers the same multiplicative risk).

Genetic profiles can be simulated given the overall frequency of the risk allele for a panel of genes. Assuming a known correct risk model (1) disease states can be simulated and (2) the performance of the panel of genes as a combined marker for risk prediction can be assessed.

Available measures of risk marker performance include:
[1] % identified as high risk in the full population
[2] % identified as high risk in the controls
[3] % identified as high risk in the cases
[4] AUC of the Receiver Operating Characteristic curve
[5] NRI (net reclassification improvment) as defined by Pencina et al (2008)
[6] reclassification % as defined by Cook (2007)
[7] expected benefit as defined by Vickers et al (2006)

Author(s)

Daryl Morris (Wen Jessie Gu)

Maintainer: Daryl Morris <darylm@u.washington.edu>

References

Cook. Circulation. 2007 Vol. 115 No. 7:928-35
Janssens et al. Genetics In Medicine. July 2006 Vol. 8 No. 7: 395-400
Pencina et al. Statistics in Medicine. 2008 Vol. 27 No. 2: 157-72
Pepe et el. Submitted to Cancer Epidemiology Biomarkers & Prevention. Vickers et al. Medical Decision Making. 2006 Vol 25 No. 6:565-74.

Examples

######################################################
# to generate a single row of Table 2.
simMultRiskPerf(or=1.05,f=.05,p=0.1,nog=50,seed=834234,highRisk=.2)

###################################################
# Code below generates the table 2 from the paper
#   (the code takes 5-10 minutes to execute, depending on memory)
#simMultRiskPerf(or=c(1.05,1.1,1.25,1.5),f=c(.05,.1,.3),p=.1,
#                nog=c(50,150,250,350), seed=834234)

###################################################
# Code below generates the table 3 from the paper
#  (an example with genes with varying f and or)
f<-c(.05+.005*(1:50),.3+.0005*(1:350))
maxOR <- 1.5
nog=c(20,50,150,250,350)
or <- c(seq(maxOR,1.15,length=20),1.15-.1/380*(1:380))
simulateRiskPerf(or=or,f=f,p=0.1,nog=nog,
                 seed=834234, varyEffects=TRUE)
                 
######################################################
# the datasets for models A and B can be generated and summarized using:
set.seed(123456)
s1 <- simulateGenotypes(or=1.05,f=.1,p=.1,nog=290)
s2 <- simulateGenotypes(or=1.1,f=.1,p=.1,nog=360)
summarizeRiskSet(s1,highRisk=.2)               
summarizeRiskSet(s2,highRisk=.2) 


[Package mgrp version 1.0 Index]