simulateGenotypes {mgrp}R Documentation

Simulate risk prediction genotypes and disease states and calculate performance of the risk markers using various measures.

Description

These functions (1) simulate a set of genotypes according to the specified OR, frequency of allele, and disease prevalence, then (2) simulate disease states according to the calculated risk and (3) report performance.

Usage

simulateGenotypes(or, f, p, n = 1e+05, nog = 400, varyEffects=FALSE, seed = 
NULL, silent=FALSE)

summarizeRiskSet(geneSimulation,highRisk=.2,roundDigits=3)

Arguments

or The disease odds ratio for the heterozygote relative to the homozygote non-risk allele (may be a vector if varyEffects==TRUE)
f The frequency of the risk allele in the population (may be a vector if varyEffects==TRUE)
p The prevalence of the disease in the population
n Total sample size
nog Number of genes in the model (may be a vector)
varyEffects TRUE or FALSE. specifies whether to allow or & f to vary over the different genes making up a samples full genotype. If TRUE, then the function checks to see whether f & or are vectors, and expands (or subsets) what it finds to specify or & f for each gene. For example, "or" might be a vector and f a single number, and the function will use the same f for each value of "or" extending the last value of "or" as necessary to reach the maximum model size as specified in nog.
seed (default NULL) A random seed which is set prior to the random number generation. Setting the seed consistently will produce consistent results.
silent (default FALSE) The function will attempt to do as many samples as it can at a time, but may run into memory allocation issues. The function will detect memory allocation issues and break the job into smaller chunks if necessary. "silent" controls whether or not the memory allocation issues are reported to the user.
geneSimulation The structure returned by simulateGenotypes
highRisk The definition for high risk to use as a cutpoint to define percentage at high risk, TPR, FPR
roundDigits The number of digits to which to round the results table.

Details

Given an odds ratio, a frequency of the risk allele (in the population) and an overall disease prevalence, a disease likelihood ratio can be calculated (assuming Hardy-Weinberg equilibrium, and that each instance of the risk allele confers the same multiplicative risk).
Genetic profiles can be simulated given the overall frequency of the risk allele for a panel of genes. Assuming a known correct risk model, disease states can be simulated.

Value

simulateGenotypes returns a list(parms,risk,disease):

parms A list of parameters to the function and calculated likelihood ratios and risks for each genotype. calcLR which includes the input parameters as well as the likelihood ratios for each genotype.
risk A matrix of calculated risks one for each subject for each size of model
disease A matrix of simulated disease states, one for each subject for each size of model

Author(s)

Daryl Morris

Examples

geneSimulation =simulateGenotypes(or=1.05,f=0.05,p=0.1,
                   nog=c(50,150,250,350), seed=834234)
summarizeRiskSet(geneSimulation,highRisk=.2)


######
# this sample shows use with a genotype profile with varying effects (f/or).

f      = c(.05+.005*(1:50),.3+.0005*(1:350))
maxOR  = 1.5
or     = c(seq(maxOR,1.15,length=20),1.15-.1/380*(1:380))
p      = .1
nog    = c(20,50,150) 
    
sg <- simulateGenotypes(or=or,f=f,p=p,n=100000,nog=nog,
                          varyEffects=TRUE,seed=834234)
summarizeRiskSet(sg,highRisk=.2,roundDigits=3)


[Package mgrp version 1.0 Index]