Here is a catalogue of software available for biomarker evaluation. We welcome contributions in the form of documents describing what a program does and how to access it.
Stata 8 provides a limited set of ROC commands within the main package. Documentation and examples from the reference manuals are here. Briefly, the following tasks can be done:
After fitting a logistic regression model in Stata, an ROC curve for the predicted model can be plotted using the lroc post estimation command. Related post estimation commands for logistic regression are:
To access this set of programs, from within Stata, link to the DABS center:
net from https://research.fhcrc.org/content/dam/stripe/diagnostic-biomarkers-statistical-center/files/stata/
Install the package:
net install pcvsuite
This will currently install four commands: roccurve, comproc, rocreg and inroc. A description of how to use them is obtained by using the help facility, e.g., help roccurve opens a window that explains the syntax and option for the roccurve command. A copy of the help files can be obtained here: roccurve, comproc, rocreg and incroc.
Briefly, the roccurve command plots an estimate of the ROC curve for one or more diagnostic tests (or biomarkers). Confidence intervals can be displayed for the TPF (true positive fraction) corresponding to a specified FPF (false positive fraction). Confidence intervals are calculated using the bootstrap. The comproc command calculates summary ROC indices for two tests along with confidence intervals for each and for the difference. A p-value for testing equality of the ROCs based on the summary indices is output. The rocreg command fits an ROC-GLM regression model. Covariate adjustment is accommodated in all three commands.
To update the pcvsuite commands at a later time, in Stata type:
adoupdate, update
To uninstall the pcvsuite type:
ado uninstall pcvsuite
Additionally, two articles have been published in the Stata Journal describing these commands:
Pepe, M.S., Longton G, Janes, H. 2009. Estimation and comparison of receiver operating characteristic curves. Stata Journal 9(1),1-16.
Janes, H., Longton G, Pepe, M.S. 2009. Accommodating covariates in receiver operating characteristic analysis. Stata Journal 9(1),17-39.
To access this set of programs, from within Stata, link to the DABS center:
net from https://research.fhcrc.org/content/dam/stripe/diagnostic-biomarkers-statistical-center/files/stata/
You will be asked which package you wish to install. Click on screensize. This will install three Stata commands: rocsize, aucsize and srcsize. These commands perform simulation studies to assess the statistical power that a phase 2 study has, i.e., a study to evaluate performance of a single biomarker. See section 8.2 of Pepe (2003) for formulas to calculate sample sizes for a phase 2 study. The formulas are based on asymptotic theory and will not provide exactly the right power in practice. Following the calculations one should run simulation studies and adjust sample sizes until the power is at the desired level. See Pepe (2003, page 220) for discussion. Running these simulations is fun. Try it!
Excerpt from Pepe, MS (2003) The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford University Press.
To access this set of programs, from within Stata, link to the DABS center:
net from https://research.fhcrc.org/content/dam/stripe/diagnostic-biomarkers-statistical-center/files/stata/
You will be prompted with a list of packages you could net describe. Click on risk_prediction. This will currently install two commands, incrisk and predcurve. A program description is available by using the help facility, e.g., help predcurve opens a window that explains the syntax and options for the predcurve command. PDFs of the help files can be obtained here for predcurve and incrisk.
To update the risk_prediction package at a later time, in Stata type:
adoupdate risk_prediction, update
S-plus version 7 does not appear to have ROC analysis commands as part of its standard package. We searched the CRAN archive for contributed R programs to perform ROC analysis. We found none to recommend as user friendly and comprehensive. For time-dependent ROC analysis (not yet on CRAN 12/16/2006), this can be accessed through the website of Dr. Patrick Heagerty.
When the outcome variable of interest is an event that occurs some time after the test is measured, ROC curves must be time dependent. Two approaches have been proposed by Patrick Heagerty and colleagues. Software and documentation can be obtained here. Alternative approaches have been described in the literature.
Heagerty PJ and Zheng Y (2005). Survival model predictive accuracy an dROC curves. Biometrics 61:92-105.
Heagerty PJ, Lumley T and Pepe MS (2000) Time dependent ROC curves for censored survival data and diagnostic markers.Biometrics 56:337-344.
The pcvsuite package for R can be downloaded here:
The package contains four main functions: roccurve, comproc, rocreg and predcurve. Briefly, the roccurve command plots an estimate of the ROC curve for one or more diagnostic tests (or biomarkers). Confidence intervals can be displayed for the TPF (true positive fraction) corresponding to a specified FPF (false positive fraction). Confidence intervals are calculated using the bootstrap. The comproc command calculates summary ROC indices for two tests along with confidence intervals for each and for the difference. A p-value for testing equality of the ROCs based on the summary indices is output. The rocreg command fits an ROC-GLM regression model. Covariate adjustment is accommodated in all three commands.
The predcurve uses the risk distribution associated with a marker or model to evaluate marker utility when applied to the population. The classification performance is optionally included in an integrated display of predictiveness and classification measures. Alternate graphical outputs include CDFs and densities of the risk estimation. Support for nested models, and for testing differences between two models is provided.Documentation on all three commands is also contained here:
Additionally, these articles (Stata J 2009;1:1-16, Stata J 2009;1:17-19) published in the Stata Journal explain both commands and their options in more detail. Note that the syntax in the articles is all Stata-specific; however, the methods and rationale used to implement the functions in R remain the same. The arguments are also the same between R and Stata.
Predictiveness curves are described in this article: AmJEpid 2008;3:362-368.
This package supports simulations of large groups of independent markers each conferring (multiplicative) risk of disease, where the risk conferred by each genotype is assumed to be known. For each simulation, various summary measures of the performance of the risk prediction algorithm are calculated.
Given an odds ratio, a frequency of the risk allele (in the population) and an overall disease prevalence, a disease likelihood ratio can be calculated (assuming Hardy-Weinberg equilibrium, and that each instance of the risk allele confers the same multiplicative risk).
Genetic profiles can be simulated given the overall frequency of the risk allele for a panel of genes. Assuming a known correct risk model (1) disease states can be simulated and (2) the performance of the panel of genes as a combined marker for risk prediction can be assessed.
The package can be downloaded here.
Documentation on how to use the package is here.
Additionally, Pepe MS, Gu W, Morris DE (2010) has results from the package. The appendix also describes the software.
Pepe MS, Gu W, Morris DE (2010). The Potential of Genes and Other Markers to Inform about Risk. Cancer Epidemiology, Biomarkers and Prevention 19(3):655-665.
SPSS provides basic ROC commands within the main package. By following Graphs -> ROC Curves, you can plot multiple ROC curves and get the AUC (with optional confidence intervals) for each curve. You need to specify a test variable and a state variable. For example, you can save predicted probabilities from a logistic regression command and use those as your test variable. AUCs can be estimated nonparametrically or under the assumption of a binegative exponential distribution. You also have the option of getting each of the coordinates of the empirical ROC.
SAS 9.1 provides basic ROC commands within the main package through the Proc Logistic command. Documentation and examples from the reference manuals are here. Briefly, the following tasks can be done:
When fitting a Proc Logistic, specify a dataset for outroc= in the model line. Then use Proc Gplot to plot the ROC curve.
Dr. Charles Metz of the radiology department at the Univeristy of Chicago pioneered the development of software for ROC analysis, particularly for radiology reading studies. Binormal ROC curves are emphasized. Analysis methods for multi-reader multi-case studies are also available here.
The Cleveland Clinic has Fortran programs to:
MedCalc is a commercial software package designed to analyze several different types of biomedical data. It includes an ROC component. MedCalc provides an online user's manual with a chapter on their ROC features.
MedCalc provides the following capabilities for analyzing ROC curves.
Most are found under the Statistics->ROC Curves menu: