Software

Here is a catalogue of software available for biomarker evaluation. Please note that Gary Longton and Dr. Pepe have both retired and some code and packages may no longer be supported.

Stata Programs

Stata Corp

Stata 8 provides a limited set of ROC commands within the main package. Documentation and examples from the reference manuals are here. Briefly, the following tasks can be done:

plot one or more empirical ROC curves
estimate and compare nonparametric AUCs
plot one or more binormal ROC curves for ordinal test data
estimate and compare binormal AUCs for ordinal test data

After fitting a logistic regression model in Stata, an ROC curve for the predicted model can be plotted using the lroc post estimation command. Related post estimation commands for logistic regression are:

estat classification reports various summary statistics, including the classification table.
estat gof reports the Pearson goodness-of-fit test or the Hosmer-Lemeshow goodness-of-fit test.
lroc graphs the ROC curve and calculates the area under the curve.
lsens graphs sensitivity and specificity versus probability cutoff and optionally creates new variables containing these data.

Gary Longton and Margaret Pepe

To access this set of programs, from within Stata, link to the DABS center:

net from https://research.fhcrc.org/content/dam/research/diagnostic-biomarkers-statistical-center/files/stata/

Install the package:

net install pcvsuite

This will currently install four commands: roccurve, comproc, rocreg and inroc. A description of how to use them is obtained by using the help facility, e.g., help roccurve opens a window that explains the syntax and option for the roccurve command. A copy of the help files can be obtained here: roccurve, comproc, rocreg and incroc.

Briefly, the roccurve command plots an estimate of the ROC curve for one or more diagnostic tests (or biomarkers). Confidence intervals can be displayed for the TPF (true positive fraction) corresponding to a specified FPF (false positive fraction). Confidence intervals are calculated using the bootstrap. The comproc command calculates summary ROC indices for two tests along with confidence intervals for each and for the difference. A p-value for testing equality of the ROCs based on the summary indices is output. The rocreg command fits an ROC-GLM regression model. Covariate adjustment is accommodated in all three commands.

To update the pcvsuite commands at a later time, in Stata type:

adoupdate, update

To uninstall the pcvsuite type:

ado uninstall pcvsuite

Additionally, two articles have been published in the Stata Journal describing these commands:
Pepe, M.S., Longton G, Janes, H. 2009. Estimation and comparison of receiver operating characteristic curves. Stata Journal 9(1),1-16.
Janes, H., Longton G, Pepe, M.S. 2009. Accommodating covariates in receiver operating characteristic analysis. Stata Journal 9(1),17-39.

Gary Longton and Margaret Pepe

To access this set of programs, from within Stata, link to the DABS center:

net from https://research.fhcrc.org/content/dam/research/diagnostic-biomarkers-statistical-center/files/stata/

You will be asked which package you wish to install. Click on screensize. This will install three Stata commands: rocsize, aucsize and srcsize. These commands perform simulation studies to assess the statistical power that a phase 2 study has, i.e., a study to evaluate performance of a single biomarker. See section 8.2 of Pepe (2003) for formulas to calculate sample sizes for a phase 2 study. The formulas are based on asymptotic theory and will not provide exactly the right power in practice. Following the calculations one should run simulation studies and adjust sample sizes until the power is at the desired level. See Pepe (2003, page 220) for discussion. Running these simulations is fun. Try it!

Excerpt from Pepe, MS (2003) The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford University Press.

Gary Longton and Margaret Pepe

To access this set of programs, from within Stata, link to the DABS center:

net from https://research.fhcrc.org/content/dam/research/diagnostic-biomarkers-statistical-center/files/stata/

You will be prompted with a list of packages you could net describe. Click on risk_prediction. This will currently install two commands, incrisk and predcurve. A program description is available by using the help facility, e.g., help predcurve opens a window that explains the syntax and options for the predcurve command. PDFs of the help files can be obtained here for predcurve and incrisk.

To update the risk_prediction package at a later time, in Stata type:

adoupdate risk_prediction, update

Andrew Vickers

View Decision Curve Analysis information - Memorial Sloan Kettering Cancer Center

R/S-plus Programs

S-plus version 7 does not appear to have ROC analysis commands as part of its standard package. We searched the CRAN archive for contributed R programs to perform ROC analysis. We found none to recommend as user friendly and comprehensive. For time-dependent ROC analysis (not yet on CRAN 12/16/2006), this can be accessed through the website of Dr. Patrick Heagerty.

Patrick Heagerty

When the outcome variable of interest is an event that occurs some time after the test is measured, ROC curves must be time dependent. Two approaches have been proposed by Patrick Heagerty and colleagues. Software and documentation can be obtained here. Alternative approaches have been described in the literature.

References

Heagerty PJ and Zheng Y (2005). Survival model predictive accuracy an dROC curves. Biometrics 61:92-105.

Heagerty PJ, Lumley T and Pepe MS (2000) Time dependent ROC curves for censored survival data and diagnostic markers.Biometrics 56:337-344.

Aasthaa Bansal, Dayrl Morris and Margaret Pepe

The pcvsuite package for R can be downloaded here:

The package contains four main functions: roccurve, comproc, rocreg and predcurve. Briefly, the roccurve command plots an estimate of the ROC curve for one or more diagnostic tests (or biomarkers). Confidence intervals can be displayed for the TPF (true positive fraction) corresponding to a specified FPF (false positive fraction). Confidence intervals are calculated using the bootstrap. The comproc command calculates summary ROC indices for two tests along with confidence intervals for each and for the difference. A p-value for testing equality of the ROCs based on the summary indices is output. The rocreg command fits an ROC-GLM regression model. Covariate adjustment is accommodated in all three commands.

The predcurve uses the risk distribution associated with a marker or model to evaluate marker utility when applied to the population. The classification performance is optionally included in an integrated display of predictiveness and classification measures. Alternate graphical outputs include CDFs and densities of the risk estimation. Support for nested models, and for testing differences between two models is provided.Documentation on all three commands is also contained here:

Additionally, these articles (Stata J 2009;1:1-16, Stata J 2009;1:17-19) published in the Stata Journal explain both commands and their options in more detail. Note that the syntax in the articles is all Stata-specific; however, the methods and rationale used to implement the functions in R remain the same. The arguments are also the same between R and Stata.

Predictiveness curves are described in this article: AmJEpid 2008;3:362-368.

Daryl Morris, Jessie Wen Gu and Margaret Pepe

This package supports simulations of large groups of independent markers each conferring (multiplicative) risk of disease, where the risk conferred by each genotype is assumed to be known. For each simulation, various summary measures of the performance of the risk prediction algorithm are calculated.

Given an odds ratio, a frequency of the risk allele (in the population) and an overall disease prevalence, a disease likelihood ratio can be calculated (assuming Hardy-Weinberg equilibrium, and that each instance of the risk allele confers the same multiplicative risk).

Genetic profiles can be simulated given the overall frequency of the risk allele for a panel of genes. Assuming a known correct risk model (1) disease states can be simulated and (2) the performance of the panel of genes as a combined marker for risk prediction can be assessed.

The package can be downloaded here.

Pepe MS, Gu W, Morris DE (2010) has results from the package. The appendix also describes the software.

References

Pepe MS, Gu W, Morris DE (2010). The Potential of Genes and Other Markers to Inform about Risk. Cancer Epidemiology, Biomarkers and Prevention 19(3):655-665.

SPSS

Versions 9.0 and above

SPSS provides basic ROC commands within the main package. By following Graphs -> ROC Curves, you can plot multiple ROC curves and get the AUC (with optional confidence intervals) for each curve. You need to specify a test variable and a state variable. For example, you can save predicted probabilities from a logistic regression command and use those as your test variable. AUCs can be estimated nonparametrically or under the assumption of a binegative exponential distribution. You also have the option of getting each of the coordinates of the empirical ROC.

Learn more about the SPSS Inc. software.

SAS

SAS 9.1 provides basic ROC commands within the main package through the Proc Logistic command. Documentation and examples from the reference manuals are here. Briefly, the following tasks can be done:

plot one empirical ROC curve from a logistic regression model
estimate nonparametric AUCs through a c-statistic

When fitting a Proc Logistic, specify a dataset for outroc= in the model line. Then use Proc Gplot to plot the ROC curve.

User articles and documentation

Estimating the Area under a Receiver Operating Characteristic (ROC) Curve For Repeated Measures Design
Honghu Liu and Tongtong Wu
Statistical Methods in Diagnostic Medicine using SAS® Software
Jay N. Mandrekar and Sumithra J. Mandrekar
Includes comparing ROC curves when two tests are taken on the same measurements, ROC curves from a logistic regression, and power calculations for the AUC.
How to Display Correlated ROC Curves with the SAS System
Barbara Schneider
New capabilities for ROC curve plotting in SAS version 9.2
Dale McLerran
(SAS datasets: compare markers, remission test, remission train)
The Cleveland Clinic has SAS macros to calculate ROC sample sizes (1 reader, or 1-2 ROC curves), calculate ROC sample sizes with multiple readers, and plot ROC curves using SAS/Graph.

MedCalc

MedCalc is a commercial software package designed to analyze several different types of biomedical data. It includes an ROC component. MedCalc provides an online user's manual with a chapter on their ROC features.

MedCalc provides the following capabilities for analyzing ROC curves.

Most are found under the Statistics->ROC Curves menu:

Enter data through manual input into an internal spreadsheet or import data through Excel, SPSS, or several other programs.
Under "ROC analysis", plot a single ROC curve (with optional confidence bounds). The output is a graph, with the optimal cutpoint marked on the graph. Additional output includes AUC and its 95% confidence interval, and a table with the sensitivity and specificity at several cut-points, with their 95% confidence intervals.
Under "Interactive dot-diagram", plot the observed values as points, and interactively move the cut-point around to find the sensitivity and specificity of the test.
Under "Plot versus criterion", plot the sensitivity and specificity versus different cut-points, with optional 95% confidence intervals.
Under "Comparison of ROC curves", plot up to 6 different ROC curves, get the AUCs for each plot, and the pairwise-comparisons between curves.
Under "Predictive values", manually enter sensitivity, specificity, and prevalence to get out the positive predictive value and negative predictive value.
It is possible to alter the format of graphs and export as an image.

Software

Stata Programs

Commercial Package Stata 8

Stata Corp

Basic ROC Analysis

Gary Longton and Margaret Pepe

Sample Size Calculations

Gary Longton and Margaret Pepe

Evaluating Risk Prediction Markers

Gary Longton and Margaret Pepe

Decision Curve Analysis

Andrew Vickers

R/S-plus Programs

Commercial Package S-Plus 7

Time-dependent ROC Curves

Patrick Heagerty

Basic ROC Analysis and Evaluation of Risk Prediction Markers

Aasthaa Bansal, Dayrl Morris and Margaret Pepe

Multiple Gene Risk Prediction Performance (Windows Binary Package)

Daryl Morris, Jessie Wen Gu and Margaret Pepe

SPSS

Base SPSS

Versions 9.0 and above

SAS

Base SAS 9.1

User articles and documentation

MedCalc

MedCalc Info and Resources