Software

Here is a catalogue of software available for biomarker evaluation. We welcome contributions in the form of documents describing what a program does and how to access it.

Stata Programs

Commercial Package Stata 8

+

Stata Corp

Stata 8 provides a limited set of ROC commands within the main package. Documentation and examples from the reference manuals are here. Briefly, the following tasks can be done:

  • plot one or more empirical ROC curves
  • estimate and compare nonparametric AUCs
  • plot one or more binormal ROC curves for ordinal test data
  • estimate and compare binormal AUCs for ordinal test data

After fitting a logistic regression model in Stata, an ROC curve for the predicted model can be plotted using the lroc post estimation command. Related post estimation commands for logistic regression are:

  • estat classification reports various summary statistics, including the classification table.
  • estat gof reports the Pearson goodness-of-fit test or the Hosmer-Lemeshow goodness-of-fit test.
  • lroc graphs the ROC curve and calculates the area under the curve.
  • lsens graphs sensitivity and specificity versus probability cutoff and optionally creates new variables containing these data.

Basic ROC Analysis

+

Gary Longton and Margaret Pepe

To access this set of programs, from within Stata, link to the DABS center:

net from  https://research.fhcrc.org/content/dam/stripe/diagnostic-biomarkers-statistical-center/files/stata/ 

Install the package:

net install pcvsuite 

This will currently install four commands: roccurve, comproc, rocreg and inroc. A description of how to use them is obtained by using the help facility, e.g., help roccurve opens a window that explains the syntax and option for the roccurve command. A copy of the help files can be obtained here: roccurvecomprocrocreg and incroc.

Briefly, the roccurve command plots an estimate of the ROC curve for one or more diagnostic tests (or biomarkers). Confidence intervals can be displayed for the TPF (true positive fraction) corresponding to a specified FPF (false positive fraction). Confidence intervals are calculated using the bootstrap. The comproc command calculates summary ROC indices for two tests along with confidence intervals for each and for the difference. A p-value for testing equality of the ROCs based on the summary indices is output. The rocreg command fits an ROC-GLM regression model. Covariate adjustment is accommodated in all three commands.

To update the pcvsuite commands at a later time, in Stata type:

adoupdate, update

To uninstall the pcvsuite type:

ado uninstall pcvsuite

Additionally, two articles have been published in the Stata Journal describing these commands:
Pepe, M.S., Longton G, Janes, H. 2009. Estimation and comparison of receiver operating characteristic curves. Stata Journal 9(1),1-16.
Janes, H., Longton G, Pepe, M.S. 2009. Accommodating covariates in receiver operating characteristic analysis. Stata Journal 9(1),17-39.

Sample Size Calculations

+

Gary Longton and Margaret Pepe

To access this set of programs, from within Stata, link to the DABS center:

net from https://research.fhcrc.org/content/dam/stripe/diagnostic-biomarkers-statistical-center/files/stata/

You will be asked which package you wish to install. Click on screensize. This will install three Stata commands: rocsize, aucsize and srcsize. These commands perform simulation studies to assess the statistical power that a phase 2 study has, i.e., a study to evaluate performance of a single biomarker. See section 8.2 of Pepe (2003) for formulas to calculate sample sizes for a phase 2 study. The formulas are based on asymptotic theory and will not provide exactly the right power in practice. Following the calculations one should run simulation studies and adjust sample sizes until the power is at the desired level. See Pepe (2003, page 220) for discussion. Running these simulations is fun. Try it!

Excerpt from Pepe, MS (2003) The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford University Press.

Evaluating Risk Prediction Markers

+

Gary Longton and Margaret Pepe

To access this set of programs, from within Stata, link to the DABS center:

net from https://research.fhcrc.org/content/dam/stripe/diagnostic-biomarkers-statistical-center/files/stata/ 

You will be prompted with a list of packages you could net describe. Click on risk_prediction. This will currently install two commands, incrisk and predcurve. A program description is available by using the help facility, e.g., help predcurve opens a window that explains the syntax and options for the predcurve command. PDFs of the help files can be obtained here for predcurve and incrisk.

To update the risk_prediction package at a later time, in Stata type:

adoupdate risk_prediction, update


R/S-plus Programs

Commercial Package S-Plus 7

+

S-plus version 7 does not appear to have ROC analysis commands as part of its standard package. We searched the CRAN archive for contributed R programs to perform ROC analysis. We found none to recommend as user friendly and comprehensive. For time-dependent ROC analysis (not yet on CRAN 12/16/2006), this can be accessed through the website of Dr. Patrick Heagerty.

Time-dependent ROC Curves

+

Patrick Heagerty

When the outcome variable of interest is an event that occurs some time after the test is measured, ROC curves must be time dependent. Two approaches have been proposed by Patrick Heagerty and colleagues. Software and documentation can be obtained here. Alternative approaches have been described in the literature.

References

Heagerty PJ and Zheng Y (2005). Survival model predictive accuracy an dROC curves. Biometrics 61:92-105.

Heagerty PJ, Lumley T and Pepe MS (2000) Time dependent ROC curves for censored survival data and diagnostic markers.Biometrics 56:337-344.

Basic ROC Analysis and Evaluation of Risk Prediction Markers

+

Aasthaa Bansal, Dayrl Morris and Margaret Pepe

The pcvsuite package for R can be downloaded here: 

The package contains four main functions: roccurve, comproc, rocreg and predcurve. Briefly, the roccurve command plots an estimate of the ROC curve for one or more diagnostic tests (or biomarkers). Confidence intervals can be displayed for the TPF (true positive fraction) corresponding to a specified FPF (false positive fraction). Confidence intervals are calculated using the bootstrap. The comproc command calculates summary ROC indices for two tests along with confidence intervals for each and for the difference. A p-value for testing equality of the ROCs based on the summary indices is output. The rocreg command fits an ROC-GLM regression model. Covariate adjustment is accommodated in all three commands.

The predcurve uses the risk distribution associated with a marker or model to evaluate marker utility when applied to the population. The classification performance is optionally included in an integrated display of predictiveness and classification measures. Alternate graphical outputs include CDFs and densities of the risk estimation. Support for nested models, and for testing differences between two models is provided.Documentation on all three commands is also contained here:

Additionally, these articles (Stata J 2009;1:1-16Stata J 2009;1:17-19) published in the Stata Journal explain both commands and their options in more detail. Note that the syntax in the articles is all Stata-specific; however, the methods and rationale used to implement the functions in R remain the same. The arguments are also the same between R and Stata.

Predictiveness curves are described in this article: AmJEpid 2008;3:362-368.

Multiple Gene Risk Prediction Performance (Windows Binary Package)

+

Daryl Morris, Jessie Wen Gu and Margaret Pepe

This package supports simulations of large groups of independent markers each conferring (multiplicative) risk of disease, where the risk conferred by each genotype is assumed to be known. For each simulation, various summary measures of the performance of the risk prediction algorithm are calculated.

Given an odds ratio, a frequency of the risk allele (in the population) and an overall disease prevalence, a disease likelihood ratio can be calculated (assuming Hardy-Weinberg equilibrium, and that each instance of the risk allele confers the same multiplicative risk).

Genetic profiles can be simulated given the overall frequency of the risk allele for a panel of genes. Assuming a known correct risk model (1) disease states can be simulated and (2) the performance of the panel of genes as a combined marker for risk prediction can be assessed.

The package can be downloaded here.

Documentation on how to use the package is here.

Additionally, Pepe MS, Gu W, Morris DE (2010) has results from the package. The appendix also describes the software.

References

Pepe MS, Gu W, Morris DE (2010). The Potential of Genes and Other Markers to Inform about Risk. Cancer Epidemiology, Biomarkers and Prevention 19(3):655-665.


SPSS

Base SPSS

+

Versions 9.0 and above

SPSS provides basic ROC commands within the main package. By following Graphs -> ROC Curves, you can plot multiple ROC curves and get the AUC (with optional confidence intervals) for each curve. You need to specify a test variable and a state variable. For example, you can save predicted probabilities from a logistic regression command and use those as your test variable. AUCs can be estimated nonparametrically or under the assumption of a binegative exponential distribution. You also have the option of getting each of the coordinates of the empirical ROC.

Learn more about the SPSS Inc. software.


SAS

Base SAS 9.1

+

SAS 9.1 provides basic ROC commands within the main package through the Proc Logistic command. Documentation and examples from the reference manuals are here. Briefly, the following tasks can be done:

  • plot one empirical ROC curve from a logistic regression model
  • estimate nonparametric AUCs through a c-statistic

When fitting a Proc Logistic, specify a dataset for outroc= in the model line. Then use Proc Gplot to plot the ROC curve.

User articles and documentation

Fortran Programs

ROCKIT

+

Dr. Charles Metz of the radiology department at the Univeristy of Chicago pioneered the development of software for ROC analysis, particularly for radiology reading studies. Binormal ROC curves are emphasized. Analysis methods for multi-reader multi-case studies are also available here.

Cleveland Clinic Resources

+

The Cleveland Clinic has Fortran programs to:

  • Estimate the sample size to compare two curves by estimating the pAUC or the sensitivity at a fixed false positive rate.
  • Make inferences on pAUCs through parametric methods
  • Make inferences on AUCs with clustered data, rating data, and continuous data, based on parametric and non-parametric data.
  • Analyze multi-reader and multi-modal ROC datasets.

MedCalc

MedCalc Info and Resources

+

MedCalc is a commercial software package designed to analyze several different types of biomedical data. It includes an ROC component. MedCalc provides an online user's manual with a chapter on their ROC features

MedCalc provides the following capabilities for analyzing ROC curves.

Most are found under the Statistics->ROC Curves menu:

  • Enter data through manual input into an internal spreadsheet or import data through Excel, SPSS, or several other programs.
  • Under "ROC analysis", plot a single ROC curve (with optional confidence bounds). The output is a graph, with the optimal cutpoint marked on the graph. Additional output includes AUC and its 95% confidence interval, and a table with the sensitivity and specificity at several cut-points, with their 95% confidence intervals.
  • Under "Interactive dot-diagram", plot the observed values as points, and interactively move the cut-point around to find the sensitivity and specificity of the test.
  • Under "Plot versus criterion", plot the sensitivity and specificity versus different cut-points, with optional 95% confidence intervals.
  • Under "Comparison of ROC curves", plot up to 6 different ROC curves, get the AUCs for each plot, and the pairwise-comparisons between curves.
  • Under "Predictive values", manually enter sensitivity, specificity, and prevalence to get out the positive predictive value and negative predictive value.
  • It is possible to alter the format of graphs and export as an image.