Stata 8 provides a limited set of ROC commands within the main package. Documentation and examples from the reference manuals are here. Briefly, the following tasks can be done:

- plot one or more empirical ROC curves
- estimate and compare nonparametric AUCs
- plot one or more binormal ROC curves for ordinal test data
- estimate and compare binormal AUCs for ordinal test data

After fitting a logistic regression model in Stata, an ROC curve for the predicted model can be plotted using the lroc post estimation command. Related post estimation commands for logistic regression are:

- estat classification reports various summary statistics, including the classification table.
- estat gof reports the Pearson goodness-of-fit test or the Hosmer-Lemeshow goodness-of-fit test.
- lroc graphs the ROC curve and calculates the area under the curve.
- lsens graphs sensitivity and specificity versus probability cutoff and optionally creates new variables containing these data.

To access this set of programs,** from within Stata, **link to the DABS center:

net from https://research.fhcrc.org/content/dam/stripe/diagnostic-biomarkers-statistical-center/files/stata/

Install the package:

net install pcvsuite

This will currently install four commands: roccurve, comproc, rocreg and inroc. A description of how to use them is obtained by using the help facility, e.g., help roccurve opens a window that explains the syntax and option for the roccurve command. A copy of the help files can be obtained here: roccurve, comproc, rocreg and incroc.

Briefly, the roccurve command plots an estimate of the ROC curve for one or more diagnostic tests (or biomarkers). Confidence intervals can be displayed for the TPF (true positive fraction) corresponding to a specified FPF (false positive fraction). Confidence intervals are calculated using the bootstrap. The comproc command calculates summary ROC indices for two tests along with confidence intervals for each and for the difference. A p-value for testing equality of the ROCs based on the summary indices is output. The rocreg command fits an ROC-GLM regression model. Covariate adjustment is accommodated in all three commands.

To update the pcvsuite commands at a later time, in Stata type:

adoupdate, update

To uninstall the pcvsuite type:

ado uninstall pcvsuite

Additionally, two articles have been published in the Stata Journal describing these commands:

Pepe, M.S., Longton G, Janes, H. 2009. Estimation and comparison of receiver operating characteristic curves. Stata Journal 9(1),1-16.

Janes, H., Longton G, Pepe, M.S. 2009. Accommodating covariates in receiver operating characteristic analysis. Stata Journal 9(1),17-39.

To access this set of programs, **from within Stata,** link to the DABS center:

net from https://research.fhcrc.org/content/dam/stripe/diagnostic-biomarkers-statistical-center/files/stata/

You will be asked which package you wish to install. Click on screensize. This will install three Stata commands: rocsize, aucsize and srcsize. These commands perform simulation studies to assess the statistical power that a phase 2 study has, i.e., a study to evaluate performance of a single biomarker. See section 8.2 of Pepe (2003) for formulas to calculate sample sizes for a phase 2 study. The formulas are based on asymptotic theory and will not provide exactly the right power in practice. Following the calculations one should run simulation studies and adjust sample sizes until the power is at the desired level. See Pepe (2003, page 220) for discussion. Running these simulations is fun. Try it!

Excerpt from Pepe, MS (2003) *The Statistical Evaluation of Medical Tests for Classification and Prediction*. Oxford University Press.

To access this set of programs, **from within Stata,** link to the DABS center:

net from https://research.fhcrc.org/content/dam/stripe/diagnostic-biomarkers-statistical-center/files/stata/

You will be prompted with a list of packages you could net describe. Click on risk_prediction. This will currently install two commands, incrisk and predcurve. A program description is available by using the help facility, e.g., help predcurve opens a window that explains the syntax and options for the predcurve command. PDFs of the help files can be obtained here for predcurve and incrisk.

To update the risk_prediction package at a later time, in Stata type:

adoupdate risk_prediction, update

When the outcome variable of interest is an event that occurs some time after the test is measured, ROC curves must be time dependent. Two approaches have been proposed by Patrick Heagerty and colleagues. Software and documentation can be obtained here. Alternative approaches have been described in the literature.

Heagerty PJ and Zheng Y (2005). Survival model predictive accuracy an dROC curves. Biometrics 61:92-105.

Heagerty PJ, Lumley T and Pepe MS (2000) Time dependent ROC curves for censored survival data and diagnostic markers.*Biometrics* 56:337-344.

The pcvsuite package for R can be downloaded here:

The package contains four main functions: roccurve, comproc, rocreg and predcurve. Briefly, the roccurve command plots an estimate of the ROC curve for one or more diagnostic tests (or biomarkers). Confidence intervals can be displayed for the TPF (true positive fraction) corresponding to a specified FPF (false positive fraction). Confidence intervals are calculated using the bootstrap. The comproc command calculates summary ROC indices for two tests along with confidence intervals for each and for the difference. A p-value for testing equality of the ROCs based on the summary indices is output. The rocreg command fits an ROC-GLM regression model. Covariate adjustment is accommodated in all three commands.

The predcurve uses the risk distribution associated with a marker or model to evaluate marker utility when applied to the population. The classification performance is optionally included in an integrated display of predictiveness and classification measures. Alternate graphical outputs include CDFs and densities of the risk estimation. Support for nested models, and for testing differences between two models is provided.Documentation on all three commands is also contained here:

Additionally, these articles (Stata J 2009;1:1-16, Stata J 2009;1:17-19) published in the Stata Journal explain both commands and their options in more detail. Note that the syntax in the articles is all Stata-specific; however, the methods and rationale used to implement the functions in R remain the same. The arguments are also the same between R and Stata.

Predictiveness curves are described in this article: AmJEpid 2008;3:362-368.

This package supports simulations of large groups of independent markers each conferring (multiplicative) risk of disease, where the risk conferred by each genotype is assumed to be known. For each simulation, various summary measures of the performance of the risk prediction algorithm are calculated.

Given an odds ratio, a frequency of the risk allele (in the population) and an overall disease prevalence, a disease likelihood ratio can be calculated (assuming Hardy-Weinberg equilibrium, and that each instance of the risk allele confers the same multiplicative risk).

Genetic profiles can be simulated given the overall frequency of the risk allele for a panel of genes. Assuming a known correct risk model (1) disease states can be simulated and (2) the performance of the panel of genes as a combined marker for risk prediction can be assessed.

The package can be downloaded here.

Documentation on how to use the package is here.

Additionally, Pepe MS, Gu W, Morris DE (2010) has results from the package. The appendix also describes the software.

Pepe MS, Gu W, Morris DE (2010). The Potential of Genes and Other Markers to Inform about Risk. *Cancer Epidemiology, Biomarkers and Prevention* 19(3):655-665.

SPSS provides basic ROC commands within the main package. By following Graphs -> ROC Curves, you can plot multiple ROC curves and get the AUC (with optional confidence intervals) for each curve. You need to specify a test variable and a state variable. For example, you can save predicted probabilities from a logistic regression command and use those as your test variable. AUCs can be estimated nonparametrically or under the assumption of a binegative exponential distribution. You also have the option of getting each of the coordinates of the empirical ROC.

SAS 9.1 provides basic ROC commands within the main package through the Proc Logistic command. Documentation and examples from the reference manuals are here. Briefly, the following tasks can be done:

- plot one empirical ROC curve from a logistic regression model
- estimate nonparametric AUCs through a c-statistic

When fitting a Proc Logistic, specify a dataset for outroc= in the model line. Then use Proc Gplot to plot the ROC curve.

- Estimating the Area under a Receiver Operating Characteristic (ROC) Curve For Repeated Measures Design

Honghu Liu and Tongtong Wu - Statistical Methods in Diagnostic Medicine using SAS® Software

Jay N. Mandrekar and Sumithra J. Mandrekar

Includes comparing ROC curves when two tests are taken on the same measurements, ROC curves from a logistic regression, and power calculations for the AUC. - How to Display Correlated ROC Curves with the SAS System

Barbara Schneider - Selecting and Combining Biomarkers for Binary Outcomes

Ziding Feng - New capabilities for ROC curve plotting in SAS version 9.2

Dale McLerran

(SAS datasets: compare markers, remission test, remission train) - The Cleveland Clinic has SAS macros to calculate ROC sample sizes (1 reader, or 1-2 ROC curves), calculate ROC sample sizes with multiple readers, and plot ROC curves using SAS/Graph.

The Cleveland Clinic has Fortran programs to:

- Estimate the sample size to compare two curves by estimating the pAUC or the sensitivity at a fixed false positive rate.
- Make inferences on pAUCs through parametric methods
- Make inferences on AUCs with clustered data, rating data, and continuous data, based on parametric and non-parametric data.
- Analyze multi-reader and multi-modal ROC datasets.

MedCalc is a commercial software package designed to analyze several different types of biomedical data. It includes an ROC component. MedCalc provides an online user's manual with a chapter on their ROC features.

MedCalc provides the following capabilities for analyzing ROC curves.

Most are found under the Statistics->ROC Curves menu:

- Enter data through manual input into an internal spreadsheet or import data through Excel, SPSS, or several other programs.
- Under "ROC analysis", plot a single ROC curve (with optional confidence bounds). The output is a graph, with the optimal cutpoint marked on the graph. Additional output includes AUC and its 95% confidence interval, and a table with the sensitivity and specificity at several cut-points, with their 95% confidence intervals.
- Under "Interactive dot-diagram", plot the observed values as points, and interactively move the cut-point around to find the sensitivity and specificity of the test.
- Under "Plot versus criterion", plot the sensitivity and specificity versus different cut-points, with optional 95% confidence intervals.
- Under "Comparison of ROC curves", plot up to 6 different ROC curves, get the AUCs for each plot, and the pairwise-comparisons between curves.
- Under "Predictive values", manually enter sensitivity, specificity, and prevalence to get out the positive predictive value and negative predictive value.
- It is possible to alter the format of graphs and export as an image.

Fred Hutchinson Cancer Center | 1100 Fairview Ave. N., Seattle, WA 98109

© 2023 Fred Hutchinson Cancer Center, a 501(c)(3) nonprofit organization.

© 2023 Fred Hutchinson Cancer Center, a 501(c)(3) nonprofit organization.