Fred Hutch Logo

Book Materials

Datasets and Stata Programs used in "The Statistical Evaluation of Medical Tests for Classification and Prediction"

The Statistical Evaluation of Medical Tests for Classification and Prediction book describes statistical concepts and techniques for evaluating medical diagnostic tests and biomarkers for detecting disease. More generally, the techniques pertain to the statistical classification problem for predicting a dichotomous outcome. Measures for quantifying test accuracy are described including sensitivity, specificity, predictive values, diagnostic likelihood ratios and the Receiver Operating Characteristic Curve that is commonly used for continuous and ordinal valued tests. Statistical procedures are presented for estimating and comparing them. Regression frameworks for assessing factors that influence test accuracy and for comparing tests while adjusting for such factors are presented.

This book presents many worked examples of real data and should be of interest to practicing statisticians or quantitative researchers involved in the development of tests for classification or prediction in medicine.

Introduction
Measures of Accuracy for Binary Tests
Comparing Binary Tests and Regression Analysis
The Receiver Operating Characteristic Curve
Estimating the ROC Curve
Covariate Effects on Continuous and Ordinal Tests
Incomplete Data and Imperfect Reference Tests
Study Design and Hypothesis Testing
More Topics and Conclusions
References/Bibliography
Index

Datasets

Study	Reference	Stata File	ASCII File
CASS	Leisenring et al. (2000) Weiner et al. (1979)	est1.dta	est1.csv est1_desc.txt
Pancreatic Ca biomarkers	Wieand et al. (1989)	wiedat2b.dta	wiedat2b.csv wiedat2b_desc.txt
Ultrasound for hepatic mets	Tosteson and Begg. (1988)	tostbegg2.dta	tostbegg2.csv tostbegg2_desc.txt
CARET PSA	Etzioni et al. (1999)	psa2b.dta	psa2b.csv psa2b_desc.txt
Gene expression array	Pepe et al. (2003)	orchratio2.dta	orchratio2.csv orchratio2_desc.txt
Norton neonatal audiology	Norton et al. (2000)	nnhs.dta	nnhs.csv nnhs_desc.txt
Leisenring neonatal audiology	Leisenring et al. (1997)	lplaudio_b.dta	lplaudio_b.csv lplaudio_b_desc.txt
Prostate Ca - St. Louis	Smith et al. (1997)	psa_dre_v2.dta	psa_dre_v2.csv psa_dre_v2_desc.txt
Stover audiology	Stover et al. (1996)	dp2.dta	dp2.csv dp2_desc.txt
Scintigraphy study	Muller et al. (1989)	mlt1.dta	mlt1.csv mlt1_desc.txt
59 Pap screen studies	Fahey et al. (1995)	fim.dta	fim.csv fim_desc.txt
Prenatal screen data (hypothetical)		hpns.dta	hpns.csv hpns_desc.txt

Stata format data files can be read with versions 8 and above.
Comma-separated ASCII (csv) files include variable names on the first row.

Dataset References

Etzioni R, Pepe M, Longton G, Hu C, Goodman G (1999). Incorporating the time dimension in receiver operating characteristic curves: A case study of prostate cancer. Medical Decision Making 19:242-51.

Fahey MT, Irwig LM, Macaskill P (1995). Meta-analysis of Pap test accuracy. American Journal of Epidemiology 141:680-9.

Leisenring W, Alonzo T, Pepe MS (2000). Comparisons of predictive values of binary medical diagnostic tests for paired designs. Biometrics 56:345-51.

Leisenring W, Pepe MS, Longton G (1997). A marginal regression modelling framework for evaluating medical diagnostic tests. Statistics in Medicine 16:1263-81.

Muller C, Wasserman HJ, Erlank P, Klopper JF, Morkel HR, Ellmann A (1989). Optimisation of density and contrast yielded by multiformat photographic images used for scintigraphy. Physics in Medicine and Biology 34:473-81.

Norton SJ, Gorga MP, Widen JE, Folsom RC, Sininger Y, Cone-Wesson B, Vohr BR, Mascher K, Fletcher K. (2000). Identification of neonatal hearing impairment: Evaluation of transient evoked ototacoustic emission, distortion product otoacoustic emission, and auditory brain stem response test performance. Ear and Hearing 21:508-28.

Pepe MS, Longton G, Anderson G, Schummer M (2003). Selecting differentially expressed genes from microarray experiments. Biometrics (in press) .

Smith DS, Bullock AD, Catalona WJ (1997). Racial differences in operating characteristics of prostate cancer screening tests. The Journal of Urology 158:1861-66.

Stover L, Gorga MP, Neely T (1996). Torwards optimizing the clinical utility of distortion product otoacoustic emission measurements. Journal of the Acoustical Society of America 100:956-967.

Tosteson AN, Begg CB (1988). A general regression methodology for ROC curve estimation. Medical Decision Making 8:204-15.

Weiner DA, Ryan TJ, McCabe CH, Kennedy JW, Schloss M, Tristani F, Chaitman BR, Fisher LD (1979). Exercise stress testing. Correlations among history of angina, ST-segment response and prevalence of coronary-artery disease in the Coronary Artery Aurgery Study (CASS). New England Journal of Medicine 301(5):230-5.

Wieand S, Gail MH, James BR, James KL (1989). A family of nonparametric statistics for comparing diagnostic markers with paired or unpaired data. Biometrika 76:585-92.

Programs

Downloadable Stata programs and help files

Stata version 7 or higher required for most programs; version 8 or 9 required for some as updates and additions become available.

emroc.ado, emroc.hlp - Plot the empirical ROC curve and optionally return plot coordinates. Calculate a nonparametric estimate of the area under the ROC curve (AUC) or partial AUC.
dfroc.ado, dfroc.hlp - Calculate the distribution-free estimator of the ROC curve within a GLM binary regression framework. Obtain bootstrap standard error estimates for the binormal ROC parameters and correponding AUC.
aucbs.ado, aucbs.hlp - Calculate a nonparametric estimate of the area under the ROC curve (AUC) and bootstrapped standard error estimates. Optionally calculate the partial AUC or empirical ROC(t) for specified t and corresponding se estimates. With data for two test measures, difference statistics for the AUC, pAUC, and ROC(t)) are calculated.
rocsize.ado, rocsize.hlp - Determine power for a one-sample screening study; continuous data.
aucsize.ado, aucsize.hlp - Determine power for a one-sample screening study based on ROC area under the curve (AUC) improvement
scrsize.ado, scrsize.hlp - Determine power for a one-sample screening study; binary test outcome data.
binscrn1.ado, binscrn1.hlp - Calculates summary screening measures for a test with binary outcome.
binscrn2.ado, binscrn2.hlp - Comparison of 2 binary screening tests; for unpaired data.
binscrn3.ado, binscrn3.hlp - Comparison of 2 binary screening tests; for paired data.
lrreg.ado, lrreg_ll.ado, lrreg.hlp - Diagnostic Likelihood Ratio (DLR) regression.

Utility programs used by text example do-files

binormroc.ado, binormroc.hlp - Plots the binormal ROC for specified normal case and control distributions of a test measure.
bvnellip.ado, bvnellip.hlp - Calculates a confidence ellipse for the joint distribution of 2 parameters. The parameters are assumed to have a bivariate normal distribution.
semt_profile.ado - Specifies data directory path and log file target path for text example do-files.

Examples

Stata do-files for selected text examples and corresponding figures

Chapter 2

Example 2.3 (figure 2.2)

Chapter 3

Chapter 4

Example 4.1 (figure 4.4)

Chapter 5

Chapter 6

Example 6.1 (figure 6.1)
Example 6.6 (incl figure 6.5)
Example 6.7 (incl figure 6.6)
Example 6.11
Example 6.12 (figure 6.8)
Example 6.13 (figure 6.9)

Chapter 7

Chapter 9

Example 9.1 (incl figure 9.1)

Book Errata

errata.pdf

Fred Hutchinson Cancer Center | 1100 Fairview Ave. N., Seattle, WA 98109
© 2025 Fred Hutchinson Cancer Center, a 501(c)(3) nonprofit organization.

Diagnostic and Biomarkers Statistical (DABS) Center