Datasets

Published datasets are available here. Users may practice implementation of statistical techniques on them. We seek contributions of datasets to add to this resource.

StudyReferenceStata FileASCII File
CASSLeisenring et al. (2000) 
Weiner et al. (1979)
est1.dtaest1.csv 
est1_desc.txt
Pancreatic Ca biomarkersWieand et al. (1989)wiedat2b.dtawiedat2b.csv 
wiedat2b_desc.txt
Ultrasound for hepatic metsTosteson and Begg (1988)tostbegg2.dtatostbegg2.csv 
tostbegg2_desc.txt
CARET PSAEtzioni et al. (1999)psa2b.dtapsa2b.csv 
psa2b_desc.txt
Gene expression arrayPepe et al. (2003)orchratio2.dtaorchratio2.csv 
orchratio2_desc.txt
Norton neonatal audiologyNorton et al. (2000)nnhs2.dtannhs2.csv 
nnhs2_desc.txt
Leisenring neonatal audiologyLeisenring et al. (1997)lplaudio_b.dtalplaudio_b.csv 
lplaudio_b_desc.txt
Prostate Ca - St. LouisSmith et al. (1997)psa_dre_v2.dtapsa_dre_v2.csv 
psa_dre_desc_v2_.txt
Stover audiologyStover et al. (1996)dp2.dtadp2.csv 
dp2_desc.txt
Scintigraphy studyMuller et al. (1989)mlt1.dtamlt1.csv 
mlt1_desc.txt
59 Pap screen studiesFahey et al. (1995)fim.dtafim.csv 
fim_desc.txt
Prenatal screen data (hypothetical) hpns.dtahpns.csv 
hpns_desc.txt
Ovarian Ca markers (hypothetical) ocdata_b.dtaocdata_b.csv 
ocdata_b_desc.txt
Covariate adjustment datasetsJanes et al (2009)Figure 1, scenario 1 
Figure 1, scenario 2
.csv file and .txt file
.csv file and .txt file
ROC regression datasetJanes et al (2009)Figure 4.csv file and .txt file
Simulated AKI dataPepe et al (20072008)aki_sim.dtaaki_sim.csv file 
aki_sim_desc.txt file
Two frameworks for ordinal ratingsMorris et al (2010)two_marker_sim.dtatwo_marker_sim.csv file 
two_marker_sim_desc.txt file
Multiple Gene Risk PredictionPepe, Gu, Morris (2010)modelA.dta 
modelB.dta
modelA.csv 
modelB.csv
Simulated Risk Reclassification datasetPepe (2011)risk_reclass_b.dtarisk_reclass_b.csv
risk_reclass_b_desc.txt

Stata format data files can be read with versions 8 and above.
Comma-separated ASCII (csv) files include variable names on the first row.


Dataset References

Etzioni R, Pepe M, Longton G, Hu C, Goodman G (1999). Incorporating the time dimension in receiver operating characteristic curves: A case study of prostate cancerMedical Decision Making 19:242-51.

Fahey MT, Irwig LM, Macaskill P (1995). Meta-analysis of Pap test accuracyAmerican Journal of Epidemiology 141:680-9.

Janes H, Longton G, Pepe MS (2009). Accommodating Covariates in ROC AnalysisStata Journal 9(1):17-39.

Leisenring W, Alonzo T, Pepe MS (2000). Comparisons of predictive values of binary medical diagnostic tests for paired designsBiometrics 56:345-51.

Leisenring W, Pepe MS, Longton G (1997). A marginal regression modelling framework for evaluating medical diagnostic testsStatistics in Medicine16:1263-81.

Morris DE, Pepe MS, Barlow WE (2010). Contrasting two frameworks for ROC analysis of ordinal ratingsMedical Decision Making (in press).

Muller C, Wasserman HJ, Erlank P, Klopper JF, Morkel HR, Ellmann A (1989). Optimisation of density and contrast yielded by multiformat photographic images used for scintigraphyPhysics in Medicine and Biology 34:473-81.

Norton SJ, Gorga MP, Widen JE, Folsom RC, Sininger Y, Cone-Wesson B, Vohr BR, Mascher K, Fletcher K. (2000). 

Identification of neonatal hearing impairment: Evaluation of transient evoked ototacoustic emission, distortion product otoacoustic emission, and auditory brain stem response test performanceEar and Hearing 21:508-28.

Pepe MS (2011). Problems with Risk Reclassification Methods for Evaluating Prediction ModelsAmerican Journal of Epidemiology 173:1327-1335.

Pepe MS (2003). The Statistical Evaluation of Medical Tests for Classification and Prediction (Oxford Statistical Science Series). Oxford University Press.

Pepe MS, Gu W, Morris DE (2010). The Potential of Genes and Other Markers to Inform about RiskCancer Epidemiology, Biomarkers and Prevention19(3):655-665.

Pepe MS, Longton G, Anderson G, Schummer M (2003). Selecting differentially expressed genes from microarray experimentsBiometrics 59:133-42.

Pepe MS, Longton G, Janes H (2007). Estimation and Comparison of Receiver Operating Characteristic CurvesStata Journal 9(1):1.

Pepe M, Zheng Y, Jin Y., Huang Y, Parikh C, Levy W. (2008) Evaluating the ROC performance of markers for future events. eventsLifetime Data Analysis14(1):86-113.

Smith DS, Bullock AD, Catalona WJ (1997). Racial differences in operating characteristics of prostate cancer screening testsThe Journal of Urology158:1861-66.

Stover L, Gorga MP, Neely T (1996). Torwards optimizing the clinical utility of distortion product otoacoustic emission measurementsJournal of the Acoustical Society of America 100:956-967.

Tosteson AN, Begg CB (1988). A general regression methodology for ROC curve estimationMedical Decision Making 8:204-15.

Weiner DA, Ryan TJ, McCabe CH, Kennedy JW, Schloss M, Tristani F, Chaitman BR, Fisher LD (1979). Exercise stress testing. Correlations among history of angina, ST-segment response and prevalence of coronary-artery disease in the Coronary Artery Aurgery Study (CASS)New England Journal of Medicine 301(5):230-5.

Wieand S, Gail MH, James BR, James KL (1989). A family of nonparametric statistics for comparing diagnostic markers with paired or unpaired dataBiometrika 76:585-92.