comproc {pcvsuite} | R Documentation |
Estimate and compare ROC summary statistics between two markers. Choices for summary statistics are: ROC(f), the True positive rate corresponding to False positive rate f; ROC^(-1)(t), the False positive rate corresponding to True positive rate t; AUC, the area under the ROC curve; and pAUC(f), the partial area under the ROC curve from 0 to f. Algorithms use the percentile value formulation of the ROC curve. When percentile values are calculated empirically, the estimates are the standard non-parametric estimations of ROC summary indices. Optional covariate adjustment can be achieved.
comproc(dataset = NULL, d, markers, auc = FALSE, pauc = NULL, roc = NULL, rocinv = NULL, pvcmeth = "empirical", tiecorr = FALSE, adjcov = NULL, adjmodel = "stratified", nsamp = 1000, nobstrap = FALSE, noccsamp = FALSE, nostsamp = FALSE, cluster = NULL, resfile = NULL, replace = FALSE, level = 95)
dataset |
optional character string specifying the name of the dataset to be used for analysis. |
d |
character string specifying the name of the 0/1 outcome vector. |
markers |
vector of character strings specifying the names of the test markers/variables. |
auc |
logical. If TRUE, compare markers with respect to the area under the curve (AUC). This is the default if no summary statistics are specified. |
pauc |
specify FPR, f, such that the markers are compared with respect to
the partial area under the curve (pAUC) for false positive range FPR < f.
The argument must be between 0 and 1. A tie correction is included in the
percentile value (PV) calculation if this option is included among the
specified summary statistics options and the pvcmeth ="empirical". |
roc |
specify FPR, f, such that the markers are compared with respect to the ROC at the specified FPR = f. The argument must be between 0 and 1. |
rocinv |
specify TPR, t, such that the markers are compared with respect to the inverse ROC, ROC^(-1)(t), at the specified TPR = t. The argument must be between 0 and 1. |
pvcmeth |
character string specifying PV calculation method as "empirical" (default) or "normal". "empirical" uses the empirical distribution of the test measure among controls (D=0) as the reference distribution for the calculation of case PVs. The PV for the case measure y_i is the proportion of control measures that smaller than y_i. "normal" models the test measure among controls with a normal distribution. The PV for the case measure y_i is the standard normal cumulative distribution function of (y_i - mean)/sd, where the mean and the standard deviation (sd) are calculated by using the control sample. |
tiecorr |
logical. If FALSE (default), no correction for ties. If TRUE, it indicates that a correction for ties between case and control values is included in the empirical PV calculation. The correction is relevant only in calculating summary indices, such as the area under the ROC curve. The tie-corrected PV for a case with the marker value y_i is the proportion of control values Y_Db < y_i plus one half the proportion of control values Y_Db = y_i, where Y_Db denotes controls. By default, the PV calculation includes only the first term, i.e. the proportion of control values Y_Db < y_i. This option applies only to the empirical PV calculation method. |
adjcov |
character string vector specifying covariates to adjust for. |
adjmodel |
character string specifying how the covariate adjustment is to
be done: "stratified" (default), "linear", "oprobit" (ordered probit), or
"ologit" (ordered logit). If "stratified", PVs are calculated separately
for each stratum defined by adjcov . This is the default if
adjmodel is not specified and adjcov is. Each case-containing
stratum must include at least two controls. Strata that do not include
cases are excluded from calculations. "linear" fits a linear regression of
the marker distribution on the adjustment covariates among controls.
Standardized residuals based on this fitted linear model are used in place
of the marker values for cases and controls. "oprobit" calculates PVs based
on the fit of an ordered probit regression model of the marker on the
adjustment covariates among controls. "ologit" calculates PVs based on the
fit of an ordered logit regression model of the marker on the adjustment
covariates among controls. "oprobit" and "ologit" assume that
markers consists of ordinal-valued marker variables. |
nsamp |
number of bootstrap samples to be drawn for estimating sampling variability of summary measures; default is nsamp=1000. |
nobstrap |
logical. If TRUE, omit boostrap sampling and estmation of
standard errors and CIs. If nsamp is specified, nobstrap
will override it. Default is FALSE. |
noccsamp |
logical. If TRUE, bootstrap samples are drawn from the combined sample (cohort sampling) rather than sampling separately from cases and controls (case-control sampling); default is FALSE (case-control sampling). |
nostsamp |
logical. If TRUE (default), bootstrap samples are drawn
without respect to covariate strata. By default, samples are drawn from
within covariate strata when stratified covariate adjustment is requested
via the adjcov and adjmodel options. |
cluster |
character string specifying variables that identify bootstrap resampling clusters. |
resfile |
character string specifying the filename to save bootstrap
results for the included statistics in. The .txt file is called
[filename].txt if a single marker is specified or [filename]\#.txt for the
\#th marker if more than 1 marker is included in markers . |
replace |
logical. If TRUE, overwrite existing specified boostrap results file if it already exists; default is FALSE. |
level |
specify confidence level for CIs as a percentage; default is level=95. |
comproc
compares two continuous marker or test variables with respect
to one or more ROC statistics: the AUC, the pAUC for FPR < f, the ROC at
FPR = f, and the inverse ROC at TPR = t.
d
is the 0/1 outcome indicator variable.
Alternatively, a single marker variable can be specified, in which case the requested ROC statistics are returned without comparison statistics.
All ROC statistics are calculated by using PVs of the disease case measures relative to the corresponding marker distribution among controls (Pepe and Longton (2005), Huang and Pepe (in press)).
Optional covariate adjustment can be achieved either by stratification or
with a linear regression approach (Janes and Pepe (2008); Janes and Pepe (2009)).
Ordered regression covariate adjustment options are available if
the markers
measures are ordinal (Morris, Pepe, Barlow (in press)).
Bootstrap standard errors and confidence intervals (CIs) for the requested statistics and marker differences are calculated. Percentile CIs are displayed.
Wald test results for marker comparisons are based on the bootstrap standard errors for the difference between markers.
A companion program for the Stata software package is available. A detailed description of the methods and algorithms are provide in two articles in the Stata Journal which can be obtained upon request from Gary Longton (glongton@fhcrc.org). Corresponding articles for this program are forthcoming.
List containing properties for requested summary statistics,
where stat
is one or more of auc
, pauc
, roc
or
rocinv
. Returned list items include the following:
[stat]1 |
statistic estimate for first marker |
[stat]2 |
statistic estimate for second marker |
[stat]delta |
estimate difference, [stat]2 - [stat]1 |
se_[stat]1 |
bootstrap standard-error estimate for first marker statistic |
se_[stat]2 |
bootstrap standard-error estimate for second marker statistic |
se_[stat]delta |
bootstrap standard-error estimate for the difference, [stat]2 - [stat]1 |
Aasthaa Bansal, University of Washington, Seattle, WA. abansal@u.washington.edu
Daryl Morris, University of Washington, Seattle, WA. darylm@u.washington.edu
Gary Longton, Fred Hutchinson Cancer Research Center, Seattle, WA. glongton@fhcrc.org
Margaret Pepe, Fred Hutchinson Cancer Research Center and University of Washington, Seattle, WA. mspepe@u.washington.edu
Holly Janes, Fred Hutchinson Cancer Research Center and University of Washington, Seattle, WA. hjanes@fhcrc.org
Dodd, L., Pepe, M.S. 2003. Partial AUC estimation and regression. Biometrics 59,614–623.
Huang, Y., Pepe, M.S. 2009. Biomarker evaluation using the controls as a reference population. Biostatistics 2,228–44.
Janes, H., Pepe, M.S. 2008. Adjusting for covariates in studies of diagnostic, screening, or prognostic markers: an old concept in a new setting. American Journal of Epidemiology 168,89–97.
Janes, H., Pepe, M.S. 2009. Adjusting for covariate effects on classification accuracy using the covariate-adjusted ROC curve. Biometrika 96,383–398.
Janes, H., Longton G, Pepe, M.S. 2009. Accommodating covariates in receiver operating characteristic analysis. Stata Journal 9(1),17–39.
Morris, D.E., Pepe, M.S., Barlow, W.E. Contrasting Two Frameworks for ROC Analysis of Ordinal Ratings. Medical Decision Making (in press)
Pepe, M.S., Longton, G. 2005. Standardizing markers to evaluate and compare their performances. Epidemiology 16(5),598-603.
Pepe MS, Longton G, Janes H. 2009. Estimation and comparison of receiver operating characteristic curves. Stata Journal 9(1),1–16.
Pepe, M.S. 2003. The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford University Press.
nnhs2 <- read.csv("http://labs.fhcrc.org/pepe/book/data/nnhs2.csv", header = TRUE, sep = ",") comproc(dataset="nnhs2", d="d", markers=c("y1","y2")) comproc(dataset="nnhs2", d="d", markers=c("y1","y2"), pauc=0.10) comproc(dataset="nnhs2", d="d", markers=c("y1","y2"), auc=TRUE, pauc=0.10, level=90) comproc(dataset="nnhs2", d="d", markers=c("y1","y2"), auc=TRUE, pauc=0.20, roc=0.20, nsamp=5000) comproc(dataset="nnhs2", d="d", markers=c("y1","y2"), pauc=0.20, pvcmeth="normal", resfile="rfile1", replace=TRUE) comproc(dataset="nnhs2", d="d", markers=c("y1","y2"), pvcmeth="normal", noccsamp=TRUE, cluster="y1") comproc(dataset="nnhs2", d="d", markers=c("y1","y2"), adjcov=c("currage","gender")) comproc(dataset="nnhs2", d="d", markers=c("y1","y2"), adjcov=c("currage","gender"), adjmodel="linear") comproc(dataset="nnhs2", d="d", markers=c("y1","y2"), adjcov="currage", adjmodel="linear", pvcmeth="normal")