Statistical inference for diagnostic test accuracy studies with multiple comparisons

Max Westphal; Antonia Zapf

doi:10.1177/09622802241236933

Statistical inference for diagnostic test accuracy studies with multiple comparisons

Standard

Statistical inference for diagnostic test accuracy studies with multiple comparisons. / Westphal, Max; Zapf, Antonia.

In: STAT METHODS MED RES, Vol. 33, No. 4, 04.2024, p. 669-680.

Research output: SCORING: Contribution to journal › SCORING: Journal article › Research › peer-review

Harvard

Westphal, M & Zapf, A 2024, 'Statistical inference for diagnostic test accuracy studies with multiple comparisons', STAT METHODS MED RES, vol. 33, no. 4, pp. 669-680. https://doi.org/10.1177/09622802241236933

APA

Westphal, M., & Zapf, A. (2024). Statistical inference for diagnostic test accuracy studies with multiple comparisons. STAT METHODS MED RES, 33(4), 669-680. https://doi.org/10.1177/09622802241236933

Vancouver

Westphal M, Zapf A. Statistical inference for diagnostic test accuracy studies with multiple comparisons. STAT METHODS MED RES. 2024 Apr;33(4):669-680. https://doi.org/10.1177/09622802241236933

Bibtex

@article{f5a73b4c132042519ee7b7fe3e87dbf3,

title = "Statistical inference for diagnostic test accuracy studies with multiple comparisons",

abstract = "Diagnostic accuracy studies assess the sensitivity and specificity of a new index test in relation to an established comparator or the reference standard. The development and selection of the index test are usually assumed to be conducted prior to the accuracy study. In practice, this is often violated, for instance, if the choice of the (apparently) best biomarker, model or cutpoint is based on the same data that is used later for validation purposes. In this work, we investigate several multiple comparison procedures which provide family-wise error rate control for the emerging multiple testing problem. Due to the nature of the co-primary hypothesis problem, conventional approaches for multiplicity adjustment are too conservative for the specific problem and thus need to be adapted. In an extensive simulation study, five multiple comparison procedures are compared with regard to statistical error rates in least-favourable and realistic scenarios. This covers parametric and non-parametric methods and one Bayesian approach. All methods have been implemented in the new open-source R package cases which allows us to reproduce all simulation results. Based on our numerical results, we conclude that the parametric approaches (maxT and Bonferroni) are easy to apply but can have inflated type I error rates for small sample sizes. The two investigated Bootstrap procedures, in particular the so-called pairs Bootstrap, allow for a family-wise error rate control in finite samples and in addition have a competitive statistical power.Keywords: Diagnosis, medical testing, multiple testing, model selection, prediction, prognosis",

keywords = "Bayes Theorem, Data Interpretation, Statistical, Computer Simulation, Sample Size, Diagnostic Tests, Routine",

author = "Max Westphal and Antonia Zapf",

year = "2024",

month = apr,

doi = "10.1177/09622802241236933",

language = "English",

volume = "33",

pages = "669--680",

journal = "STAT METHODS MED RES",

issn = "0962-2802",

publisher = "SAGE Publications",

number = "4",

}

RIS

TY - JOUR

T1 - Statistical inference for diagnostic test accuracy studies with multiple comparisons

AU - Westphal, Max

AU - Zapf, Antonia

PY - 2024/4

Y1 - 2024/4

N2 - Diagnostic accuracy studies assess the sensitivity and specificity of a new index test in relation to an established comparator or the reference standard. The development and selection of the index test are usually assumed to be conducted prior to the accuracy study. In practice, this is often violated, for instance, if the choice of the (apparently) best biomarker, model or cutpoint is based on the same data that is used later for validation purposes. In this work, we investigate several multiple comparison procedures which provide family-wise error rate control for the emerging multiple testing problem. Due to the nature of the co-primary hypothesis problem, conventional approaches for multiplicity adjustment are too conservative for the specific problem and thus need to be adapted. In an extensive simulation study, five multiple comparison procedures are compared with regard to statistical error rates in least-favourable and realistic scenarios. This covers parametric and non-parametric methods and one Bayesian approach. All methods have been implemented in the new open-source R package cases which allows us to reproduce all simulation results. Based on our numerical results, we conclude that the parametric approaches (maxT and Bonferroni) are easy to apply but can have inflated type I error rates for small sample sizes. The two investigated Bootstrap procedures, in particular the so-called pairs Bootstrap, allow for a family-wise error rate control in finite samples and in addition have a competitive statistical power.Keywords: Diagnosis, medical testing, multiple testing, model selection, prediction, prognosis

AB - Diagnostic accuracy studies assess the sensitivity and specificity of a new index test in relation to an established comparator or the reference standard. The development and selection of the index test are usually assumed to be conducted prior to the accuracy study. In practice, this is often violated, for instance, if the choice of the (apparently) best biomarker, model or cutpoint is based on the same data that is used later for validation purposes. In this work, we investigate several multiple comparison procedures which provide family-wise error rate control for the emerging multiple testing problem. Due to the nature of the co-primary hypothesis problem, conventional approaches for multiplicity adjustment are too conservative for the specific problem and thus need to be adapted. In an extensive simulation study, five multiple comparison procedures are compared with regard to statistical error rates in least-favourable and realistic scenarios. This covers parametric and non-parametric methods and one Bayesian approach. All methods have been implemented in the new open-source R package cases which allows us to reproduce all simulation results. Based on our numerical results, we conclude that the parametric approaches (maxT and Bonferroni) are easy to apply but can have inflated type I error rates for small sample sizes. The two investigated Bootstrap procedures, in particular the so-called pairs Bootstrap, allow for a family-wise error rate control in finite samples and in addition have a competitive statistical power.Keywords: Diagnosis, medical testing, multiple testing, model selection, prediction, prognosis

KW - Bayes Theorem

KW - Data Interpretation, Statistical

KW - Computer Simulation

KW - Sample Size

KW - Diagnostic Tests, Routine

U2 - 10.1177/09622802241236933

DO - 10.1177/09622802241236933

M3 - SCORING: Journal article

C2 - 38490184

VL - 33

SP - 669

EP - 680

JO - STAT METHODS MED RES

JF - STAT METHODS MED RES

SN - 0962-2802

IS - 4

ER -