Impact of cross-calibration methods on the interpretation of a treatment comparison study using 2 depression scales

Standard

Impact of cross-calibration methods on the interpretation of a treatment comparison study using 2 depression scales. / Fischer, Herbert Felix; Wahl, Inka; Fliege, Herbert; Klapp, Burghard F; Rose, Matthias.

in: MED CARE, Jahrgang 50, Nr. 4, 01.04.2012, S. 320-6.

Publikationen: SCORING: Beitrag in Fachzeitschrift/ZeitungSCORING: ZeitschriftenaufsatzForschungBegutachtung

Harvard

APA

Vancouver

Bibtex

@article{3330c6a46a4b4bb687d4b37a2d15514f,
title = "Impact of cross-calibration methods on the interpretation of a treatment comparison study using 2 depression scales",
abstract = "BACKGROUND: Many questionnaires assessing depressive symptoms are available. Most of these questionnaires are constructed based on classical test theory, making comparisons of individual scores difficult. Item response theory (IRT) allows the comparison of scores from different instruments. In this study, the impact of IRT-based cross-calibration methods on the results of a treatment outcome study was evaluated using 2 instruments.METHODS: Data collected during admission and discharge procedures from 1066 inpatients in 2 psychosomatic clinics using different depression measures were analyzed. To achieve comparability across the applied depression measures, we used an IRT-based conversion table to transform scores from one instrument's scale to the other. Latent trait values were also estimated using different instruments in each clinic. We compared these methods to the traditional approach of using the same instrument in both clinics and examined their effects on the statistical analyses.RESULTS: There was no substantial change in the interpretation of the study results when different instruments were used. However, F values, P values, and effect sizes in the analysis of variance changed significantly. This might be attributed to differences in the content or measurement properties of the instruments. Interestingly, no difference was observed between use of transformed sum scores and latent trait values.CONCLUSIONS: IRT cross-calibration methods are a convenient way to enhance the comparability of questionnaire data in applied clinical settings but seem not to be able to overcome differences in measurement properties of the instruments. As these differences can lead to biased results, there is a need for further research into more advanced techniques.",
keywords = "Adult, Calibration, Data Interpretation, Statistical, Depression, Female, Humans, Male, Psychiatric Status Rating Scales, Psychometrics, Questionnaires, Severity of Illness Index, Treatment Outcome",
author = "Fischer, {Herbert Felix} and Inka Wahl and Herbert Fliege and Klapp, {Burghard F} and Matthias Rose",
year = "2012",
month = apr,
day = "1",
doi = "10.1097/MLR.0b013e31822945b4",
language = "English",
volume = "50",
pages = "320--6",
journal = "MED CARE",
issn = "0025-7079",
publisher = "Lippincott Williams and Wilkins",
number = "4",

}

RIS

TY - JOUR

T1 - Impact of cross-calibration methods on the interpretation of a treatment comparison study using 2 depression scales

AU - Fischer, Herbert Felix

AU - Wahl, Inka

AU - Fliege, Herbert

AU - Klapp, Burghard F

AU - Rose, Matthias

PY - 2012/4/1

Y1 - 2012/4/1

N2 - BACKGROUND: Many questionnaires assessing depressive symptoms are available. Most of these questionnaires are constructed based on classical test theory, making comparisons of individual scores difficult. Item response theory (IRT) allows the comparison of scores from different instruments. In this study, the impact of IRT-based cross-calibration methods on the results of a treatment outcome study was evaluated using 2 instruments.METHODS: Data collected during admission and discharge procedures from 1066 inpatients in 2 psychosomatic clinics using different depression measures were analyzed. To achieve comparability across the applied depression measures, we used an IRT-based conversion table to transform scores from one instrument's scale to the other. Latent trait values were also estimated using different instruments in each clinic. We compared these methods to the traditional approach of using the same instrument in both clinics and examined their effects on the statistical analyses.RESULTS: There was no substantial change in the interpretation of the study results when different instruments were used. However, F values, P values, and effect sizes in the analysis of variance changed significantly. This might be attributed to differences in the content or measurement properties of the instruments. Interestingly, no difference was observed between use of transformed sum scores and latent trait values.CONCLUSIONS: IRT cross-calibration methods are a convenient way to enhance the comparability of questionnaire data in applied clinical settings but seem not to be able to overcome differences in measurement properties of the instruments. As these differences can lead to biased results, there is a need for further research into more advanced techniques.

AB - BACKGROUND: Many questionnaires assessing depressive symptoms are available. Most of these questionnaires are constructed based on classical test theory, making comparisons of individual scores difficult. Item response theory (IRT) allows the comparison of scores from different instruments. In this study, the impact of IRT-based cross-calibration methods on the results of a treatment outcome study was evaluated using 2 instruments.METHODS: Data collected during admission and discharge procedures from 1066 inpatients in 2 psychosomatic clinics using different depression measures were analyzed. To achieve comparability across the applied depression measures, we used an IRT-based conversion table to transform scores from one instrument's scale to the other. Latent trait values were also estimated using different instruments in each clinic. We compared these methods to the traditional approach of using the same instrument in both clinics and examined their effects on the statistical analyses.RESULTS: There was no substantial change in the interpretation of the study results when different instruments were used. However, F values, P values, and effect sizes in the analysis of variance changed significantly. This might be attributed to differences in the content or measurement properties of the instruments. Interestingly, no difference was observed between use of transformed sum scores and latent trait values.CONCLUSIONS: IRT cross-calibration methods are a convenient way to enhance the comparability of questionnaire data in applied clinical settings but seem not to be able to overcome differences in measurement properties of the instruments. As these differences can lead to biased results, there is a need for further research into more advanced techniques.

KW - Adult

KW - Calibration

KW - Data Interpretation, Statistical

KW - Depression

KW - Female

KW - Humans

KW - Male

KW - Psychiatric Status Rating Scales

KW - Psychometrics

KW - Questionnaires

KW - Severity of Illness Index

KW - Treatment Outcome

U2 - 10.1097/MLR.0b013e31822945b4

DO - 10.1097/MLR.0b013e31822945b4

M3 - SCORING: Journal article

C2 - 22422054

VL - 50

SP - 320

EP - 326

JO - MED CARE

JF - MED CARE

SN - 0025-7079

IS - 4

ER -