Impact of cross-calibration methods on the interpretation of a treatment comparison study using 2 depression scales
Standard
Impact of cross-calibration methods on the interpretation of a treatment comparison study using 2 depression scales. / Fischer, Herbert Felix; Wahl, Inka; Fliege, Herbert; Klapp, Burghard F; Rose, Matthias.
In: MED CARE, Vol. 50, No. 4, 01.04.2012, p. 320-6.Research output: SCORING: Contribution to journal › SCORING: Journal article › Research › peer-review
Harvard
APA
Vancouver
Bibtex
}
RIS
TY - JOUR
T1 - Impact of cross-calibration methods on the interpretation of a treatment comparison study using 2 depression scales
AU - Fischer, Herbert Felix
AU - Wahl, Inka
AU - Fliege, Herbert
AU - Klapp, Burghard F
AU - Rose, Matthias
PY - 2012/4/1
Y1 - 2012/4/1
N2 - BACKGROUND: Many questionnaires assessing depressive symptoms are available. Most of these questionnaires are constructed based on classical test theory, making comparisons of individual scores difficult. Item response theory (IRT) allows the comparison of scores from different instruments. In this study, the impact of IRT-based cross-calibration methods on the results of a treatment outcome study was evaluated using 2 instruments.METHODS: Data collected during admission and discharge procedures from 1066 inpatients in 2 psychosomatic clinics using different depression measures were analyzed. To achieve comparability across the applied depression measures, we used an IRT-based conversion table to transform scores from one instrument's scale to the other. Latent trait values were also estimated using different instruments in each clinic. We compared these methods to the traditional approach of using the same instrument in both clinics and examined their effects on the statistical analyses.RESULTS: There was no substantial change in the interpretation of the study results when different instruments were used. However, F values, P values, and effect sizes in the analysis of variance changed significantly. This might be attributed to differences in the content or measurement properties of the instruments. Interestingly, no difference was observed between use of transformed sum scores and latent trait values.CONCLUSIONS: IRT cross-calibration methods are a convenient way to enhance the comparability of questionnaire data in applied clinical settings but seem not to be able to overcome differences in measurement properties of the instruments. As these differences can lead to biased results, there is a need for further research into more advanced techniques.
AB - BACKGROUND: Many questionnaires assessing depressive symptoms are available. Most of these questionnaires are constructed based on classical test theory, making comparisons of individual scores difficult. Item response theory (IRT) allows the comparison of scores from different instruments. In this study, the impact of IRT-based cross-calibration methods on the results of a treatment outcome study was evaluated using 2 instruments.METHODS: Data collected during admission and discharge procedures from 1066 inpatients in 2 psychosomatic clinics using different depression measures were analyzed. To achieve comparability across the applied depression measures, we used an IRT-based conversion table to transform scores from one instrument's scale to the other. Latent trait values were also estimated using different instruments in each clinic. We compared these methods to the traditional approach of using the same instrument in both clinics and examined their effects on the statistical analyses.RESULTS: There was no substantial change in the interpretation of the study results when different instruments were used. However, F values, P values, and effect sizes in the analysis of variance changed significantly. This might be attributed to differences in the content or measurement properties of the instruments. Interestingly, no difference was observed between use of transformed sum scores and latent trait values.CONCLUSIONS: IRT cross-calibration methods are a convenient way to enhance the comparability of questionnaire data in applied clinical settings but seem not to be able to overcome differences in measurement properties of the instruments. As these differences can lead to biased results, there is a need for further research into more advanced techniques.
KW - Adult
KW - Calibration
KW - Data Interpretation, Statistical
KW - Depression
KW - Female
KW - Humans
KW - Male
KW - Psychiatric Status Rating Scales
KW - Psychometrics
KW - Questionnaires
KW - Severity of Illness Index
KW - Treatment Outcome
U2 - 10.1097/MLR.0b013e31822945b4
DO - 10.1097/MLR.0b013e31822945b4
M3 - SCORING: Journal article
C2 - 22422054
VL - 50
SP - 320
EP - 326
JO - MED CARE
JF - MED CARE
SN - 0025-7079
IS - 4
ER -