Statistical model building: Background "knowledge" based on inappropriate preselection causes misspecification

Standard

Statistical model building: Background "knowledge" based on inappropriate preselection causes misspecification. / Hafermann, Lorena; Becher, Heiko; Herrmann, Carolin; Klein, Nadja; Heinze, Georg; Rauch, Geraldine.

In: BMC MED RES METHODOL, Vol. 21, No. 1, 29.09.2021, p. 196.

Research output: SCORING: Contribution to journalSCORING: Journal articleResearchpeer-review

Harvard

APA

Vancouver

Bibtex

@article{896d0401298a44d5b014a05c6a384f06,
title = "Statistical model building: Background {"}knowledge{"} based on inappropriate preselection causes misspecification",
abstract = "BACKGROUND: Statistical model building requires selection of variables for a model depending on the model's aim. In descriptive and explanatory models, a common recommendation often met in the literature is to include all variables in the model which are assumed or known to be associated with the outcome independent of their identification with data driven selection procedures. An open question is, how reliable this assumed {"}background knowledge{"} truly is. In fact, {"}known{"} predictors might be findings from preceding studies which may also have employed inappropriate model building strategies.METHODS: We conducted a simulation study assessing the influence of treating variables as {"}known predictors{"} in model building when in fact this knowledge resulting from preceding studies might be insufficient. Within randomly generated preceding study data sets, model building with variable selection was conducted. A variable was subsequently considered as a {"}known{"} predictor if a predefined number of preceding studies identified it as relevant.RESULTS: Even if several preceding studies identified a variable as a {"}true{"} predictor, this classification is often false positive. Moreover, variables not identified might still be truly predictive. This especially holds true if the preceding studies employed inappropriate selection methods such as univariable selection.CONCLUSIONS: The source of {"}background knowledge{"} should be evaluated with care. Knowledge generated on preceding studies can cause misspecification.",
author = "Lorena Hafermann and Heiko Becher and Carolin Herrmann and Nadja Klein and Georg Heinze and Geraldine Rauch",
note = "{\textcopyright} 2021. The Author(s).",
year = "2021",
month = sep,
day = "29",
doi = "10.1186/s12874-021-01373-z",
language = "English",
volume = "21",
pages = "196",
journal = "BMC MED RES METHODOL",
issn = "1471-2288",
publisher = "BioMed Central Ltd.",
number = "1",

}

RIS

TY - JOUR

T1 - Statistical model building: Background "knowledge" based on inappropriate preselection causes misspecification

AU - Hafermann, Lorena

AU - Becher, Heiko

AU - Herrmann, Carolin

AU - Klein, Nadja

AU - Heinze, Georg

AU - Rauch, Geraldine

N1 - © 2021. The Author(s).

PY - 2021/9/29

Y1 - 2021/9/29

N2 - BACKGROUND: Statistical model building requires selection of variables for a model depending on the model's aim. In descriptive and explanatory models, a common recommendation often met in the literature is to include all variables in the model which are assumed or known to be associated with the outcome independent of their identification with data driven selection procedures. An open question is, how reliable this assumed "background knowledge" truly is. In fact, "known" predictors might be findings from preceding studies which may also have employed inappropriate model building strategies.METHODS: We conducted a simulation study assessing the influence of treating variables as "known predictors" in model building when in fact this knowledge resulting from preceding studies might be insufficient. Within randomly generated preceding study data sets, model building with variable selection was conducted. A variable was subsequently considered as a "known" predictor if a predefined number of preceding studies identified it as relevant.RESULTS: Even if several preceding studies identified a variable as a "true" predictor, this classification is often false positive. Moreover, variables not identified might still be truly predictive. This especially holds true if the preceding studies employed inappropriate selection methods such as univariable selection.CONCLUSIONS: The source of "background knowledge" should be evaluated with care. Knowledge generated on preceding studies can cause misspecification.

AB - BACKGROUND: Statistical model building requires selection of variables for a model depending on the model's aim. In descriptive and explanatory models, a common recommendation often met in the literature is to include all variables in the model which are assumed or known to be associated with the outcome independent of their identification with data driven selection procedures. An open question is, how reliable this assumed "background knowledge" truly is. In fact, "known" predictors might be findings from preceding studies which may also have employed inappropriate model building strategies.METHODS: We conducted a simulation study assessing the influence of treating variables as "known predictors" in model building when in fact this knowledge resulting from preceding studies might be insufficient. Within randomly generated preceding study data sets, model building with variable selection was conducted. A variable was subsequently considered as a "known" predictor if a predefined number of preceding studies identified it as relevant.RESULTS: Even if several preceding studies identified a variable as a "true" predictor, this classification is often false positive. Moreover, variables not identified might still be truly predictive. This especially holds true if the preceding studies employed inappropriate selection methods such as univariable selection.CONCLUSIONS: The source of "background knowledge" should be evaluated with care. Knowledge generated on preceding studies can cause misspecification.

U2 - 10.1186/s12874-021-01373-z

DO - 10.1186/s12874-021-01373-z

M3 - SCORING: Journal article

C2 - 34587892

VL - 21

SP - 196

JO - BMC MED RES METHODOL

JF - BMC MED RES METHODOL

SN - 1471-2288

IS - 1

ER -