On stability issues in deriving multivariable regression models

Standard

On stability issues in deriving multivariable regression models. / Sauerbrei, Willi; Buchholz, Anika; Boulesteix, Anne-Laure; Binder, Harald.

In: BIOMETRICAL J, Vol. 57, No. 4, 2015, p. 531-55.

Research output: SCORING: Contribution to journalSCORING: Journal articleResearchpeer-review

Harvard

APA

Vancouver

Bibtex

@article{31b1ba3892d64cb2923046a421e8ab7e,
title = "On stability issues in deriving multivariable regression models",
abstract = "In many areas of science where empirical data are analyzed, a task is often to identify important variables with influence on an outcome. Most often this is done by using a variable selection strategy in the context of a multivariable regression model. Using a study on ozone effects in children (n = 496, 24 covariates), we will discuss aspects relevant for deriving a suitable model. With an emphasis on model stability, we will explore and illustrate differences between predictive models and explanatory models, the key role of stopping criteria, and the value of bootstrap resampling (with and without replacement). Bootstrap resampling will be used to assess variable selection stability, to derive a predictor that incorporates model uncertainty, check for influential points, and visualize the variable selection process. For the latter two tasks we adapt and extend recent approaches, such as stability paths, to serve our purposes. Based on earlier experiences and on results from the example, we will argue for simpler models and that predictions are usually very similar, irrespective of the selection method used. Important differences exist for the corresponding variances, and the model uncertainty concept helps to protect against serious underestimation of the variance of a predictor-derived data dependently. Results of stability investigations illustrate severe difficulties in the task of deriving a suitable explanatory model. It seems possible to identify a small number of variables with an important and probably true influence on the outcome, but too often several variables are included whose selection may be a result of chance or may depend on a small number of observations.",
author = "Willi Sauerbrei and Anika Buchholz and Anne-Laure Boulesteix and Harald Binder",
note = "{\textcopyright} 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.",
year = "2015",
doi = "10.1002/bimj.201300222",
language = "English",
volume = "57",
pages = "531--55",
journal = "BIOMETRICAL J",
issn = "0323-3847",
publisher = "Wiley-VCH Verlag GmbH",
number = "4",

}

RIS

TY - JOUR

T1 - On stability issues in deriving multivariable regression models

AU - Sauerbrei, Willi

AU - Buchholz, Anika

AU - Boulesteix, Anne-Laure

AU - Binder, Harald

N1 - © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

PY - 2015

Y1 - 2015

N2 - In many areas of science where empirical data are analyzed, a task is often to identify important variables with influence on an outcome. Most often this is done by using a variable selection strategy in the context of a multivariable regression model. Using a study on ozone effects in children (n = 496, 24 covariates), we will discuss aspects relevant for deriving a suitable model. With an emphasis on model stability, we will explore and illustrate differences between predictive models and explanatory models, the key role of stopping criteria, and the value of bootstrap resampling (with and without replacement). Bootstrap resampling will be used to assess variable selection stability, to derive a predictor that incorporates model uncertainty, check for influential points, and visualize the variable selection process. For the latter two tasks we adapt and extend recent approaches, such as stability paths, to serve our purposes. Based on earlier experiences and on results from the example, we will argue for simpler models and that predictions are usually very similar, irrespective of the selection method used. Important differences exist for the corresponding variances, and the model uncertainty concept helps to protect against serious underestimation of the variance of a predictor-derived data dependently. Results of stability investigations illustrate severe difficulties in the task of deriving a suitable explanatory model. It seems possible to identify a small number of variables with an important and probably true influence on the outcome, but too often several variables are included whose selection may be a result of chance or may depend on a small number of observations.

AB - In many areas of science where empirical data are analyzed, a task is often to identify important variables with influence on an outcome. Most often this is done by using a variable selection strategy in the context of a multivariable regression model. Using a study on ozone effects in children (n = 496, 24 covariates), we will discuss aspects relevant for deriving a suitable model. With an emphasis on model stability, we will explore and illustrate differences between predictive models and explanatory models, the key role of stopping criteria, and the value of bootstrap resampling (with and without replacement). Bootstrap resampling will be used to assess variable selection stability, to derive a predictor that incorporates model uncertainty, check for influential points, and visualize the variable selection process. For the latter two tasks we adapt and extend recent approaches, such as stability paths, to serve our purposes. Based on earlier experiences and on results from the example, we will argue for simpler models and that predictions are usually very similar, irrespective of the selection method used. Important differences exist for the corresponding variances, and the model uncertainty concept helps to protect against serious underestimation of the variance of a predictor-derived data dependently. Results of stability investigations illustrate severe difficulties in the task of deriving a suitable explanatory model. It seems possible to identify a small number of variables with an important and probably true influence on the outcome, but too often several variables are included whose selection may be a result of chance or may depend on a small number of observations.

U2 - 10.1002/bimj.201300222

DO - 10.1002/bimj.201300222

M3 - SCORING: Journal article

C2 - 25501529

VL - 57

SP - 531

EP - 555

JO - BIOMETRICAL J

JF - BIOMETRICAL J

SN - 0323-3847

IS - 4

ER -