Class imbalance in gradient boosting classification algorithms: Application to experimental stroke data

Standard

Class imbalance in gradient boosting classification algorithms: Application to experimental stroke data. / Lyashevska, Olga; Malone, Fiona; MacCarthy, Eugene; Fiehler, Jens; Buhk, Jan-Hendrik; Morris, Liam.

In: STAT METHODS MED RES, Vol. 30, No. 3, 03.2021, p. 916-925.

Research output: SCORING: Contribution to journalSCORING: Journal articleResearchpeer-review

Harvard

APA

Vancouver

Bibtex

@article{f81a031fcadf4c6f988fa2b2aeabb4a2,
title = "Class imbalance in gradient boosting classification algorithms: Application to experimental stroke data",
abstract = "Imbalance between positive and negative outcomes, a so-called class imbalance, is a problem generally found in medical data. Imbalanced data hinder the performance of conventional classification methods which aim to improve the overall accuracy of the model without accounting for uneven distribution of the classes. To rectify this, the data can be resampled by oversampling the positive (minority) class until the classes are approximately equally represented. After that, a prediction model such as gradient boosting algorithm can be fitted with greater confidence. This classification method allows for non-linear relationships and deep interactive effects while focusing on difficult areas by iterative shifting towards problematic observations. In this study, we demonstrate application of these methods to medical data and develop a practical framework for evaluation of features contributing into the probability of stroke.",
author = "Olga Lyashevska and Fiona Malone and Eugene MacCarthy and Jens Fiehler and Jan-Hendrik Buhk and Liam Morris",
year = "2021",
month = mar,
doi = "10.1177/0962280220980484",
language = "English",
volume = "30",
pages = "916--925",
journal = "STAT METHODS MED RES",
issn = "0962-2802",
publisher = "SAGE Publications",
number = "3",

}

RIS

TY - JOUR

T1 - Class imbalance in gradient boosting classification algorithms: Application to experimental stroke data

AU - Lyashevska, Olga

AU - Malone, Fiona

AU - MacCarthy, Eugene

AU - Fiehler, Jens

AU - Buhk, Jan-Hendrik

AU - Morris, Liam

PY - 2021/3

Y1 - 2021/3

N2 - Imbalance between positive and negative outcomes, a so-called class imbalance, is a problem generally found in medical data. Imbalanced data hinder the performance of conventional classification methods which aim to improve the overall accuracy of the model without accounting for uneven distribution of the classes. To rectify this, the data can be resampled by oversampling the positive (minority) class until the classes are approximately equally represented. After that, a prediction model such as gradient boosting algorithm can be fitted with greater confidence. This classification method allows for non-linear relationships and deep interactive effects while focusing on difficult areas by iterative shifting towards problematic observations. In this study, we demonstrate application of these methods to medical data and develop a practical framework for evaluation of features contributing into the probability of stroke.

AB - Imbalance between positive and negative outcomes, a so-called class imbalance, is a problem generally found in medical data. Imbalanced data hinder the performance of conventional classification methods which aim to improve the overall accuracy of the model without accounting for uneven distribution of the classes. To rectify this, the data can be resampled by oversampling the positive (minority) class until the classes are approximately equally represented. After that, a prediction model such as gradient boosting algorithm can be fitted with greater confidence. This classification method allows for non-linear relationships and deep interactive effects while focusing on difficult areas by iterative shifting towards problematic observations. In this study, we demonstrate application of these methods to medical data and develop a practical framework for evaluation of features contributing into the probability of stroke.

U2 - 10.1177/0962280220980484

DO - 10.1177/0962280220980484

M3 - SCORING: Journal article

C2 - 33356965

VL - 30

SP - 916

EP - 925

JO - STAT METHODS MED RES

JF - STAT METHODS MED RES

SN - 0962-2802

IS - 3

ER -