Performance comparison of TCR-pMHC prediction tools reveals a strong data dependency

Standard

Performance comparison of TCR-pMHC prediction tools reveals a strong data dependency. / Deng, Lihua; Ly, Cedric; Abdollahi, Sina; Zhao, Yu; Prinz, Immo; Bonn, Stefan.

in: FRONT IMMUNOL, Jahrgang 14, 18.04.2023, S. 1128326.

Publikationen: SCORING: Beitrag in Fachzeitschrift/ZeitungSCORING: ZeitschriftenaufsatzForschungBegutachtung

Harvard

APA

Vancouver

Bibtex

@article{1216292c5e2b45b6baa347d3186df5b4,
title = "Performance comparison of TCR-pMHC prediction tools reveals a strong data dependency",
abstract = "The interaction of T-cell receptors with peptide-major histocompatibility complex molecules (TCR-pMHC) plays a crucial role in adaptive immune responses. Currently there are various models aiming at predicting TCR-pMHC binding, while a standard dataset and procedure to compare the performance of these approaches is still missing. In this work we provide a general method for data collection, preprocessing, splitting and generation of negative examples, as well as comprehensive datasets to compare TCR-pMHC prediction models. We collected, harmonized, and merged all the major publicly available TCR-pMHC binding data and compared the performance of five state-of-the-art deep learning models (TITAN, NetTCR-2.0, ERGO, DLpTCR and ImRex) using this data. Our performance evaluation focuses on two scenarios: 1) different splitting methods for generating training and testing data to assess model generalization and 2) different data versions that vary in size and peptide imbalance to assess model robustness. Our results indicate that the five contemporary models do not generalize to peptides that have not been in the training set. We can also show that model performance is strongly dependent on the data balance and size, which indicates a relatively low model robustness. These results suggest that TCR-pMHC binding prediction remains highly challenging and requires further high quality data and novel algorithmic approaches.",
keywords = "Peptides, Receptors, Antigen, T-Cell, Histocompatibility Antigens, Major Histocompatibility Complex, Protein Binding",
author = "Lihua Deng and Cedric Ly and Sina Abdollahi and Yu Zhao and Immo Prinz and Stefan Bonn",
note = "Copyright {\textcopyright} 2023 Deng, Ly, Abdollahi, Zhao, Prinz and Bonn.",
year = "2023",
month = apr,
day = "18",
doi = "10.3389/fimmu.2023.1128326",
language = "English",
volume = "14",
pages = "1128326",
journal = "FRONT IMMUNOL",
issn = "1664-3224",
publisher = "Lausanne : Frontiers Research Foundation",

}

RIS

TY - JOUR

T1 - Performance comparison of TCR-pMHC prediction tools reveals a strong data dependency

AU - Deng, Lihua

AU - Ly, Cedric

AU - Abdollahi, Sina

AU - Zhao, Yu

AU - Prinz, Immo

AU - Bonn, Stefan

N1 - Copyright © 2023 Deng, Ly, Abdollahi, Zhao, Prinz and Bonn.

PY - 2023/4/18

Y1 - 2023/4/18

N2 - The interaction of T-cell receptors with peptide-major histocompatibility complex molecules (TCR-pMHC) plays a crucial role in adaptive immune responses. Currently there are various models aiming at predicting TCR-pMHC binding, while a standard dataset and procedure to compare the performance of these approaches is still missing. In this work we provide a general method for data collection, preprocessing, splitting and generation of negative examples, as well as comprehensive datasets to compare TCR-pMHC prediction models. We collected, harmonized, and merged all the major publicly available TCR-pMHC binding data and compared the performance of five state-of-the-art deep learning models (TITAN, NetTCR-2.0, ERGO, DLpTCR and ImRex) using this data. Our performance evaluation focuses on two scenarios: 1) different splitting methods for generating training and testing data to assess model generalization and 2) different data versions that vary in size and peptide imbalance to assess model robustness. Our results indicate that the five contemporary models do not generalize to peptides that have not been in the training set. We can also show that model performance is strongly dependent on the data balance and size, which indicates a relatively low model robustness. These results suggest that TCR-pMHC binding prediction remains highly challenging and requires further high quality data and novel algorithmic approaches.

AB - The interaction of T-cell receptors with peptide-major histocompatibility complex molecules (TCR-pMHC) plays a crucial role in adaptive immune responses. Currently there are various models aiming at predicting TCR-pMHC binding, while a standard dataset and procedure to compare the performance of these approaches is still missing. In this work we provide a general method for data collection, preprocessing, splitting and generation of negative examples, as well as comprehensive datasets to compare TCR-pMHC prediction models. We collected, harmonized, and merged all the major publicly available TCR-pMHC binding data and compared the performance of five state-of-the-art deep learning models (TITAN, NetTCR-2.0, ERGO, DLpTCR and ImRex) using this data. Our performance evaluation focuses on two scenarios: 1) different splitting methods for generating training and testing data to assess model generalization and 2) different data versions that vary in size and peptide imbalance to assess model robustness. Our results indicate that the five contemporary models do not generalize to peptides that have not been in the training set. We can also show that model performance is strongly dependent on the data balance and size, which indicates a relatively low model robustness. These results suggest that TCR-pMHC binding prediction remains highly challenging and requires further high quality data and novel algorithmic approaches.

KW - Peptides

KW - Receptors, Antigen, T-Cell

KW - Histocompatibility Antigens

KW - Major Histocompatibility Complex

KW - Protein Binding

U2 - 10.3389/fimmu.2023.1128326

DO - 10.3389/fimmu.2023.1128326

M3 - SCORING: Journal article

C2 - 37143667

VL - 14

SP - 1128326

JO - FRONT IMMUNOL

JF - FRONT IMMUNOL

SN - 1664-3224

ER -