Performance comparison of TCR-pMHC prediction tools reveals a strong data dependency
Standard
Performance comparison of TCR-pMHC prediction tools reveals a strong data dependency. / Deng, Lihua; Ly, Cedric; Abdollahi, Sina; Zhao, Yu; Prinz, Immo; Bonn, Stefan.
in: FRONT IMMUNOL, Jahrgang 14, 18.04.2023, S. 1128326.Publikationen: SCORING: Beitrag in Fachzeitschrift/Zeitung › SCORING: Zeitschriftenaufsatz › Forschung › Begutachtung
Harvard
APA
Vancouver
Bibtex
}
RIS
TY - JOUR
T1 - Performance comparison of TCR-pMHC prediction tools reveals a strong data dependency
AU - Deng, Lihua
AU - Ly, Cedric
AU - Abdollahi, Sina
AU - Zhao, Yu
AU - Prinz, Immo
AU - Bonn, Stefan
N1 - Copyright © 2023 Deng, Ly, Abdollahi, Zhao, Prinz and Bonn.
PY - 2023/4/18
Y1 - 2023/4/18
N2 - The interaction of T-cell receptors with peptide-major histocompatibility complex molecules (TCR-pMHC) plays a crucial role in adaptive immune responses. Currently there are various models aiming at predicting TCR-pMHC binding, while a standard dataset and procedure to compare the performance of these approaches is still missing. In this work we provide a general method for data collection, preprocessing, splitting and generation of negative examples, as well as comprehensive datasets to compare TCR-pMHC prediction models. We collected, harmonized, and merged all the major publicly available TCR-pMHC binding data and compared the performance of five state-of-the-art deep learning models (TITAN, NetTCR-2.0, ERGO, DLpTCR and ImRex) using this data. Our performance evaluation focuses on two scenarios: 1) different splitting methods for generating training and testing data to assess model generalization and 2) different data versions that vary in size and peptide imbalance to assess model robustness. Our results indicate that the five contemporary models do not generalize to peptides that have not been in the training set. We can also show that model performance is strongly dependent on the data balance and size, which indicates a relatively low model robustness. These results suggest that TCR-pMHC binding prediction remains highly challenging and requires further high quality data and novel algorithmic approaches.
AB - The interaction of T-cell receptors with peptide-major histocompatibility complex molecules (TCR-pMHC) plays a crucial role in adaptive immune responses. Currently there are various models aiming at predicting TCR-pMHC binding, while a standard dataset and procedure to compare the performance of these approaches is still missing. In this work we provide a general method for data collection, preprocessing, splitting and generation of negative examples, as well as comprehensive datasets to compare TCR-pMHC prediction models. We collected, harmonized, and merged all the major publicly available TCR-pMHC binding data and compared the performance of five state-of-the-art deep learning models (TITAN, NetTCR-2.0, ERGO, DLpTCR and ImRex) using this data. Our performance evaluation focuses on two scenarios: 1) different splitting methods for generating training and testing data to assess model generalization and 2) different data versions that vary in size and peptide imbalance to assess model robustness. Our results indicate that the five contemporary models do not generalize to peptides that have not been in the training set. We can also show that model performance is strongly dependent on the data balance and size, which indicates a relatively low model robustness. These results suggest that TCR-pMHC binding prediction remains highly challenging and requires further high quality data and novel algorithmic approaches.
KW - Peptides
KW - Receptors, Antigen, T-Cell
KW - Histocompatibility Antigens
KW - Major Histocompatibility Complex
KW - Protein Binding
U2 - 10.3389/fimmu.2023.1128326
DO - 10.3389/fimmu.2023.1128326
M3 - SCORING: Journal article
C2 - 37143667
VL - 14
SP - 1128326
JO - FRONT IMMUNOL
JF - FRONT IMMUNOL
SN - 1664-3224
ER -