Robust classification using average correlations as features (ACF)

Abstract

MOTIVATION: In single-cell transcriptomics and other omics technologies, large fractions of missing values commonly occur. Researchers often either consider only those features that were measured for each instance of their dataset, thereby accepting severe loss of information, or use imputation which can lead to erroneous results. Pairwise metrics allow for imputation-free classification with minimal loss of data.

RESULTS: Using pairwise correlations as metric, state-of-the-art approaches to classification would include the K-nearest-neighbor- (KNN) and distribution-based-classification-classifier. Our novel method, termed average correlations as features (ACF), significantly outperforms those approaches by training tunable machine learning models on inter-class and intra-class correlations. Our approach is characterized in simulation studies and its classification performance is demonstrated on real-world datasets from single-cell RNA sequencing and bottom-up proteomics. Furthermore, we demonstrate that variants of our method offer superior flexibility and performance over KNN classifiers and can be used in conjunction with other machine learning methods. In summary, ACF is a flexible method that enables missing value tolerant classification with minimal loss of data.

Bibliografische Daten

OriginalspracheEnglisch
Aufsatznummer101
ISSN1471-2105
DOIs
StatusVeröffentlicht - 20.03.2023

Anmerkungen des Dekanats

© 2023. The Author(s).

PubMed 36941542