HarmonizR enables data harmonization across independent proteomic datasets with appropriate handling of missing values
Standard
HarmonizR enables data harmonization across independent proteomic datasets with appropriate handling of missing values. / Voß, Hannah; Schlumbohm, Simon; Barwikowski, Philip; Wurlitzer, Marcus; Dottermusch, Matthias; Neumann, Philipp; Schlüter, Hartmut; Neumann, Julia E; Krisp, Christoph.
In: NAT COMMUN, Vol. 13, No. 1, 3523, 20.06.2022.Research output: SCORING: Contribution to journal › SCORING: Journal article › Research › peer-review
Harvard
APA
Vancouver
Bibtex
}
RIS
TY - JOUR
T1 - HarmonizR enables data harmonization across independent proteomic datasets with appropriate handling of missing values
AU - Voß, Hannah
AU - Schlumbohm, Simon
AU - Barwikowski, Philip
AU - Wurlitzer, Marcus
AU - Dottermusch, Matthias
AU - Neumann, Philipp
AU - Schlüter, Hartmut
AU - Neumann, Julia E
AU - Krisp, Christoph
N1 - © 2022. The Author(s).
PY - 2022/6/20
Y1 - 2022/6/20
N2 - Dataset integration is common practice to overcome limitations in statistically underpowered omics datasets. Proteome datasets display high technical variability and frequent missing values. Sophisticated strategies for batch effect reduction are lacking or rely on error-prone data imputation. Here we introduce HarmonizR, a data harmonization tool with appropriate missing value handling. The method exploits the structure of available data and matrix dissection for minimal data loss, without data imputation. This strategy implements two common batch effect reduction methods-ComBat and limma (removeBatchEffect()). The HarmonizR strategy, evaluated on four exemplarily analyzed datasets with up to 23 batches, demonstrated successful data harmonization for different tissue preservation techniques, LC-MS/MS instrumentation setups, and quantification approaches. Compared to data imputation methods, HarmonizR was more efficient and performed superior regarding the detection of significant proteins. HarmonizR is an efficient tool for missing data tolerant experimental variance reduction and is easily adjustable for individual dataset properties and user preferences.
AB - Dataset integration is common practice to overcome limitations in statistically underpowered omics datasets. Proteome datasets display high technical variability and frequent missing values. Sophisticated strategies for batch effect reduction are lacking or rely on error-prone data imputation. Here we introduce HarmonizR, a data harmonization tool with appropriate missing value handling. The method exploits the structure of available data and matrix dissection for minimal data loss, without data imputation. This strategy implements two common batch effect reduction methods-ComBat and limma (removeBatchEffect()). The HarmonizR strategy, evaluated on four exemplarily analyzed datasets with up to 23 batches, demonstrated successful data harmonization for different tissue preservation techniques, LC-MS/MS instrumentation setups, and quantification approaches. Compared to data imputation methods, HarmonizR was more efficient and performed superior regarding the detection of significant proteins. HarmonizR is an efficient tool for missing data tolerant experimental variance reduction and is easily adjustable for individual dataset properties and user preferences.
KW - Algorithms
KW - Chromatography, Liquid
KW - Proteome
KW - Proteomics/methods
KW - Research Design
KW - Tandem Mass Spectrometry
U2 - 10.1038/s41467-022-31007-x
DO - 10.1038/s41467-022-31007-x
M3 - SCORING: Journal article
C2 - 35725563
VL - 13
JO - NAT COMMUN
JF - NAT COMMUN
SN - 2041-1723
IS - 1
M1 - 3523
ER -