HarmonizR enables data harmonization across independent proteomic datasets with appropriate handling of missing values

Hannah Voß; Simon Schlumbohm; Philip Barwikowski; Marcus Wurlitzer; Matthias Dottermusch; Philipp Neumann; Hartmut Schlüter; Julia E Neumann; Christoph Krisp

doi:10.1038/s41467-022-31007-x

HarmonizR enables data harmonization across independent proteomic datasets with appropriate handling of missing values

Standard

HarmonizR enables data harmonization across independent proteomic datasets with appropriate handling of missing values. / Voß, Hannah; Schlumbohm, Simon; Barwikowski, Philip; Wurlitzer, Marcus; Dottermusch, Matthias; Neumann, Philipp; Schlüter, Hartmut ; Neumann, Julia E; Krisp, Christoph.

In: NAT COMMUN, Vol. 13, No. 1, 3523, 20.06.2022.

Research output: SCORING: Contribution to journal › SCORING: Journal article › Research › peer-review

Harvard

Voß, H, Schlumbohm, S, Barwikowski, P, Wurlitzer, M, Dottermusch, M, Neumann, P, Schlüter, H , Neumann, JE & Krisp, C 2022, 'HarmonizR enables data harmonization across independent proteomic datasets with appropriate handling of missing values', NAT COMMUN, vol. 13, no. 1, 3523. https://doi.org/10.1038/s41467-022-31007-x

APA

Voß, H., Schlumbohm, S., Barwikowski, P., Wurlitzer, M., Dottermusch, M., Neumann, P., Schlüter, H., Neumann, J. E., & Krisp, C. (2022). HarmonizR enables data harmonization across independent proteomic datasets with appropriate handling of missing values. NAT COMMUN, 13(1), [3523]. https://doi.org/10.1038/s41467-022-31007-x

Vancouver

Voß H, Schlumbohm S, Barwikowski P, Wurlitzer M, Dottermusch M, Neumann P et al. HarmonizR enables data harmonization across independent proteomic datasets with appropriate handling of missing values. NAT COMMUN. 2022 Jun 20;13(1). 3523. https://doi.org/10.1038/s41467-022-31007-x

Bibtex

@article{e20f95dc9c63444296a6a86842cd1f59,

title = "HarmonizR enables data harmonization across independent proteomic datasets with appropriate handling of missing values",

abstract = "Dataset integration is common practice to overcome limitations in statistically underpowered omics datasets. Proteome datasets display high technical variability and frequent missing values. Sophisticated strategies for batch effect reduction are lacking or rely on error-prone data imputation. Here we introduce HarmonizR, a data harmonization tool with appropriate missing value handling. The method exploits the structure of available data and matrix dissection for minimal data loss, without data imputation. This strategy implements two common batch effect reduction methods-ComBat and limma (removeBatchEffect()). The HarmonizR strategy, evaluated on four exemplarily analyzed datasets with up to 23 batches, demonstrated successful data harmonization for different tissue preservation techniques, LC-MS/MS instrumentation setups, and quantification approaches. Compared to data imputation methods, HarmonizR was more efficient and performed superior regarding the detection of significant proteins. HarmonizR is an efficient tool for missing data tolerant experimental variance reduction and is easily adjustable for individual dataset properties and user preferences.",

keywords = "Algorithms, Chromatography, Liquid, Proteome, Proteomics/methods, Research Design, Tandem Mass Spectrometry",

author = "Hannah Vo{\ss} and Simon Schlumbohm and Philip Barwikowski and Marcus Wurlitzer and Matthias Dottermusch and Philipp Neumann and Hartmut Schl{\"u}ter and Neumann, {Julia E} and Christoph Krisp",

note = "{\textcopyright} 2022. The Author(s).",

year = "2022",

month = jun,

day = "20",

doi = "10.1038/s41467-022-31007-x",

language = "English",

volume = "13",

journal = "NAT COMMUN",

issn = "2041-1723",

publisher = "NATURE PUBLISHING GROUP",

number = "1",

}

RIS

TY - JOUR

T1 - HarmonizR enables data harmonization across independent proteomic datasets with appropriate handling of missing values

AU - Voß, Hannah

AU - Schlumbohm, Simon

AU - Barwikowski, Philip

AU - Wurlitzer, Marcus

AU - Dottermusch, Matthias

AU - Neumann, Philipp

AU - Schlüter, Hartmut

AU - Neumann, Julia E

AU - Krisp, Christoph

PY - 2022/6/20

Y1 - 2022/6/20

N2 - Dataset integration is common practice to overcome limitations in statistically underpowered omics datasets. Proteome datasets display high technical variability and frequent missing values. Sophisticated strategies for batch effect reduction are lacking or rely on error-prone data imputation. Here we introduce HarmonizR, a data harmonization tool with appropriate missing value handling. The method exploits the structure of available data and matrix dissection for minimal data loss, without data imputation. This strategy implements two common batch effect reduction methods-ComBat and limma (removeBatchEffect()). The HarmonizR strategy, evaluated on four exemplarily analyzed datasets with up to 23 batches, demonstrated successful data harmonization for different tissue preservation techniques, LC-MS/MS instrumentation setups, and quantification approaches. Compared to data imputation methods, HarmonizR was more efficient and performed superior regarding the detection of significant proteins. HarmonizR is an efficient tool for missing data tolerant experimental variance reduction and is easily adjustable for individual dataset properties and user preferences.

AB - Dataset integration is common practice to overcome limitations in statistically underpowered omics datasets. Proteome datasets display high technical variability and frequent missing values. Sophisticated strategies for batch effect reduction are lacking or rely on error-prone data imputation. Here we introduce HarmonizR, a data harmonization tool with appropriate missing value handling. The method exploits the structure of available data and matrix dissection for minimal data loss, without data imputation. This strategy implements two common batch effect reduction methods-ComBat and limma (removeBatchEffect()). The HarmonizR strategy, evaluated on four exemplarily analyzed datasets with up to 23 batches, demonstrated successful data harmonization for different tissue preservation techniques, LC-MS/MS instrumentation setups, and quantification approaches. Compared to data imputation methods, HarmonizR was more efficient and performed superior regarding the detection of significant proteins. HarmonizR is an efficient tool for missing data tolerant experimental variance reduction and is easily adjustable for individual dataset properties and user preferences.

KW - Algorithms

KW - Chromatography, Liquid

KW - Proteome

KW - Proteomics/methods

KW - Research Design

KW - Tandem Mass Spectrometry

U2 - 10.1038/s41467-022-31007-x

DO - 10.1038/s41467-022-31007-x

M3 - SCORING: Journal article

C2 - 35725563

VL - 13

JO - NAT COMMUN

JF - NAT COMMUN

SN - 2041-1723

IS - 1

M1 - 3523

ER -