Removing Batch Effects from Longitudinal Gene Expression - Quantile Normalization Plus ComBat as Best Approach for Microarray Transcriptome Data

Standard

Removing Batch Effects from Longitudinal Gene Expression - Quantile Normalization Plus ComBat as Best Approach for Microarray Transcriptome Data. / Müller, Christian; Schillert, Arne; Röthemeier, Caroline; Trégouët, David-Alexandre; Proust, Carole; Binder, Harald; Pfeiffer, Norbert; Beutel, Manfred; Lackner, Karl J; Schnabel, Renate B; Tiret, Laurence; Wild, Philipp S; Blankenberg, Stefan; Zeller, Tanja; Ziegler, Andreas.

In: PLOS ONE, Vol. 11, No. 6, 2016, p. e0156594.

Research output: SCORING: Contribution to journalSCORING: Journal articleResearchpeer-review

Harvard

Müller, C, Schillert, A, Röthemeier, C, Trégouët, D-A, Proust, C, Binder, H, Pfeiffer, N, Beutel, M, Lackner, KJ, Schnabel, RB, Tiret, L, Wild, PS, Blankenberg, S, Zeller, T & Ziegler, A 2016, 'Removing Batch Effects from Longitudinal Gene Expression - Quantile Normalization Plus ComBat as Best Approach for Microarray Transcriptome Data', PLOS ONE, vol. 11, no. 6, pp. e0156594. https://doi.org/10.1371/journal.pone.0156594

APA

Müller, C., Schillert, A., Röthemeier, C., Trégouët, D-A., Proust, C., Binder, H., Pfeiffer, N., Beutel, M., Lackner, K. J., Schnabel, R. B., Tiret, L., Wild, P. S., Blankenberg, S., Zeller, T., & Ziegler, A. (2016). Removing Batch Effects from Longitudinal Gene Expression - Quantile Normalization Plus ComBat as Best Approach for Microarray Transcriptome Data. PLOS ONE, 11(6), e0156594. https://doi.org/10.1371/journal.pone.0156594

Vancouver

Bibtex

@article{febfec2d0f234d198a6900215b411f30,
title = "Removing Batch Effects from Longitudinal Gene Expression - Quantile Normalization Plus ComBat as Best Approach for Microarray Transcriptome Data",
abstract = "Technical variation plays an important role in microarray-based gene expression studies, and batch effects explain a large proportion of this noise. It is therefore mandatory to eliminate technical variation while maintaining biological variability. Several strategies have been proposed for the removal of batch effects, although they have not been evaluated in large-scale longitudinal gene expression data. In this study, we aimed at identifying a suitable method for batch effect removal in a large study of microarray-based longitudinal gene expression. Monocytic gene expression was measured in 1092 participants of the Gutenberg Health Study at baseline and 5-year follow up. Replicates of selected samples were measured at both time points to identify technical variability. Deming regression, Passing-Bablok regression, linear mixed models, non-linear models as well as ReplicateRUV and ComBat were applied to eliminate batch effects between replicates. In a second step, quantile normalization prior to batch effect correction was performed for each method. Technical variation between batches was evaluated by principal component analysis. Associations between body mass index and transcriptomes were calculated before and after batch removal. Results from association analyses were compared to evaluate maintenance of biological variability. Quantile normalization, separately performed in each batch, combined with ComBat successfully reduced batch effects and maintained biological variability. ReplicateRUV performed perfectly in the replicate data subset of the study, but failed when applied to all samples. All other methods did not substantially reduce batch effects in the replicate data subset. Quantile normalization plus ComBat appears to be a valuable approach for batch correction in longitudinal gene expression data. ",
keywords = "Adult, Aged, Female, Gene Expression Profiling/methods, Humans, Longitudinal Studies, Male, Middle Aged, Monocytes/chemistry, Nonlinear Dynamics, Oligonucleotide Array Sequence Analysis/methods, Principal Component Analysis, Prospective Studies",
author = "Christian M{\"u}ller and Arne Schillert and Caroline R{\"o}themeier and David-Alexandre Tr{\'e}gou{\"e}t and Carole Proust and Harald Binder and Norbert Pfeiffer and Manfred Beutel and Lackner, {Karl J} and Schnabel, {Renate B} and Laurence Tiret and Wild, {Philipp S} and Stefan Blankenberg and Tanja Zeller and Andreas Ziegler",
year = "2016",
doi = "10.1371/journal.pone.0156594",
language = "English",
volume = "11",
pages = "e0156594",
journal = "PLOS ONE",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "6",

}

RIS

TY - JOUR

T1 - Removing Batch Effects from Longitudinal Gene Expression - Quantile Normalization Plus ComBat as Best Approach for Microarray Transcriptome Data

AU - Müller, Christian

AU - Schillert, Arne

AU - Röthemeier, Caroline

AU - Trégouët, David-Alexandre

AU - Proust, Carole

AU - Binder, Harald

AU - Pfeiffer, Norbert

AU - Beutel, Manfred

AU - Lackner, Karl J

AU - Schnabel, Renate B

AU - Tiret, Laurence

AU - Wild, Philipp S

AU - Blankenberg, Stefan

AU - Zeller, Tanja

AU - Ziegler, Andreas

PY - 2016

Y1 - 2016

N2 - Technical variation plays an important role in microarray-based gene expression studies, and batch effects explain a large proportion of this noise. It is therefore mandatory to eliminate technical variation while maintaining biological variability. Several strategies have been proposed for the removal of batch effects, although they have not been evaluated in large-scale longitudinal gene expression data. In this study, we aimed at identifying a suitable method for batch effect removal in a large study of microarray-based longitudinal gene expression. Monocytic gene expression was measured in 1092 participants of the Gutenberg Health Study at baseline and 5-year follow up. Replicates of selected samples were measured at both time points to identify technical variability. Deming regression, Passing-Bablok regression, linear mixed models, non-linear models as well as ReplicateRUV and ComBat were applied to eliminate batch effects between replicates. In a second step, quantile normalization prior to batch effect correction was performed for each method. Technical variation between batches was evaluated by principal component analysis. Associations between body mass index and transcriptomes were calculated before and after batch removal. Results from association analyses were compared to evaluate maintenance of biological variability. Quantile normalization, separately performed in each batch, combined with ComBat successfully reduced batch effects and maintained biological variability. ReplicateRUV performed perfectly in the replicate data subset of the study, but failed when applied to all samples. All other methods did not substantially reduce batch effects in the replicate data subset. Quantile normalization plus ComBat appears to be a valuable approach for batch correction in longitudinal gene expression data.

AB - Technical variation plays an important role in microarray-based gene expression studies, and batch effects explain a large proportion of this noise. It is therefore mandatory to eliminate technical variation while maintaining biological variability. Several strategies have been proposed for the removal of batch effects, although they have not been evaluated in large-scale longitudinal gene expression data. In this study, we aimed at identifying a suitable method for batch effect removal in a large study of microarray-based longitudinal gene expression. Monocytic gene expression was measured in 1092 participants of the Gutenberg Health Study at baseline and 5-year follow up. Replicates of selected samples were measured at both time points to identify technical variability. Deming regression, Passing-Bablok regression, linear mixed models, non-linear models as well as ReplicateRUV and ComBat were applied to eliminate batch effects between replicates. In a second step, quantile normalization prior to batch effect correction was performed for each method. Technical variation between batches was evaluated by principal component analysis. Associations between body mass index and transcriptomes were calculated before and after batch removal. Results from association analyses were compared to evaluate maintenance of biological variability. Quantile normalization, separately performed in each batch, combined with ComBat successfully reduced batch effects and maintained biological variability. ReplicateRUV performed perfectly in the replicate data subset of the study, but failed when applied to all samples. All other methods did not substantially reduce batch effects in the replicate data subset. Quantile normalization plus ComBat appears to be a valuable approach for batch correction in longitudinal gene expression data.

KW - Adult

KW - Aged

KW - Female

KW - Gene Expression Profiling/methods

KW - Humans

KW - Longitudinal Studies

KW - Male

KW - Middle Aged

KW - Monocytes/chemistry

KW - Nonlinear Dynamics

KW - Oligonucleotide Array Sequence Analysis/methods

KW - Principal Component Analysis

KW - Prospective Studies

U2 - 10.1371/journal.pone.0156594

DO - 10.1371/journal.pone.0156594

M3 - SCORING: Journal article

C2 - 27272489

VL - 11

SP - e0156594

JO - PLOS ONE

JF - PLOS ONE

SN - 1932-6203

IS - 6

ER -