Bias-invariant RNA-sequencing metadata annotation

Standard

Bias-invariant RNA-sequencing metadata annotation. / Wartmann, Hannes; Heins, Sven; Kloiber, Karin; Bonn, Stefan.

in: GIGASCIENCE, Jahrgang 10, Nr. 9, 22.09.2021, S. giab064.

Publikationen: SCORING: Beitrag in Fachzeitschrift/ZeitungSCORING: ZeitschriftenaufsatzForschungBegutachtung

Harvard

APA

Vancouver

Bibtex

@article{3156b4cc308e4241be475ca502daa468,
title = "Bias-invariant RNA-sequencing metadata annotation",
abstract = "BACKGROUND: Recent technological advances have resulted in an unprecedented increase in publicly available biomedical data, yet the reuse of the data is often precluded by experimental bias and a lack of annotation depth and consistency. Missing annotations makes it impossible for researchers to find datasets specific to their needs.FINDINGS: Here, we investigate RNA-sequencing metadata prediction based on gene expression values. We present a deep-learning-based domain adaptation algorithm for the automatic annotation of RNA-sequencing metadata. We show, in multiple experiments, that our model is better at integrating heterogeneous training data compared with existing linear regression-based approaches, resulting in improved tissue type classification. By using a model architecture similar to Siamese networks, the algorithm can learn biases from datasets with few samples.CONCLUSION: Using our novel domain adaptation approach, we achieved metadata annotation accuracies up to 15.7% better than a previously published method. Using the best model, we provide a list of >10,000 novel tissue and sex label annotations for 8,495 unique SRA samples. Our approach has the potential to revive idle datasets by automated annotation making them more searchable.",
author = "Hannes Wartmann and Sven Heins and Karin Kloiber and Stefan Bonn",
note = "{\textcopyright} The Author(s) 2021. Published by Oxford University Press GigaScience.",
year = "2021",
month = sep,
day = "22",
doi = "10.1093/gigascience/giab064",
language = "English",
volume = "10",
pages = "giab064",
journal = "GIGASCIENCE",
issn = "2047-217X",
publisher = "BioMed Central Ltd.",
number = "9",

}

RIS

TY - JOUR

T1 - Bias-invariant RNA-sequencing metadata annotation

AU - Wartmann, Hannes

AU - Heins, Sven

AU - Kloiber, Karin

AU - Bonn, Stefan

N1 - © The Author(s) 2021. Published by Oxford University Press GigaScience.

PY - 2021/9/22

Y1 - 2021/9/22

N2 - BACKGROUND: Recent technological advances have resulted in an unprecedented increase in publicly available biomedical data, yet the reuse of the data is often precluded by experimental bias and a lack of annotation depth and consistency. Missing annotations makes it impossible for researchers to find datasets specific to their needs.FINDINGS: Here, we investigate RNA-sequencing metadata prediction based on gene expression values. We present a deep-learning-based domain adaptation algorithm for the automatic annotation of RNA-sequencing metadata. We show, in multiple experiments, that our model is better at integrating heterogeneous training data compared with existing linear regression-based approaches, resulting in improved tissue type classification. By using a model architecture similar to Siamese networks, the algorithm can learn biases from datasets with few samples.CONCLUSION: Using our novel domain adaptation approach, we achieved metadata annotation accuracies up to 15.7% better than a previously published method. Using the best model, we provide a list of >10,000 novel tissue and sex label annotations for 8,495 unique SRA samples. Our approach has the potential to revive idle datasets by automated annotation making them more searchable.

AB - BACKGROUND: Recent technological advances have resulted in an unprecedented increase in publicly available biomedical data, yet the reuse of the data is often precluded by experimental bias and a lack of annotation depth and consistency. Missing annotations makes it impossible for researchers to find datasets specific to their needs.FINDINGS: Here, we investigate RNA-sequencing metadata prediction based on gene expression values. We present a deep-learning-based domain adaptation algorithm for the automatic annotation of RNA-sequencing metadata. We show, in multiple experiments, that our model is better at integrating heterogeneous training data compared with existing linear regression-based approaches, resulting in improved tissue type classification. By using a model architecture similar to Siamese networks, the algorithm can learn biases from datasets with few samples.CONCLUSION: Using our novel domain adaptation approach, we achieved metadata annotation accuracies up to 15.7% better than a previously published method. Using the best model, we provide a list of >10,000 novel tissue and sex label annotations for 8,495 unique SRA samples. Our approach has the potential to revive idle datasets by automated annotation making them more searchable.

U2 - 10.1093/gigascience/giab064

DO - 10.1093/gigascience/giab064

M3 - SCORING: Journal article

C2 - 34553213

VL - 10

SP - giab064

JO - GIGASCIENCE

JF - GIGASCIENCE

SN - 2047-217X

IS - 9

ER -