Bias-invariant RNA-sequencing metadata annotation

Hannes Wartmann; Sven Heins; Karin Kloiber; Stefan Bonn

doi:10.1093/gigascience/giab064

Bias-invariant RNA-sequencing metadata annotation

Standard

Bias-invariant RNA-sequencing metadata annotation. / Wartmann, Hannes; Heins, Sven; Kloiber, Karin; Bonn, Stefan.

In: GIGASCIENCE, Vol. 10, No. 9, 22.09.2021, p. giab064.

Research output: SCORING: Contribution to journal › SCORING: Journal article › Research › peer-review

Harvard

Wartmann, H, Heins, S, Kloiber, K & Bonn, S 2021, 'Bias-invariant RNA-sequencing metadata annotation', GIGASCIENCE, vol. 10, no. 9, pp. giab064. https://doi.org/10.1093/gigascience/giab064

APA

Wartmann, H., Heins, S., Kloiber, K., & Bonn, S. (2021). Bias-invariant RNA-sequencing metadata annotation. GIGASCIENCE, 10(9), giab064. https://doi.org/10.1093/gigascience/giab064

Vancouver

Wartmann H, Heins S, Kloiber K, Bonn S. Bias-invariant RNA-sequencing metadata annotation. GIGASCIENCE. 2021 Sep 22;10(9):giab064. https://doi.org/10.1093/gigascience/giab064

Bibtex

@article{3156b4cc308e4241be475ca502daa468,

title = "Bias-invariant RNA-sequencing metadata annotation",

abstract = "BACKGROUND: Recent technological advances have resulted in an unprecedented increase in publicly available biomedical data, yet the reuse of the data is often precluded by experimental bias and a lack of annotation depth and consistency. Missing annotations makes it impossible for researchers to find datasets specific to their needs.FINDINGS: Here, we investigate RNA-sequencing metadata prediction based on gene expression values. We present a deep-learning-based domain adaptation algorithm for the automatic annotation of RNA-sequencing metadata. We show, in multiple experiments, that our model is better at integrating heterogeneous training data compared with existing linear regression-based approaches, resulting in improved tissue type classification. By using a model architecture similar to Siamese networks, the algorithm can learn biases from datasets with few samples.CONCLUSION: Using our novel domain adaptation approach, we achieved metadata annotation accuracies up to 15.7% better than a previously published method. Using the best model, we provide a list of >10,000 novel tissue and sex label annotations for 8,495 unique SRA samples. Our approach has the potential to revive idle datasets by automated annotation making them more searchable.",

author = "Hannes Wartmann and Sven Heins and Karin Kloiber and Stefan Bonn",

note = "{\textcopyright} The Author(s) 2021. Published by Oxford University Press GigaScience.",

year = "2021",

month = sep,

day = "22",

doi = "10.1093/gigascience/giab064",

language = "English",

volume = "10",

pages = "giab064",

journal = "GIGASCIENCE",

issn = "2047-217X",

publisher = "BioMed Central Ltd.",

number = "9",

}

RIS

TY - JOUR

T1 - Bias-invariant RNA-sequencing metadata annotation

AU - Wartmann, Hannes

AU - Heins, Sven

AU - Kloiber, Karin

AU - Bonn, Stefan

PY - 2021/9/22

Y1 - 2021/9/22

N2 - BACKGROUND: Recent technological advances have resulted in an unprecedented increase in publicly available biomedical data, yet the reuse of the data is often precluded by experimental bias and a lack of annotation depth and consistency. Missing annotations makes it impossible for researchers to find datasets specific to their needs.FINDINGS: Here, we investigate RNA-sequencing metadata prediction based on gene expression values. We present a deep-learning-based domain adaptation algorithm for the automatic annotation of RNA-sequencing metadata. We show, in multiple experiments, that our model is better at integrating heterogeneous training data compared with existing linear regression-based approaches, resulting in improved tissue type classification. By using a model architecture similar to Siamese networks, the algorithm can learn biases from datasets with few samples.CONCLUSION: Using our novel domain adaptation approach, we achieved metadata annotation accuracies up to 15.7% better than a previously published method. Using the best model, we provide a list of >10,000 novel tissue and sex label annotations for 8,495 unique SRA samples. Our approach has the potential to revive idle datasets by automated annotation making them more searchable.

AB - BACKGROUND: Recent technological advances have resulted in an unprecedented increase in publicly available biomedical data, yet the reuse of the data is often precluded by experimental bias and a lack of annotation depth and consistency. Missing annotations makes it impossible for researchers to find datasets specific to their needs.FINDINGS: Here, we investigate RNA-sequencing metadata prediction based on gene expression values. We present a deep-learning-based domain adaptation algorithm for the automatic annotation of RNA-sequencing metadata. We show, in multiple experiments, that our model is better at integrating heterogeneous training data compared with existing linear regression-based approaches, resulting in improved tissue type classification. By using a model architecture similar to Siamese networks, the algorithm can learn biases from datasets with few samples.CONCLUSION: Using our novel domain adaptation approach, we achieved metadata annotation accuracies up to 15.7% better than a previously published method. Using the best model, we provide a list of >10,000 novel tissue and sex label annotations for 8,495 unique SRA samples. Our approach has the potential to revive idle datasets by automated annotation making them more searchable.

U2 - 10.1093/gigascience/giab064

DO - 10.1093/gigascience/giab064

M3 - SCORING: Journal article

C2 - 34553213

VL - 10

SP - giab064

JO - GIGASCIENCE

JF - GIGASCIENCE

SN - 2047-217X

IS - 9

ER -