Disease gene prioritization using network and feature

Standard

Disease gene prioritization using network and feature. / Xie, Bingqing; Agam, Gady; Balasubramanian, Sandhya; Xu, Jinbo; Gilliam, T Conrad; Maltsev, Natalia; Börnigen, Daniela.

In: J COMPUT BIOL, Vol. 22, No. 4, 04.2015, p. 313-323.

Research output: SCORING: Contribution to journalSCORING: Journal articleResearchpeer-review

Harvard

Xie, B, Agam, G, Balasubramanian, S, Xu, J, Gilliam, TC, Maltsev, N & Börnigen, D 2015, 'Disease gene prioritization using network and feature', J COMPUT BIOL, vol. 22, no. 4, pp. 313-323. https://doi.org/10.1089/cmb.2015.0001

APA

Xie, B., Agam, G., Balasubramanian, S., Xu, J., Gilliam, T. C., Maltsev, N., & Börnigen, D. (2015). Disease gene prioritization using network and feature. J COMPUT BIOL, 22(4), 313-323. https://doi.org/10.1089/cmb.2015.0001

Vancouver

Xie B, Agam G, Balasubramanian S, Xu J, Gilliam TC, Maltsev N et al. Disease gene prioritization using network and feature. J COMPUT BIOL. 2015 Apr;22(4):313-323. https://doi.org/10.1089/cmb.2015.0001

Bibtex

@article{658c9b42ad6749b4956a8c35219d7a39,
title = "Disease gene prioritization using network and feature",
abstract = "Identifying high-confidence candidate genes that are causative for disease phenotypes, from the large lists of variations produced by high-throughput genomics, can be both time-consuming and costly. The development of novel computational approaches, utilizing existing biological knowledge for the prioritization of such candidate genes, can improve the efficiency and accuracy of the biomedical data analysis. It can also reduce the cost of such studies by avoiding experimental validations of irrelevant candidates. In this study, we address this challenge by proposing a novel gene prioritization approach that ranks promising candidate genes that are likely to be involved in a disease or phenotype under study. This algorithm is based on the modified conditional random field (CRF) model that simultaneously makes use of both gene annotations and gene interactions, while preserving their original representation. We validated our approach on two independent disease benchmark studies by ranking candidate genes using network and feature information. Our results showed both high area under the curve (AUC) value (0.86), and more importantly high partial AUC (pAUC) value (0.1296), and revealed higher accuracy and precision at the top predictions as compared with other well-performed gene prioritization tools, such as Endeavour (AUC-0.82, pAUC-0.083) and PINTA (AUC-0.76, pAUC-0.066). We were able to detect more target genes (9/18/19/27) on top positions (1/5/10/20) compared to Endeavour (3/11/14/23) and PINTA (6/10/13/18). To demonstrate its usability, we applied our method to a case study for the prediction of molecular mechanisms contributing to intellectual disability and autism. Our approach was able to correctly recover genes related to both disorders and provide suggestions for possible additional candidates based on their rankings and functional annotations.",
keywords = "Area Under Curve, Autism Spectrum Disorder, Gene Regulatory Networks, Genetic Association Studies, Genetic Predisposition to Disease, Humans, Intellectual Disability, Models, Genetic, Molecular Sequence Annotation, Phenotype, ROC Curve, Journal Article",
author = "Bingqing Xie and Gady Agam and Sandhya Balasubramanian and Jinbo Xu and Gilliam, {T Conrad} and Natalia Maltsev and Daniela B{\"o}rnigen",
year = "2015",
month = apr,
doi = "10.1089/cmb.2015.0001",
language = "English",
volume = "22",
pages = "313--323",
journal = "J COMPUT BIOL",
issn = "1066-5277",
publisher = "Mary Ann Liebert Inc.",
number = "4",

}

RIS

TY - JOUR

T1 - Disease gene prioritization using network and feature

AU - Xie, Bingqing

AU - Agam, Gady

AU - Balasubramanian, Sandhya

AU - Xu, Jinbo

AU - Gilliam, T Conrad

AU - Maltsev, Natalia

AU - Börnigen, Daniela

PY - 2015/4

Y1 - 2015/4

N2 - Identifying high-confidence candidate genes that are causative for disease phenotypes, from the large lists of variations produced by high-throughput genomics, can be both time-consuming and costly. The development of novel computational approaches, utilizing existing biological knowledge for the prioritization of such candidate genes, can improve the efficiency and accuracy of the biomedical data analysis. It can also reduce the cost of such studies by avoiding experimental validations of irrelevant candidates. In this study, we address this challenge by proposing a novel gene prioritization approach that ranks promising candidate genes that are likely to be involved in a disease or phenotype under study. This algorithm is based on the modified conditional random field (CRF) model that simultaneously makes use of both gene annotations and gene interactions, while preserving their original representation. We validated our approach on two independent disease benchmark studies by ranking candidate genes using network and feature information. Our results showed both high area under the curve (AUC) value (0.86), and more importantly high partial AUC (pAUC) value (0.1296), and revealed higher accuracy and precision at the top predictions as compared with other well-performed gene prioritization tools, such as Endeavour (AUC-0.82, pAUC-0.083) and PINTA (AUC-0.76, pAUC-0.066). We were able to detect more target genes (9/18/19/27) on top positions (1/5/10/20) compared to Endeavour (3/11/14/23) and PINTA (6/10/13/18). To demonstrate its usability, we applied our method to a case study for the prediction of molecular mechanisms contributing to intellectual disability and autism. Our approach was able to correctly recover genes related to both disorders and provide suggestions for possible additional candidates based on their rankings and functional annotations.

AB - Identifying high-confidence candidate genes that are causative for disease phenotypes, from the large lists of variations produced by high-throughput genomics, can be both time-consuming and costly. The development of novel computational approaches, utilizing existing biological knowledge for the prioritization of such candidate genes, can improve the efficiency and accuracy of the biomedical data analysis. It can also reduce the cost of such studies by avoiding experimental validations of irrelevant candidates. In this study, we address this challenge by proposing a novel gene prioritization approach that ranks promising candidate genes that are likely to be involved in a disease or phenotype under study. This algorithm is based on the modified conditional random field (CRF) model that simultaneously makes use of both gene annotations and gene interactions, while preserving their original representation. We validated our approach on two independent disease benchmark studies by ranking candidate genes using network and feature information. Our results showed both high area under the curve (AUC) value (0.86), and more importantly high partial AUC (pAUC) value (0.1296), and revealed higher accuracy and precision at the top predictions as compared with other well-performed gene prioritization tools, such as Endeavour (AUC-0.82, pAUC-0.083) and PINTA (AUC-0.76, pAUC-0.066). We were able to detect more target genes (9/18/19/27) on top positions (1/5/10/20) compared to Endeavour (3/11/14/23) and PINTA (6/10/13/18). To demonstrate its usability, we applied our method to a case study for the prediction of molecular mechanisms contributing to intellectual disability and autism. Our approach was able to correctly recover genes related to both disorders and provide suggestions for possible additional candidates based on their rankings and functional annotations.

KW - Area Under Curve

KW - Autism Spectrum Disorder

KW - Gene Regulatory Networks

KW - Genetic Association Studies

KW - Genetic Predisposition to Disease

KW - Humans

KW - Intellectual Disability

KW - Models, Genetic

KW - Molecular Sequence Annotation

KW - Phenotype

KW - ROC Curve

KW - Journal Article

U2 - 10.1089/cmb.2015.0001

DO - 10.1089/cmb.2015.0001

M3 - SCORING: Journal article

C2 - 25844670

VL - 22

SP - 313

EP - 323

JO - J COMPUT BIOL

JF - J COMPUT BIOL

SN - 1066-5277

IS - 4

ER -