Disease gene prioritization using network and feature
Standard
Disease gene prioritization using network and feature. / Xie, Bingqing; Agam, Gady; Balasubramanian, Sandhya; Xu, Jinbo; Gilliam, T Conrad; Maltsev, Natalia; Börnigen, Daniela.
In: J COMPUT BIOL, Vol. 22, No. 4, 04.2015, p. 313-323.Research output: SCORING: Contribution to journal › SCORING: Journal article › Research › peer-review
Harvard
APA
Vancouver
Bibtex
}
RIS
TY - JOUR
T1 - Disease gene prioritization using network and feature
AU - Xie, Bingqing
AU - Agam, Gady
AU - Balasubramanian, Sandhya
AU - Xu, Jinbo
AU - Gilliam, T Conrad
AU - Maltsev, Natalia
AU - Börnigen, Daniela
PY - 2015/4
Y1 - 2015/4
N2 - Identifying high-confidence candidate genes that are causative for disease phenotypes, from the large lists of variations produced by high-throughput genomics, can be both time-consuming and costly. The development of novel computational approaches, utilizing existing biological knowledge for the prioritization of such candidate genes, can improve the efficiency and accuracy of the biomedical data analysis. It can also reduce the cost of such studies by avoiding experimental validations of irrelevant candidates. In this study, we address this challenge by proposing a novel gene prioritization approach that ranks promising candidate genes that are likely to be involved in a disease or phenotype under study. This algorithm is based on the modified conditional random field (CRF) model that simultaneously makes use of both gene annotations and gene interactions, while preserving their original representation. We validated our approach on two independent disease benchmark studies by ranking candidate genes using network and feature information. Our results showed both high area under the curve (AUC) value (0.86), and more importantly high partial AUC (pAUC) value (0.1296), and revealed higher accuracy and precision at the top predictions as compared with other well-performed gene prioritization tools, such as Endeavour (AUC-0.82, pAUC-0.083) and PINTA (AUC-0.76, pAUC-0.066). We were able to detect more target genes (9/18/19/27) on top positions (1/5/10/20) compared to Endeavour (3/11/14/23) and PINTA (6/10/13/18). To demonstrate its usability, we applied our method to a case study for the prediction of molecular mechanisms contributing to intellectual disability and autism. Our approach was able to correctly recover genes related to both disorders and provide suggestions for possible additional candidates based on their rankings and functional annotations.
AB - Identifying high-confidence candidate genes that are causative for disease phenotypes, from the large lists of variations produced by high-throughput genomics, can be both time-consuming and costly. The development of novel computational approaches, utilizing existing biological knowledge for the prioritization of such candidate genes, can improve the efficiency and accuracy of the biomedical data analysis. It can also reduce the cost of such studies by avoiding experimental validations of irrelevant candidates. In this study, we address this challenge by proposing a novel gene prioritization approach that ranks promising candidate genes that are likely to be involved in a disease or phenotype under study. This algorithm is based on the modified conditional random field (CRF) model that simultaneously makes use of both gene annotations and gene interactions, while preserving their original representation. We validated our approach on two independent disease benchmark studies by ranking candidate genes using network and feature information. Our results showed both high area under the curve (AUC) value (0.86), and more importantly high partial AUC (pAUC) value (0.1296), and revealed higher accuracy and precision at the top predictions as compared with other well-performed gene prioritization tools, such as Endeavour (AUC-0.82, pAUC-0.083) and PINTA (AUC-0.76, pAUC-0.066). We were able to detect more target genes (9/18/19/27) on top positions (1/5/10/20) compared to Endeavour (3/11/14/23) and PINTA (6/10/13/18). To demonstrate its usability, we applied our method to a case study for the prediction of molecular mechanisms contributing to intellectual disability and autism. Our approach was able to correctly recover genes related to both disorders and provide suggestions for possible additional candidates based on their rankings and functional annotations.
KW - Area Under Curve
KW - Autism Spectrum Disorder
KW - Gene Regulatory Networks
KW - Genetic Association Studies
KW - Genetic Predisposition to Disease
KW - Humans
KW - Intellectual Disability
KW - Models, Genetic
KW - Molecular Sequence Annotation
KW - Phenotype
KW - ROC Curve
KW - Journal Article
U2 - 10.1089/cmb.2015.0001
DO - 10.1089/cmb.2015.0001
M3 - SCORING: Journal article
C2 - 25844670
VL - 22
SP - 313
EP - 323
JO - J COMPUT BIOL
JF - J COMPUT BIOL
SN - 1066-5277
IS - 4
ER -