Report time: December 9, 2016 10:00 am
Report location: Shenghua building 215 in Main Campus of CSU
Speaker: Associate Professor Chen Bolin (Northwestern Polytechnical University)
Report synopsis：Integrating multiple data sources is indispensable in improving disease gene identification. It is not only due to the fact that disease genes associated with similar genetic diseases tend to lie close with each other in various biological networks, but also due to the fact that gene-disease associations are complex. Although various algorithms have been proposed to identify disease genes, their prediction performances and the computational time still should be further improved. In this proposal, I mainly talk two high-performance multiple data integration algorithms for identifying disease-related genes. To start with, a logistic-regression based method is introduced for multiple data integration. A posterior probability of each candidate gene associated with individual diseases is calculated by using a Bayesian analysis method and a binary logistic regression model. It is not only generated predictions with high AUC scores, but also runs very fast. However, the number of known disease-related genes is far less than the number of all unknown genes, which makes it very hard to detect novel predictions from such imbalanced training samples. To overcome this issue, a training sample re-balancing strategy is further introduced by using a two-step logistic regression and a random re-sampling method. The issue of imbalanced classification is circumvented by randomly adding positive instances related to other cancers at first, and then excluding those unrelated predictions according to the overall performance at the following step. Numerical experiments show that the proposed methods is able to identify disease-related genes with a high predictive accuracy, which is very promising, compared with existing methods.
Biography: Bolin Chen received his B.Sc. degree in statistics science in 2007 and the M.Sc. degree in operational research and cybernetics in 2010. Both from Northwestern Polytechnical University, Xi’an, China. He received the Ph.D. degree in 2014 (under the supervision of Dr. Fang-Xiang Wu) in Biomedical Engineering from University of Saskatchewan (U of S), Saskatoon, Canada. He is currently an associate professor at the School of Computer Science, Northwestern Polytechnical University, Xi’an, China.
His research interests include computational and systems biology and proteomic data analysis, disease gene identification, deep learning, etc. He has published about 20 technical papers in refereed journals and conference proceedings, such asBriefings in bioinformatics, Proteomics, BMC Medical Genomics, IEEE Transactions on Nanobiosciences, SCIENCE CHINA – life science, etc.