Inferring Disease Associations of the Long non-coding RNAs through Non-negative Matrix Factorization

Ashis Kumer Biswas, Mingon Kang, Dong-Chul Kim, Chris H.Q. Ding, Baoju Zhang, Xiaoyong Wu and Jean X. Gao


Long non-coding RNAs (lncRNAs) have been implicated in various biological processes, and are linked in many dysregulations. Over the past decade, researchers reported a large number of human disease associations with the lncRNAs, both intergenic lncRNAs (lincRNAs) and non-intergenic lncRNAs. Thanks to the next generating sequencing platform, RNA-seq, for which researchers also were able to quantify expression profiles of each of the lncRNAs in human tissue samples. In this article we adapted the Non-negative Matrix Factorization method to develop a low-rank computational model that can describe the existing knowledge about both non-intergenic and intergenic lncRNA -disease associations represented in a two dimensional association matrix as well as convey a way of ranking disease causing lncRNAs. We proposed several NMF formulations for the problem and we found that the sparsity constrained NMF obtained the best model among all the other models. By exploiting the inherent bi-clustering ability of the NMF models, we extracted several lncRNA groups and disease groups that possess biological significance. Moreover, we proposed an integrative NMF formulation where we incorporated along with the coding gene and lincRNA disease association data, prior knowledge about relationship networks among the coding genes and lincRNAs, and the RNA-seq expression profile data to identify potential lincRNA-coding gene co-modules with which we further enhanced the lincRNA-disease associations and untangle mysteries about functional chemistry of the intergenic lncRNAs. Experimental results show the superiority of our proposed method over two state-of-the-art clustering algorithms -- k-means and hierarchical clustering.


Inferring disease associations of the long non-coding RNAs through non-negative matrix factorization
Ashis Kumer Biswas, Mingon Kang, Dong-Chul Kim, Chris H. Q. Ding, Baoju Zhang, Xiaoyong Wu and Jean X. Gao
Network Modeling Analysis in Health Informatics and Bioinformatics (NetMAHIB) 4:1, 2015. [DOI][bib]


Source files

The source codes are released under the MIT License, but please do acknowledge its use with a citation to the publication.

Data files (separately)