Determining the Number of Latent Factors in Statistical Multi-Relational Learning.

TitleDetermining the Number of Latent Factors in Statistical Multi-Relational Learning.
Publication TypeJournal Article
Year of Publication2019
AuthorsShi, Chengchun, Wenbin Lu, and Rui Song
JournalJ Mach Learn Res
Date Published2019

Statistical relational learning is primarily concerned with learning and inferring relationships between entities in large-scale knowledge graphs. Nickel et al. (2011) proposed a RESCAL tensor factorization model for statistical relational learning, which achieves better or at least comparable results on common benchmark data sets when compared to other state-of-the-art methods. Given a positive integer , RESCAL computes an -dimensional latent vector for each entity. The latent factors can be further used for solving relational learning tasks, such as collective classification, collective entity resolution and link-based clustering. The focus of this paper is to determine the number of latent factors in the RESCAL model. Due to the structure of the RESCAL model, its log-likelihood function is not concave. As a result, the corresponding maximum likelihood estimators (MLEs) may not be consistent. Nonetheless, we design a specific pseudometric, prove the consistency of the MLEs under this pseudometric and establish its rate of convergence. Based on these results, we propose a general class of information criteria and prove their model selection consistencies when the number of relations is either bounded or diverges at a proper rate of the number of entities. Simulations and real data examples show that our proposed information criteria have good finite sample properties.

Alternate JournalJ Mach Learn Res
Original PublicationDetermining the number of latent factors in statistical multi-relational learning.
PubMed ID31983896
PubMed Central IDPMC6980192
Grant ListP01 CA142538 / CA / NCI NIH HHS / United States