R/dist_from_corpus.R
dist_from_corpus.Rd
The Hellinger distance between the token probabilities or betas for each topic and the overall probability for the word in the corpus is calculated.
dist_from_corpus(topic_model, dtm_data)
a fitted topic model object from one of the following:
tm-class
a document-term matrix of token counts coercible to simple_triplet_matrix
A vector of distances with length equal to the number of topics in the fitted model
Jordan Boyd-Graber, David Mimno, and David Newman, 2014. Care and Feeding of Topic Models: Problems, Diagnostics, and Improvements. CRC Handbooks ofModern Statistical Methods. CRC Press, Boca Raton, Florida.