Calculate the distance of each topic from the overall corpus token distribution

The Hellinger distance between the token probabilities or betas for each topic and the overall probability for the word in the corpus is calculated.

dist_from_corpus(topic_model, dtm_data)

Arguments

topic_model: a fitted topic model object from one of the following: tm-class
dtm_data: a document-term matrix of token counts coercible to simple_triplet_matrix

Value

A vector of distances with length equal to the number of topics in the fitted model

References

Jordan Boyd-Graber, David Mimno, and David Newman, 2014. Care and Feeding of Topic Models: Problems, Diagnostics, and Improvements. CRC Handbooks ofModern Statistical Methods. CRC Press, Boca Raton, Florida.

Examples


# Using the example from the LDA function
library(topicmodels)
data("AssociatedPress", package = "topicmodels")
lda <- LDA(AssociatedPress[1:20,], control = list(alpha = 0.1), k = 2)
dist_from_corpus(lda, AssociatedPress[1:20,])
#> [1] 0.4396003 0.4562296