R/topic_coherence.R
topic_coherence.Rd
Using the the N highest probability tokens for each topic, calculate the topic coherence for each topic
topic_coherence(topic_model, dtm_data, top_n_tokens = 10, smoothing_beta = 1)
a fitted topic model object from one of the following:
tm-class
a document-term matrix of token counts coercible to simple_triplet_matrix
an integer indicating the number of top words to consider, the default is 10
a numeric indicating the value to use to smooth the document frequencies in order avoid log zero issues, the default is 1
A vector of topic coherence scores with length equal to the number of topics in the fitted model
Mimno, D., Wallach, H. M., Talley, E., Leenders, M., & McCallum, A. (2011, July). "Optimizing semantic coherence in topic models." In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 262-272). Association for Computational Linguistics. Chicago McCallum, Andrew Kachites. "MALLET: A Machine Learning for Language Toolkit." https://mallet.cs.umass.edu 2002.