Generate a dataframe containing the diagnostics for each topic in a topic model
top_n_tokens = 10,
method = c("gamma_threshold", "largest_gamma"),
gamma_threshold = 0.2
a fitted topic model object from one of the following:
a document-term matrix of token counts coercible to slam_triplet_matrix
where each row is a document, each column is a token,
and each entry is the frequency of the token in a given document
an integer indicating the number of top words to consider for mean token length
a string indicating which method to use - "gamma_threshold" or "largest_gamma"
a number between 0 and 1 indicating the gamma threshold to be used when using the gamma threshold method, the default is 0.2
A dataframe where each row is a topic and each column contains the associated diagnostic values
Jordan Boyd-Graber, David Mimno, and David Newman, 2014. Care and Feeding of Topic Models: Problems, Diagnostics, and Improvements. CRC Handbooks ofModern Statistical Methods. CRC Press, Boca Raton, Florida.
# Using the example from the LDA function
data("AssociatedPress", package = "topicmodels")
lda <- LDA(AssociatedPress[1:20,], control = list(alpha = 0.1), k = 2)
topic_diagnostics(lda, AssociatedPress[1:20,])
#> topic_num topic_size mean_token_length dist_from_corpus tf_df_dist
#> 1 1 5327.414 5.8 0.4044253 5.450792
#> 2 2 5145.586 4.9 0.5072421 4.777808
#> doc_prominence topic_coherence topic_exclusivity
#> 1 11 -29.42471 9.734567
#> 2 9 -29.32065 9.777371