The first article of this series explains how to process free-text entries in survey comments so that a law firm or law department can use the tools of natural language processing (NLP). The second article explores how to gain insights about written comments in surveys from word clouds, associations, sentiment, and bigrams. In this third article, we focus on a popular tool to ferret out elusive themes in text comments, referred to as topic modeling.
Latent Dirichlet Allocation (LDA) is an unsupervised machine learning algorithm for topic modeling, meaning identifying sets of words that characterize topics that might not be obvious—they are “latent.” The algorithm uses a probability distribution, the Dirichlet distribution to allocate comments to topics and words to topics: thus, Latent Dirichlet Allocation. Under the hood, LDA unleashes complex mathematics (truncated singular-value decomposition, if you insist), which uses linear algebra extensively. It saves a law firm, law department or legal vendor from coding written responses by hand, a laborious undertaking that introduces bias and often overlooks aspects of the text as someone combs through it.