MALLET

Download Texts

Zip file contains each sermon in two formats, plain text and TEI/XML.

Topic Analysis With MALLET

MALLET is an open-source toolkit that requires minimal setup, but its output can appear rather opaque. For those interested in trying it out, the Programming Historian 2 provides a useful guide on how to get started, and a discussion of its usage is featured on the DiSC blog. Basically, MALLET shatters a group of texts into their individual words and then uses probability statistics to identify groups of words that frequently occur together. These groups of co-occuring words might indicate the presence of a theme or topic across the corpus. Its primary output includes lists of the top words for each topic group and a table that specifies the topical percentages in each document.

MALLET is best suited to analyzing huge sets of documents—situations in which it is not impossible for an individual to read through all the texts. Working extensively with MALLET data may not be advisable for smaller collections when time could be better spent reading the texts. As the table below demonstrates, the word groups do hint at some generic themes but tell us little about what the speakers actually thought of Lincoln.

It is important to be cautious in extracting meaning from the data. MALLET does not take into account the meanings of words, so it requires us to interpret what the groupings might indicate. Moreover, the probability-based model does not generate the exact same results from run to run.

To test how the MALLET data might relate to the themes of ‘slavery’ and ‘peace’ explored in the Voyant analysis, we looked at which documents had the highest compositional percentages of topics related to these words across three different runs of MALLET. Documents occurring more than once within the two categories are bolded:

To scrutinize the results more closely, the texts that have the keywords ‘slavery’ or ‘peace’ in their main topics in more than one execution of MALLET are listed in the following table:

How might the sermons in the ‘slavery’ keyword group substantively differ from those in the ‘peace’ keyword group? Do the MALLET topics provide a meaningful subset of sermons, or are the results just too random to decipher? In the case of small collections such as this, MALLET may provide an interesting avenue into comparing documents, but the only way to find out is to dive into the fun of reading them.