In 2003, in an issue of the Literary and Linguistic Computing journal, humanities computing scholar Geoffrey Rockwell asked the question, “What is text analysis, really?” More than ten years later, some Digital Humanities are still asking the same question, especially as technological advances lead to the creation of new text analysis tools and methods. In its most basic form, text analysis – which is also known as text data mining or, simply, text mining – is the search for and discovery of patterns and trends in a corpus of texts. The analysis of those patterns and trends can help researchers uncover previously unseen characteristics of a specific corpus, deconstruct a text, and reveal new ideas and theories about a particular genre or author. The following annotated bibliography offers an overview of text mining tools in Digital Humanities, with the intention that it may serve as a starting point for further exploration into text analysis.
Argamon, Shlomo and Mark Olsen. “Words, Patterns and Documents: Experiments in Machine Learning and Text Analysis.” Digital Humanities Quarterly. 3.2 (2009). Web. 15 November 2014.
In Argamon and Olsen’s article, they suggest that the rapid digitization of texts requires new kinds of text analysis tools, because the current tools may not scale effectively to large corpora and do not adequately leverage the capability of machines to recognize patterns. To test this idea, Argamon and Olsen, through the ARTFL Project, developed PhiloMine, a set of text analysis tools that extent PhiloLogic, the authors’ full-text search and analysis system. Argamon and Olsen provide an overview of PhiloMine’s tasks (predictive text mining, comparative text mining and clustering analysis), and then summarize three research papers that highlight the tasks’ strengths and weaknesses.
Borovsky, Zoe. “Text and Network Analysis Tools and Visualization.” NEH Summer Institute for Advanced Topics in Digital Humanities. Los Angeles, 22 June 2012. Presentation. Web. 15 November 2014.
This presentation by Borovsky, the Librarian for Digital Research and Scholarship at UCLA, provides an overview of text mining tools, with an in-depth look at a few specific tools: Gephi, Many Eyes, Voyant and Word Smith. Borovsky highlights some of the benefits and challenges of each tool, and offers examples of sample outcomes. Though the slides are presented without the addition of a transcript of Borovsky’s presentation speech, the slides themselves a high-level overview of these four specific text mining tools and Borovsky’s template easily allows readers to discover relevant information about each tool.
Green, Harriett. “Under the Workbench: An analysis of the use and preservation of MONK text mining research software.” Literary and Linguistic Computing. 29.1 (2014): 23-40. Web. 15 November 2014.
To help further humanities scholars’ understanding of how to use text mining tools, Green conducted an analysis of the web-based text mining software MONK (Metadata Opens New Knowledge). Green studied a random sample of 18 months of analytics data from the MONK website and conducted interviews with MONK users to understand the purpose of the tool, it’s usability and the challenges encountered. Along with other findings, Green discovered that MONK is often used as a teaching tutorial and that it often provides an entry point for students and researchers learning about text analysis.
Muralidharan, Aditi and Marti A. Hearst. “Supporting exploratory text analysis in literature study.” Literary and Linguistic Computing. 28.2 (2013): 283-295. Web. 15 November 2014.
According to Muralidharan and Hearst, the majority of text analysis tools have focused on aiding interpretation, but there haven’t been many (if any) tools devoted to finding and revealing insights not previously known to the researcher. So Muralidharan and Hearst created WordSeer, a text analysis tool designed for literary texts and literary research questions. To illustrate the functionality of WordSeer, Muralidharan and Hearst used this text analysis tool to examine the differences in language between male and female characters in Shakespeare’s plays.
Ramsay, Stephen. “In Praise of Pattern.” Faculty Publications – Department of English. Digital Commons @ University of Nebraska-Lincoln: 2005. Web. 15 November 2014.
Ramsay sets out to explore the idea of pattern as a point of Intersection between computational text analysis and the “interpretive landscape of literary studies.” Ramsay wanted to prove that there could be a computational tool that offered interpretive insight and not specific facts or results. So he set out to create StageGraph, a tool designed ostensibly to study structural properties in Shakespeare’s plays, but one also stemming from a branch of mathematics known as graph theory.
Rockwell, Geoffrey. “TAPoR: Building a Portal for Text Analysis.” Mind Technologies: Humanities Computing and the Canadian Academic Community. Ed. Ray Siemens and David Moorman. University of Calgary Press: 2005. 285-299. Print.
In this chapter, Rockwell introduces readers to the TAPoR – the Text Analysis Portal for Research. The TAPoR project began as a collaboration of researchers and projects and eventually proposed a network of labs and servers that would connect and aggregate the best text analysis tools, making them available to the larger academic community. Rockwell then explores TAPoR in more detail, offers an overview of the portal’s specific functions, and discusses the types of users the project envisions will use the tools available through the portal.
—. “What is Text Analysis, Really?” Literary and Linguistic Computing. 18.2 (2003): 209-219. Web. 15 November 2014.
In this article, Rockwell argues that text analysis becomes, in effect, an interpretive aid because it creates new hybrid versions of a text by deconstructing and reconstructing some original text. As a result, Rockwell stresses the need for new kinds of text analysis tools that emphasize experimentation over hypothesis testing. He concludes the paper with a proposal for a portal model for text analysis tools, using his own TAPoR as an example.
Simpson, John, Geoffrey Rockwell, Ryan Chartier, Stéfan Sinclair, Susan Brown, Amy Dyrbye, and Kirsten Uszkalo. “Text Mining Tools in the Humanities.” Journal of Digital Humanities. 2.3 (2013). Web. 15 November 2014.
Derived from an oral presentation at a research conference, Simpson et al.’s brief article and accompanying poster presents the testing framework developed for the TAPoR text mining tool. The TAPoR testing framework was then used as a proposal for the creation of a systematic approach to testing and reviewing humanities research tools, especially text mining tools.
“Text Mining.” DiRT Digital Research Tools. n.p., n.d. Web. 15 November 2014.
The DiRT directory compiles information about digital research tools for scholarly and academic use. The directory is divided into several categories, with one category devoted to text mining tools. Users can narrow the category by platform (operating system), cost, whether or not the tool is open sourced and more. Each individual entry includes a description of the tool as well as a link to the tool itself or its developer’s website. While the DiRT directory is an invaluable resource of text mining tools, one drawback is that the tools themselves are not rated in any way, either by the directory’s editorial board or by other users.
van Gemert, Jan. “Text Mining Tools on the Internet.” ISIS Technical Report Series. The University of Amsterdam: 2000. Web. 15 November 2014.
van Gemert’s report is a thorough and comprehensive overview of text mining tools available on the Internet, though as it was published in 2000, it is now out-of-date. Still, this report offers a great deal of information both about specific text mining tools and the companies behind their creation. Van Gemert includes website links, summaries and information about available trial versions for each tool.
[Image note: text cloud created from the content of this post using Tagul, an online word cloud creator.]