A Digital Education

Meredith Dabek, Maynooth University

Category: text analysis

Take Two: Literature and DH

Recently, two intriguing articles from well-respected Digital Humanities scholars came through in my feed reader, and as they align quite nicely with my own interests in the intersection of technology and literature, I thought I’d share them here.

What is an @uthor? by Matthew Kirschenbaum

Writing for the LA Review of Books, Kirschenbaum (perhaps best known for his article “What is Digital Humanities and What’s It Doing in English Departments?”), explores how the evolving landscape of social media and author engagement with audiences online is changing the nature of literary criticism and the very idea of authorship itself:

Today you cannot write seriously about contemporary literature without taking into account myriad channels and venues for online exchange. That in and of itself may seem uncontroversial, but I submit we have not yet fully grasped all of the ramifications. We might start by examining the extent to which social media and writers’ online presences or platforms are reinscribing the authority of authorship. The mere profusion of images of the celebrity author visually cohabitating the same embodied space as us, the abundance of first-person audio/visual documentation, the pressure on authors to self-mediate and self-promote their work through their individual online identities, and the impact of the kind of online interactions described above (those Woody Allenesque “wobbles”) have all changed the nature of authorial presence. Authorship, in short, has become a kind of media, algorithmically tractable and traceable and disseminated and distributed across the same networks and infrastructure carrying other kinds of previously differentiated cultural production.


There are Only Six Basic Book Plots 

In an article for Motherboard, contributing editor Ben Richmond interviewed Matthew Jockers (textual analysis proponent and author of Macroanalysis) about his algorithmic model that identifies archetypal plot shapes. According to his research, about 90% of the time, results showed six basic plots (with the remaining 10% indicating seven basic plots). While some of his data remains unknown, Jockers did release his tools on GitHub to encourage others to try the same experiment for themselves:

Most books that measure the number of plots seem aimed at writers and would-be writers, but Jockers’s work has implications for readers, librarians, and even literature snobs, or anyone who wants to put snobs in their places.

As he was charting plots, Jockers noticed that some genres that are derided for being “formulaic,” like romance, aren’t just relying on boy-meets-girl.

“Romance showed some proclivity for two of the six plot shapes, but it wasn’t an overwhelming case of all the plots falling into one,” Jockers said. “It was a much more evenly distributed from these six shapes.”

Text Mining: An Annotated Bibliography

Text Cloud of Text MiningIn 2003, in an issue of the Literary and Linguistic Computing journal, humanities computing scholar Geoffrey Rockwell asked the question, “What is text analysis, really?” More than ten years later, some Digital Humanities are still asking the same question, especially as technological advances lead to the creation of new text analysis tools and methods. In its most basic form, text analysis – which is also known as text data mining or, simply, text mining – is the search for and discovery of patterns and trends in a corpus of texts. The analysis of those patterns and trends can help researchers uncover previously unseen characteristics of a specific corpus, deconstruct a text, and reveal new ideas and theories about a particular genre or author. The following annotated bibliography offers an overview of text mining tools in Digital Humanities, with the intention that it may serve as a starting point for further exploration into text analysis.

Argamon, Shlomo and Mark Olsen. “Words, Patterns and Documents: Experiments in Machine Learning and Text Analysis.Digital Humanities Quarterly. 3.2 (2009). Web. 15 November 2014.

In Argamon and Olsen’s article, they suggest that the rapid digitization of texts requires new kinds of text analysis tools, because the current tools may not scale effectively to large corpora and do not adequately leverage the capability of machines to recognize patterns. To test this idea, Argamon and Olsen, through the ARTFL Project, developed PhiloMine, a set of text analysis tools that extent PhiloLogic, the authors’ full-text search and analysis system. Argamon and Olsen provide an overview of PhiloMine’s tasks (predictive text mining, comparative text mining and clustering analysis), and then summarize three research papers that highlight the tasks’ strengths and weaknesses.

Borovsky, Zoe. “Text and Network Analysis Tools and Visualization.” NEH Summer Institute for Advanced Topics in Digital Humanities. Los Angeles, 22 June 2012. Presentation. Web. 15 November 2014.

This presentation by Borovsky, the Librarian for Digital Research and Scholarship at UCLA, provides an overview of text mining tools, with an in-depth look at a few specific tools: Gephi, Many Eyes, Voyant and Word Smith. Borovsky highlights some of the benefits and challenges of each tool, and offers examples of sample outcomes. Though the slides are presented without the addition of a transcript of Borovsky’s presentation speech, the slides themselves a high-level overview of these four specific text mining tools and Borovsky’s template easily allows readers to discover relevant information about each tool.

Green, Harriett. “Under the Workbench: An analysis of the use and preservation of MONK text mining research software.Literary and Linguistic Computing. 29.1 (2014): 23-40. Web. 15 November 2014.

To help further humanities scholars’ understanding of how to use text mining tools, Green conducted an analysis of the web-based text mining software MONK (Metadata Opens New Knowledge). Green studied a random sample of 18 months of analytics data from the MONK website and conducted interviews with MONK users to understand the purpose of the tool, it’s usability and the challenges encountered. Along with other findings, Green discovered that MONK is often used as a teaching tutorial and that it often provides an entry point for students and researchers learning about text analysis.

Muralidharan, Aditi and Marti A. Hearst. “Supporting exploratory text analysis in literature study.Literary and Linguistic Computing. 28.2 (2013): 283-295. Web. 15 November 2014.

According to Muralidharan and Hearst, the majority of text analysis tools have focused on aiding interpretation, but there haven’t been many (if any) tools devoted to finding and revealing insights not previously known to the researcher. So Muralidharan and Hearst created WordSeer, a text analysis tool designed for literary texts and literary research questions. To illustrate the functionality of WordSeer, Muralidharan and Hearst used this text analysis tool to examine the differences in language between male and female characters in Shakespeare’s plays.

Ramsay, Stephen. “In Praise of Pattern.Faculty Publications – Department of English. Digital Commons @ University of Nebraska-Lincoln: 2005. Web. 15 November 2014.

Ramsay sets out to explore the idea of pattern as a point of Intersection between computational text analysis and the “interpretive landscape of literary studies.” Ramsay wanted to prove that there could be a computational tool that offered interpretive insight and not specific facts or results. So he set out to create StageGraph, a tool designed ostensibly to study structural properties in Shakespeare’s plays, but one also stemming from a branch of mathematics known as graph theory.

Rockwell, Geoffrey. “TAPoR: Building a Portal for Text Analysis.” Mind Technologies: Humanities Computing and the Canadian Academic Community. Ed. Ray Siemens and David Moorman. University of Calgary Press: 2005. 285-299. Print.

In this chapter, Rockwell introduces readers to the TAPoR – the Text Analysis Portal for Research. The TAPoR project began as a collaboration of researchers and projects and eventually proposed a network of labs and servers that would connect and aggregate the best text analysis tools, making them available to the larger academic community. Rockwell then explores TAPoR in more detail, offers an overview of the portal’s specific functions, and discusses the types of users the project envisions will use the tools available through the portal.

—. “What is Text Analysis, Really?Literary and Linguistic Computing. 18.2 (2003): 209-219. Web. 15 November 2014.

In this article, Rockwell argues that text analysis becomes, in effect, an interpretive aid because it creates new hybrid versions of a text by deconstructing and reconstructing some original text. As a result, Rockwell stresses the need for new kinds of text analysis tools that emphasize experimentation over hypothesis testing. He concludes the paper with a proposal for a portal model for text analysis tools, using his own TAPoR as an example.

Simpson, John, Geoffrey Rockwell, Ryan Chartier, Stéfan Sinclair, Susan Brown, Amy Dyrbye, and Kirsten Uszkalo. “Text Mining Tools in the Humanities.Journal of Digital Humanities. 2.3 (2013). Web. 15 November 2014.

Derived from an oral presentation at a research conference, Simpson et al.’s brief article and accompanying poster presents the testing framework developed for the TAPoR text mining tool. The TAPoR testing framework was then used as a proposal for the creation of a systematic approach to testing and reviewing humanities research tools, especially text mining tools.

Text Mining.DiRT Digital Research Tools. n.p., n.d. Web. 15 November 2014.

The DiRT directory compiles information about digital research tools for scholarly and academic use. The directory is divided into several categories, with one category devoted to text mining tools. Users can narrow the category by platform (operating system), cost, whether or not the tool is open sourced and more. Each individual entry includes a description of the tool as well as a link to the tool itself or its developer’s website. While the DiRT directory is an invaluable resource of text mining tools, one drawback is that the tools themselves are not rated in any way, either by the directory’s editorial board or by other users.

van Gemert, Jan. “Text Mining Tools on the Internet.ISIS Technical Report Series. The University of Amsterdam: 2000. Web. 15 November 2014.

van Gemert’s report is a thorough and comprehensive overview of text mining tools available on the Internet, though as it was published in 2000, it is now out-of-date. Still, this report offers a great deal of information both about specific text mining tools and the companies behind their creation. Van Gemert includes website links, summaries and information about available trial versions for each tool.

[Image note: text cloud created from the content of this post using Tagul, an online word cloud creator.]

© 2017 A Digital Education

Theme by Anders NorenUp ↑