A Digital Education

Meredith Dabek, Maynooth University

Category: AFF601 – Theory & Practice

Text Mining: An Annotated Bibliography

Text Cloud of Text MiningIn 2003, in an issue of the Literary and Linguistic Computing journal, humanities computing scholar Geoffrey Rockwell asked the question, “What is text analysis, really?” More than ten years later, some Digital Humanities are still asking the same question, especially as technological advances lead to the creation of new text analysis tools and methods. In its most basic form, text analysis – which is also known as text data mining or, simply, text mining – is the search for and discovery of patterns and trends in a corpus of texts. The analysis of those patterns and trends can help researchers uncover previously unseen characteristics of a specific corpus, deconstruct a text, and reveal new ideas and theories about a particular genre or author. The following annotated bibliography offers an overview of text mining tools in Digital Humanities, with the intention that it may serve as a starting point for further exploration into text analysis.

Argamon, Shlomo and Mark Olsen. “Words, Patterns and Documents: Experiments in Machine Learning and Text Analysis.Digital Humanities Quarterly. 3.2 (2009). Web. 15 November 2014.

In Argamon and Olsen’s article, they suggest that the rapid digitization of texts requires new kinds of text analysis tools, because the current tools may not scale effectively to large corpora and do not adequately leverage the capability of machines to recognize patterns. To test this idea, Argamon and Olsen, through the ARTFL Project, developed PhiloMine, a set of text analysis tools that extent PhiloLogic, the authors’ full-text search and analysis system. Argamon and Olsen provide an overview of PhiloMine’s tasks (predictive text mining, comparative text mining and clustering analysis), and then summarize three research papers that highlight the tasks’ strengths and weaknesses.

Borovsky, Zoe. “Text and Network Analysis Tools and Visualization.” NEH Summer Institute for Advanced Topics in Digital Humanities. Los Angeles, 22 June 2012. Presentation. Web. 15 November 2014.

This presentation by Borovsky, the Librarian for Digital Research and Scholarship at UCLA, provides an overview of text mining tools, with an in-depth look at a few specific tools: Gephi, Many Eyes, Voyant and Word Smith. Borovsky highlights some of the benefits and challenges of each tool, and offers examples of sample outcomes. Though the slides are presented without the addition of a transcript of Borovsky’s presentation speech, the slides themselves a high-level overview of these four specific text mining tools and Borovsky’s template easily allows readers to discover relevant information about each tool.

Green, Harriett. “Under the Workbench: An analysis of the use and preservation of MONK text mining research software.Literary and Linguistic Computing. 29.1 (2014): 23-40. Web. 15 November 2014.

To help further humanities scholars’ understanding of how to use text mining tools, Green conducted an analysis of the web-based text mining software MONK (Metadata Opens New Knowledge). Green studied a random sample of 18 months of analytics data from the MONK website and conducted interviews with MONK users to understand the purpose of the tool, it’s usability and the challenges encountered. Along with other findings, Green discovered that MONK is often used as a teaching tutorial and that it often provides an entry point for students and researchers learning about text analysis.

Muralidharan, Aditi and Marti A. Hearst. “Supporting exploratory text analysis in literature study.Literary and Linguistic Computing. 28.2 (2013): 283-295. Web. 15 November 2014.

According to Muralidharan and Hearst, the majority of text analysis tools have focused on aiding interpretation, but there haven’t been many (if any) tools devoted to finding and revealing insights not previously known to the researcher. So Muralidharan and Hearst created WordSeer, a text analysis tool designed for literary texts and literary research questions. To illustrate the functionality of WordSeer, Muralidharan and Hearst used this text analysis tool to examine the differences in language between male and female characters in Shakespeare’s plays.

Ramsay, Stephen. “In Praise of Pattern.Faculty Publications – Department of English. Digital Commons @ University of Nebraska-Lincoln: 2005. Web. 15 November 2014.

Ramsay sets out to explore the idea of pattern as a point of Intersection between computational text analysis and the “interpretive landscape of literary studies.” Ramsay wanted to prove that there could be a computational tool that offered interpretive insight and not specific facts or results. So he set out to create StageGraph, a tool designed ostensibly to study structural properties in Shakespeare’s plays, but one also stemming from a branch of mathematics known as graph theory.

Rockwell, Geoffrey. “TAPoR: Building a Portal for Text Analysis.” Mind Technologies: Humanities Computing and the Canadian Academic Community. Ed. Ray Siemens and David Moorman. University of Calgary Press: 2005. 285-299. Print.

In this chapter, Rockwell introduces readers to the TAPoR – the Text Analysis Portal for Research. The TAPoR project began as a collaboration of researchers and projects and eventually proposed a network of labs and servers that would connect and aggregate the best text analysis tools, making them available to the larger academic community. Rockwell then explores TAPoR in more detail, offers an overview of the portal’s specific functions, and discusses the types of users the project envisions will use the tools available through the portal.

—. “What is Text Analysis, Really?Literary and Linguistic Computing. 18.2 (2003): 209-219. Web. 15 November 2014.

In this article, Rockwell argues that text analysis becomes, in effect, an interpretive aid because it creates new hybrid versions of a text by deconstructing and reconstructing some original text. As a result, Rockwell stresses the need for new kinds of text analysis tools that emphasize experimentation over hypothesis testing. He concludes the paper with a proposal for a portal model for text analysis tools, using his own TAPoR as an example.

Simpson, John, Geoffrey Rockwell, Ryan Chartier, Stéfan Sinclair, Susan Brown, Amy Dyrbye, and Kirsten Uszkalo. “Text Mining Tools in the Humanities.Journal of Digital Humanities. 2.3 (2013). Web. 15 November 2014.

Derived from an oral presentation at a research conference, Simpson et al.’s brief article and accompanying poster presents the testing framework developed for the TAPoR text mining tool. The TAPoR testing framework was then used as a proposal for the creation of a systematic approach to testing and reviewing humanities research tools, especially text mining tools.

Text Mining.DiRT Digital Research Tools. n.p., n.d. Web. 15 November 2014.

The DiRT directory compiles information about digital research tools for scholarly and academic use. The directory is divided into several categories, with one category devoted to text mining tools. Users can narrow the category by platform (operating system), cost, whether or not the tool is open sourced and more. Each individual entry includes a description of the tool as well as a link to the tool itself or its developer’s website. While the DiRT directory is an invaluable resource of text mining tools, one drawback is that the tools themselves are not rated in any way, either by the directory’s editorial board or by other users.

van Gemert, Jan. “Text Mining Tools on the Internet.ISIS Technical Report Series. The University of Amsterdam: 2000. Web. 15 November 2014.

van Gemert’s report is a thorough and comprehensive overview of text mining tools available on the Internet, though as it was published in 2000, it is now out-of-date. Still, this report offers a great deal of information both about specific text mining tools and the companies behind their creation. Van Gemert includes website links, summaries and information about available trial versions for each tool.

[Image note: text cloud created from the content of this post using Tagul, an online word cloud creator.]

Access & Accessibility in Digital Humanities

This year, from October 20th to October 26th, humanities researchers will observe International Open Access Week, a global event designed to celebrate and promote the benefits of open access and to encourage open access as the standard for academic scholarship. The organizers behind International Open Access Week define open access as the “free, immediate, online access to the results of scholarly research, and the right to use and re-use those results as you need.” Many projects, journals and scholarly resources within Digital Humanities promote themselves as open access, and many digital humanists support an increased commitment to open access research.

There is, however, a key difference between providing access to Digital Humanities research, and making that research accessible to all. While access can refer to “the right or opportunity to use or benefit from something,” accessibility specifically refers to something “easily obtained or used,” particularly by individuals with a disability (emphasis mine). If Digital Humanities, as a field of study, intends to maintain and perhaps even advance its commitment to access, then digital humanists must also consider accessibility when creating their projects. Far too often, the needs of individuals with disabilities remain neglected in digital spaces. According to George H. Williams, Associate Professor of English at the University of South Carolina Upstate:

Many of the otherwise most valuable digital resources are useless for people who are – for example – deaf or hard or hearing, as well as for people who are blind, have low vision or have difficulty distinguishing particular colors.

Indeed, despite its widespread use across many demographic groups, the Internet is “inherently unfriendly to many different kinds of disabilities” (Lazar and Jaeger, 70).

Accessibility on the Web

The Web Accessibility Initiative (WAI), created by the World Wide Web Consortium (W3C), tracks how individuals with disabilities use the Internet and develops guidelines and resources to help ensure websites are accessible to everyone. In theory, the Internet is designed to improve communication by removing barriers and obstacles; in practice, however, when websites – or Digital Humanities projects – are badly designed, they can prevent a large subset of the population from accessing information. Furthermore, each individual has his or her own strengths, weaknesses, skills and abilities, all of which can affect how he or she uses the Internet. Digital projects that take a “one way fits all” approach limit their reach and impact when certain groups of people can’t use or access that project.

The WAI offers an overview of the diversity of abilities and disabilities, which can range from auditory, visual, cognitive or physical disabilities to age-related impairments, temporary or situational impairments and health conditions. Each disability may have its own set of barriers to accessibility, requiring different solutions or alternatives. An individual who is hard of hearing, for example, might find it difficult to view audio content presented without captions, while someone with a cognitive disability might react poorly to lots of animation or moving images. Even the computer itself, with its traditional set up with a mouse and keyboard, can become an obstacle to a person with a lost limb or injury that prevents use of his or her hands.

Why Does Accessibility Matter?

Accessibility should be an integral part of Digital Humanities projects, for a variety of reasons. Perhaps most obviously, there could be legal implications, since many countries have passed laws requiring web accessibility. Digital Humanities projects are also sometimes funded through federal grants and, as Williams points out, digital humanists may lose such funding if they cannot demonstrate accessibility and adherence to federal accessibility laws.

Additionally, despite the existence of accessibility laws, a central administrating organization or group for web and digital accessibility does not. In the United States, for example, there is no one government agency in charge of ensuring compliance with accessibility laws. According to Lazar and Jaeger, this haphazard approach places “the burden on people with disabilities to enforce their own rights” (76).

Of course, accessibility also helps expand the reach of a Digital Humanities project. By taking the needs of the greatest number of people into account when designing a project, digital humanists can ensure the largest audience for their work, which in turn could help further the research or provide new contexts and connections.

Ideas and Recommendations

Improving accessibility in Digital Humanities will require more than one solution, and should include collaboration between those with expertise and those ready to learn. It will also necessitate improved accessibility policies and laws, as well as the enforcement of those laws. Williams proposes a universal design approach, explaining that universal design “is design that involves conscious decisions about accessibility for all.” It’s also efficient, providing websites and digital projects with compatibility for multiple devices and platforms. This would allow a digital humanist to design and create a project just once, then easily adapt it for different audiences or devices.

The WAI also offers suggestions by highlighting some of the tools a disabled person might use to improve his or her Internet experience (for example, hardware or software meant to help bridge the gap between the individual and the website) and the strategies and techniques a person might develop to interact with non-accessible websites. These include voice recognition software to give commands, screen readers for those with poor vision, and alternatives to the keyboard and mouse (touch-screens, joysticks, etc).

Certainly, one important step towards improved Digital Humanities accessibility is awareness within the field. A coalition of American universities and research centers is leading the charge for increased awareness with the Building an Accessible Future for the Humanities project. The Accessible Future partnership, supported in part by the US National Endowment for the Humanities, hosts a series of workshops exploring technologies, design standard and issues with digital projects, all tailored towards securing accessibility’s place in Digital Humanities.

Access has long been an integral part of Digital Humanities, grounded in the idea that digital projects should be available to as many people as possible. If Digital Humanities intends to continue its commitment to open access data and research, then accessibility – and specifically digital accessibility – must also become an integral part of the field. Designing accessible projects may require some rethinking and adjustments, but it won’t be as difficult as one might expect. Lazar and Jaeger remind us “the technical solutions for web accessibility already exist” (80). It’s simply a matter of being mindful of different abilities, considering accessibility issues and concerns from the start of each project, and ensuring that the information, in its many forms, is accessible to the widest possible audience.

Works Cited

About.International Open Access Week. Andrea Higginbotham, nd. Web. 21 October 2014.

“Access.” The New Oxford American Dictionary. Version 2.2.1. 2011. Apple, Inc.

“Accessible.” The New Oxford American Dictionary. Version 2.2.1. 2011. Apple, Inc.

Accessible Future. Indiana University Perdue University Indianapolis (IUPUI), 2014. Web. 20 October 2014.

How People with Disabilities Use the Web.Web Accessibility Initiative. W3C, 2013. Web. 20 October 2014.

Lazar, Jonathan and Paul Jaeger. “Reducing Barriers to Online Access for People with Disabilities.Issues in Science and Technology. Winter 2011: 69-82. Web. 20 October 2014.

Williams, George H. “Disability, Universal Design, and the Digital Humanities.Debates in Digital Humanities. Ed. Matthew K. Gold. University of Minnesota Press, 2012. 202-212. Web. 20 October 2014.

Crowdsourcing in DH, Part 2

When Jeff Howe and Mark Robinson coined the term “crowdsourcing” back in 2005 in an article for Wired magazine, the term referred primarily to practices operated by for-profit businesses, particularly within the tech world, whereby a large group of contributors undertook a number of small, often routine and mundane tasks. Nearly 10 years later, crowdsourcing has changed and evolved to a point where, like Digital Humanities, a standard, agreed-upon definition is difficult to find.

Stuart Dunn, a Digital Humanities lecturer at Kings College London, describes crowdsourcing as a “loaded term,” since the historical definition of the word connotes “the antithesis of what academia understands as public engagement and impact.” Yet, even with a variety of potential definitions and blurred boundaries for what might be considered a crowdsourced project, many Digital Humanities projects still rely on the term, if only because the larger population has developed a collective – if vague and overgeneralized – understanding of what “crowdsourcing” means.

As I mentioned earlier this week, my classmates and I recently presented on a number of crowdsourced projects. Listening to the other presentations and conducting my own research clearly revealed the depth and breadth of just what “the crowd” can accomplish. Below, I’ve shared a selection of some crowdsourced projects I found particularly interesting.

(There are, of course, many more examples than I’ve listed here. On my Links of Interest page, you can find a link to more DH crowdsourcing examples.)

  • What’s the Score at the Bodleian? – The Bodleian Library at Oxford University launched this project in collaboration with Zooniverse (a larger crowdsourcing project), to increase access to the library’s music collection and collection of printed musical scores. Volunteers transcribe the scores and add metadata tags to help categorize each score. The project initially attracted my attention as I’m a music fan and one-time musician myself, but further thought has me wondering: most online crowdsourcing projects are geared towards sighted volunteers – that is, volunteers need to be able to see something on a website. With What’s the Score?, there’s the potential for the Bodleian to add an audio component, allowing sight-impaired volunteers to offer tags or transcribe based on what they hear. Currently, the Bodleian does have some audio files uploaded, though these appear to be examples of the collection, rather than opportunities. I’d love to see the Bodleian – and other DH crowdsourcing projects – expand their accessibility so that more volunteers could contribute.
  • Reverse the Odds! – Another Zooniverse-affiliated program, Reverse the Odds! is a mobile game developed by Cancer Research UK. While the game is designed with bright colors and an easy-to-use interface, it also incorporates real cancer research data. By playing the game, participants help researchers recognize the patterns of various cancer cells, which, in turn, is used to find real solutions to cancer and cancer symptoms. There are other citizen science projects that have created games to further research; Reverse the Odds! is just one such example.
  • Tag! You’re It! and Freeze Tag! at the Brooklyn Museum – Though now retired, these two projects intertwined games with crowdsourcing in a new way. The Tag! game had volunteers providing collection tags to items in the Brooklyn Museums’ collections, with an interface that volunteers “playing” against each other for points. The Freeze Tag! component then gave volunteers the ability to revise and correct others’ tags, ensuring a built-in verification and moderation process. The project was a success for the museum and the use of game names that referenced clear childhood memories (at least for those of us who played the school yard game Tag) no doubt helped draw more volunteers to the project.
  • What Was There – Finally, a project not associated with an academic or nonprofit institution. What Was There was created by Enlighten Ventures, LLC, a digital marketing agency. The platform invites participants to upload old photos of their local community, then tag those photos with location and year. Once uploaded, the photos can then be overlaid with Google Maps Street View, providing a real-time visual example of how cityscapes and landscapes have changed over time. According to the website, the project hopes to “weave together a photographic history of the world (or at least any place covered by Google Maps).” That’s a fine goal, but there’s the potential for historians, architects, urban planners and conservationists to use the data gathered by the project for further research. Enlighten doesn’t (yet) mention what is done with the tags gathered, nor make it available to the public, but should they decide to open up the data, there are possibilities here.

Crowdsourcing in DH, Part 1

Earlier this afternoon, myself and my classmates in the Digital Humanities Theory and Practice course gave brief presentations on various crowdsourced projects, most of which related to Digital Humanities and/or citizens science in some way. I’ll write more later this week on crowdsourcing in DH in general, but for now, a bit of information on my chosen project:

The What’s on the Menu? project at the New York Public Library launched in 2011 and aims to transcribe and geotag the library’s entire collection of restaurant menus (approximately 45,000 menus dating back to the 1840’s, making it the largest menu collection in the world). The NYPL had some great early successes (its initial goal was reached within the first three months of the project’s launch) and while it seems to have stalled a bit since then, the data compiled by the project provides a fascinating look at America’s culinary and nutritional history.

Visit the project website for more information.


What is Digital Humanities Anyway?

To paraphrase Shakespeare, that is indeed the question.

It’s a question I heard quite often after informing family and friends I would be moving to Ireland to undertake a Digital Humanities degree. At the time, I usually described it as “the intersection of computing and technology with the humanities,” which, while technically correct, doesn’t fully capture the range and diversity in this field and its tools.

In April 2011, at the Defining Digital Humanities program at Columbia University, Dan Cohen (then Professor of History and Director of the Center for History and New Media at George Mason University, presently founding Executive Director of the Digital Library of America) presented his own definition of Digital Humanities:

Digital Humanities is the use of digital media and technology to advance the full range of thought and practice in the humanities, from the creation of scholarly resources to research on those resources to the communication of results to colleagues and students.

While there isn’t, as of yet, any one standard definition of Digital Humanities, I quite like Cohen’s definition for a few reasons. Coming from a communications background and having a great deal of interest in media, I appreciate his inclusion of “media and technology” (emphasis mine). Many digital humanists tend to focus on the computing technology aspects of Digital Humanities, for good reason, but I believe media (particularly digital and social media) have an equally important role to play. Cohen’s definition also emphasizes “the full range of thought and practice” in the humanities. Digital Humanities is not limited to one particular area of research; indeed, the diversity and broad reach of Digital Humanities projects are part of why it is difficult to define the field.

Most importantly, though, by specifically highlighting communication with colleagues and students, Cohen has, in my opinion, narrowed in on two of the most essential components of Digital Humanities. At its core, Digital Humanities is a collaborative process, much more so than any other humanities area. Ongoing communication and collaboration with other researchers and academics is what helps drive Digital Humanities forward, as does the continued education of the next generation of digital humanists, those who will build upon the foundation laid by present-day collaborations.

In that same speech, Cohen also refers to Digital Humanities as “a moving target.” It’s an apt description of a field in constant motion, evolving with each new project. Digital Humanities is a field that will continue to change, just as the technologies used now won’t be the same in five, 10 or 15 years. As a result, a standard definition might remain elusive.

Perhaps the best we can hope for is a definition of Digital Humanities as it stands now, with the understanding that any definition is a fluid idea bound to change. I plan to revisit my idea of a Digital Humanities definition towards the end of the semester and the end of the year. We’ll see how my ideas (and Cohen’s, too!) hold up over time.

© 2017 A Digital Education

Theme by Anders NorenUp ↑