Creating a Digital Scholarly Edition: Lessons from The Woodman Diary Project


In a previous blog post, I wrote about the Woodman Diary project, in which a group of students (myself included) enrolled in AFF606a (Digital Scholarly Editing) are creating a digital edition of a First World War diary under the guidance of Professor Susan Schreibman. The project, which began in earnest in January 2015, is now entering its final weeks. Though the creation of each digital scholarly edition may differ depending on the project team, the timeframe, resources, or other aspects, there are some general lessons we can draw from the Woodman Diary project that may prove helpful for future work.

Teamwork, Communication and Project Management

A digital scholarly edition such as the Woodman Diary has many parts. There is the text itself, its transcription and digital images. There are the technical aspects, such as the XML/TEI encoded files and the XSLT used to transform the XML. Digital editions often include extensive contextual and historical information, and there might also be design considerations for the final website. With each part progressing and moving forward at its own rate during the project timeframe, teamwork and communication between team members has been vital to Woodman Diary’s progress.

At the start of the project, we established clear goals and a clear division of labor by having each team member assume responsibility for one part of the digital scholarly edition. Doing so allowed us to set clearly established communication avenues; questions about the annotations, for example, are directed to Noel, whereas Josh handles any issues with the design composites. By assigning one person to take charge of a specific piece of the project, we are striving to eliminate any confusion or cross-purpose tasks.

Woodman Diary logoThe division of labor also contributes to effective teamwork within the project. Given the scope and timeframe of this project, it simply is not possible to complete the necessary work without each team member contributing to the whole. Moreover, allowing each team member to oversee his or her own area of responsibility helps ensure the continued progress of the project, by separating a seemingly daunting task into manageable pieces.

At the same time, however, the appointment of a project manager and the work he does is absolutely essential to ensuring the project advances as intended. Project managers offer structure and a foundational grounding to a specific project, enabling team members to work together to accomplish defined goals. As the Project Management Institute (PMI) states on its website, “hope is not a strategy.” We could not have crossed our fingers and anticipated a positive outcome. Consequently, having a project manager provides the necessary structure needed to complete the project.  When individuals come together as a team to create something, whether it is a digital scholarly edition, a new software program or the construction of a building, they need a strong, solid plan and a leader who can guide the process from start to finish.

While the process of creating a digital scholarly edition such as the Woodman Diary is the result of the collective efforts of the entire team, ceding overall management and oversight of the project to one person is important for success. Woodman Diary team member Shane McGarry serves as the project manager, and his expertise and previous experience in such a role has proven invaluable. Throughout the last several months, Shane has kept us focused on our long-term goals and deadlines, acted as the primary contact between the project team and Professor Schreibman, and shepherded the project from its early beginnings to this last final month. He also ensures we adhere to good project management principles by establishing clear communication processes.

Our team meets in person for regular progress meetings on a weekly basis, avails ourselves of project management tools, such as Google Drive, Google Group, and Jira, and uses a shared Google Calendar to highlight any personal commitments that might interfere with deadlines. These practices enable us to communicate effectively amongst ourselves, whether it is simply to check in or to crowdsource ideas for a particular aspect of the digital scholarly edition.

Effective team communication, though, is more than simply staying in touch. Clear, consistent communication can help identity potential risks before they become problems, determine which areas of the project might need more attention, or reallocate resources based on progress reports. Indeed, project teams that communicate well are more likely to be successful. According to a 2013 report from the Project Management Institute, projects with highly effective communication plans were more likely to meet their original goals (80%, versus 52% of projects with minimal communication) and more likely to be completed on time (71%, versus 37%). With so many different parts to the Woodman Diary project, its ultimate success will be due, in large part, to our team’s ability to communicate well.

Know what the project is – and what it isn’t

Good communication can also mean listening, especially to those who have relevant knowledge. Last month, our team had the opportunity to speak with Gordon O’Sullivan, a former student at Trinity College Dublin who served as the project manager for another digital scholarly edition, the Mary Martin Diary project. Gordon offered a wealth of advice and feedback, but his most valuable piece of guidance was this: know what your project is – and know what your project isn’t.

Scope creep – the unplanned or continuous expansion or extension of a project’s scope – is the bane of many project managers (“Scope Creep”). Particularly in a group environment, when ideas are flowing and creativity peaks, it is easy to get carried away with grandiose visions and “wish list” items. But such ideas often don’t come with the necessary corresponding adjustments in time, resources and/or money. Moreover, many scope creep ideas are often “nice to have” elements in the project, but are not essential components for its completion.

Albert Woodman’s diary contains multiple inserted maps and newspaper clippings, referencing various campaigns and attacks during the war. Additionally, he mentions several towns and cities throughout his entries, which are encoded with a <placeName> TEI tag. In trying to determine how best to include the maps and the references to specific places in the project, we have considered using geo-referencing software to create dynamic images comparing Woodman’s geographic references with present-day Google Earth (see the sample image below).

Example of Geo-referenced MapUltimately, though, the geo-referenced maps are an example of scope creep. Their inclusion in the project would be interesting and informative, but the time involved in their creation (as well as the time needed to learn the specific geo-referencing software) shifts attention away from the project’s core components, especially at this critical time in our schedule. Gordon’s advice reminds us to focus on our original project plan. For now, geo-referenced maps do not fit within the scope of what our project is. Rather than attempting too much, we can instead concentrate on completing and refining our initial objectives and goals.


Though the Woodman Diary project may be unique with regards to its purpose, goals and final result, the lessons learned by myself and the other team members throughout the process can be useful and applicable for other digital scholarly edition (DSE) projects. From the appointment of a project manager to minimizing scope creep, the example set by our project team will hopefully prove beneficial for future DSE projects.

Helping IMMA Plan for the Future

This term, as part of the MA program, I am completing a practicum with IMMA, the Irish Museum of Modern Art. The practicum provides me with hands-on, real-world experience working on a digital humanities project and helps the host institution find solutions to a unique problem or challenge.


As my Bringing Irish Artists Closer practicum with IMMA (the Irish Museum of Modern Art) enters its final month, my focus has shifted to center on the web-based application prototype. While the prototype for this project is initially intended to complement the Gerda Frömel exhibition by presenting information and context on Frömel and her art, it also needs to be flexible and adaptable enough to accommodate future artists featured in the series. As a result, many of my conversations with IMMA supervisor Aoife Flynn and exhibition curator Sean Kissane have carefully considered the need to plan for the future and preserve the prototype’s structure and foundation for the exhibitions still to come.

Of course, it can be challenging to envision all possible future scenarios when designing and creating a website prototype, but keeping the expected uses of the prototype in mind can be helpful in guiding the decision-making process. The future of the web application was foremost in my mind when I opted to create the prototype using the WordPress platform. In addition to being a platform with which I am already familiar and proficient, WordPress is free and open-source, with both blogging and content management tools. Most importantly, however, since the project’s timeline and scope didn’t allow for building the prototype from scratch, WordPress offers specific options that will help ensure the application’s preservation.


IMMA website IMMA’s current website, in terms of both design and content management system (CMS), is more than ten years old. According to Aoife Flynn, IMMA’s Public Relations Executive, the result is a website that “is not extendable and has become costly to use” and update (“Re-Imagining IMMA Online”). Accordingly, IMMA is in the process of planning and designing a new website, with the goal of launching it within the next few years, and therefore it was not practicable or possible to use IMMA’s CMS for this practicum project. WordPress has an extensive library of site themes, which offer opportunities to create a prototype with a clean, streamlined design and structure that can easily adjust to and merge with IMMA’s new website when it does debut.

Such a merger is possible because of WordPress’ functionality and the ease with which a website or blog can migrate to another domain or server. For the purposes of this practicum, the prototype will be created on WordPress’ “.com” platform, with access shared between myself and IMMA. WordPress will fully and freely host this initial version of the prototype until the new main website is live. At that time, WordPress’ capabilities will provide IMMA with several options. The museum’s staff might choose to change the prototype’s URL, directing it towards IMMA’s new website, while leaving the content and structure in place, or IMMA might choose to export the entire prototype in XML format for implementation on IMMA’s new content management system. Both options preserve the original content of the application prototype while giving IMMA the greatest amount of flexibility in deciding how to incorporate the application with its new web presence.

Private Pages

Another key consideration for the future of the IMMA prototype is designing and creating an application versatile enough to accommodate multiple artists. Though the initial prototype will focus on Gerda Frömel, it is IMMA’s intention to use the digital application for the whole of the Modern Masters Series, which will feature a variety of artists. As one might expect, each artist has his or her own influences, affinities, media and practices. Throughout her career, for example, Frömel studied metalwork and sculpture, created devotional objects for Christian churches (such as stained-glass windows), exhibited both small-scale bronze castings and pencil drawings, and designed and produced a large, stainless steel public sculpture on commission. In contrast, Irish artist Patrick Hennessy focused solely on painting still life, landscapes and portraits, while Barrie Cooke was an abstract expressionist painter who also created mixed media pieces.

Given the wide diversity and variety of contemporary artists in Ireland, it would be quite difficult to create a “one-size-fits-all” application. Instead, in consultation with Aoife and Sean, I’ve structured the prototype with a few high-level categories that can then be divided further into sub-categories more specific to each artist. In order to maintain the clean, streamlined design and navigation, these sub-categories will be constructed as private pages in WordPress.

Page VisibilityOne of the benefits of a web-based application comes from (relatively) unlimited real estate on the Internet. WordPress allows users to create as many pages as needed and, most importantly for this project, WordPress offers the option of setting pages as “private.” Private pages do not show up on a website or application’s navigation menu, in RSS feeds or in search engine results. These pages are only accessible through the administrative console, by site editors and administrators. Thus, the WordPress prototype can host multiple pages representing the various sub-categories for each individual artist in the Modern Masters Series. These pages can then be turned “on” or “off” depending on IMMA’s needs for the application at any given time. The overall structure of the prototype will remain the same, but IMMA will retain maximum flexibility over its content, allowing the museum to use the application beyond its initial intended implementation.


In creating a digital resource for IMMA and its Modern Masters Series, I have given careful consideration of the future needs and uses of the application, particularly when choosing a web-publishing platform with which to build the prototype. With the planned new website and a diverse range of artists featured in the series offering unique challenges, WordPress provides appropriate options and solutions to IMMA’s needs. The result will be an application with built-in flexibility to ensure the continued use of a valuable digital art resource.

Encoding Choices in the Woodman Diary Project

TEI and Diplomatic Editions

Developed and first released in 1990, the Text Encoding Initiative (TEI) Guidelines are a specific method of text encoding that allows both computers and humans can read and understand those texts, separate and independent from a specific operating system. The Guidelines, which are expressed in the Extensible Markup Language (XML), provide scholars with pre-defined markup tags and elements to establish the structure of a particular text. The full and complete set of the Guidelines comprises nearly 500 elements, which digital humanists use to indicate what a text is, rather than how it should look or act.

TEI files have two parts: (1) a header, which includes information about the text, such as its title, author, publisher, and other bibliographic items; and (2) the body or text section, which contains the encoding of the actual text. All of the TEI tags and elements are organized into one of these two parts (“Introducing”). In addition to common structural elements such as paragraphs (<p>) and lines (<l>), the TEI Guidelines also include tags that allow encoders to communication editorial choices (<choice>), account for any apparent errors (<del> or <add>), and reflect decisions about any emendations in the original text (<unclear>). These tags are often used when scholars seek to create a diplomatic edition, a version of an original text which attempts to accurately reproduce any significant features, including spelling, abbreviations, deletions and other alterations (Pierazzo).

Diplomatic editions can range in their adherence to accuracy, from those considered ultra-diplomatic or strictly diplomatic “in which every feature which may reasonably be reproduced…is retained” to editions that feature normalized texts, created with readability in mind (Driscoll). Many scholarly editions fall somewhere in the middle, with an emphasis on a “semi-diplomatic” edition that retains some of the original text’s features, but not all. Such is the case here at Maynooth University, where a group of students enrolled in the Digital Scholarly Editing module are using TEI to encode and create a digital edition of the Woodman Diary.

The Woodman Diary Project

In 1918, Albert “Bert” Woodman was a soldier in the “L” Signal Company of the Royal Engineers, stationed in Dunkirk, France during World War I. After marrying his sweetheart, Nellie, Bert started to keep a diary of his experiences, intending to share it with Nellie when he returned home. Bert’s handwritten entries, starting in January 1918 and continuing until just after Armistice Day in November, fill the front and backs of nearly every page in the diary and span two physical journals, known by their respective brand names, Wilson and Butterfly.

Many historical documents, like Woodman’s diary, present unique challenges and opportunities for text encoders. Aside from understanding and transcribing an individual’s specific handwriting style, encoders may also encounter faint or faded writing, ink spills which obscure words and scribbles and cross-outs. Consequently, text encoders (in this case, the students in the module) must make careful editorial choices regarding the level of accuracy encoded in TEI.

Though the Woodman Diary Project is not a strict or ultra-diplomatic edition, the project team did decide to encode a handful of features often present in diplomatic editions, such as unclear words, additions and deletions, and abbreviated words. These tags and elements not only help preserve Bert’s idiosyncrasies, but they also allow readers in the general public or academic researchers to understand more about the diary and the circumstances under which it was written.

As often happens with handwritten documents, the Woodman diary contains a number of struck out words, phrases and letters, perhaps because Bert misspelled something or incorrectly recorded a number or name. These deletions are frequently accompanied by additions, either above or next to the original text. To accurately represent these features of the text, the Woodman Diary Project team used TEI’s <del> and <add> tags. Additionally, the attribute @rend gave team members the ability to indicate further characteristics, such as the position of the addition (e.g., above, over-written, next to) and even the very nature of the deletion (e.g., scribble, strikethrough, etc):

… when <add rend=”overwritten”>the<del>J</del></add> the Union Jack comes along …

5 March

(Woodman, 5 March 1918)

Occasionally, there were occasional words the project team was unable to decipher with absolute certainty. Despite having access to high-quality, high-resolution digital images of the physical diary, some words remain illegible, even if the proposed word does make sense within the context of the entry. In these cases, TEI’s <unclear> tag is used to contain “a word, phrase or passage which cannot be transcribed with certainty because it is illegible…in the source [document]” (“Elements Available”). In these cases, the tag helps signal to readers and researchers that there is still some doubt regarding the transcription of a word and phrase:

I’ll start another as soon as I can get the price of one <unclear>more</unclear>!!!

8 July

(Woodman, 8 July 1918)

When scholarly editions want to retain some diplomatic edition features, text encoders may offer the option of switching between the original text and an edited version. TEI’s <choice> tags allows for this, giving encoders the ability to “switch automatically between one ‘view’ of a text and another,” and therefore providing readers and researchers with insight into the encoder’s editorial choices (“Elements Available”). For the Woodman Diary Project, team members used the <choice> tag to contain abbreviations (<abbr>) and expansions (<expan>). Bert seems to have favored economy, given his use of every page available to him in his notebooks, and he also frequently abbreviated Standard English words and phrases in a likely attempt to save precious writing space:

Don’t get any <choice><abbr>ltrs</abbr><expan>letters</expand></choice> at all today

4 Feb

(Woodman, 4 Feb 1918)

As demonstrated by the examples above from the Woodman diary, the TEI tags for encoding editorial changes and choices prove particular useful for scholarly editions. While some encoders may choose to make silent corrects or emendations to enable easier reading of a text, partial or strict adherence to a diplomatic encoding offers accuracy and authenticity when dealing with historical texts. The encoding choices made by the Woodman Diary Project team give readers and researchers further insight into Bert Woodman and provide a more complete representation of his diary.



Digital Engagement with Irish Artists

This term, as part of the MA program, I am completing a practicum with IMMA, the Irish Museum of Modern Art. The practicum provides me with hands-on, real-world experience working on a digital humanities project and helps the host institution find solutions to a unique problem or challenge.  

Introduction to IMMA

IMMA English logoIMMA – the Irish Museum of Modern Art – is Ireland’s premier modern art institution, and is home to the nation’s collection of modern and contemporary art. Established by the government of Ireland in 1990, IMMA opened in 1991 and, since then, has featured a dynamic and evolving series of exhibitions, events, and programs, which are designed to engage the general public with the museum’s collections while supporting and promoting Irish artists.

IMMA’s emphasis on creating an enjoyable visitor-centric experience for museum guests has led to such initiatives as its award-winning Education and Community Program, the Artists Residency Program, and regularly scheduled talks, lectures and events. Through these offerings, IMMA strives to provide innovative and inclusive opportunities for a variety of audiences, including the more than 400,000 annual visitors from Ireland and abroad.

Practicum Goals

The practicum with IMMA is titled Bringing Irish Artists Closer at IMMA and is primarily focused on connecting and engaging museum visitors with a specific Irish artist and her body of work through a digital resource. In April, IMMA will open a new retrospective exhibition featuring the work of Gerda Frömel, an artist who was well-regarded during her lifetime and who first exhibited in Ireland in 1957. The digital resource will be designed both to complement the Frömel exhibition as a mobile-responsive website and serve as a template for future exhibitions in IMMA’s Modern Masters series. In addition to increasing overall awareness of IMMA and of Gerda Frömel, the practicum will seek to position IMMA as the primary source of information for contemporary Irish artists.

Challenge: The User Experience

There are a number of challenges – and opportunities – associated with this practicum, but one key issue revolves around understanding the user’s experience of exhibition, the digital resource and the combination of the two. Traditionally, a visit to an art museum might involve a visitor giving his or her near-complete attention to the art or exhibition itself. In some cases, there might be a tour, led by museum staff. In these cases, the experience is primarily analogue, with no digital component.

Tate Modern Art App

Tate Modern’s Art Terms App

With the rise in mobile applications designed specifically for museums, however, visitors may now divide their attention between the art and a smartphone or hand-held device. They may Google a phrase or name that might be unfamiliar, upload photos to a social media website or “check in” via a geolocation app. As a result, museums (including IMMA) must determine how to balance the benefits of a digital, mobile resource with the decidedly un-digital experience of viewing art.

During the nascent years of mobile museum applications, many institutions created multimedia guides for exhibitions and collections that were based, in part, on the traditional docent-led tours of gallery. In a 2009 paper for the Museums and the Web conference, Koven Smith of the Metropolitan Museum of Art points out that multimedia and/or digital “tours” with “stops” often do not take the specific user experience into account, thus limiting the usefulness of a mobile, digital resource. According to Smith, only a small percentage of museum visitors still want the “led-by-the-hand” approach. Rather, he says, “museums [must] now encourage users to self-curate.”

A mobile app or other digital resource for a museum exhibition or collection needs to be flexible enough to provide a user with choices that lets him or her drive the experience. This may mean incorporating content that can and should be viewed (or read or seen) while the visitor is at the museum, and it may also mean specifically including content intended to be accessed via the Internet before or after visiting the museum. The digital resource for the Frömel exhibition, for example, will be built as a website, but will also be accessible on and responsive to mobile devices. This decision was made deliberately, as it offers a range of possibilities for IMMA visitors in choosing how, when and where they experience the complementary information. The website option also allows IMMA to use the Frömel exhibition and digital resource as a test for future exhibitions, helping museum staff discover the format that best suits IMMA’s visitors.

Of course, the user experience incorporates more than simply how and when a visitor will use a specific mobile app or a website. The specific nature of the museum, the widespread use (or lack thereof) of mobile devices and user demographics will all influence a visitor’s experience. In working to build a digital resource for IMMA, the Bringing Irish Artists Closer practicum will explore best practices from other museums and cultural heritage institutions, while also analyzing specific data about IMMA’s audiences and visitors to present a whole and complete understanding of how best to engage art lovers and art newcomers alike with the work of Gerda Frömel.

Take Two: Literature and DH

Recently, two intriguing articles from well-respected Digital Humanities scholars came through in my feed reader, and as they align quite nicely with my own interests in the intersection of technology and literature, I thought I’d share them here.

What is an @uthor? by Matthew Kirschenbaum

Writing for the LA Review of Books, Kirschenbaum (perhaps best known for his article “What is Digital Humanities and What’s It Doing in English Departments?”), explores how the evolving landscape of social media and author engagement with audiences online is changing the nature of literary criticism and the very idea of authorship itself:

Today you cannot write seriously about contemporary literature without taking into account myriad channels and venues for online exchange. That in and of itself may seem uncontroversial, but I submit we have not yet fully grasped all of the ramifications. We might start by examining the extent to which social media and writers’ online presences or platforms are reinscribing the authority of authorship. The mere profusion of images of the celebrity author visually cohabitating the same embodied space as us, the abundance of first-person audio/visual documentation, the pressure on authors to self-mediate and self-promote their work through their individual online identities, and the impact of the kind of online interactions described above (those Woody Allenesque “wobbles”) have all changed the nature of authorial presence. Authorship, in short, has become a kind of media, algorithmically tractable and traceable and disseminated and distributed across the same networks and infrastructure carrying other kinds of previously differentiated cultural production.

There are Only Six Basic Book Plots 

In an article for Motherboard, contributing editor Ben Richmond interviewed Matthew Jockers (textual analysis proponent and author of Macroanalysis) about his algorithmic model that identifies archetypal plot shapes. According to his research, about 90% of the time, results showed six basic plots (with the remaining 10% indicating seven basic plots). While some of his data remains unknown, Jockers did release his tools on GitHub to encourage others to try the same experiment for themselves:

Most books that measure the number of plots seem aimed at writers and would-be writers, but Jockers’s work has implications for readers, librarians, and even literature snobs, or anyone who wants to put snobs in their places.

As he was charting plots, Jockers noticed that some genres that are derided for being “formulaic,” like romance, aren’t just relying on boy-meets-girl.

“Romance showed some proclivity for two of the six plot shapes, but it wasn’t an overwhelming case of all the plots falling into one,” Jockers said. “It was a much more evenly distributed from these six shapes.”

Choose Your Own Twitter Adventure

Within the larger world of electronic (digital) literature is the genre of hypertext fiction, a non-linear approach to reading that gives readers links or modes to jump from one part of the text to another. It is, by its nature, interactive, with the reader guiding the narrative depending on the choices she makes. Hypertext fiction also isn’t necessarily limited to e-books or online stories.  The term can also apply to traditionally published books (many prior to the advent of the web) with nonlinear narratives, such as Joyce’s Ulysses.

For many of my generation, the Choose Your Own Adventure novels are the best example of a traditionally published hypertext novel. Though not explicitly referred to or marketed as such, the CYOA books were hypertextual and interactive. The reader could choose any number of paths through the story that would alter the story’s outcome, and many readers (myself included) often tried to guess or predicate what would happen next, mostly to avoid the dreaded “you’ve died” message.

Now, thanks to one clever and imaginative Twitter user, the principals behind the CYOA books specifically and hypertext fiction in general have come to social media. Twitter user Terence Eden (@edent) created a “Choose Your Own Adventure” narrative for Twitter. The story takes place entirely within the Twitter platform / website, and web-savvy readers and fans can navigate through a series of choices in a mysterious story. Should you run or hide? Investigate that glowing light? Fight back or flee? Each choice brings you to a another, until (of course) you die.

Eden’s CYOA Twitter story works well for a couple of reasons. Thanks to Twitter’s setup, the @ symbol will automatically link to a user name, which allowed Eden to create a variety of user names for this specific project without having to rely on outside webpages or excessively long hyperlinks (that take up valuable “real estate” on the 140-character platform). Furthermore, Twitter allows for pinned tweets, which means Eden could keep all the relevant information at the top of a user profile, negating the need for CYOA readers to scroll. Plus, Eden kept the narrative portions of the tweets are short and to the point, compelling readers to keep clicking. The result is an addictive and entertaining story completely enclosed within this one social network. It will be interesting to see what happens next to push hypertext fiction forward even more.

End of Term Reflections

Well, it’s been four months, and my first semester as a Digital Humanities student is (for all intents and purposes) finished. From my perspective, the last sixteen weeks have been incredibly productive, informative and thought-provoking. I’ve not only learned a great deal, but I’ve also had the opportunity to think critically about what I’ve learned, and how I believe those lessons fit within the overall Digital Humanities field. Below are some of my reflections and thoughts about this past term, and some ideas for the future.

Though my technical and coding skills have vastly improved (especially when compared to the days and months when I was teaching myself), I still believe this is one area where I can do better. I’ve grappled with data modeling, encoding, and metadata schemas, but practice makes perfect, and there is always more to learn. I do wish there had been some follow up to the intensive, pre-term Java course we took; I did well with the module at the time, but feel I’ve lost some of the knowledge since due to non-use.

The intersections between Digital Humanities, media and digital (electronic) literature remains a strong area of interest for me, as one might have guessed based on some of my previous posts. I’ve been attempting to expand my knowledge of this area by reading on my own, and I’m fascinated by the creativity and ingenuity found in some of these new digital literature projects. In looking forward to the future, I’ve started working on a PhD proposal for doctoral-level research specifically addressing digital (electronic) literature. It’s still very much a work in progress, but I’m passionate about this particular area of study and look forward to what comes next.

My MA program is, as the name implies, Digital Humanities, so many of the readings and lectures have had a literature and/or history focus to them. As a result, I am very curious about what doesn’t come up as often, namely the state of the digital arts, and how that intersects with Digital Humanities. Some colleagues and lecturers are working in the art history and cultural heritage sectors, but I still sense that there is still a huge gap in awareness between Digital Humanities and digital arts (or music or performance). There could be many reasons for this (I have a few theories of my own), but I also believe there’s a world of untapped potential with the digital arts (the What’s the Score? project at the Bodelian Library is one project that immediately comes to mind) and I’d love to know more. I’m very interested in learning more about applying digital ideas and techniques to the art world, which is why I’m especially excited for my upcoming practicum next semester with the Irish Museum of Modern Art. More on that next term!

Similarly, I’m also curious about issues of diversity, race, gender and sex in the Digital Humanities. From my (admittedly somewhat limited) perspective, I see the field as one in which the majority of thought leaders and researchers are still male and overwhelmingly white. I’m interested about that dynamic and what it means both for the DH field and for DH projects and research. To my mind, there is a clear and identifiable need for more diversity within the field. I don’t know that I’m the best person to propose any solutions, but I would love to see a more concerted effort to think critically about expanding DH to include those voices that aren’t necessarily being heard. (Of course, if anyone has suggestions for readings that address this very topic and would like to point me in the right direction, I’d be most appreciative.)

These are just a few thoughts; like so many things in life, learning about Digital Humanities is an ongoing process (especially since it is an evolving field itself) and I know I’ll have much more to stay in 2015.

Until then, Happy Holidays, and a Happy New Year!

Reimagining the Audience for Digital Scholarly Editions

According to the Modern Language Association’s Guidelines for Editors of Scholarly Editions, a scholarly edition’s most basic task is to “present a reliable text,” one that can also contribute to academic research on a particular topic. Traditionally, scholarly editions have had fairly limited audiences, the final printed version intended primarily for other scholars conducting similar research. With the dawn of the digital age, however, the creation of digital scholarly editions is changing the nature of the audience for these works. The availability of scholarly editions online and the use of crowdsourcing to help create these editions are just two ways the digital world is blurring the lines between the traditional academic audience and a much larger, more public audience.

In 2009, at the Association for Documentary Editing Annual Conference, Andrew Jewell presented a presented a paper that explored new ideas around the reading of digital scholarly editions. According to Jewell, “the dominant model for distributing [scholarly] editions in the age of print [was] to sell large volumes at large prices” (1). But the advent of digital publication on the Internet has upended this model by amplifying the reach of a scholarly edition. Where they once would have been available only to a narrowly focused audience, many scholarly editions in digital form can now be accessed by anyone with an Internet connection.

A general audience, however, has different needs than a scholarly one, and may even approach the edition with different intentions. In fact, many casual readers of a scholarly edition may not have even specifically sought out the resource, but rather stumbled across it accidentally. Jewell offers the example of his own Willa Cather Archive, noting that a reader may find the archive “because search engines lead them to hidden bits of knowledge deep in the site” (3). A wider, more diverse audience for a scholarly edition also means the text and content will be consumed in new ways. A printed scholarly edition may follow a traditional, linear format; in a digital world, readers skim, search, scan and skip over parts that may not interest them.

Moreover, readers can access digital editions through any number of Internet browsers, mobile devices or tablets. Each option changes the experience of the edition in subtle ways, even when the content available remains the same. As Jewell correctly points out, “we cannot fully predict how readers will interact with digital publications…[and] we cannot expect every view of that website to be the same for each user” (6). The very nature of the Internet means each visit to a digital edition website will result in a different kind of engagement with the text, with the idea of “the audience” changing each time as well.

The evolving nature of a digital scholarly edition’s audience is not limited to reading and accessing information, though. Some scholarly editions are blurring the boundaries even further by actively involving the audience in the creation of the text itself. In 2010, Cathy Moran Hajo, Associate Editor of the Margaret Sanger Papers, wrote, “Web 2.0 tools are increasing in sophistication and enabling large amounts of people from all walks of life to participate in the creation of editions.” Hajo was, in effect, referring to crowdsourcing and in the years since, an increasing number of cultural and academic institutions have turned to crowdsourcing to complement and contribute to existing projects.

Crowdsourcing in the humanities (or, indeed, in Digital Humanities) aims, in part, to “expand the scope of the community membership beyond academics, and into the interested and engaged general public” (Siemens, et al.). Crowdsourced projects specifically reach out to the audience and invite them into the scholarly editing process, by having them either enrich existing materials or help create an entirely new resource (Carletti et al). In doing so, these projects are not simply looking for free labor, but instead, according to Carletti et al., are “collaborating with their public to augment or build digital assets through the aggregation of dispersed resources.”

Transcribe Bentham, one example of a crowdsourced scholarly edition project, has relied on volunteers to help transcribe thousands of manuscripts from philosopher Jeremy Bentham. The rationale behind opening up this project and scholarly edition to the larger public was due partly because the initiative hoped to “democratize the creation of, and access to, knowledge and humanities research” (Causer and Terras). Beyond opening access to the research, however, crowdsourcing connects passionate, interested individuals with these scholarly projects. The vast majority of crowdsourcing volunteers are not rewarded monetarily, and so many participate simply because they have a deep, personal interest in the subject. And as Ricc Ferrante, Director of Digital Services & Information at the Smithsonian Institution Archives points out, “passion breeds evangelists, breeds new volunteers, and new discoveries,” all of which can, in turn, lead to new knowledge.

There are some who may question the value of an open-access, online digital edition or the use of crowdsourcing to create such an edition. These individuals may maintain that scholarly editions should remain in the realm of the scholar. Ultimately, though, the blurred audience lines can be considered a good thing, as it expands the reach of a particular subject and opens up the humanities to new understandings. For Jewell:

“The defining feature of the broader audience that encounters free, online documentary editions is diversity: it comes from around the world, from a variety of perspectives and educational levels, and with a variety of goals.”

With more diversity comes more readers, more perspectives, and more people discovering new content that they may not have before encountered. Digital tools and technologies create a larger audience for scholarly editions, providing an enriched, varied and dynamic way of accessing and experiencing humanities data. The challenge, then, for scholarly editors, is to “move beyond the ivory towers of research libraries to high schools, town libraries and even to the comfort of private homes” (Hajo). By extending the reach of a digital scholarly edition and blurring the line between a traditional audience and a more expansive one, researchers and editors can ensure that their work is truly open and accessible.


Text Mining: An Annotated Bibliography

Text Cloud of Text MiningIn 2003, in an issue of the Literary and Linguistic Computing journal, humanities computing scholar Geoffrey Rockwell asked the question, “What is text analysis, really?” More than ten years later, some Digital Humanities are still asking the same question, especially as technological advances lead to the creation of new text analysis tools and methods. In its most basic form, text analysis – which is also known as text data mining or, simply, text mining – is the search for and discovery of patterns and trends in a corpus of texts. The analysis of those patterns and trends can help researchers uncover previously unseen characteristics of a specific corpus, deconstruct a text, and reveal new ideas and theories about a particular genre or author. The following annotated bibliography offers an overview of text mining tools in Digital Humanities, with the intention that it may serve as a starting point for further exploration into text analysis.

Argamon, Shlomo and Mark Olsen. “Words, Patterns and Documents: Experiments in Machine Learning and Text Analysis.Digital Humanities Quarterly. 3.2 (2009). Web. 15 November 2014.

In Argamon and Olsen’s article, they suggest that the rapid digitization of texts requires new kinds of text analysis tools, because the current tools may not scale effectively to large corpora and do not adequately leverage the capability of machines to recognize patterns. To test this idea, Argamon and Olsen, through the ARTFL Project, developed PhiloMine, a set of text analysis tools that extent PhiloLogic, the authors’ full-text search and analysis system. Argamon and Olsen provide an overview of PhiloMine’s tasks (predictive text mining, comparative text mining and clustering analysis), and then summarize three research papers that highlight the tasks’ strengths and weaknesses.

Borovsky, Zoe. “Text and Network Analysis Tools and Visualization.” NEH Summer Institute for Advanced Topics in Digital Humanities. Los Angeles, 22 June 2012. Presentation. Web. 15 November 2014.

This presentation by Borovsky, the Librarian for Digital Research and Scholarship at UCLA, provides an overview of text mining tools, with an in-depth look at a few specific tools: Gephi, Many Eyes, Voyant and Word Smith. Borovsky highlights some of the benefits and challenges of each tool, and offers examples of sample outcomes. Though the slides are presented without the addition of a transcript of Borovsky’s presentation speech, the slides themselves a high-level overview of these four specific text mining tools and Borovsky’s template easily allows readers to discover relevant information about each tool.

Green, Harriett. “Under the Workbench: An analysis of the use and preservation of MONK text mining research software.Literary and Linguistic Computing. 29.1 (2014): 23-40. Web. 15 November 2014.

To help further humanities scholars’ understanding of how to use text mining tools, Green conducted an analysis of the web-based text mining software MONK (Metadata Opens New Knowledge). Green studied a random sample of 18 months of analytics data from the MONK website and conducted interviews with MONK users to understand the purpose of the tool, it’s usability and the challenges encountered. Along with other findings, Green discovered that MONK is often used as a teaching tutorial and that it often provides an entry point for students and researchers learning about text analysis.

Muralidharan, Aditi and Marti A. Hearst. “Supporting exploratory text analysis in literature study.Literary and Linguistic Computing. 28.2 (2013): 283-295. Web. 15 November 2014.

According to Muralidharan and Hearst, the majority of text analysis tools have focused on aiding interpretation, but there haven’t been many (if any) tools devoted to finding and revealing insights not previously known to the researcher. So Muralidharan and Hearst created WordSeer, a text analysis tool designed for literary texts and literary research questions. To illustrate the functionality of WordSeer, Muralidharan and Hearst used this text analysis tool to examine the differences in language between male and female characters in Shakespeare’s plays.

Ramsay, Stephen. “In Praise of Pattern.Faculty Publications – Department of English. Digital Commons @ University of Nebraska-Lincoln: 2005. Web. 15 November 2014.

Ramsay sets out to explore the idea of pattern as a point of Intersection between computational text analysis and the “interpretive landscape of literary studies.” Ramsay wanted to prove that there could be a computational tool that offered interpretive insight and not specific facts or results. So he set out to create StageGraph, a tool designed ostensibly to study structural properties in Shakespeare’s plays, but one also stemming from a branch of mathematics known as graph theory.

Rockwell, Geoffrey. “TAPoR: Building a Portal for Text Analysis.” Mind Technologies: Humanities Computing and the Canadian Academic Community. Ed. Ray Siemens and David Moorman. University of Calgary Press: 2005. 285-299. Print.

In this chapter, Rockwell introduces readers to the TAPoR – the Text Analysis Portal for Research. The TAPoR project began as a collaboration of researchers and projects and eventually proposed a network of labs and servers that would connect and aggregate the best text analysis tools, making them available to the larger academic community. Rockwell then explores TAPoR in more detail, offers an overview of the portal’s specific functions, and discusses the types of users the project envisions will use the tools available through the portal.

—. “What is Text Analysis, Really?Literary and Linguistic Computing. 18.2 (2003): 209-219. Web. 15 November 2014.

In this article, Rockwell argues that text analysis becomes, in effect, an interpretive aid because it creates new hybrid versions of a text by deconstructing and reconstructing some original text. As a result, Rockwell stresses the need for new kinds of text analysis tools that emphasize experimentation over hypothesis testing. He concludes the paper with a proposal for a portal model for text analysis tools, using his own TAPoR as an example.

Simpson, John, Geoffrey Rockwell, Ryan Chartier, Stéfan Sinclair, Susan Brown, Amy Dyrbye, and Kirsten Uszkalo. “Text Mining Tools in the Humanities.Journal of Digital Humanities. 2.3 (2013). Web. 15 November 2014.

Derived from an oral presentation at a research conference, Simpson et al.’s brief article and accompanying poster presents the testing framework developed for the TAPoR text mining tool. The TAPoR testing framework was then used as a proposal for the creation of a systematic approach to testing and reviewing humanities research tools, especially text mining tools.

Text Mining.DiRT Digital Research Tools. n.p., n.d. Web. 15 November 2014.

The DiRT directory compiles information about digital research tools for scholarly and academic use. The directory is divided into several categories, with one category devoted to text mining tools. Users can narrow the category by platform (operating system), cost, whether or not the tool is open sourced and more. Each individual entry includes a description of the tool as well as a link to the tool itself or its developer’s website. While the DiRT directory is an invaluable resource of text mining tools, one drawback is that the tools themselves are not rated in any way, either by the directory’s editorial board or by other users.

van Gemert, Jan. “Text Mining Tools on the Internet.ISIS Technical Report Series. The University of Amsterdam: 2000. Web. 15 November 2014.

van Gemert’s report is a thorough and comprehensive overview of text mining tools available on the Internet, though as it was published in 2000, it is now out-of-date. Still, this report offers a great deal of information both about specific text mining tools and the companies behind their creation. Van Gemert includes website links, summaries and information about available trial versions for each tool.

[Image note: text cloud created from the content of this post using Tagul, an online word cloud creator.]

