TEI and Diplomatic Editions

Developed and first released in 1990, the Text Encoding Initiative (TEI) Guidelines are a specific method of text encoding that allows both computers and humans can read and understand those texts, separate and independent from a specific operating system. The Guidelines, which are expressed in the Extensible Markup Language (XML), provide scholars with pre-defined markup tags and elements to establish the structure of a particular text. The full and complete set of the Guidelines comprises nearly 500 elements, which digital humanists use to indicate what a text is, rather than how it should look or act.

TEI files have two parts: (1) a header, which includes information about the text, such as its title, author, publisher, and other bibliographic items; and (2) the body or text section, which contains the encoding of the actual text. All of the TEI tags and elements are organized into one of these two parts (“Introducing”). In addition to common structural elements such as paragraphs (<p>) and lines (<l>), the TEI Guidelines also include tags that allow encoders to communication editorial choices (<choice>), account for any apparent errors (<del> or <add>), and reflect decisions about any emendations in the original text (<unclear>). These tags are often used when scholars seek to create a diplomatic edition, a version of an original text which attempts to accurately reproduce any significant features, including spelling, abbreviations, deletions and other alterations (Pierazzo).

Diplomatic editions can range in their adherence to accuracy, from those considered ultra-diplomatic or strictly diplomatic “in which every feature which may reasonably be reproduced…is retained” to editions that feature normalized texts, created with readability in mind (Driscoll). Many scholarly editions fall somewhere in the middle, with an emphasis on a “semi-diplomatic” edition that retains some of the original text’s features, but not all. Such is the case here at Maynooth University, where a group of students enrolled in the Digital Scholarly Editing module are using TEI to encode and create a digital edition of the Woodman Diary.

The Woodman Diary Project

In 1918, Albert “Bert” Woodman was a soldier in the “L” Signal Company of the Royal Engineers, stationed in Dunkirk, France during World War I. After marrying his sweetheart, Nellie, Bert started to keep a diary of his experiences, intending to share it with Nellie when he returned home. Bert’s handwritten entries, starting in January 1918 and continuing until just after Armistice Day in November, fill the front and backs of nearly every page in the diary and span two physical journals, known by their respective brand names, Wilson and Butterfly.

Many historical documents, like Woodman’s diary, present unique challenges and opportunities for text encoders. Aside from understanding and transcribing an individual’s specific handwriting style, encoders may also encounter faint or faded writing, ink spills which obscure words and scribbles and cross-outs. Consequently, text encoders (in this case, the students in the module) must make careful editorial choices regarding the level of accuracy encoded in TEI.

Though the Woodman Diary Project is not a strict or ultra-diplomatic edition, the project team did decide to encode a handful of features often present in diplomatic editions, such as unclear words, additions and deletions, and abbreviated words. These tags and elements not only help preserve Bert’s idiosyncrasies, but they also allow readers in the general public or academic researchers to understand more about the diary and the circumstances under which it was written.

As often happens with handwritten documents, the Woodman diary contains a number of struck out words, phrases and letters, perhaps because Bert misspelled something or incorrectly recorded a number or name. These deletions are frequently accompanied by additions, either above or next to the original text. To accurately represent these features of the text, the Woodman Diary Project team used TEI’s <del> and <add> tags. Additionally, the attribute @rend gave team members the ability to indicate further characteristics, such as the position of the addition (e.g., above, over-written, next to) and even the very nature of the deletion (e.g., scribble, strikethrough, etc):

… when <add rend=”overwritten”>the<del>J</del></add> the Union Jack comes along …

5 March

(Woodman, 5 March 1918)

Occasionally, there were occasional words the project team was unable to decipher with absolute certainty. Despite having access to high-quality, high-resolution digital images of the physical diary, some words remain illegible, even if the proposed word does make sense within the context of the entry. In these cases, TEI’s <unclear> tag is used to contain “a word, phrase or passage which cannot be transcribed with certainty because it is illegible…in the source [document]” (“Elements Available”). In these cases, the tag helps signal to readers and researchers that there is still some doubt regarding the transcription of a word and phrase:

I’ll start another as soon as I can get the price of one <unclear>more</unclear>!!!

8 July

(Woodman, 8 July 1918)

When scholarly editions want to retain some diplomatic edition features, text encoders may offer the option of switching between the original text and an edited version. TEI’s <choice> tags allows for this, giving encoders the ability to “switch automatically between one ‘view’ of a text and another,” and therefore providing readers and researchers with insight into the encoder’s editorial choices (“Elements Available”). For the Woodman Diary Project, team members used the <choice> tag to contain abbreviations (<abbr>) and expansions (<expan>). Bert seems to have favored economy, given his use of every page available to him in his notebooks, and he also frequently abbreviated Standard English words and phrases in a likely attempt to save precious writing space:

Don’t get any <choice><abbr>ltrs</abbr><expan>letters</expand></choice> at all today

4 Feb

(Woodman, 4 Feb 1918)

As demonstrated by the examples above from the Woodman diary, the TEI tags for encoding editorial changes and choices prove particular useful for scholarly editions. While some encoders may choose to make silent corrects or emendations to enable easier reading of a text, partial or strict adherence to a diplomatic encoding offers accuracy and authenticity when dealing with historical texts. The encoding choices made by the Woodman Diary Project team give readers and researchers further insight into Bert Woodman and provide a more complete representation of his diary.



Driscoll, M.J. “Electronic Textual Editing: Levels of Transcription.” TEI: Text Encoding Initiative. TEI Consortium, n.d. Web, 18 March 2015.

“Elements Available in All TEI Documents.” TEI: Text Encoding Initiative. TEI Consortium, n.d. Web, 18 March 2015.

“Introducing the Guidelines.” TEI: Text Encoding Initiative. TEI Consortium, 2013. Web. 1 January 2014.

Pierazzo, Elena. 2011. “A Rationale of Digital Documentary Editions.” Literary and Linguistic Computing. 26.4 (2011): 463-477. Web. 18 March 2015.

Woodman, Albert. “Diary.” 1918. The Woodman Diary Project. An Foras Feasa, Maynooth University.