NOTE: Code brackets which should appear as “< and >” are represented by “[ and ]” to display this post on WordPress.
At time of writing, the Digital Scholarly Editing class has progressed impressively with the Albert Woodman Diary project; the digitisation of the diary of a Royal Engineer stationed at Dunkirk during the First World War. Albert’s diary is divided by month throughout the year of 1918. Albert writes through numerous exciting events (at least for a Great War historian) such as the German Spring Offensives and the Hundred Days Offensive. This blog post is concerned with the technical aspect of the project and the significant hurdle that is facing the team now; the encoding of the diary for the forthcoming site.
It is worth mentioning that this post is written from the experience of an absolute beginner to TEI, XML and coding in general. Previous to this, my only experience with coding was a brief three week crash course in Java back in September. As such, the concept of encoding several month’s worth of Albert’s diary was a compelling albeit overwhelming one. Fortunately several members of the team are competent users of XML and were more than happy to assist the less knowledgeable members of the team. Despite this advantage the task that lay ahead indeed seemed insurmountable.
Our team was given a gentle introduction to the TEI encoding principle throughout the semester, experience that proved invaluable to dealing with some of the issues (which will be described below in further detail) that would arise from digitising Albert’s diary. Furthermore the TEI site guidelines proved invaluable in assisting the team with their work.
The diary was divided by month which gave the members of the team two months each to encode. The team was quite fortunate to have access to a detailed transcription of the diary, granted to us by Albert’s granddaughter. From the transcription the team adapted the text for coding via the editing platform oXygen or notepad +. Early in the project development it was decided to take a documentary or diplomatic approach to the design, according to Stephen R Reimer a diplomatic edition is responsible for ‘indicating as far as possible the “state” of the text in this manuscript.’ (1998, online) In terms of the Woodman Diary, this made it necessary to follow Albert’s writing character for character including spelling errors, punctuation and line breaks.
(The above images are of the Diary entry for the 8th of February and the encoded version. One can see that the structure of the physical copy has been emulated with linebreaks)
It was this design choice that lead to some of the aforementioned issues, namely the division of dates and pages. The Diary is divided up by date wherein each entry (Albert kept daily entries) is wrapped in [div] tags. The new entry is wrapped in a [head] tag inside which the entry is labelled according to its date and which journal (Albert wrote two, the Wilson and Butterfly diaries) it was taken from, eg: w_1918-03-16. However, on occasion Albert’s entries span more than one page where a day starts on one page then concludes on the next. As the two pages are wrapped under the one [div type=”day”] tag, it was impossible to wrap the date under a fresh [head] tag, thus sabotaging the design choice we made. Needless to say, this issue lead to some frantic brainstorming. Once it was established that the issue lay with overarching hierarchies it was possible to overcome the obstacle. The solution lay with changing the [date] tag to a [type =”head”], therefore allowing it to function similar to the [head] but without disrupting the hierarchy. Thus, the code could validate.
Another issue lay with the extra content contained within the Diary, namely newspaper clippings, photographs and other inserts that Albert included with his writings. It was decided that the Diary itself took precedence over the additional content but we would try to include as much as possible if we had the time. The issue with the additional content is that inserts and diary pages would often share the same date, thus leading to naming issues. Earlier in the project, when editing the images, I was responsible for establishing a naming convention for the diary. This convention, as mentioned above, was the page number, date and initial of the diary eg, 087-September16-B. During the process the inserts were labelled with an ‘r’ for ‘reverse of page.’ This scheme worked once implemented into the XML editor; the [div] tags reserved for date would clash whenever the two entries shared the same date, so the inserts were clearly marked with an ‘r’.
At the time of writing the encoding of the related media / additional content remains a pressing issue. The concern lies with the [div] tags once more and whether or not the additional material is assigned [div type=”related-media”], [div type=”other-image”] or some other title.
Aside from these small issues the progress of the coding is astounding. As an absolute beginner I dreaded the task but must state how enjoyable the process was. The guidelines of the TEI eased many concerns and once some initial progress was made with the work it became less of a task and more of a enjoyable exercise. The Oxygen editor is extremely useful for a beginner thanks to its clear, clean interface and the use of colour coding to aid with validation, a factor that allows one to locate the problems with the code instantly, freeing time and effort.
oXygen XML editor. ‘Video Demonstrations.’ (2014) Web, available at http://www.oxygenxml.com/videos.html
Accessed on 20/3/15
Reimer, Stephen. R. ‘Manuscript Studies: Textual Bibliography: Kinds of Edition.’ University of Alberta (1998) Web, available at http://www.ualberta.ca/~sreimer/ms-course/course/editns.htm
Accessed on 22/3/15
TEI.’Home.’ Text Encoding Initiative (2014) Web, available at http://www.tei-c.org/index.xml
Accessed on 22/3/15