Computational Linguistics and the Dating of Early Irish Texts

Workshop and Lecture Series hosted by the members of the Chronologicon Hibernicum Project

15 December 2016.

Source: https://www.maynoothuniversity.ie/chronologiconhibernicum
Source: https://www.maynoothuniversity.ie/chronologiconhibernicum

Onscreen presentation offers advantages that could not have been imagined by the editors of previous generations.  One of the major advantages hypertext editions have over print is that they are fully searchable.  This has benefits beyond mere convenience: Thorlac Turville-Petre has recently argued regarding the study of Middle English linguistics through the digital medium that searchable texts and electronic concordances serve as powerful aids to full and accurate analyses of the language – a point that has not been overlooked by scholars of Medieval Irish.

Recent projects in Medieval Irish Studies have made an increasing number of digital resources available to the researcher.  Chronologicon Hibernicum (ChronHib) – A Probabilistic Chronological Framework for Dating Early Irish Language Developments and Literature, is an ERC funded research project with the aim of creating tools to facilitate the study of language development.  The primary aim of the project is to refine the methodology for dating Early Medieval Irish linguistic development and to build a chronological framework of linguistic changes that can be used to date literary texts within the Early Irish period (ca. 6th – mid 10th century).  The project aims to achieve this through statistical methods for the seriation of linguistic data, and for estimating dates using Bayesian inference.  A further goal of the project is to harness the potential of existing digital resources and to develop new publicly available corpora to help date Old and Middle Irish texts and to gain deeper insights into the development of the phonology, morphology, syntax and lexicon of the Irish language.

In early December, I was invited to attend a morning workshop and an afternoon lecture series entitled “Computational Linguistics and Dating of Early Irish Texts”, hosted by the members of the Chronologicon Hibernicum project, at National University Ireland, Maynooth.  The aim of the workshop was to discuss aspects of harmonising the annotation schemes and headword choices in the existing linguistically-parsed databases of Early Irish (Milan Glosses  http://www.univie.ac.at/indogermanistik/milan_glosses.htm, Griffith 2013; Priscian Glosses http://www.univie.ac.at/indogermanistik/priscian/, Bauer 2015), and the Parsed Old and Middle Irish Corpus (https://www.dias.ie/celt/celt-publications-2/celt-theparsed-old-and-middle-irish-corpus-pomic/, Lash 2014) as well as the various in-progress databases including The Annals of Ulster database, the Minor Glosses database and the Poems of Blathmac database.  Additionally, the workshop aimed to examine how existing online text repositories, such as CELT (https://www.ucc.ie/celt/) and Thesaurus Linguae Hibernicae (https://www.ucd.ie/thl/) could be utilised to advance the study of Early Medieval Irish linguistics.

Following some brief introductions, Prof. David Stifter, the principal investigator of ChronHib, discussed the progress of the project to date and presented the participants with a draft Guidelines for Analysis and Mark-up in ChronHib’s Lexical Databases.  A lively debate ensued and a number of issues were discussed in relation to including the following:

  • Issues of importing data from existing databases such as CELT and TLH.
  • Exploring the possibilities of tagging texts automatically
  • Database server and web enabled data entry – IT
  • Prototyping and dirty data.
  • Usability studies
  • Creating a live database
  • Formatting of the Masterfile

The afternoon lectures explored computational methods for linguistic research, especially as they apply to the study of Old and Middle Irish.  The three speakers all addressed different aspects of computational linguistics – from building corpora of historical languages, to the application of computational approaches to linguistic analysis and unsupervised machine learning.  First up, was Dr. Marius Jøhndal a linguist researching the syntax of Latin at the Department of Philosophy and, Classics, History of Art and Ideas at the University of Oslo.  Dr. Jøhndal discussed his experiences of the PROIEL Treebank project and the degree to which computational approaches can be applied when building corpora of historical languages.  The next speaker, Dr. Aaron Griffith of Utrecht University, examined the distribution of pre-verbal ceta ‘first’ <*kintu- in the glosses in order to demonstrate the kinds of research questions that can be addressed utilising digital corpora of historical languages.  Lastly Prof. Gregory Toner provided an overview of the methods and some of the results of an ongoing project at Queen’s University, Belfast which applies unsupervised machine learning computational linguistic techniques to the dating on medieval Irish texts.

All in all the day was a success.  It was particularly exciting for a student of Digital Humanities with a primary research interest in Medieval Irish Studies to participate in such a vibrant discourse concerning the implementation of digital technology in the study of Medieval Irish linguistics.   It also raised questions for me regarding the understanding of Digital Humanities in other humanities disciplines.  Throughout the morning session participants encouraged members of the project to approach IT professionals and database specialists to resolve issues of interactivity and design.  When I enquired as to why the project had not thought to approach the Digital Humanities Department, a department with which the project shares a floor at Maynooth University, I was informed that they had not considered that databases formed part of the Digital Humanities subject matter.  For me, this points to a break-down in communication between Digital Humanities and the wider humanities community.  I cannot help but wonder how many opportunities have been missed as a consequence of similar misunderstandings?

Speakers and presentation titles:

  • Marius Jøhndal – “Building and using online corpora for (historical) linguistic research”
  • Aaron Griffith – “Pre-verbal ceta ‘first’ in the glosses (and some thoughts of the origin of the notae augentes)”
  • Gregory Toner – “Machine learning and the dating of Medieval Irish texts”

Works Cited:

Thorlac Turville-Petre, ‘Editing Electonic Texts’, in Probable Truth: Editing Texts from Britain in the Twenty-First Century, eds Gillespie, V. and Hudson A., pp, 55-70, at pp. 61-2.

Working in a Cultural Heritage Institute

Source: http://maynoothcollege.ie/national-museum-maynooth/
Source: http://maynoothcollege.ie/national-museum-maynooth/

To follow our progress go to Twitter: @Ecclesiology3D

This post is the first of a series of posts detailing my experience of working with the ecclesiological material in St. Patrick’s College Museum.  The primary goal of this research was to enable students participating in AFF622: Digital Heritage: Theories, Methods and Challenges as part of the MA in Digital Humanities, An Foras Feasa, National University Maynooth, to develop the practical skills required for working with photogrammetry and 3D scanning to explore a cultural heritage scenario.

The research deliverables included selecting and capturing appropriate ecclesiastical artefacts.  Furthermore, students were expected to design and carry out an effective workflow for 3D recording cultural heritage projects including: Capturing, processing, online publishing, 3D printing, and writing a report

The research was carried out in cooperation with the National Science and Ecclesiology Museum, St. Patrick’s College, Maynooth.  Working alongside the curator, Dr. Niall McKeith, and under the guidance of the course coordinator, Dr. Konstantinos Papadopoulos, the project team captured data for fourteen objects from the Museum’s collection of ecclesiastical artefacts in November 2016.

Many of the observations made here can be applied to other similar digitisation projects.  It should be noted that the hardware and the software used by the team were prespecified by the course coordinator and therefore, discussions surrounding their use did not include resource acquisition.  Details will be provided in later posts.  A list of factors that may need to be addressed by other projects is included at the end of the section.

Much of what follows will highlight the importance of thorough project planning in advance of data capture.

Digitisation is above all an access strategy.  Like many smaller cultural heritage museums, the opening hours of The Museum of Ecclesiology are very restricted.  At present, the institution does not provide access to the collection in digital forms.  Considering the limited opening hours of the museum at St Patrick’s College, digitisation will open new modes of access to the collection.

Public access to artefacts is an important factor when considering which objects to digitise and when to digitise.  In a cultural heritage institution with regular public access the project should aim to minimise its impact to the institution’s other activities.  Coordination between project members and museum staff is essential.

A further consideration in this regard in the amount of time assigned to data capturing.  Quality control throughout the data capturing process is essential.  Repeated digitisation should be avoided and is often not an option.

Most museums will not have the appropriate physical environment and hardware and software systems in place.  In relation to St. Patrick’s College Museum the lighting was poor and the physical environment was uncomfortable due to the temperature.  The hardware and the necessary software had to be transported to the museum for the data capturing.

Source: https://maynoothcollegemuseum.wordpress.com/history-of-the-museum/
Source: https://maynoothcollegemuseum.wordpress.com/history-of-the-museum/

Ideally, original items should be handled as little as possible.  Where necessary, appropriate precautions were taken when handling original items: team members wore cotton gloves and the curator handled the very fragile objects such as the Ivory Ciborium.  When working with the Laser Scanner, the user manual recommends using powder for shiny or metallic surfaces.  Similar recommendations are made for these items when using photogrammetry.  It is unlikely that you will be permitted to use powder on original artefacts.  The same can be said for the use of alignment markers.

The metadata models implemented by the project were restricted by the prespecified online publishing platforms (for further detail see Online Publishing).  The historical and contextual information for many of the ecclesiological materials at St. Patrick’s Museum is extremely limited, in many cases the provenance is unknown.  Digitisation in the absence of annotation lacks context and meaning.  A project should consider the digitisation process as a unified whole, focusing on those objects which are best suited to the technology being employed may limit the overall results if there is insufficient data on the object itself.  The approach taken by the current project was to research the individual items.  However, this may not be in the scope of similar digitisation projects and it should be taken into consideration when deciding which items to digitise.

Digital Archaeology vs Digital Humanities, or Why Labels are Important

Source: http://www.mccarty.org.uk/essays/McCarty,%20Humanities%20computing.pdf
Source: http://www.mccarty.org.uk/essays/McCarty,%20Humanities%20computing.pdf

Introduction

In his foreword to the Digital Archaeology section of the Frontiers in Digital Humanities online journal, Andre Costopoulos writes that, “I want to stop talking about digital archaeology.  I want to continue doing archaeology digitally.”  In a series of provocative statements, Costopoulos suggests that conversations concerning the implications of new digital tools are somewhat obsolete as digitisation continues regardless.  He finishes by saying, “Forget the label.  We are building a digital archaeology by doing archaeology digitally.  This is what we do” (Costopoulos).  In his response to Costopoulos’ piece, Jeremy Huggett writes,

There’s no doubt that every archaeologist is a digital archaeologist, in the sense that everyone uses a computer to some extent at some point in their work….  However, not everyone is a digital archaeologist.

It is Huggett’s view that to avoid having conversations concerning the implications of new digital tools would be an “abrogation of responsibility” (Huggett, Let’s Talk About Digital Archaeology).  At first glance, both contributions seem directly applicable to Digital Humanities and that is how they were read by many participants in a recent group discussion, myself included.  However, on closer reading the relationship between Digital Archaeology and Digital Humanities becomes tenuous.  Should we read the debates within Digital Archaeology as though are synonymous with those in Digital Humanities?

 

Discussion

The answer to this question largely depends on how we define Digital Humanities and the relationships between Digital Humanities and independent humanities disciplines, including Digital Archaeology.  As Matthew Kirschenbaum points out, this is a question to which the multiplicity of answers now constitutes a genre (Kirschenbaum, 1).  For me, Digital Humanities is a transdisciplinary field of study, not a sub-field or sub-discipline and it is equally grounded in the theoretical, the methodological and the practical.  To my mind, such a definition attests to the applicability of digital technologies across the humanities whilst giving sufficient recognition to the fact that Digital Humanities constitutes an independent field of study.  I do not consider myself a Digital Humanities scholar because I apply digital tools to Medieval Irish material (or perhaps more accurately I intend to apply digital tools to Medieval Irish material).  If such work required a label, I would say that I am a Digital Medievalist.  What sets Digital Humanists apart from those who employ digital tools within their independent disciplines is that they have considered the wider institutional, cultural and political issues.

Perhaps more telling, however, are the attitudes of Digital Archaeologists and Digital Humanists themselves.  In 2012, Stuart Dunn wrote that the relationship between archaeology and Digital Humanities has been curiously lacking (Dunn).  In a paper, referencing Dunn’s work, Huggett writes that “digital humanists are not queuing up to access DA and digital archaeologists are not knocking on the door of DH” (Huggett, “Core or Periphery? Digital Humanities from an Archaeological Perspective”, 92).  In 2015, he highlighted the divergences in the developmental histories of the two fields and argued in favour of a “third wave” for Digital Archaeology which greatly differs from the “third wave” proposed for Digital Humanities (Huggett, “A Manifesto for an Introspective Digital Archaeology”, 89).  From a Digital Humanities perspective, Figure 1, originally published in 2002, maps the various methodological commons of Digital Humanities and archaeology is notably absent.

 

Conclusion:

I am not suggesting that discourse within Digital Archaeology cannot be beneficial to Digital Humanities scholars, or vice versa.  There are undoubtedly many strong parallels between Digital Humanities and Digital Archaeology.  However, one should exercise caution when reading Digital Archaeology material and should not assume that Digital Archaeology is synonymous with Digital Humanities.  To paraphrase Huggett in his article cited in the above introduction: there’s no doubt that every archaeologist is a digital archaeologist, in the sense that everyone uses a computer to some extent at some point in their work.  However, not every Digital Archaeologist is a Digital Humanist.

 

Further Reading:

Costopoulos, Andre. “Digital Archaeology Is Here (and Has Been For a While).” Frontiers in Digital Humanities 3 (2016): 1-4. Web.

Dunn, Stuart. CAA1 – The Digital Humanities and Archaeology Venn Diagram. N.p., 206AD. Web.

Huggett, Jeremy. “Core or Periphery? Digital Humanities from an Archaeological Perspective.” Historical Social Research 37 (2012): 86-105. Web.

—. “A Manifesto for an Introspective Digital Archaeology.” Open Archaeology 1 (2015): 86–95. Web.

—. Let’s Talk About Digital Archaeology. N.p., 2016. Web.

Kirschenbaum, Matthew. “What Is Digital Humanities and What’s It Doing in English Departments.” ADE Bulletin 150 (2010): 1–7. Web.

Reflections on the Ethics of Tangible Cultural Heritage Reconstruction

Introduction

When reflecting on the recently reconstructed Triumphal Arch from Palmyra which was destroyed by ISIS in May 2015, I have found myself considering the anxiety that surrounds the recreation or reconstruction of cultural heritage artefacts.  On the face of it, the use of digital technology adds extra dimensions to discussions about the ethics of cultural heritage conservation.  In an age of digital reproduction, one of the primary questions appears to have become not whether we can reconstruct cultural artefacts, but rather whether we should.  This question becomes particularly sensitive when it is applied to the physical restoration of tangible cultural heritage that has been destroyed as a result of armed conflict.  Even more so, when the culture in question is still the subject of deliberate destruction.

Discussion

However, when we drill down into the question of whether or not we should be reconstructing cultural heritage objects we find that there is nothing particularly new about the situation in which conservation professionals now find themselves. Firstly, the replication of cultural artefacts (both 2D and 3D) is not a modern phenomenon nor are the anxieties surrounding their reproduction.  In 1936, Walter Benjamin wrote that ‘man-made artefacts could always be imitated by men’.  Benjamin was most concerned with the ‘aura’ of the artefact and according to him the most destructive social consequence of mechanical reproduction is ‘the liquidation of the traditional value of cultural heritage’ (Benjamin).

Secondly, whilst concerns for cultural heritage during times of armed conflict run the risk of appearing indifferent to the immediate need to preserve human life, the desire to conserve and protect cultural heritage objects during times of active combat is well documented.  Perhaps most famously, the Monuments, Fine Arts, and Archives (MFAA) program was established during the second world war to protect and preserve Europe’s cultural heritage.

Similarly, the reconstruction of a people’s cultural heritage post-armed conflict is not without precedent.  Moreover, the restoration of cultural artefacts in the immediate aftermath of the object’s destruction is not unprecedented.  The example of the destruction of the Dalada Maligawa (the Temple of the Tooth Relic) of Sri Lanka destroyed by the Liberation Tigers of Tamil Elam (LTTE) in 1998 will serve as an example for both points.  In the day immediately following the destruction of the Temple, calls were made for its restoration and Sri Lankans proceeded to rebuilt this significant cultural symbol.  So why the anxiety?

The anthropologist Valene Smith has written: ‘Wars are without equal as the time-makers of society.  Lives are so irrevocably changed that culture and behaviour are marked by three phases: “before the war”, “during the war”, and “after the war”’ (Smith).  Cultures develop and transform throughout the course of and as a direct result of armed conflict.  In relation to the 3D-printed scale model of the Triumphal Arch from Palmyra, Mark Sinclair has suggested that what is unprecedented is the ability to reconstruct cultural artefacts “during the war” (to employ Smith’s terminology) and to my mind this is ultimately the source of the anxiety surrounding the reconstruction of the Arch of Triumph (Sinclair).

Conclusion

The question then, is not really should we reconstruct cultural heritage objects, but rather when should we reconstruct them?  One advantage that digital reproduction has over mechanical reproduction is the ability store large amounts of detailed information without having to produce replications.  Given that it is no longer necessary, is it appropriate to recreate the artefacts of a culture that is still under threat?  As the copyist Adam Lowe has observed, ‘the critical thing now is to document. Later we can decide what to do with the material we collect’ (Sattin).

References

Benjamin, Walter. The Work of Art in the Age of Mechanical Reproduction. N.p. Web.

Sattin, Anthony. Meet the Master of Reproduction. N.p., 2015. Web.

Sinclair, Mark. Should Museums Be Recreating the Past. N.p., 2016. Web.

Smith, Valerie. “War and Tourism: An American Ethnography.” Annals of Tourism Research 25.1 (1998): 202–27. Print.

Further Reading

Stanley-Price, Nicolas. Cultural Heritage in Postwar Recovery. ICCROM, 2005. Web.

Reflections on Copyright Law in Ireland and Public Engagement 3D Scanning Projects

ogham-in-3d

Introduction

The issue of copyright is a contentious one, particularly in the modern era of digital productions and reproductions. In what follows, I offer briefly a couple of thoughts which have occurred to me during my recent endeavour to get to grips with copyright law in Ireland as it applies to the digitisation of cultural heritage artefacts. Foremost in my thoughts have been the implications for copyright of public engagement projects involving photogrammetry and structure in motion software. In the absence of legislation dealing specifically with digital works, I have followed the examples of others in examining the current legislation as it applies to photographic materials (Weinberg, 3-6; Margoni, 26-50).

Legislation

The primary legislation governing copyright in Ireland is the Copyright and Related Rights Act 2000-2007 (CRRA 2000). Section 2(1) CRRA classifies a photograph as an ‘artistic work’. As such, photographs must be original in order to attract copyright protection. Traditionally, for originality to be considered present it was required that an artistic work display a modest level of skill, labour and effort on behalf of the author and that it should not be copied from another source. However, a recent report aimed at improving national copyright law recommended that the development of the statutory definition of ‘originality’ should be left to the case-law of the Court of Justice of the European Union (EJC) (Department of Jobs, Enterprise and Innovation, 33-4). Under EU law the required originality standard is that the work being protected is the ‘author’s own intellectual creation’. The author’s own intellectual creation is present when the author can make free and creative choices and put their own person stamp in the work. Consequently, ‘labour, skill and effort’, no matter the amount, do not necessarily result in originality. The EU originality standard further specifies that in instances where an expression is governed by technical or functional rules or a specific goal, no originality can be present (Margoni, 14-16).

Community Engagement Projects

In recent years, a number of heritage projects within Ireland have actively sought to engage the public in the production of 3D records and visualisations (see below for a list of references). In doing so, they have highlighted the affordability, accessibility and, perhaps most importantly, the simplicity of new recording technologies. For example, the coordinator of the Roscommon3D and Galway3D citizen science projects, states that ‘there is very little skill invovled [sic] as most of the work is done by a computer’ (Dempsey, 3).

Conclusions

In attempting to establish whether or not the individual digitised objects created by these community based projects would acquire copyright protection, an application of the EU originality test would more than likely produce a negative result. Firstly, the author (defined here as the photographer) cannot exercise free and creative choice. These projects specify photographic subjects and the success of the subsequent three dimensional visualisation is dependent on the accuracy of the photographs, that is to say that the photographs must be taken according to a predefined set of rules. Secondly, an author might argue that processing legitimises their claim to originality as the EJC left a certain degree of ambiguity in this respect in the Painer case. However, by highlighting the lack of human input required by the technology, community based 3D modelling projects have negated this ambiguity. In my view, the individual digitised objects created for these projects would fail in a copyright claim as they appear to fall short of the requisite originality standard in every respect.

References:

Cases:

Legislation:

Government Reports:

Literature:

3D Public Engagement Projects:

Further Reading: