Effective Data Modelling Techniques for the Humanities.

The process of data modelling for the purposes of analysis and querying humanities data differs from that of absolute data, due to the underlying differences between humanities data and other data. Humanities data, by it’s very nature, is conceptual, interpretive and subjective. Therefore, it is difficult to model and to query, due to the different viewpoints of the creators and users of humanities-based databases. Johanna Drucker (‘Humanities Approaches to Graphical Display’, Digital Humanities Quarterly, 5(1), 2011) argues that ‘(r)endering observation (the act of creating a statistical, empirical, or subjective account or image) as if it were the same as the phenomena observed collapses the critical distance between the phenomenal world and its interpretation, undoing the basis of interpretation on which humanistic knowledge production is based.’ Drucker’s understanding of data modelling for the humanities has it’s roots at the interpretation of data. She continues, to point out that ‘at the very least, humanists beginning to play at the intersection of statistics and graphics ought to take a detour through the substantial discussions of the sociology of knowledge and its developed critique of realist models of data gathering. At best, we need to take on the challenge of developing graphical expressions rooted in and appropriate to interpretative activity’.

With this understanding of humanities data in mind, the attention can turn to the two different types of data modelling techniques, XML and semantic models, and how they are used. By understanding the uses and principles for each schema, one can more effectively assess which is most appropriate to model humanities data.

The principles of XML allow for the querying of almost any type of database, as XML is not a language in it’s own right, but rather a formalism for creating query languages within the context of a particular database or set of databases. In relation to humanities data as a whole, XML is the most customisable form of data modelling. Reverting to the earlier argument that all humanities data is subjective and interpretive, the customisable features of XML query language allows the creator and user of the database to conceptualise the data in their own fashion. The XML tags of a database exist in the logical schema, or the structural reference guide to a database. There is no wrong type of logical schema for XML tags but the creator must always thrive for a consistent output on each database, which is detailed in the structure of an entity relationship model. Wendell Piez (‘Three Questions and One Experiment On Data Modeling in the Humanities’, Workshop on Knowledge Organization and Data Modeling in the Humanities, Brown University Press: Providence, RI, 2012), highlights the capabilities of personalisation in XML documents; ‘every XML instance, as an encoded document, both depends on character encoding and the rules of XML syntax, and it implies a schema, more or less coherent, even if it does not call one explicitly.’ With this understanding of XML and the study of humanities data, it is not difficult to understand the appeal of conceptual data modelling which still functions in a very structured and linear fashion.

In contrast, the principles of the semantic web work in a very contextual way, defining the meaning of data within the context of the interrelationships between various data within a database. Semantic web models define the resources, ideas, and events of a real-world figure within the physical data stores of a model. Therefore, the model most be a representation of the real world. This can have a two-sided effect on the interpretation of the data. On the one hand, the findings of the data may be taken as true, given that they have gone through a computational querying and structuring. On the other hand, one can argue that all data is conceptual, so despite the computational understanding, the findings can always still be assessed. Semantic web models are most useful in the context of large, sharable databases, where the data can be queried from any number of users, all of whom can easily contradict the findings of each other. The most popular uses for the semantic web include artificial intelligence modelling, due to the effectiveness of the semantic web for the assessment of the interrelationships within large databases.

When one asks which data modelling technique is most useful in the context of the humanities, it is fair to say that the usefulness of the method depends on the needs of the user. XML works in a very structured and linear format, but the main navigation and query language of the method is subject to the needs of the user. In contrast, the semantic web, while also being very customisable, provides the user with a very advanced method of understanding data, due to the assessment of the interrelationships of data.

Works Cited
Johanna Drucker, ‘Humanities Approaches to Graphical Display’, Digital Humanities Quarterly, 5(1), 2011
Julia Flanders and Fotis Jannidis ‘Data Modelling’ in Susan Schreibman, Ray Siemens and John Unsworth (eds.) A New Companion to Digital Humanities, Oxford, UK: John Wiley & Sons Ltd (2016).
Wendell Piez, ‘Three Questions and One Experiment On Data Modeling in the Humanities’, Workshop on Knowledge Organization and Data Modeling in the Humanities, Brown University Press: Providence, RI (2012)

Please tell me what you think!