Effective Data Modelling Techniques for the Humanities.

The process of data modelling for the purposes of analysis and querying humanities data differs from that of absolute data, due to the underlying differences between humanities data and other data. Humanities data, by it’s very nature, is conceptual, interpretive and subjective. Therefore, it is difficult to model and to query, due to the different viewpoints of the creators and users of humanities-based databases. Johanna Drucker (‘Humanities Approaches to Graphical Display’, Digital Humanities Quarterly, 5(1), 2011) argues that ‘(r)endering observation (the act of creating a statistical, empirical, or subjective account or image) as if it were the same as the phenomena observed collapses the critical distance between the phenomenal world and its interpretation, undoing the basis of interpretation on which humanistic knowledge production is based.’ Drucker’s understanding of data modelling for the humanities has it’s roots at the interpretation of data. She continues, to point out that ‘at the very least, humanists beginning to play at the intersection of statistics and graphics ought to take a detour through the substantial discussions of the sociology of knowledge and its developed critique of realist models of data gathering. At best, we need to take on the challenge of developing graphical expressions rooted in and appropriate to interpretative activity’.

With this understanding of humanities data in mind, the attention can turn to the two different types of data modelling techniques, XML and semantic models, and how they are used. By understanding the uses and principles for each schema, one can more effectively assess which is most appropriate to model humanities data.

The principles of XML allow for the querying of almost any type of database, as XML is not a language in it’s own right, but rather a formalism for creating query languages within the context of a particular database or set of databases. In relation to humanities data as a whole, XML is the most customisable form of data modelling. Reverting to the earlier argument that all humanities data is subjective and interpretive, the customisable features of XML query language allows the creator and user of the database to conceptualise the data in their own fashion. The XML tags of a database exist in the logical schema, or the structural reference guide to a database. There is no wrong type of logical schema for XML tags but the creator must always thrive for a consistent output on each database, which is detailed in the structure of an entity relationship model. Wendell Piez (‘Three Questions and One Experiment On Data Modeling in the Humanities’, Workshop on Knowledge Organization and Data Modeling in the Humanities, Brown University Press: Providence, RI, 2012), highlights the capabilities of personalisation in XML documents; ‘every XML instance, as an encoded document, both depends on character encoding and the rules of XML syntax, and it implies a schema, more or less coherent, even if it does not call one explicitly.’ With this understanding of XML and the study of humanities data, it is not difficult to understand the appeal of conceptual data modelling which still functions in a very structured and linear fashion.

In contrast, the principles of the semantic web work in a very contextual way, defining the meaning of data within the context of the interrelationships between various data within a database. Semantic web models define the resources, ideas, and events of a real-world figure within the physical data stores of a model. Therefore, the model most be a representation of the real world. This can have a two-sided effect on the interpretation of the data. On the one hand, the findings of the data may be taken as true, given that they have gone through a computational querying and structuring. On the other hand, one can argue that all data is conceptual, so despite the computational understanding, the findings can always still be assessed. Semantic web models are most useful in the context of large, sharable databases, where the data can be queried from any number of users, all of whom can easily contradict the findings of each other. The most popular uses for the semantic web include artificial intelligence modelling, due to the effectiveness of the semantic web for the assessment of the interrelationships within large databases.

When one asks which data modelling technique is most useful in the context of the humanities, it is fair to say that the usefulness of the method depends on the needs of the user. XML works in a very structured and linear format, but the main navigation and query language of the method is subject to the needs of the user. In contrast, the semantic web, while also being very customisable, provides the user with a very advanced method of understanding data, due to the assessment of the interrelationships of data.

Works Cited
Johanna Drucker, ‘Humanities Approaches to Graphical Display’, Digital Humanities Quarterly, 5(1), 2011
Julia Flanders and Fotis Jannidis ‘Data Modelling’ in Susan Schreibman, Ray Siemens and John Unsworth (eds.) A New Companion to Digital Humanities, Oxford, UK: John Wiley & Sons Ltd (2016).
Wendell Piez, ‘Three Questions and One Experiment On Data Modeling in the Humanities’, Workshop on Knowledge Organization and Data Modeling in the Humanities, Brown University Press: Providence, RI (2012)

Knowledge, Data and Digital Humanities: What’s the Problem?

The relationship between knowledge and data is inherently linked, but we tend to equate them differently. In the first instance, one must always consider what both of these terms mean. What defines data and what defines knowledge? In our first class for this module, the discussion geared towards representations of knowledge and what constitutes knowledge. The definitions have changed over many years of philosophical thinking, but can now be understood as ‘[f]acts, information, and skills acquired through experience or education; the theoretical or practical understanding of a subject’, ‘[t]he sum of what is known’, ‘[i]nformation held on a computer system’. (Oxford English Dictionary). This third definition of ‘information on a computer system’ is where our interests lie with regards to the relationship between knowledge and data and their web-based counterparts, as it goes part of the way to defining how we understand knowledge representations. Data is the term used to represent facts digitally, even though data is the fact itself. It is through this terminology that the relationship between knowledge and data becomes, somehow, problematic.

Unlike data, knowledge can be a very ambiguous and layered term and holds different connotations from person to person. In contrast, data is straightforward in comparison; data is a representation of information and the construction of the data changes depending on the information. Data holds no argument and cannot be used in its basic form to facilitate a persuasive point unless it is assisting a pre-existing theory relating to the data. Davis, Schrobe and Szolovitz (1993) discuss the representation of knowledge under a number of conditions, specifically relating to artificial intelligence. Some of these conditions work together while others are independent of their counterparts, dependant on the condition itself, but the conditions are laid out in five major representations which together provide a framework for how we understand the representation of knowledge.

Brewster and O’Hara (2007) use the five principle conditions laid out by Davis, Schrobe and Szolovitz to show how knowledge representation, computer science and data all relate. Brewster and O’Hara understand knowledge representation as the managing of a collection of facts – in other words, the creation of the dataset. This seems like a very straightforward task, until one analyses how the authors understand knowledge. For Brewster and O’Hara, knowledge exists in two interrelated parts; the first is that knowledge is a single edifice to which new, conceptual building blocks are always added. The concepts are key, but we are constantly manipulating them with words.

The five principles of knowledge representation are used throughout the conceptualisation of knowledge in digital form. Surrogacy is the internal representation and external reasoning of knowledge. The type of representation is used in data visualisation technologies to make sense of the data shown and to show how ambiguities can be hidden under fact-based pretty pictures. Surrogacy works well with relation to the ontological commitments of knowledge representation and allows us to ‘make a decision about how and what we see in the world’ (Brewster and O’Hara). The final three principles allow one to understand how we transition from human reasoning to digital representation of that reasoning. The first is a fragmentary theory of intelligent reasoning and an examination of how people reason. The second is the medium for efficient computation, which dictates that the knowledge must be programmable, and the final principle is the medium for human expression and how problems are inevitable when transitioning to the web scale.

Through data visualisation tools, we can represent knowledge in an efficient and ambiguous manner, which allows people to come to their own conclusions about the data. This is the most efficient form of web-based knowledge, and allows accessibility to thrive. The most challenging aspect of knowledge representation is, in fact, the human component behind it, as people must acclimatise themselves to representing data in a way other than just words.

Further reading

Armstrong, David, 1973. ‘A defence of reliabilism.’ Belief, Truth and Knowledge, Cambridge: Cambridge University Press
Bateman, J. A. (1993) ‘Ontology Construction and Natural Language.’ Proceedings of the International
Workshop on Formal Ontology
. 4 May 2012.
Brewster, C. and K. O’Hara (2007). ‘Knowledge representation with ontologies: Present challenges —
future possibilities’, International Journal of Human-Computer Studies 65, Elsevier. 4 May 2012.
Davis, R., H. Shrobe, and P. Szolovits. ‘What is a knowledge Representation’. AI Magazine 14.1:( 1993). 24 Jan 2015.
Gettier, Edmund, 1963. ‘Is Justified True Belief Knowledge?’ Analysis, 23, pp. 15-34.
McCandless, David, ‘The Beauty of Data VisualisationsTED Talks. (Web) Accessed 12 Feb 2017.