The Data Type Defines the Model

A comparison of data modelling techniques essentially comes down to what technique is optimal for working with what type of data. Of course, other questions often need to be taken into consideration, namely the cost of implementing any given technique and the ability of the relevant people to interact with the data modelled in a given format. With regard to humanities data, data needs to be accessible to researchers who may not have an extensive amount of computer science experience, and such a limitation needs to be taken into consideration when modelling data.

The two primary methods of storing and modelling data are relational databases and graph databases. Each fulfils a different role, and it is only by understanding each technique and how it functions that it is possible to assess which is most suited for modelling which type of data. Both of these database types have their own query language and their own rules and conventions, but the primary reason why an individual may use one over another is down to the type of data which is being worked with.

Relational databases, which use the SQL query language, are comprised of tables and keys. Pieces of data are stored in tables and is accessible through the input of keys. Pieces of data which are connected to each other (or related, hence the name) have their tables joined together, a memory-expensive process. These types of database are, by their very nature of being tabular, best suited to dealing with quantitative data, be that data humanities or otherwise.

Graph databases, however, are not comprised of tables but of nodes and edges. Edges connect related nodes to each other, and these connections are easily plotted and visualised, hence why they are termed graph databases. Resource Description Framework (RDF) databases, using the SPARQL query language, are a form of graph database. While such a modelling technique is also particularly well suited to quantitative data, it is optimised for storing descriptive information in a body of text. Information is often presented to the end user as a table with a predicate in one column and the object (i.e. the text) in another, but it is not stored as a table, and the SPARQL query language is optimised for searching through a chunk of text in order to find the specific query.

In conclusion, it is impossible to say whether one data modelling technique is better or worse than any other. Such an opinion is necessarily subjective depending on the experience of the opinion-holder. However, it is possible to say that different modelling techniques are better or worse with regard to handling different forms of data. And while relational databases are optimised for quantitative data, and graph databases for more descriptive data, XML databases work best for storing the contents of entire texts, and which modelling form a database designer chooses to utilise ought to be informed, first and foremost, by the type of data that database is intended to hold.

Leave a Reply