Relational Databases versus Graph Databases: Gen X versus Gen Y


blog

When developing databases it is known that developers have choices and preferences. For some the relational model is tried and tested, for others it is outdated and confining. So what is the best choice? Where do you hedge your bets? This post will define and explore the connection between the two in order to understand the crux of this question. While data modelling is continually evolving, and therefore continually complicated, for this discussion at least, we will still to the basics.

Relational Databases:

blog2Relational databases are not a “new thing”. While the technology narrative instills in us that the world wide web and all its glories are a very recent invention the relational database has been in existence since the 1970s. Conceived by Edgar Codd as a response to user demands this system is anything but new. This data model is built on a system of tables and the references or relationships between these tables are defined by “keys”. Therefore, to connect these tables, that is to display the relationship, a process called “joining” is required. These operations are usually considered quite server and memory intensive. Therefore, to produce the most efficient model the developer must choose to what level they will ‘normalise’ their data. This process is simply one of standardisation in which data is classified into more accurate categories. Ranging from first to fifth levels the amount of normalisation required is usually at the discretion of the developer. However, additional costs, server space and intensity all factor into this decision. While the process of normalisation can strengthen the capabilities of the database it has been argued that the need for classification can cause difficulties for non-quantitative fields of study, such as the humanities. Conversely, for simple data structures which require little probing the relational database can serve as a truly tested and reliable model.

Graph Databases:

As stated above the relational database is tried and tested to be both successful and useful for certain types of data structures, and arguably could be utilised for others if incorporated properly. However, it is possible that the success of relational databases lies on rocky foundations. A lack of real competition only serves as a highlight and praise to the strength of RDBs. However, graph bases present a new challenge to this longstanding hegemony and now that a viable alternative is on the market, a decision must be made.

It could be argued that graph databases simply build on the relationship model. Both of these databases rely on their ability to connect or “join” data in response to server demand. However, in the case of graph databases, tables and keys have been replaced by a system of nodes and edges. Essentially the nodes represent classes or entities. blog1These entities are “joined” by relationship records, or edges, which can be defined by type, direction or other additional attributes. Therefore, when performing the graph equivalent of a “joining” operation graph databases uses this list of edges, or predicates”, to find the connected nodes. The benefits of this nodes and edges system is the intuitive way in which it allows you to store and manipulate data. In graph databases your data is more flexible, while it should still be small and normalised to a degree the verb defined connection process allows for greater adaptability. This is particularly relevant for humanities databases as it allows expansion beyond the confines of the RDB when handling difficult data.

Conclusion:

In conclusion, if you’re data is relatively neat and quantitative based there is no need to be pressured into incorporating it into a graph database. In these scenarios RDBs have been proven as an effective model for uncomplicated data storage. However, if you’re data is becoming more complex and requires more detail it may be necessary to upgrade to the newer graph database. Not only will this database allow for greater flexibility with you data but, it will also decrease the strain on “joining” operations thanks to the subject, object and predicate basis.

Data Visualisations: Knowledge or not?


gfdg

As a visual learner, I can see the immediate appeal of data visualisations. Large quantities of information can be represented in succinct forms, highlighted by colour and emphasised through shapes. This kind of display is engaging and provides a nice alternative for what can sometimes be a monotonous world of stagnant text. However, as with any source these visualisations must be subjected to scrutiny. Without delving too far into the deep realm of epistemology it is important to assess what transforms data into knowledge and where do these visualisations fall on our triangle.

I am not an epistemologist, nor am I an expert in data analysis, but in the study of history we are trained to see that while we may never know the past in absolute certainty we can employ strategies to obtain as much knowledge as is within our power. My aim with this post is to highlight how humanities research techniques can be applied to data visualisations so that we can estimate the place of these visual products both on the triangle and within research.

To truly understand the risks in using data visualisations we need to look at the process behind their fncreation. The first is data collation.  During a recent lecture Dr Vinayak Das Gupta posited to our class that data was fact regardless of our state of knowing. Building on this, if we can accept that data is fact, and therefore non-negotiable, our emphasis must shift to the researcher who gathers this data. For any form of data collation, the researcher must set parameters to define their data by. These parameters decide which data is included and which is left behind. Coming from the humanities perspective this selection process is equal to the choice of selecting primary resources. A good researcher will aim to gather large quantities using clear and unbiased parameters in the same way that historians aim to use a wide variety of primary source material. Inherent in both disciplines is the possibility of biased selections. Therefore, when using data visualisations, as with secondary texts, it is essential to interrogate which how the data was collected and which parameters were applied.

Following on from data source selection is contextual support. In the same way that primary source material must be placed within its historical background so to must data visualisations be placed in context. In his TED talk on the subject David McCandless exhibits how visualisations without context can be dangerous and misleading (McCandless). McCandless begins with a visualisation comparing American military spending to other countries, as expected the visualisation returns a large red sector for America which dominates the screen. However, McCandless follows this nvisualisation with another placing American military spending within the context of American GDP. The new data context reveals a new side to the data in general and alters our view of military spending. Therefore, context is integral to understanding whether these visualisations are providing us with knowledge or a skewed reflection of manipulated data sets.

Therefore, data visualisations are very similar to a historian’s traditional realm of benefits. However, they come with added benefits. Thanks to data processing researchers can assess large quantities of data which would overwhelm if not outreach the traditional research. In addition to bringing a visually pleasing product to the reader visualisations can accomplish tasks that would be near impossible without the technology and allows us as researchers to see trends and patterns that we may have never noticed without this technology.

So where does this leave visualisations on the data-knowledge triangle. Despite their benefits there is no doubt that as with any other source visualisations need to be subjected to scrutiny and placed within context to be of true value. They, in themselves, are not enough to constitute knowledge and we cannot automatically assume that they are justified even though the data may be true. As a consequence, visualisations lie firmly in the information category. With added information and justification, they may assist us in our pursuit of knowledge but they in themselves are not enough to constitute it.

Further Reading:

‘Recorded Crime Offences by Type of Offence and Quarter’ cso.ie. Date accessed: 15 February 2017.

McCandless, David, ‘The Beauty of Data Visualisations’ TED Talks. https://www.youtube.com/watch?v=5Zg-C8AAIGg. Date accessed: 15 February 2017