Historical research that utilises Digital Humanities methodologies often construct databases to help structure the data and information being gathered. This can cause difficulties for how the data is structured and the type of database to be constructed. Luke Kirwan in an 2013 conference paper succinctly summed up the problems in a series of questions and observations:
“The application of digital humanities to the study of history is considered a new methodological approach, but it remains rather ad-hoc. How should a database be structured? How should ‘nouns’ be encoded? How do we tie ancient, sometimes no longer extant regions, into a modern GIS system?” (Kirwan 2013)
Kirwan’s paper on ‘Databases for quantitative history’ raises a number of questions about the feasibility of implementing databases for historical research, and in particular the use of relational databases for quantitative historical data.
The first question that needs to be addressed before any comparative analysis can be implemented is that of definitions. The terms ‘model’ and ‘data’ are utilised indiscriminately with very little definition of what each term means in the context of different modelling techniques. Goodman in Language of Art (1976) wrote a prescient critique of the term ‘model’ in the 1970s that is still relevant to our current context. Goodman’s critique was outlined in detail in Michael Gavin’s article on ‘Agent-based modelling and humanities research’.
’Few terms are used in popular and scientific discourse more promiscuously than ‘model.’” (Gavin 2014)
Goodman’s critique of ‘model’ was in popular and scientific discourse. In recent decades, the term ‘model’ is now ubiquitous across the social sciences and the humanities. This has led humanities scholars to embrace phrases such as ‘data,’ which must be ‘modelled’ for results. This is part of a broader cultural shift within elements of humanities research to borrow the rhetorical phraseology of the scientific method. In particular, this is evident in the digital humanities and this is, in many ways, the most appropriate venue for this hybrid discourse; as digital humanities is in and of itself a hybrid between the humanities and computational techniques.
In the module AFF604A, two distinct types of data modelling were outlined; relational databases and graph databases. Although neither technique is mutually exclusive and can be implemented in tandem on a project that requires a bifurcated approach, it is more common for the databases to be applied separately. For example, the 1641 Depositions project did utilise elements of graph database design and relational databases. However, from exploring the website it would appear that relational databases underpin the significant elements of the project (Conlan and Lawless). In addition, it is unclear to what degree the project followed data modelling standards such as E.F. Codd’s concept of data normalisation.
Another example is the Irish Army Census 1922;this project is a database that is neither a relational database or graph database but attempts to utilise each in its design. The result is a largely unusable dataset that is held in stasis due to the ad-hoc nature of database implementation as outlined by Kirwan. The main search index is designed around a simplistic relational database structure, however it is unclear from examining the website to what degree the data has been normalised.
Although digital humanities is utilising and opening the discussion about different types of modelling data, a lot more research into different computational techniques and the use of differing models is required for implementation in the field of historical research and in particular the design of historical databases.
Gavin, M, ‘Agent-based modeling and historical simulation’ in Digital Humanities Quarterly, viii, no. 4 (2014).
Kirwan, Luke, “Databases for quantitative history” Proceedings of the Third Conference on Digital Humanities in Luxembourg with a Special Focus on Reading Historical Sources in the Digital Age, Luxembourg, December 5-6 2013.
“Knowledge is what we know. Data is fact, it exists irrespective of our state of knowing.”
(V. Das Gupta, 2017)
The above definition was posited by Dr. Das Gupta as part of a recent lecture. The quotation is evocative of the Donald Rumsfeld quote about ‘known knowns’ (The Atlantic) which itself is a useful way of encapsulating how certain government agencies gather and stratify information. One potential manner of connecting data and knowledge is through context and visualisation. If we can visualise data within a specific context that is related to our pre-existing knowledge of the data, can this lead to a further contextualisation? Or is data visualisation something that is inherently a series of choices and decisions made by the author, resulting in differing interpretations and divergent strands of ‘knowledge’.
Michael Friendly’s chapter ‘A brief history of data visualisation’ (Friendly, 2008) provides a concise overview of data visualisation and its uses throughout history. Friendly attempts to chart the development of data visualisation through the ‘Milestone Project’ (Friendly 2). Friendly’s conclusions are relevant to this question of data visualisation as a tool to help contextualise and expand knowledge about a given set of facts.
“From this history one may also see that most of the innovations in data visualisation arose from concrete, often practical goals: the need or desire to see phenomena and relationships in new or different ways. It is also clear that the development of graphic methods depended fundamentally on parallel advances in technology, data collection and statistical theory.”
From Friendly’s conclusions, it is clear that innovation in data visualisation was driven by goals dictated in some cases by private industry and governmental agencies. In order to present a test case of data visualisation adding context to a given set of facts, some tabulated statistics from the Irish Census 2011 will be utilised in a series of data visualisations. I have taken data originally collated by the Central Statistics Office (CSO) that has been packaged so that it can be projected into a series of thematic maps in the programming language R. This Irish Census data has been provided courtesy of Prof. Chris Brunsdon and the National Centre for Geocomputation at Maynooth University (data available on request). I created a series of maps in order to illustrate different ways that data visualisation can be utilised. To start with, I am going to be visualising the population data released by the CSO collated from the Irish Census 2011. The total population of Ireland in 2011 as recorded on the census was 4.5 million persons (CSO). Projecting these results onto a map of Ireland looks like this:
This visualisation of Ireland presents the population of Ireland in 2011 per county, and uses a colour coding system to indicate high and low population counties. However, the results only provide a superficial indication of the population disparity within Ireland. Over 1 in 4 citizens of Ireland live in Dublin City and within the greater Dublin area (CSO). However from this map this is largely obscured by the geographical projection of the counties’ physical size. To compensate, I utilised a cartogram or density projection map; this map utilises the quantity of the data being measured instead of following the geographic boundaries.
In this map, the city and suburbs of Dublin are so large that the geographic boundaries of Ireland are stretched and distended. While initially complex to examine, this type of data visualisation is a more accurate way of describing population density. One critique of cartograms, in addition to the geographic distortion, is the majoritarian representation of the country. In the example of Ireland, the population of Dublin results in a Dublin-centric map. Although it is not clearly illustrated on a cartogram, the majority of Irish people do not live in Dublin City or its environs. However, the fact that over 1.2 million people do live in the greater Dublin area results in a distortion. As democratic systems attempt to accommodate the minority position within the wider society, using cartograms to make policy decisions would greatly skew the benefits of the society onto the largest areas while severely neglecting the non-Dublin regions of the country. In the case of Ireland, this phenomena is evident in recent social surveys, economic data and a recent report from the Irish Department of Housing and the Economic Social and Research Institute (Irish Times).
From the above maps, it is clear that cartograms present the surveyed data (total population) in a more accurate manner but the geographic boundaries of the map are affected by this. I then used a different metric from the Irish Census 2011, that of empty houses. This is a particularly relevant subject as the current homelessness crisis within Ireland has been blamed on lack of action dealing with empty houses during the period 2011 – 2016 (Irish Independent). The CSO created two specific types of data relating to empty houses; a percentage of vacant houses and a total of vacant houses. Starting with the percentage of vacant houses, I have created a map using the Irish National Grid, similar to the map of the total Irish population.
From the above map, the highest percentage of vacant houses are in the North West of the country, specifically Leitrim and Donegal. It is important to note that these counties have a low total population, in particular Leitrim which has the smallest population of the entire country. This can be visualised in the following map series:
While this comparative map illustrates the differences between the percentage of empty houses and the total population, it is a misleading comparison as one is a percentage and the other is a total of persons. A more accurate comparative analysis is that of the total number of vacant houses compared to the total population.
This presents a more balanced comparison and shows that although there are a large number of vacant houses in Donegal, Cork replaces Leitrim as the county with the highest amount of vacant dwellings in absolute terms. A cartogram of vacant houses shows the following:
The result of these series of maps shows that although Leitrim was cited repeatedly as a place with the largest number of vacant houses (Leitrim Observer), this was only true as a percentage of all the houses in the county. A more accurate measure of vacant houses shows that Dublin City contained a large amount of vacant houses in 2011, therefore undercutting the argument that surplus houses in remote rural areas contributed to the housing shortage.
In conclusion, the examples of data visualisations created show that this type of visualisation can be beneficial to give added context and additional knowledge to a pre-existing set of facts or perceived facts as reported by the media. However, there are marked differences between the two different sets of data utilised. In the first set of examples the total population of Ireland was mapped onto a series of geographical maps of Ireland. Although the cartogram of population is more accurate for depicting urban density, the majoritarian perspective of the country as depicted by the cartogram was inaccurate. This is due to the sheer imbalance in the distribution of the Irish population. In the second set of examples examining vacant housing, the initial set of statistics on the percentage of vacant houses was utilised in media reporting to imply that the housing crisis had its origins in the amount of houses built in largely unpopulated rural counties such as Leitrim. However using the more accurate number of vacant houses reveals that although the percentages of vacant houses in small rural counties is disproportionally high, a significant number of vacant houses remained in urban centres such as Dublin and Cork. The cartogram that mapped the total number of vacant houses resulted in a more accurate depiction of the housing surplus across the Irish state. The question of whether data visualisation can add context to pre-existing knowledge is true in the case of the vacant houses. The same question applied to the total population of the Irish state is more complex, as the cartogram is more accurate in projecting density but skews the overall perception of population distribution into a Dublin centric map. The context has been added to the knowledge, but it requires further contextualisation or a different data visualisation.
‘Rumsfeld’s Knowns and Unknowns: The Intellectual History of a Quip’ The Atlantic Accessed February 17th 2017.
‘Census 2011 – Profile 1 Town and County’ cso.ie Accessed February 12th 2017.
Friendly, Michael. “A Brief History of Data Visualization.” Handbook of Data Visualization. Ed. Chun-houh Chen, Wolfgang Karl Härdle, Antony Unwin Springer Berlin Heidelberg, 2008. pp. 15–56.
‘A housing crisis while 165,000 houses lie empty’ Irish Independent Accessed February 12th 2017.
‘Dublin’s dominance of state near unique in Western world’ Irish Times Accessed February 12th 2017.
‘County has highest level of vacant houses’ Leitrim Observer Accessed February 12th 2017.