A reflection on data and knowledge

“Knowledge is what we know. Data is fact, it exists irrespective of our state of knowing.”
(V. Das Gupta, 2017)

The above definition was posited by Dr. Das Gupta as part of a recent lecture. The quotation is evocative of the Donald Rumsfeld quote about ‘known knowns’ (The Atlantic) which itself is a useful way of encapsulating how certain government agencies gather and stratify information. One potential manner of connecting data and knowledge is through context and visualisation. If we can visualise data within a specific context that is related to our pre-existing knowledge of the data, can this lead to a further contextualisation? Or is data visualisation something that is inherently a series of choices and decisions made by the author, resulting in differing interpretations and divergent strands of ‘knowledge’.

Michael Friendly’s chapter ‘A brief history of data visualisation’ (Friendly, 2008) provides a concise overview of data visualisation and its uses throughout history. Friendly attempts to chart the development of data visualisation through the ‘Milestone Project’ (Friendly 2). Friendly’s conclusions are relevant to this question of data visualisation as a tool to help contextualise and expand knowledge about a given set of facts.

“From this history one may also see that most of the innovations in data visualisation arose from concrete, often practical goals: the need or desire to see phenomena and relationships in new or different ways. It is also clear that the development of graphic methods depended fundamentally on parallel advances in technology, data collection and statistical theory.”
(Friendly 30)

From Friendly’s conclusions, it is clear that innovation in data visualisation was driven by goals dictated in some cases by private industry and governmental agencies. In order to present a test case of data visualisation adding context to a given set of facts, some tabulated statistics from the Irish Census 2011 will be utilised in a series of data visualisations. I have taken data originally collated by  the Central Statistics Office (CSO) that has been packaged so that it can be projected into a series of thematic maps in the programming language R. This Irish Census data has been provided courtesy of Prof. Chris Brunsdon and the National Centre for Geocomputation at Maynooth University (data available on request). I created a series of maps in order to illustrate different ways that data visualisation can be utilised. To start with, I am going to be visualising the population data released by the CSO collated from the Irish Census 2011. The total population of Ireland in 2011 as recorded on the census was 4.5 million persons (CSO). Projecting these results onto a map of Ireland looks like this:

 

Total Population of Ireland (2011 Census)

This visualisation of Ireland presents the population of Ireland in 2011 per county, and uses a colour coding system to indicate high and low population counties. However, the results only provide a superficial indication of the population disparity within Ireland. Over 1 in 4 citizens of Ireland live in Dublin City and within the greater Dublin area (CSO). However from this map this is largely obscured by the geographical projection of the counties’ physical size. To compensate, I utilised a cartogram or density projection map; this map utilises the quantity of the data being measured instead of following the geographic boundaries.

Cartogram of Ireland – Total Population (2011 Census)

 

In this map, the city and suburbs of Dublin are so large that the geographic boundaries of Ireland are stretched and distended. While initially complex to examine, this type of data visualisation is a more accurate way of describing population density. One critique of cartograms, in addition to the geographic distortion, is the majoritarian representation of the country. In the example of Ireland, the population of Dublin results in a Dublin-centric map. Although it is not clearly illustrated on a cartogram, the majority of Irish people do not live in Dublin City or its environs. However, the fact that over 1.2 million people do live in the greater Dublin area results in a distortion. As democratic systems attempt to accommodate the minority position within the wider society, using cartograms to make policy decisions would greatly skew the benefits of the society onto the largest areas while severely neglecting the non-Dublin regions of the country. In the case of Ireland, this phenomena is evident in recent social surveys, economic data and a recent report from the Irish Department of Housing and the Economic Social and Research Institute (Irish Times).

From the above maps, it is clear that cartograms present the surveyed data (total population) in a more accurate manner but the geographic boundaries of the map are affected by this. I then used a different metric from the Irish Census 2011, that of empty houses. This is a particularly relevant subject as the current homelessness crisis within Ireland has been blamed on lack of action dealing with empty houses during the period 2011 – 2016 (Irish Independent). The CSO created two specific types of data relating to empty houses; a percentage of vacant houses and a total of vacant houses. Starting with the percentage of vacant houses, I have created a map using the Irish National Grid, similar to the map of the total Irish population.

Percentage of Vacant Houses in Ireland (2011 Census)

From the above map, the highest percentage of vacant houses are in the North West of the country, specifically Leitrim and Donegal. It is important to note that these counties have a low total population, in particular Leitrim which has the smallest population of the entire country. This can be visualised in the following map series:

Comparative map of % Vacant Houses and Total Population (2011 Census)

While this comparative map illustrates the differences between the percentage of empty houses and the total population, it is a misleading comparison as one is a percentage and the other is a total of persons. A more accurate comparative analysis is that of the total number of vacant houses compared to the total population.

Comparative map of total vacant houses and total population (2011 Census)

This presents a more balanced comparison and shows that although there are a large number of vacant houses in Donegal, Cork replaces Leitrim as the county with the highest amount of vacant dwellings in absolute terms. A cartogram of vacant houses shows the following:

 

Cartogram of total Irish Vacant Houses (2011 Census)

The result of these series of maps shows that although Leitrim was cited repeatedly as a place with the largest number of vacant houses (Leitrim Observer), this was only true as a percentage of all the houses in the county. A more accurate measure of vacant houses shows that Dublin City contained a large amount of vacant houses in 2011, therefore undercutting the argument that surplus houses in remote rural areas contributed to the housing shortage.

In conclusion, the examples of data visualisations created show that this type of visualisation can be beneficial to give added context and additional knowledge to a pre-existing set of facts or perceived facts as reported by the media. However, there are marked differences between the two different sets of data utilised. In the first set of examples the total population of Ireland was mapped onto a series of geographical maps of Ireland. Although the cartogram of population is more accurate for depicting urban density, the majoritarian perspective of the country as depicted by the cartogram was inaccurate. This is due to the sheer imbalance in the distribution of the Irish population. In the second set of examples examining vacant housing, the initial set of statistics on the percentage of vacant houses was utilised in media reporting to imply that the housing crisis had its origins in the amount of houses built in largely unpopulated rural counties such as Leitrim. However using the more accurate number of vacant houses reveals that although the percentages of vacant houses in small rural counties is disproportionally high, a significant number of vacant houses remained in urban centres such as Dublin and Cork. The cartogram that mapped the total number of vacant houses resulted in a more accurate depiction of the housing surplus across the Irish state. The question of whether data visualisation can add context to pre-existing knowledge is true in the case of the vacant houses. The same question applied to the total population of the Irish state is more complex, as the cartogram is more accurate in projecting density but skews the overall perception of population distribution into a Dublin centric map. The context has been added to the knowledge, but it requires further contextualisation or a different data visualisation.

 

Work Cited

‘Rumsfeld’s Knowns and Unknowns: The Intellectual History of a Quip’ The Atlantic Accessed February 17th 2017.

‘Census 2011 – Profile 1 Town and County’ cso.ie Accessed February 12th 2017.

Friendly, Michael. “A Brief History of Data Visualization.” Handbook of Data Visualization. Ed. Chun-houh Chen, Wolfgang Karl Härdle, Antony Unwin  Springer Berlin Heidelberg, 2008. pp. 15–56.

‘A housing crisis while 165,000 houses lie empty’ Irish Independent Accessed February 12th 2017.

‘Dublin’s dominance of state near unique in Western world’ Irish Times Accessed February 12th 2017.

‘County has highest level of vacant houses’ Leitrim Observer Accessed February 12th 2017.