Modelling Humanities Data

The Relationship between Data and Knowledge

This paper is a report on a task which was to find Central Statistics Office (CS0) published data, to examine that data and to determine what conclusions, if any, could be drawn from the data and in doing so to look at the relationship between data and knowledge.

The dataset

From the CSO official website (Central Statistics Office), I chose the following dataset – Recorded Crime Offences by Type of Offence and Quarter.  The data refers to crime statistics for the period for 2003 to 2016, by quarter.

             Snapshot of a portion of the dataset on the CSO website


I selected the dataset, highlighted the data and pasted it to an excel table.  I decided to look in more detail at the crimes that related to death and in particular to look at any trends discernible in the period the data refers to.  I therefore selected data for the following offences; homicide offences, infanticide, manslaughter and murder.  I rejected all other data as irrelevant to my research.  As I wasn’t interested for this task in variations per quarter, I created an additional column to calculate the yearly totals.

Visualisation as means of interpreting the dataset

I then used free software – (“Tableau”) to create a visualisation to interpret the data by dragging the excel worksheet into tableau and selecting data for the x and y axis.  Because the data related to a 15 year timespan presenting it on a graph facilitates a quicker, easier interpretation of the data but the visualisation also has limitations.






Offences causing death between 2003 and 2016


Interpretation of the data

From the visualisation I can draw certain conclusions about the crimes in these categories that were brought to justice and recorded but not necessarily about when the crimes took place. Neither can I make judgement about why the crimes took place or the relationship of variations in policing resources to bringing crimes to justice.

The graph shows that there was an upward trend in offences causing death from the beginning of the period to 2006 from which point there was an overall downward trend, other than for a slight rise between 2011 and 2014.  There were less criminal deaths in 2011 than there had been for several years and would be until 2015. The incidence of killings of a child by a parent (infanticide) is consistently low, with just one case in 2007.

Although the trend for 2016 is downward, we cannot conclude that there were less deaths in 2016.  This is because the data relates to the 1st 3 quarters of 2016 only.  While the number of murders appear to have decreased from 2015, closer examination of the data shows that they had in fact increased on the same period in 2015.

Without specialist knowledge or further research I am interpreting the specialist terminology used based on my non-specialist or presumed knowledge and could be reinforcing my own preconceptions rather than reaching accurate conclusions. To fully and accurately interpret the data I also need to know something of the data collection conventions and terminology.  For example, I do not know if the category ‘homicidal offences’ includes the other three categories or is addition to them.  While I know that murder, infanticide and manslaughter are types of homicidal offences I would need to know more about the data collection conventions used to answer this.

Without doing more research I cannot say whether all or some of the crimes took place in the given year or if the year refers to the year the crime was before the courts.  If a murder took place in 2009 but was not tried until 2013, is it recorded in the year the death occurred or the year the crime took place?

The accuracy of the knowledge I can gain is therefore influenced by my own prior knowledge and critical judgement in addition to the accuracy and consistency of the data.


From Data to Knowledge

The Oxford english dictionary defines data as ‘facts and statistics’ or philosophically, ‘things known or assumed as facts’.  While I agree that data is what is used as the basis for analysis or reasoning this doesn’t mean that the data itself is fact.  Data is itself subject to selection, interpretation, error and bias, whether intentional or unintentional, in the process of collecting, compiling and presentation.

The act of compiling data requires knowledge about the data and about the process in order to create ‘good’ or consistent data that is reliable.  As a citizen of a democratic country I place a large degree of trust in the data compiled and made available by the CSO because it is an official, state body with a professional track record in data collection and analysis.  I also place trust in the legal and policing authorities from which the data is collected and largely assume that the data is correct.

In making these assumptions I am using critical judgement, based on my own prior knowledge, sense of trust and cultural values.  We may interpret data from a different source in a much more sceptical way.

We also use prior knowledge to understand the data.  For example, I know that both manslaughter and suicide are types of homicide and that suicide is not a crime and therefore is not listed. In some countries, suicide is a crime and would be included in official figures, such as these.


In this example, I have the foundations on which to construct new knowledge – I believe that the data is accurate.  I justify my belief because the data is from a reliable source and therefore I consider it to be solid evidence.  I use reasoning and critical judgment to interpret the evidence to reach certain conclusions, or knowledge, based on the data.  I cannot, however, account for the trend itself. My view is that justified true belief is not necessarily knowledge (Gettier) but a step toward knowledge.



Central Statistics Office. Web. 8 Feb. 2017.

Gettier, Edmund. “Is Justified True Belief Knowledge?” Analysis 23 (1963): 121–123. Print.

“Tableau.” Web. 8 Feb. 2017.