Tag Archives: Statistics

Comparing data modelling techniques

As discussed in previous posts, data refers to a collection of information. Whatever the purpose of this collection, in order to gain insight and work with this information, an adequate manner of displaying and comparing data is necessary in order to get full use out of it. Some thought needs to be put into how a system is designed for modelling data, the first step in database design and object orientated programming. Data Modelling is generally understood as having three stages of design: Conceptual, Logical and Physical. (“Data Modeling – Conceptual, Logical, And Physical Data Models”) Complexity increases with each stage of design. It should be highlighted that the structure of containing data is often purpose built. “The biggest challenge is correctly capturing the requirements on the data model. Often when the project starts, there are only vague requirements (if requirements at all), and the data model must represent these requirements completely and precisely. Therefore it is a very challenging task to go from ambiguity or vagueness to precision. ” (Hoberman)

Data modelling assumes the following in its design:
i) There can be numerous links between different data
ii) Categorization of data, separation and encapsulation is necessary for searchability – and  a well built ontology allows you to get the most out of your data.
iii) Unique keys are used to identify parts of information, as access points linking data.

The Conceptual model highlights how the different bits of data relate to one another, specifying Entity Names and Relationships. The Logical model, is more specific and detailed – adding Attributes, Foreign Keys and Primary Keys. The physical model, must be implementable and applicable to the database of choice -specifying Column names and data types, tables names and Foreign and Primary keys. (“Data Modeling – Conceptual, Logical, And Physical Data Models”)

Within the Conceptual, Logical and Physical schemas there are numerous ways of modelling data, that can vary according to design depending on using the data for comparison and tracking correlations. Hoberman reminds us that methods of building and modelling this data can vary. “In some efforts, the database design is completed, and then the logical and conceptual are built for documentation and support purposes.” While familiarity with ones data set is needed for the purposes of interpretation, techniques of displaying data can be useful for particular purposes. Personally, I find visual data modelling techniques much easier to work with – particularly when comparing data. “The underlying benefit of creating a data model is that the data actually becomes understandable, as others can read it and learn about it. ” (Hoberman)

Different types of data modelling techniques which we should be familiar with include:

Spreadsheets for example can be used to model data, depending on the purpose this can be adequate as information can be grouped in rows and columns. The example is given of spreadsheets as data notation with financial business experts by Steve Hoberman. However, he also highlights the importance of definitions when modelling data – as every data set needs to be treated differently. The key understanding here is the elation between different types of data.

Visual representation of data can be very useful, especially when looking for comparisons and correlations. Diagrams are very useful when trying to design the structure for holding your data – setting out links and structure. What is important too is query languages for databases, which can have ontologies assigned such as W3schools standards like RDF. There are numerous software programs that can be useful for Data Modelling from spreadsheets, to diagram drawing softwares for explaining the concept but these must be held on database platforms designed to support data models like MySQL which we’ve used in class

 

Works Cited

Hoberman, Steve. “Data modeling techniques explained: How to get the most from your data”. Date of Access: 11 May 2017.

http://searchdatamanagement.techtarget.com/feature/Data-modeling-techniques-explained-How-to-get-the-most-from-your-data

“Data Modeling – Conceptual, Logical, And Physical Data Models” from Data Warehousing. Online. Date of Access: 11 May 2017.

http://www.1keydata.com/datawarehousing/data-modeling-levels.html

The Relationship between Knowledge and Data

Knowledge is quite a difficult term to define, it relates to an accumulation of information and understanding of a topic gained over time. Data is a collection of information assumed to be true for the purposes of analysis and deduction. If knowledge is cumulative, and that data a collection of information used for reasoning: than using data is part of the process of obtaining informed knowledge. Though many associate the word “data” with computers it can be in either analogue or digital form, digital data having numerous advantages including easy conversion and manipulation of elements for comparison and display: like statistics, a popularly cited and displayed form of data.

Census data available at cso.ie can be used to explore the dynamics and diversity of the Irish population. However, it is important to understand the limitations of the conclusions that you draw. To build knowledge we need to understand the process of data gathering, social institutions, history and other factors to obtain true knowledge of the subject at hand through analysis and interpretation.

Data can be treated as a tool to develop and/or form an understanding, to support a theory or to explain relationships between variables.  Critical analysis is needed to data interpret data and understand the scope and limits of the data to obtain knowledge with a reasonable degree of accuracy. Measuring society is difficult as there not everyone fits into the checkbox on a census form – and there are assumptions made in the collection of data to provide a statistic for calculations and comparison. Categories like “religion”, “ethnicity” and “nationality” are self defined and converted to numerical values – these are often used to support claims and hypothesis. Looking at this CSO information can we tell much about the makeup of Irish society? What about secularism?

 

 

 

 

 

 

 

 

The Irish Journal, referring to CSO 2011: “THE PERCENTAGE OF Catholics in Ireland is at its lowest ever, while the actual number of Catholics is at its highest level since records began.” (http://www.thejournal.ie/regious-statistics-census-2011-640180-Oct2012/ ) Though the population is growing, the percentage of Roman Catholics in relation to the population is not. Referring to the data on the right a simple calculation from the CSO information reveals that “3.86 million people classed themselves as being Catholic – 84.2 per cent of the population. Of this amount, 92 per cent were Irish nationals.” ( http://www.thejournal.ie/regious-statistics-census-2011-640180-Oct2012/) According to the data set available from the CSO, 77.464% of Irish nationals are Catholic – of which the majority are considered ethnically “white Irish” as well as being Irish nationals. Nationality is usually(but not always) determined by citizenship.

Data modelling allows us to selectively limit fields and compare data. The broad distinction between “Irish” and “Non-Irish”, refers to nationality – not ethnicity, and its much easier to become an Irish citizen under now that Ireland is in the EU. Cultural background in Ireland may be diverse including large student population and increasing visitors, tourists workers, and immigrants – but not all are Irish nationals by definition.

Limiting information displayed to “Usually resident and present in the state”, and “Ethnic background” in the 2011 census indicates predominantly “white Irish” majority, followed closely by other white backgrounds and by further limiting the data it appears to be this way across all age groups.

Information needs to be looked at in proportion to understand the forces shaping the ethnic and cultural makeup of society. Furthermore, we need to know what exactly has been measured – i.e. what is represented in data. Catholicism has a massive influence on Ireland on the past – there may be other factors impacting this figure than what I would take religion to be: practicing spiritual teachings which isn’t measured in the CSO.

We can understand relational proportion, Catholicism as dispersed per age – indicates most Catholics are over 16. However, the connection between the Catholic Church and the National School systems would indicate possible influence on the age category of  under 14. By changing the display of data – the category that children about to attend or currently attending national school makes up a large chunk of the ethnically “white Irish” population, the table here shows the proportion.

Population usually resident and present in the state don’t have the same ratio of age dispersal. This may indicate a social factor causing the population to indicate Catholicism on the census. While comparing and displaying statistics can be used to come up with hypotheses, but it does not constitute proof. While you can make a strong case by displaying statistics, this can’t be taken as evidence. Correlating statistics do not necessarily indicate that the variables in question are related – as there can be unknown (or unmentioned) variables and furthermore we need to understand how exactly attributes are measured.

Modelling population statistics can highlight correlations and comparisons within data: illustrating points, demonstrating knowledge, and looking for relationships between variables. But we have to understand how it relates to what we are trying to find out if we are using it for reasoning and deduction! Data isn’t always adequate to what you are trying to find out – in the process of analysing and interpreting there may be a need to obtain external information to build knowledge on the subject: an understanding of the relationships between variables, which may exist outside of the data.

Our understanding of Catholicism as a religion seems to be changing compared to church attendance statistics – though its associated with the ethnic group “white Irish”– in 2009 weekly church  attendance was at less than 50% (https://faithsurvey.co.uk/irish-census.html) Over time, the general trend seems to be towards the decline of practicing Catholics – or understand the category more as a cultural group if there are more people identifying as Catholic, but less practicing the religion. Perhaps indicative of a more secular attitude.

Table from faithsurvey.co.uk

Credibility of data is another issue, where is it coming from – can we trust newspaper statistics? CSO would be a reliable source of information generally, but official record can have biases – especially in countries that refuse to recognize ethnic groups. However, in terms of practicality you are limited to what’s available.

If we were trying to use the official CSO statistic data to work out the strength of religious followings in Ireland would it be accurate? Or how about looking at minorities? Adequacy of data should be accessed in the process of obtaining knowledge. Data is just information, we decide how to treat it: Know its limitations, the assumptions being made and the degree of accuracy that’s needed. Data modelling can be used and abused. Information is frequently misrepresented both accidentally and on purpose – even if data is factual, the conclusions aren’t always correct – and statistics don’t always measure what they appear to.

Statistical data is useful for understanding trends in behaviors and dynamics of social structures over time – to further interrogate the CSO 2012 about the makeup and diversity of Irish society over time some background information is needed.  Religion in Irish society is definitely according to p.55-56  in a CSO publication from 2000 That was then, This is Now: Change in Ireland, 1949-1999 A publication to mark the 50th anniversary of the Central Statistics office indicates increased a trend of increasing religious diversity in Ireland today, and potentially the need to identify new categories and gather more data in order to understand the makeup of the Irish population. When we compare this data from 1991 to 2011 to see trends in population dynamics focusing on religion and nationality – the identified trend seems quite accurate.

Church of Ireland 93, 056+30464 totaling at 123,520. Presbyterian 14348+8311 totaling at: 22, 659.

Other stated religions have nearly doubled: 34, 867 +40227 totaling at: 75, 094. As of 2011 CSO records indicate the Hindu population ordinarily resident in the state as 10,688, meaning the population is more than 11 times larger over the 20 years since the 1991 survey – making up over 12% of the “Other stated Religion” category.

Muslim 18223+29143 totaling at: 47366, a dozen times higher than the 1991 survey.

The number of people who leave their religion unstated is 29888+12925 totaling at: 42, 813 -this has nearly halved since 1991 however, those who indicate that they are “No Religion” 172180+82,194 totaling at: 254,374 which is more than five times the figures indicated in 1991 – indicating an overall increase in secular Irish society.

From the evidence presented it seems that there is increasing cultural and religious diversity in modern Ireland, this is due to a variety of factors and may necessitate a widening in the parameters of gathered data. The importance of understanding the form of data cannot be understated, categorizations and what exactly information is indicating: especially if it has been converted to numerical data. The true nature of individual cases in this statistical data set, the potential answers given are predetermined – which means that unidentified categories will be left out or overlooked.

Conclusion:

While data can be used to come up with or support theories – there is the capacity for data to be misleading too. In isolation data like statistics can appear to show something, but this may be negated by other information like revealing a (previously) unknown variable. You are essentially building knowledge based on assumptions, using data as part of the process but it allows you to build an informed opinion. With greater familiarity around the topic at hand i.e. with more study, comparison and analysis, data can also be very revealing

On its own data is just a collection of information does not directly convert to knowledge: but you can use data to further your understanding and to make estimates based on assumptions. To build knowledge we need data, which is useful for making comparisons and inferring, illustrating, building and imparting knowledge too! While there are certain considerations worth bearing in mind, simply put: This is the best that we have for the purposes of generalizing population information in Ireland today. Data modelling can be used for displaying, illustrating and imparting knowledge. There is a process of evaluating and deciphering the collection of information that you are working with. Even if the information true – it is based on certain assumptions that can impact how we interpret and analyze information and draw conclusions.

Works Cited

That was then, this is now. 2000. CSO. Web.<http://www.cso.ie/en/media/csoie/releasespublications/documents/otherreleases/thatwasthenthisisnow.pdf>

“Number of Catholics at record high, despite lowest percentage ever– CSO.” The Journal. Oct. 2012. Web. <http://www.thejournal.ie/regious-statistics-census-2011-640180-Oct2012/>

Faith Survey. <https://faithsurvey.co.uk/irish-census.html> Web.