Data Modeling: Compare Techniques

In the previous post regarding data and the way that they could model, was made an attempt to visualise them through Gephi. In that post, it will be discussed the modeling of data and the different techniques that contribute to this, and which of these techniques could work better for a humanities dataset, according to my personal experience.

data-modelling-training

Data modeling is the process in which the structure of data is represented as well as the relationships that are created between data. This is the reason why all the notation systems that are used commonly have the ability to convert one to another. There are presented differences among them which are more aesthetics. However, with some of them, there is the opportunity to create and show differences that others cannot do it, and all of them do not have the same or all the symbols to represent all the possible situations. (Hay)

A data model can have many uses and implementations such as in the field of business and science. There are three different data model types, which are: conceptual data model, logical data model, and physical data model. Each of them can be independent of each other and render in schemas that named conceptual, logical and physical schemas respectively.

Firstly, a conceptual schema used for the representation of data in a database, describing the semantics of a domain or simpler the first part of the data requirements organising. Secondly, a logical schema represents the structure of a domain of information, capturing important information regarding the elements of the database and the way they related to each other. Thirdly, a physical schema describes the details regarding store the data.

16578658-Abstract-word-cloud-for-Ontology-with-related-tags-and-terms-Stock-PhotoIn addition, there is an another method of data representation that called ontology. Ontology is a model which introduces a relevant to domain vocabulary and specifies the intended meaning of vocabulary. Also, an ontology has two parts a set of axioms and a set of facts. The former set is used to describe the structure of the model, and the latter to describe some particular actual situation. (Horrocks)

When we talk about the structure of humanities data, the ontology could be a quite efficient method that could help in the creation of a humanities database. A schema which describes the humanities data could be large and complex or used at query times such as the data of a library or a museum, or an archive. This is connected directly with the type of data, which could be quite descriptive. In those cases, the metadata of an element could be many and different between them, and the way of each user could make a search differ and difficulty predictable as well. An ontology has the ability to create a reasonable structure, including inferred answers and intended queries. (Horrocks) That could help a lot the structure of humanities data as with an ontology some very basic problems of them could be solved.

Having tried to create an ontology in Protege, an ontology editor, I saw that it is not a very complicated process for a person with humanities and not science background to create an ontology. There is the opportunity to create and name your elements and their relationships too, giving the freedom to organise the data and the structure of them according to you and your users’ needs, and not following formworks created by others, that maybe do not feet in your aims. That is very important for people like me, to find a method that they can understand and work with it without problems.

References:

Hay, D., C., A Comparison of Data Modeling Techniques, http://www.cs.uml.edu/~lechner/DavidHay/DHay_ComparingDModTechniques.pdf, Web, Accessed on 8/5/2017

Horrocks, Ian, Ontologies and Databases, https://www.posccaesar.org/svn/pub/…/Ian_Horrocks_Ontologies_and_databases.pdf, Web, Accessed on 8/5/2017

Seiner, R., S., Different Kinds of Data Models: History and a Suggestion, http://tdan.com/different-kinds-of-data-models-history-and-a-suggestion/14400, Web, Accessed on 8/5/2017

Data Game

According to Oxford Dictionary, data is a noun which is used to describe ”facts and statistics collected together for reference or analysis”. In addition, data is used in a philosophical context as ”things known or assumed as facts, making the basis of reasoning or calculation.”  However, for the purpose of this post I will be focusing on the meaning associated with computer science that defines the data as ”the quantities, characters, or symbols on which operations are performed by a computer, which may be stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media.” To understand the follower example we should keep in our minds this definition. In the example, the data is visualized giving so that the result is conveyed more clearly especially for visual learners.

Data visualization is a different way to present data. The main goal is to communicate information clearly and efficiently. So, visualization has to be informative and useful to be successful. Besides, data visualization hide a question which has to be answered such as a small story which slowly unfolds with the help of the visual element. Where we to use a formula a to describe this, would be the following: Question + Visual Data + Context = Story (Shapiro).

Below is presented an example of data processing and visual representation to show how data, through the right combinations, has the potential to produce a meaningful result and not just information, thus contributing to the creation of knowledge. The software Tableau (https://www.tableau.com/) was used for the creation of this chart, a data visualisation software which can help someone see and understand what results could arise from their data. It is user-friendly and can connect to almost any database. It is the database of the Central Statistics Office (CSO) (http://www.cso.ie/webserviceclient/DatasetListing.aspx) that is used here.  From this dataset, one could select a topic and a subtopic that the are interesting in and the available data tables will be displayed. On this occasion,  the name “Recorded Crime Offences by Garda Station, Type of Offence and Year” is used, taken from the main category of “People and Society” and the secondary category of “Crime and Justice.”  From this diagram, are exported data about how many and what kind of crimes are reported were reported to the police stations in Ireland from 2003 to 2016.

As can be seen in the pictures, there is one main illustration and an appendix. In the first one,  are showed the different types of offences, in which police station they took place, what year and how many there were.

printscreen 1

The appendix presents the categories of the offences, how many they are and the color in which they can found in the illustration.

printscreen 3

When the cursor is on one of the different circles, the following statistics appear: Garda Station, Type of Offence, Year and Value. The bigger circles have a bigger value number and the smaller have a smaller one, respectively. Ιn this way, simply by moving the mouse over the chart, everyone is able to be informed about the basics without getting losing themselves in long lists of endless information such as in the database used here.

printscreen 5

The world of data and the way it is organized and visualized may be resemble confusing, but after appropriate processing and construction it could reveal something that we had not imagined before. It is something like Lego, where there are many pieces in different shapes, colors, and sizes and through consecutive and different combinations they have the ability a new result can be produced very different from the previous one. Thus, the data, like small Lego pieces, can be combined in many different ways to produce the results we want to present each time.  However, when we talk about data, imagination is not enough. A key role is played by the wording of the question we want to answer through the elaboration of appropriate data. We should experiment with the available data and try to create new combinations and versions. Through all that, we could realize the potential data hides. So, let’s start the data game!

Readings:

Shapiro, M., Once Upon a Stacked Time Series,  Beautiful Visualization, Edit. by Steele, J. and Iliinsky, N., http://simpte.ch/ebooks/OReilly.Beautiful.Series/9781449379872%20-%20Beautiful%20Visualization.pdf, Web, Accessed on 16/2/2017

Oxford Dictioneries, https://en.oxforddictionaries.com/definition/data, Web, Accessed on 16/2/2017

Recorded Crime Offences by Type of Offence and Quarter, http://www.cso.ie/webserviceclient/DatasetListing.aspx, Web, Accessed on 16/2/2017