Comparing data modelling techniques

Introduction

What is data modelling?

Data modelling is essentially the structure in which data (information and knowledge) is collected, managed and represented.  It is intended to describe the concepts or objects of concern to an individual or organisation in order to represent both the concepts and objects and the relationships between them.

Data modelling techniques

There are currently two main data modelling techniques used in computer systems.  These are database systems and graph systems.  Of the database systems, in the last few decades since it was proposed in 1970 by Codd, the relational model has become the de facto standard for information representation (Martinez-Cruz, Blanco, and Vila).  However, in the last decade ontologies, expressed in graphs, have emerged and have grown in popularity to represent a viable alternative to relational databases, particularly because of their application in the Semantic Web.

Comparing techniques – similarities and differences

As the basic intention of data modelling is to describe things and their relationships to each other, it is not surprising that there is a strong degree of correlation between the organisation of databases and ontologies.  Both use a formal language and have types, properties and constraints.  A relational database ‘entity’ can correspond to an ontology ‘class’, a relational attribute to an ontology ‘property’.  However, the focus of databases is the data, while the focus of ontologies is communicating meaning and shared understanding.

Relational databases, considered fully normalised when normalised to 3rd normal form, are highly suitable to data organisation and structure.  Because normalisation reduces or eliminates redundancy, they are also very effective for data collection.  Normalisation, however, requires the creation of multiple tables with joins between tables.  Querying across tables can be technically complex, can cause efficiency problems and can be expensive. Databases are often de-normalised to improve performance for data  warehousing and extraction.  What arises is multiple highly specialised individual databases developed to manage specific information by individual entities.  The databases and the data in them generally sit behind a firewall and are often only available on the internet through a customised application that has to be developed or customised for the individual database. Both the normalisation process and online access through another application can result in a reduction of specificity of the original dataset.

 

Ontologies are also highly structured.  Based on the Resource Description Framework (RDF) , a standard method for defining things and the relationships between them,  and designed for the Web to refer to any thing or any concept, the entire ontology can be viewed globally without restrictions or layers of interfaces.  Viewers are given access to the data rather than to html documents.  Using HTTP URIs as globally unique identifiers for data items and vocabulary terms, an ontology can be amended and added to at any time, unlike databases, which can be technically difficult and expensive to modify.  Being inherently scalable,  ontologies enable much quicker searching of vast quantities of data. 

Conclusions

While the strength of databases is in data capture and structuring, the strength of graph models lie in their ability to visually represent data and relationships between data and in the ease of sharing data on the web.  To access the data, however, both models require some prior knowledge, of the database schema or ontology structure.  Choosing one over the other will ultimately depend on the particular project and end user requirements.  Graph database technology is comparatively new and much less familiar to potential end users than relational databases, which are now commonplace and which have stood the test of several decades. Further research will explore and develop methods of effectively translating data in databases to graph models and as Martinez-Cruz et al suggest, it is likely that databases will remain important for some time for the capture and structuring of large datasets. 

 

Bibliography

Martinez-Cruz, Carmen, Ignacio J. Blanco, and M. Amparo Vila. “Ontologies versus Relational Databases: Are They so Different? A Comparison.” Artificial Intelligence Review 38.4 (2012): 271–290. CrossRef. Web.

 

This entry was posted in My Course and tagged , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *