Data Standardisation (AFF-604A)

ignored-links-whiteboardThe pervasiveness of the internet into everyday life is becoming more and more inherent as the looming information takeover grows evermore vast. This is evidenced by the mere task of retrieving data is made possible by having access to almost the entire wealth of human knowledge sitting in your pocket. Having an immense wealth of data means nothing however if there is no way through which a computer can identify, select, or relate any of it. Data standardisation can be seen as a means of creating technological coherence of data across a myriad of platforms, devices, and databases. It plays a most important role in helping to understand functionality of the so-called semantic web, and indeed the concept itself appears to be one that fulfills the expectations or functional desires of a user. The semantic web can be seen as an extension of the world wide web wherein the relationships between web pages are standardised to increase their usability. It stands to reason that the usefulness of a huge dataset is only realised when it becomes easily searchable; the malleability of functionality is in itself what makes the web such a useful tool. Berners-Lee et al. explain the Semantic web thus:

…an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation” (2001).

 If one is to take this at face value, it can be surmised that the point of this is to make the individual elements of data not only machine-readable, but also to make it machine-understandable. This aim here is to give data specific meanings or attributes so that the machine can pick them out of a vast repository of information based on their identifying qualities. It becomes important then that machines be able not only to display text, but to be able to pluck out individual elements of a text to perform autonomous functions with the aim of creating a greater service for the user (Sure, Struder, 2005).

This can be accomplished through the use of RDF (Resource Description Framework) documents etc. RDF documents are used to great effect to describe entity relationships within data: the ‘triple’ construction is an example of this. The triple construction occurs wherein an entity relationship is described as containing a subject, predicate, and object. For example, a sentence we might want to represent in an RDF document might be ‘Liberace played the piano’. In this sentence we are describing the relationship between Liberace and his piano, wherein Liberace is the subject and the piano is the object. The predicate is ‘played’ in this case. Data that contains relationships such as these is known, on what is called the Semantic Web, as ‘Linked Data’. Linked data is standardised data that are assigned uniform resource identifiers (URI) that differentiate one piece of linked data from another. URIs allow a certain piece of data to be identified by one unique identifier, be that a reference tag, a link, or whatever the user desires. By giving pieces of data their own unique identifiers and creating relationships between them we are creating linked data. The possibilities for modelling linked data are thus manifold, which allows for new interpretations of vast sets of data and metadata.

Surely it stands to reason that the standardisation of data on the semantic web would in turn provide for a more accessible world-wide-web with which one could accomplish much more with datasets. A good example of this is DBpedia. This was a crowd-sourced effort to take all of the information stored on Wikipedia and create a structured source of information available on the web. By taking all of the metadata stored in Wikipedia articles, one can search the entire website for a specific entity; for example if you wanted to search for all of the kings of Spain. After this, imagine you wanted to search for all of the kings of Spain who ruled for more than 5 years and and ended up being assassinated. The implications of being able to explore relationships between structured data ultimately means that the web is being developed to be more useful to the user. The more these methods are developed, the more it would appear that the usefulness of placing data on the web is dependent on the naming of digital objects to provide a more developed inter-operability between user and machine.

References

Berners-Lee et al., “The Semantic Web”, Scientific American, 2001, Web. 17 Feb. 2016.

Sure, Work; Studer, Rudy, “Library Management.” (n.d.): n. pag. Semantic Web Technologies for Digital Libraries: : Vol 26, No 4/5. Web. 17 Feb. 2016.

 

Published by

rbreen

MA Digital Humanities student in An Foras Feasa, Maynooth University. BA Music from Maynooth University.

Leave a Reply

Your email address will not be published. Required fields are marked *