Data Standardisation

The ubiquitous sharing and self-referential navigation that has characterised the web, and web 2.0 in particular, is the happy (and economically significant) result of the standardisation of documents and their connections through the implementation of technical guidelines such as HTML and API. The power of such coherent structures is manifest in the effect the web has had on virtually every aspect of modern life, society and economic activities.

However, the technical incongruity which motivated and inspired Tim Berners-Lee to concoct and propose the World Wide Web is still extant online, only manifest in the more granular area of data rather than documents. The web operates very well when transporting the user from one document to another through hypertext, and when returning documents that have been searched for. However, when trying to search raw data, or draw correlation between sets of raw data, the web, in its current form, often operates ineffectively. It is little surprise then, that much of the driving force behind the standardisation of this digital age’s data is coming from the same source as it conceptual genesis.

Tim Berners-Lee and his colleagues have promoted the notion of data standardisation and posited that the net effect (excuse the pun) of such practices could be greater than they seem through superficial estimation. Data standardisation is the notion that data can have technical coherence and compatibility across (and in spite of) platforms and fields, allowing for greater efficacy, usability and commodification through the implementation of technical standards. In other words, the process of implementing technical standards on data, in the same spirit with which the original tenets of the World Wide Web were applied to documents, could enhance data in the same revolutionary way the Web enhanced documents. Notions of controlled specifications and persistent criteria are, naturally, implicit here.

These enhancements will be manifest in the sophistication of query and manipulation which standardised data would allow. Further to this, and key to what proponents of such practices such as Berners-Lee imagine the potential of such standardisation could be, is a notion of standardised links between data, analogous to hypertext, leading to a web of linked data.

This concept relies on a structure which treats relationships between data as a kind of unit of data in its own right. It is this relational nature which RDF (Resource Description Framework), a conceptual structure which uses xml (amongst others such as Turtle) to express data and their relationships, relies on and from where the semantic web takes its name:

Meaning is expressed by RDF, which encodes it in sets of triples, each triple being rather like the subject, verb and object of an elementary sentence. These triples can be written using XML tags. In RDF, a document makes assertions that particular things (people, Web pages or whatever) have properties (such as “is a sister of,” “is the author of”) with certain values (another person, another Web page). This structure turns out to be a natural way to describe the vast majority of the data processed by machines. (Berners-Lee et al. 2001)

The potential and power of such linked data is, in essence, the creativity with which it can be queried and utilised. As such, this can be hard to summarise or imagine without exploring its potential first hand. Using the example of Wikipedia, rather than searching for the article on, say, David Beckham, to discover, perhaps, how many goals he has scored, where one must know what one is looking for in order to discover the relevant data. With linked data, one may stumble across such data while querying wikipedia to return all British footballers, of Caucasian ethnicity, who were born in the 1980s and achieved more than 69 caps for England. This, with our current standards and manifestation of the Web (think the ubiquitous Google search box), seems a peculiar (and perhaps counter intuitive) way to query such databases. However, should we substitute the footballer and his properties for the results of a pharmaceutical trial and the genetic makeup of the subjects, we can begin to imagine the power of such a knowledge structure.

The next blog post in this series, which will discuss the potential future of the World Wide Web and extend this conversation on the Semantic and Linked Data, can be found here.

Berners-Lee, Tim, James Hendler, and Ora Lassila. “The semantic web.” Scientific american284.5 (2001): 28-37.

Herman, Ivan. “W3c semantic web frequently asked questions.” W3C Semantic Web (2009).

 

 

Your two cents...