Monthly Archive: November 2014

Super-Aggregators as Digital History Tool

Much has been written about the usefulness of digitisation for history research and humanities research more generally.  It has been noted that the nature of history research means that history researchers use digital tools in different ways than other humanities scholars.  In their survey of 213 North American and Western European historians, Gibbs and Owens (2012) outline results that suggest that the primary use of digital tools by historians is to speed up traditional research methodologies.  They write that where digitised primary and secondary sources are concerned, historians tend to value quantity over quality: “In contrast to other disciplines like philology or textual criticism, where exact transcription is crucial, historians frequently preferred resources that offer large quantities of materials with even a crude full-text component. This sentiment likely reflects their primary use of technology, namely that finding references and information is a much higher priority than using tools to analyze primary sources” (ibid).  In 2007, the Abraham Lincoln Historical Digitization Project at Northern Illinois University Libraries was lauded for being unlike any existing historically oriented digitisation project, in that its website also included a number of multimedia and interpretive materials (VandeCreek 2007).  Most discussion on such projects concentrates almost exclusively on the question of access and how such access has led to the democratisation of history research. Super aggregators such as Europeana and the Digital Public Library of America (DPLA) would seem then to be a natural progression from digitisation projects that were organised around one research question or subject.  Such aggregators are by their nature hugely accessible and represent a further step in the democratisation of access to cultural heritage objects, but here I want to discuss their usefulness as a tool for historians.

The so-called super aggregators represent a new development in online digital resources and in particular of open shared resources.  Europeana was the first of such projects, describing itself as a cross border, cross domain, single access point for digitised cultural heritage materials provided by various European libraries, museums, archives, galleries, audiovisual collections and other memory institutions.  Similarly, the DPLA, serves as “the central link in an expanding network of cultural institutions that want to make their holdings more visible to the public” (Howard 2013).  According to its founder, Dan Cohen, the DPLA is not concerned with the preservation of cultural heritage objects but rather with being a connector or aggregator of digital and digitised cultural heritage content.  Both Europeana and the DPLA provide access to millions of objects from thousands of content providers.  Both have standardised the metadata provided by contributing institutions and provide basic search and browse functions.  Searchers are given access to a preview of the object with accompanying metadata provided by the content provider.  This immediately begs the question of how such aggregators are more useful than search engines such as Google.  Maxwell (2010) is sceptical about the usefulness of digital archives when compared with a search engine such as Google Books.  He suggests as assessment criteria, the number of hits for a given search, and the ease of access, which he measures in page loads and mouse clicks.  He uses a search for “Fichte” to compare Google Books to Europeana, and finds Europeana wanting.  His conclusions are based on what he believes to be Europeana’s inefficient and inaccurate interface and more significantly on the unavailability of full text search.  The reference to Europeana as an archive is to misunderstand its primary function, and I would also suggest that this comparison is not useful as it attempts to equate what are essentially two different tools.  However, his criticism of Europeana’s interface has some merit, as it is somewhat ungainly and not as intuitive as it could be; it is certainly not as intuitive as the DPLA’s interface, in fact, is.

Both Europeana and the DPLA have built an open API which they hope will encourage the independent development of applications, tools, and resources that make use of the data contained in both platforms.  The DPLA website lists completed and proposed projects based on their API, which is designed to be extensible in order to cater for the varying degrees of technical sophistication of the DPLA’s audience.  Stephanie Lampkin, a community rep for the DPLA, also explains that there are four interfaces – exhibitions, bookshelf, map, and timeline – which could be useful for research.  She suggests that the Map can be used as an excellent visualisation tool to pinpoint exactly where resources are available.  Gibbs and Owens (2010) found that the respondents in their survey were mostly interested in the availability of as many resources as possible.  They were concerned about gated access but had little interest in other tools that might help them make use of the objects they were accessing in novel ways.  In an interview with John Palfrey (2013), Dan Cohen suggested that one of the benefits of the DPLA for academic libraries was that it can be used to suggest research materials and collections beyond a home institution, to create virtual exhibits from federated sites which would serve to enhance the scholarship of students and faculty.  Aggregators use their metadata to point searchers to records relevant to their searches.  This has the effect of increasing the visibility of small and potentially unknown archives and collections.  According to Gibbs and Owens this access to a large quantity and variety of resources is typically what historians require from a digital tool.  Perhaps this is a symptom of a general reluctance to embrace digital tools among historians, however as things stand, such super aggregators perform an important and desired function, one which could not easily be substituted with a search engine, no matter how sophisticated.  Access to such a large amount of content from different cultural domains not only provides historians with access to a large quantity of both searched for and unknown digital collections, it also, by providing such access, has the potential to open up new research questions.

Bibliography

  1. DPLA. Digital Public Library of America. 2014. Web. 26 November 2014.
  2. DPLA. Meet our Community Reps: Using DPLA as a Research and Teaching Tool. June 17 2014.  Web. 28 November 2014.
  3. Europeana.eu. Europeana. 2014. Web.  20 November 2014.
  4. Gibbs, F. and Owens, T. “Building Better Digital Humanities Tools: Toward Broader Audiences and User-centered Designs.Digital Humanities Quarterly. 2 (2012). Web. 29 November 2014.
  5. Howard, J. “Digital Public Library of America: Young but Well Connected.” Chronicle of Higher Education. 60.1 (2013). 28 November 2014.
  6. Palfrey, J. “What is the DPLA?Library Journal. 7 (2013).  Web.  28 November 2014.
  7. Maxwell, A. “Digital Archives and History Research: Feedback from an End-user.Library Review.  1 (2010).  Web.  20 November 2014.
  8. VandeCreek, D. “‘Webs of Significance’: The Abraham Lincoln Historical Digitization Project, New Technology, and the Democratization of History.” Digital Humanities Quarterly. 1.1 (2007) Web. 28 November 2014.

Review of the Dublin Core Metadata Standard

The Dublin Core metadata standard was created following a 1995 workshop sponsored by the OCLC and the NCSA.  The original objective of Dublin Core was to define a set of elements that could be used by authors to describe networked electronic information.  The workshop was attended by people from a range of disciplines including librarians, archivists, and computing and humanities scholars, all of whom recognised that widespread indexing and bibliographic control of internet resources depended on the existence of a simple record to describe networked resources.  Early Dublin Core workshops popularised the idea of core metadata for simple and generic resource descriptions; its original goal was to define a set of elements and some rules that could be followed by non-cataloguers, so that the creators and publishers of internet documents could create their own metadata records.  Because of its simplicity, the Dublin Core element set is used by many outside of the library community.  Originally it was developed with a view to describing document-like objects but it can also be used to describe other types of resources as well, for example, internet resources such as videos, images, and web pages, or physical objects like books, CDs, or artworks.  Its suitability for use with other non-document resources will depend to some extent on how clearly their metadata resembles typical metadata and also what purpose that metadata is intended to serve.

The Dublin Core Metadata Initiative (DCMI) manage the continuing development of Dublin Core and its related specifications.  The DCMI has expanded beyond simply maintaining the Dublin Core Metadata Element Set into an organisation that promotes the widespread adoption of interoperable metadata standards, shared innovation in metadata design, and best practices in metadata implementation.  It does this in a number of ways: by managing the long-term curation and development of DCMI specification and metadata terms namespaces; by managing the discussion of DCMI-wide work themes; by setting up and managing international and regional events; and by creating and delivering training resources in metadata best practice.

The DCMI has a formal approval process through which the semantic and technical specifications of Dublin Core are approved. There are five categories of proposals that can be made to the DCMI.  They are: proposed changes to metadata terms; proposals for DCMI Recommendations; Proposals for DCMI Recommended Resources; Proposals for Application Profiles as DCMI Recommended Resources; and, finally, proposals for DCMI Process Documents.  Proposals can be submitted to the DCMI managing director by internal and external organisations, or by any individual.  During the formal approval process, proposals can be assigned one of the following statuses:  Community Specification, which means that the specification is put forward for DCMI endorsement for use and publication by task groups within the DCMI; Proposed Recommendations, which are technical specifications considered close to stable and which have growing support for adoption by the Dublin Core Community; Working Drafts, which are documents under development; Process Documents, which describe the process and procedures relevant for the operation of DCMI and its work structure; Recommended Resources, which are resources that the DCMI executive recommend as material for use by the DCMI community in support of their use of Dublin Core metadata; and, finally, Superseded Recommendations, which are specifications that have been replaced by newer versions. When proposals are first submitted, the directorate acknowledges receipt and decides whether a document falls in one of the five categories. A first decision on whether DCMI will accept a proposal for consideration is communicated to the submitter no later than two months after submission with specification of the process and timeline foreseen.

The Dublin Core Metadata Element set is comprised of 15 core elements, which together are referred to as Simple Dublin Core.  Qualified Dublin Core contains an additional three elements as well as a group of element refinements that are knows as terms. The 15 core elements are Title, Creator, Subject, Description, Publisher, Contributor, Date, Type, Format, Identifier, Source, Language, Relation, Coverage, and Rights.  The three additional elements of qualified Dublin Core are Audience, Provenance, and Rights Holder.  Dublin Core is non-hierarchical and each element is optional and repeatable.  One of the most significant goals of Dublin Core is simplicity of creation and maintenance so that it should be easy to use for non-specialists; the effect of this is to encourage the proliferation of metadata records of resources, and by extension, provide for effective retrieval of those resources in the networked environment. One problem that arises between different metadata standards is that people from different fields of knowledge use different terminology to describe the same thing.  For example, the <creator> element can be used to describe an artist, an author, or the creator of an electronic resource.  It is for this reason that the Dublin Core elements are described using a universally understood semantics; this serves to increase the accessibility of resources.  Further, the extensibility of Dublin Core increases its potential for interoperability, by acknowledging that it is likely that other communities of metadata experts will create and administer additional metadata standards to fulfil the needs of their particular community.

The Dublin Core Metadata Element Set also implements some principles that are critical to understanding how to think about the relationship of metadata to the resources they describe.  The one-to-one principle means that Dublin Core metadata describes one version of a resource.  In other words, metadata must be provided for both an artefact and its digital reproduction; one is not taken to represent the other.  The dumb-down principle means that the purpose of a qualifier or term is to refine the information provided by the element, and not to extend it in anyway.  The information must be understandable even if the qualifier is taken away.  The third principle is that of appropriate values, which means that the person implementing the metadata must always bear in mind the requirement of usefulness for discovery.  It is also worth noting that while Dublin Core was originally developed in English, the DCMI has acknowledged the multilingual and multicultural nature of electronic resources and so versions of Dublin Core are being developed in other languages.  In addition to its use for resource description and its interoperability with other metadata standards, Dublin Core Metadata can also be used to provide interoperability for metadata vocabularies in the Linked Data cloud and Semantic Web implementations.

In Ireland, the Dublin Core Metadata Element set has been used to record metadata on a number of projects.  A Digital Edition of Táin Bó Fliodhaise; A Digital Edition of the Alcalá Account Book; Art College Student Registers; Conflict Archive on the Internet; and the Earley and Company Archives have all used Simple Dublin Core to record the metadata of their resources.  Qualified Dublin Core has been used to record the metadata of objects contained in the Irish Virtual Research Library and Archive (IVRLA), now the UCD Digital Library.  The primary objective of the IVRLA Project was to digitise a core number of archival collections held in several University College Dublin repositories.  It took key humanities resources from five repositories containing physical materials in manuscript, printed, audio, video and graphic format.  In many cases, due to the rarity or fragility of these resources, they were not accessible to scholars outside of UCD.  The UCD Digital Library uses qualified Dublin Core to record the metadata on all of the objects in its collections.  As the format of the objects vary, Dublin Core is a suitable metadata standard to use as its semantics can apply to objects in different formats across different communities of knowledge, thus making them searchable and accessible.  MODS and METS are two other metadata standards which could have been applied to the UCD Digital Library.  However both of these standards are more complicated than Dublin Core and require specialist knowledge to implement.  For example, there are seven major parts of a METS document and MODS requires knowledge of MARC21.  Another central reason why Dublin Core was the most useful metadata standard for the Digital Library to use is the variation in the types of objects it digitises; for example, maps of Dublin, data sets from the urban modelling group and photographs of the Irish civil war and 1916 Rising.  Also, perhaps more significantly, many of the collections in the Digital Library are also published to Europeana, which asks that metadata conforms to the Europeana Data Model which incorporates all previous Dublin Core based Europeana Semantic Elements.  Therefore the use of Dublin Core allows for ease of ingestion into Europeana.

Bibliography

  1. Chan, L. M & Zend, M. L. “Metadata Interoperability and Standardization – A Study of Methodology Part 1:  Achieving Interoperability at the Schema Level”. D-Lib Magazine. 12(6) (2006).    Available at: http://www.dlib.org/dlib/june06/chan/06chan.html Accessed 30 October 2014.
  1. “Metadata Basics”. Available at http://dublincore.org/metadata-basics/ 24 October 2014. Web.  Accessed 6 November 2014.
  1. Digital Humanities Observatory. “Digital Research and Projects in Ireland.”  Available at: https://web.archive.org/web/20100303205203/http://dho.ie/drapier/  Web.  Accessed 6 November 2014.
  1. Heery, R. “Review of Metadata Formats”. Program. 30(4) October 1996, pp. 345-373. Web.  Available at: http://www.ukoln.ac.uk/metadata/review.html Accessed 30 October 2014.
  1. Hillman, D. “Using Dublin Core”. Available at: http://dublincore.org/documents/2001/04/12/usageguide/ Web.  Accessed 5 November 2014.
  1. “A Framework of Guidance for Building Good Digital Collections”. 2007. Web. Available at:  http://www.niso.org/publications/rp/framework3.pdf Accessed 6 November 2014.
  1. “Understanding Metadata”. 2004. Web.  Available at: http://www.niso.org/publications/press/UnderstandingMetadata.pdf  Accessed 30 October 2014.
  1. UCD Digital Library. http://digital.ucd.ie/ Web. Accessed 6 November 2014.