The risks to Digital Data Preservation.
I attended a lecture by Dr. Natalie Harrower the Director of Digital Repository of Ireland about Digital Preservation, the beginnings of it, the methods used to preserve it (namely repositories) and the importance of having these repositories in place for the preservation of digital data. The task of digital preservation is a challenging one, more so than that of a tangible item. Unlike a physical item all digital materials have a unifying characteristic, that they are machine dependable, the more technology becomes sophisticated, the more dependency lies on a certain system or piece of hardware to read/store that piece of data down the line. This dependency on different technologies while they are being updated and improved upon, along with the fact that there are different policies and legal considerations that come into play regarding privacy, are all important risks to digital preservation that need to be taken on board when considering this topic.
A digital repository according to Harrower is ‘an infrastructure that provides long term storage, management, and preservation of digital resources as well as reliable access to these resources.’ What I like about this quote is that this type of repository will provide long term storage for these digital resources. In the digital age that we are living in, data and data management is constantly changing. A digital file is stored as a series of bits or binary digits, these bits are then stored on some media device. The problem with this however is that storage media can decay which may lead to corrupted files down the line, these files could be attacked maliciously by a virus or somebody could forget about that storage device or lose it, especially with something as small as a memory stick. I emphasis the last point here as even though I am talking about the importance and risks to digital preservation I am the number one suspect when it comes to losing memory. However, the biggest or what I believe the biggest threat to digital preservation to be is that this new technology is being created at such an accelerated rate that preservationists have had to adapt very quickly, projects that were stored on a floppy disk 10 years ago are now obsolete, even data saved to a physical hard drive has now moved to data being saved on cloud storage.
Metadata is at the core to this preservation, particularly for the Digital Repository of Ireland. According to the National Information Standards Organisation, metadata is ‘structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource.’ Meta data not only helps with the preservation but it will also store information like who has created the data, who has the right to that information and if there are any intellectual or property rights associated with said data. This however this can give rise to problems in itself, namely the standards of the metadata. With this rise in different digital technologies as mentioned above preservationists have had to adapt quickly. The four most common standards for metadata are Dublin Core, EAD, MARC, MODS and these are also the four which are supported by DRI. This standardisation on metadata means that a certain community will be all singing essentially off the same hymn sheet, it results in a consistency in the metadata which will lead to more accurate search and retrievals of that information.
With directories, repositories or any file sharing software, privacy and other security risks can be a threat. Not only is keeping the data safe and secure of primary concern but also by sharing files you or opening access of files to others, you may be giving out sensitive or personal information. This is why there is an introduction of certain restrictions on who can actually access the information but with that information still being ‘open access.’ For example taking a look at the Magadalene Oral Histories Collection, this is an extremely sensitive topic, one which deals with individual stories and personal details, especially individuals who are still alive today. There has to be certain procedures put in place for the protection of this information. On the other hand, this was a collection which was built by Maynooth University as a means of research so this information would need to be accessed by a variety of different users. By introducing levels of access to this information it resulted in the information being available for teaching and research purposes only. Also, all information within the collection was anonymised, so although the information was still valuable there was no way of being able to pin point it to a story or example to an exact individual. This collection on DRI is an example of how it is important to preserve the data but in a safe way, where individuals personal information is not put at risk.
What metadata used to store this information also needs to be carefully scrutinized from a security and risk perspective. Using a slightly different example in 2011 a torrent file was leaked by a member of the Anonymous group (the vigilante hacker group) with files namely from the US Chamber of Commerce and the American Legislative Exchange Council. The files that were released were all document based meta data from different file types, word, pdfs, powerpoints etc… Although the information was ‘only’ metadata some of the information which was released was sensitive which could have put individuals at risk. This information included network IDs, email addresses, IP addresses and operating systems etc… ,on their own not a whole lot could be done with this information but it could open up the potential for a phishing scam or malware being introduced into their network. I understand that this is extreme example and comparing this to the metadata used in the Magadelene Oral Histories example it is a lot different, some may argue that I am over exaggerating the risks involved however security breaches or data being leaked can start at a small scale like the one above. It is a possibility and it is an important thing to consider especially when it could put an individual, or individual’s information (personal or otherwise) at risk.
Digital Preservation Coalition. “DCC Curation Lifecycle Model.” Accessed December 2016 http://www.dcc.ac.uk/sites/default/files/documents/publications/DCCLifecycle.pdf
Digital Preservation Coalition. “Digital Preservation Briefing”, Digital Preservation Handbook, 2nd Edition, Digital Preservation Coalition, 2015. Accessed December 2016 http://handbook.dpconline.org/docman/digital-preservation-handbook2/1552-dp-handbook-digital-preservation-briefing/file
DRI “Meta Data Quality Control” January 2015, accessed December 2016 http://dri.ie/sites/default/files/files/metadata-quality-control.pdf
O’Carroll, A. and Webb, S. “Digital archiving in Ireland: national survey of the humanities and social sciences” National University of Ireland Maynooth, 2012. Accessed December 2016 http://dri.ie/digital-archiving-in-ireland-2012.pdf
Ragan, Steve “Study examines the problem with metadata and file sharing” CSOonline.com , July 2014. Accessed December 2016 http://www.csoonline.com/article/2456087/business-continuity/study-examines-the-problems-with-metadata-and-file-sharing.html