The Durability of Data: When Good Drives Go Bad

One of the topics that came up in the very first session of our core class, the appropriately titled “Digital Humanities: Theory and Practice”, was the issue of preserving digital data.  It is an easy assumption that once a book is scanned and backed up onto a drive, it is safe for all eternity—but that is far from the case.

Here’s a simple example: have you ever backed something up to a writable CD, only to come back a few years later and have your computer report that the disc is completely unreadable?  A compact disc, like any media, is subject to decay, and unfortunately those early CD-Rs decay more rapidly than most.

But old CDs aren’t the only way data can be lost: it is also very easy to inadvertently destroy data. Consider the poor laptop I used throughout my four years as an undergraduate: when I departed for Japan, I took a new computer with me.  While I was gone, one of my family members found my old computer and decided to put it to use, backing up its original contents to a portable drive and reformatting the laptop itself.  When, a few years later, I was visiting home and needed to pull some files from my old computer, I found the drive reinitialized.  Worse still, the hardware that had been used to back up my drive had already become out of date, and we didn’t have any way of reading it.  When we finally dug up a machine that could pull the data, we found that the backup had become corrupted—probably demagnetized.  In the years following, I managed to piece together much of my undergraduate career from various backups I found lying around, but it has been a slow and painful process.  In the end, most of my third year of study is lost, as are all the source files of the music I wrote while a student.

As one might expect, the experience led me to become somewhat preoccupied with the preservation of digital data.  When I finally left Japan and moved back home for a spell, the first thing I did was to round up everything I could get my hands on—like old 3.5″ floppies from my childhood, Iomega Zip Disk backups I made in college, the decrepit old Compaq computer I used in high school—and move everything I possibly could to a latest-generation external hard drive, where it is safe, at least for the time being.

As it turns out, this very personal concern of mine of preserving my own digital history is one that is shared throughout the realm of technology.

One example I found particularly fascinating is Jordan Mechner’s discovery of the original Apple II source code for his groundbreaking game Prince of Persia.  To the uninitiated, this is a mere curio, but to a programmer or historian interested in electronic entertainment, it is a major event: Mechner compares the source code of a game with sheet music—it can be studied, broken down, and analyzed from a perspective that the finished product or performance cannot—and the original Prince of Persia is an important artifact in electronic entertainment history.  Not only was the game notable for breaking new ground in the area of cinematic platforming and rotoscoped animation in games, it also was popular enough to spawn a major game franchise and a movie tie-in, but the original title was, by Wikipedia’s count, ported to over 20 other platforms, and remains a seminal title in 90’s computer gaming.

What makes the find so compelling in my eyes, however, is the medium it was on: 3.5″ Apple II floppies.  Think about it: these disks turned up only two years ago, in 2012.  If you are one of the proud owners of an Apple II computer, do you still have it?  Does it still work?  Perhaps it does; perhaps you loved the old thing and cared for it well, and were lucky enough not to have it decay.  Well, do you have any way of connecting the data from an Apple II to one of your next-gen computers, and still have it be readable?  Therein lies the rub: how do you bridge the gap between old formats and new?  Jordan Mechner was fortunate enough to have found something that people with the necessary skills thought was worth saving, and he found it at the right time.  As he puts it on his blog: “The 1980s and the Apple II are long enough ago to be of historical interest, yet recent enough that the people who put the data on the disks are still with us, and young enough to kind of remember how we did it” (Mechner).

Ultimately, fortune smiled upon Prince of Persia, and the source code was still intact on the old disks, and has now been extracted and posted online.  But the process wasn’t an easy one, and the code’s author asserts that much of the difficulty came from the fact that the preservation involved digital media:

Pretty much anything on paper or film, if you pop it in a cardboard box and forget about for a few decades, the people of the future will still be able to figure out what it is, or was. Not so with digital media. Operating systems and data formats change every few years, along with the size and shape of the thingy and the thing you need to plug it into. Skip a few updates in a row, and you’re quickly in the territory where special equipment and expertise are needed to recover your data. Add to that the fact that magnetic media degrade with time, a single hard knock or scratch can render a hard drive or floppy disk unreadable, and suddenly the analog media of the past start to look remarkably durable (Mechner).

So, if digital media are so fragile, what is the point of Digital Humanities?  Digital Humanist Abby Smith writes that the perceived impermanence of data is a significant obstacle to digitisation in academia, and then goes on to describe the very limitations that Jordan Mechner experienced above.  Why are we so concerned with cramming books into little magnetic drives if getting the information off of them ten years down the line is going to be akin to digital brain surgery?

In a 2002 report for the Council on Library and Information Resources, Daniel Greenstein and Abby Smith outline four proposed ways to combat the fragility of digital data. In short: reformatting information as technology changes; preserving data along with the platform and hardware it depends upon; developing emulation of older environments; and persistent object preservation, which involves recording all the context and properties necessary to make data persistent.  All of these approaches, however, are focused on saving one instance of a digital object, which fails to take advantage of one of the main strengths of our current digital age.

A different answer can be found in what Jordan Mechner did just after he successfully extracted that very source code that inspired his reflection: he put it online. It is far easier to make an exact copy of a megabyte of data than it is to make a perfect duplicate of a book.  And furthermore, that data can be much more easily shared than a physical book can.  Now that the Prince of Persia source code has been put online and downloaded untold times, if, ten years from now, its creator again finds himself unable to locate the original code, it is incredibly likely that someone will simply be able to e-mail it to him.  This kind of widespread duplication may terrify copyright holders, but it is a preservationist’s dream.

As digital humanists, then, our very act of making artifacts accessible digitally contributes to their preservation.  Should something happen to the archive of data we have created, chances are that the items will still exist on a drive somewhere on the Internet, thanks to its being downloaded by an amateur historian somewhere.  Furthermore, burgeoning initiatives such as the Open Archives Initiative Protocol for Metadata Harvesting aim to facilitate data sharing among scholarly institutions, further helping to preserve data by allowing multiple copies to be hosted in different locations.

Even with data copied in multiple locations, however, we should not become complacent in thinking it safe.  Equally important is making sure that what we DO have is stored on the most current hardware possible and in contemporary formats, lest it become lost in the hieroglyphs of an archaic format, much like the Prince of Persia source code nearly was.

References:

Greenstein, Daniel and Abby Smith. “Appendix 2 Digital Preservation in the United States: Survey of Current Research, Practice, and Common Understandings.New Model Scholarship: How Will It Survive? Council on Library and Information Resources, March 2002. Web. Accessed 22 October 2014. Weblink: http://www.clir.org/pubs/reports/pub114/appendix2.html

Mechner, Jordan. “Prince of Persia Source Code.Jordan Mechner – Archive, 17 April 2012. Web. Accessed 22 October 2014. Weblink: http://www.jordanmechner.com/archive/#2012-04-source

About OAI.Open Archives Initiative. Web. Accessed 22 October 2014. Weblink: http://www.openarchives.org/OAI/OAI-organization.php

Smith, Abby. “Preservation.” A Companion to Digital Humanities. Ed. Susan Schreibman, Ray Siemens, John Unsworth. Oxford: Blackwell, 2004. Web. Accessed 22 October 2014. Weblink: http://www.digitalhumanities.org/companion/

Wikipedia: Prince of Persia (1989 Video Game). Last Modified 15 October 2014. Web. Accessed 22 October 2014. Weblink: http://en.wikipedia.org/wiki/Prince_of_Persia_%281989_video_game%29

This entry was posted in Theory and Practice and tagged , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>