This is a talk I gave at the Geek Night in Oxford. Complete Slides of the talk.
A summary of the talk.
With photographs and written word on paper backup/storage is passive, barring physical damage the content will be readable for years or even hundreds of years. With digital media backup/storage is active, if you don't keep testing it, duplicating & moving forward in media format your content will be unusable over time. Compare the box of photos and notes left in the loft for 50 years to a stack of CD-R. Will the discs be readable, will you even have a drive to read them? Chances are the paper will be. This is a problem for all of us as time passes and we collect more content in digital format.
The first dark age occurred around 410AD after the collapse of the Roman empire. The period last until shortly after 1000AD and historically very little is known about this period.
I think we are creating a new dark age. Not like the first one, but a dark ages for family social history, this will be caused by a loss of data.
I've been thinking about this a lot recently, as we now have a daughter. Taking photographs is now part of recording the family history for my daughters future generations. Currently I have around 8000 photos stored in iPhoto all tagged with descriptions, all of which is stored in it's proprietary database, hmmm.
This is a recent phenomenon, the problem is occurring now and the effects of it will not be seen until the future. Photographing has been around since 1850, yet it's only since 2000 that digital cameras have been in common use.
The two big issues are
We have a photo of my daughter's great great great grandmother, the picture is over 123 years old. All being well, will my daughter's great great great grandchildren in 123 years time be able to view the photographs we have taken now? The work to make sure this is possible will be much more than just putting photographs into a box like our ancesters did. It's this 'active' management that is going to be the cause of the 'Dark Ages 2.0'.
Since photographs were invented around 1850 'backups' have been easy, just chuck the negatives and photos in a box. This is what most families have done and it's worked well for generations.
Now we have digital backups to carry out. There are many media options for storing backups and over time these degrade and fall out of use. We have to continually recreate backups, test them and move them forward as new media formats are developed - a lot of 'active' management.
Just imagine a hard drive got 'stuck in the loft' like a box of photos might. After 50 years the photos would be fine. The hard drive? Even if it would spin up, would the data on the disk be readable, and with a USB connection finding a computer to plug it into might prove difficult. 50 Years ago personal computers didn't exist. The progress in the next 50 years will make it very difficult to read media from this era. I found a few old copies of Computer magazines in the back of a cupboard from around the mid 90s with floppy disks on the cover. I can still read the magazine, but I don't even own a 3.5" floppy drive any more to be able to read the disks.
This is all something as technical people we understand, but even as technical people we know we probably don't back up enough. So what chance does an 'average person on the street' stand trying to keep on top of this. I personally know of non-tech friends who have lost photos, some of very important family history moments; weddings, babies, loved ones no longer with us. It's that digital backups are an 'active' process that causes the problem.
Previously backing up/storage was a passive process, by putting things in a box it was done, they would always come out roughly in the same condition they went in. And this passive process has meant that family photos have survived down the generations, now the process is 'active' how much will survive over time?
There are two options with where to store meta-data
- In the image file
- In a separate database
Looking at our old photographs as an example of what to do - if we're lucky somebody wrote the date the photograph was taken and who is in the photograph straight on the back of the photograph. This has stood the test of time well as it's commonly the only way we know who features in the photographs.
Now image that instead of writing the meta-data on the back of the photograph they wrote a number. Then in a separate notebook next to that number they wrote all of the meta-data. Over time the photos and the notebook must always be kept together. What happens if the photos get shared out to different family members, would the notebook get split up or copied (by hand, unless it's in the last 30 years). Also if the notebook got lost then all of the meta-data for all of the photos would be lost for ever. Overall it doesn't sound a very good solution and thankfully most people just wrote on the back of the photographs.
Many photo software packages do store the meta-data exactly as described above, in a separate proprietary database. iPhoto is one such program. When backing up photographs the database needs to be backed up as well. They must always be kept together and in sync. iPhoto provides options for this, but in 50 years time would iPhoto 59 be able to restore a backup from iPhoto 9 with all the meta-data restored? Also what other program would be able to load this proprietary database and extract the meta-data. How much work would be required to obtain this data. Also if this one database file becomes corrupt then all of the meta-data for every photograph will be lost.
The other option is to store the meta-data within the image file; jpg or RAW (preferably though for archiving convert your RAW images to DNG as this open standard will stand much greater chance of continued support in the future). As the old adage goes in computing we like standards - that's why we have SO many of them. And so it is with image meta-data, with the following standards; EXIF, XMP, IPTC & MakerNotes. By using a combination of these standards it is possible to store all of the meta-data required about images directly within the image files.
MakerNotes throws a bit of a spanner in the works as it's internal contents is not a defined standard but a 'binary blob' of data that records all of the camera details when the photo was taken. The ISO, Lens type, shutter speed etc. Unfortunately there is no standard for this data and each manufacture has come up with their own format, and not all of them are documented. If that wasn't bad enough the data has absolute references within it, so changes to the rest of the meta-data can corrupt the MakerNotes. Picassa is one such program that helpfully stores all tags and descriptions in the EXIF/XMP headers but unfortunately does not understand MakerNotes and therefore can corrupt this information in your images files.
Two programs that do handle all meta-data within the image files and correctly handle the MakerNotes are DigiKam & Adobe Lightroom 2. DigiKam is open-source and works on Linux, Windows and Mac OS X.
I think that for the storage of photographs a set of plain file system folders with images in jpg or dng format with all meta-data contained within the images will be the most resilient and most likely to endure archival system for the future.
Even keeping to this very simple storage structure, frequent testing, duplication and transfer across media formats will be required to maintain the archives for future generations.
What's your strategy?
Since giving this talk an article along similar lines has been published in American Scientist called Avoiding a Digital Dark Age which is also worth a read.