Maybe some of you have migrated your collections to hard disks (HDD). You may have had a collection of DVDs and CDs but copied them on HDD, got rid of all the physical collections and are quite happy about it. An 1TB hard disk costs less than DVD box set these days and can hold hundreds of movies. Directories and folder management is much easier and faster than going through a clatter of disks. I do have some movies on HDD and find them quite useful and easy. But when it comes to trusting HDD, it’s a different story.

There is something called MTBF. It is Mean Time Between Failures. This has been a standard for many advanced electrical and mechanical components and HDD industry used it to measure integrity and reliability of their products. MTBF is a very confusing metric. For example, a particular product boasts MTBF of 1 million hours. This means when it recovers from one failure, the next failure will occur one million hours later, 114 years… Not quite. It only means this particular manufacturer tested probably 3,000 drives of this series for a month and total of 2 failures are observed. Otherwise, nobody can test a hard disk drive for 114 years.

Of course many of you know something called “HDD crash”. A HDD is not only a electrical device but a mechanical device, quite sophisticated one. There is a read/write head and disk. The head writes and reads data on a spinning disk, without physically touching the surface of the disk. But to read/write precisely, it has to fly at extremely low altitude. It is roughly 10nm (1 nanometer = 0.000001 mm) apart. If the head bumps on the disk, it causes the scratches and defects, which would make the HDD unreadable.

How small is this gap of 10nm? If you scale this read/write head to Boeing 747, it is like the plane flying at 1mm from the ground. That’s 10nm. If your HDD crashes, you curse on it. Nobody will expect a Boeing 747 flying at 1mm from the ground for thousands of hours to be scratch-free. It is an amazing technology, but people take it for granted.

That is why you need a backup for data on HDDs. All the data centers in the world have data backup strategies in place. Mirroring, instant copying (snapshots), RAID, multiple copies, data distribution across the continents, backup tapes and other technologies to make sure your e-mail data, bank accounts, blogs, photos online and other critical information are intact. There is a quite interesting technical article by Google engineers on HDD failures. They had tracked large numbers of HDD in their servers, monitoring their error increase and retire rate. Quite fascinating but also scary, because so many HDDs are retiring less than a year.

The problem with HDD crash is not loss of data. It is total loss of data. Even if it’s a small crash at one isolated area, in many cases, you cannot read any data from the drive, not one file. Then, if you store your favorite films on a HDD, the crash will destroy whole library. Not one bit will remain. This is something completely different from physical film. If you damage a part of the film, still you can edit them out, splice the rest together, and watch it with slight sudden jump in frames.

This is why I was somewhat concerned when I saw the video clip above last year. Russian film archive, Gosfilmofond, had arranged transfer of early American films not available previously to digital format and send first ten of them to Library of Congress… in HDDs. When we are talking about film buffs personal collections in HDDs, the clash would only bring personal dismay and anguish, but it is not serious anyway. But when it is a national treasure handled by Library of Congress, it is a different story. I strongly believe they must have made the multiple copies of the HDDs onto other HDDs or magnetic tapes immediately after and stored them in several different locations. If one of these original HDDs were to clash, there should be another to replace it. For better or worse, this practice of digital film archiving will become standard for many institutions and libraries, since this will save a lot of floor space, increase the accessibility and seems to lessen the management burden. It seems more “advanced” than arrays of rusting cans of films, but in three to five years they have to deal with decaying HDD components and migration issues.

And yet, there are more complications and issues in digital data archiving other than physical degradation of storage media. Issues of migration, economics of retention and digital restoration etc. I will revisit them from time to time here in this blog sometime.