You are here

A crashed off-site RAID drive

Here are some more observations on using RAID on the Mac OS X, particularly in terms of off-site storage, terminology, and upgrading. Here is a photo of my former off-site hard drive:

WD 7500 with the case opened

It's been sitting in an office desk drawer for a couple of months, and the time came to cycle it back into the RAID set. But when I tried to spin it up, I was greeted by a disappointing rattle, and the drive didn't come on-line. The drive, a WD 7500 AAKS, was 14 months old when it died. In the photo above, I've removed the case cover in preparation for an autopsy.

To recap, I have a Mac Pro, which has 4 drive bays. I routinely use one bay for the system and two for a RAID set that holds the system's Time Machine backup. Here is my process for swapping out the off-site copy:

  • I shut down the Mac Pro.
  • I remove one of the hard drives from the RAID set. On the Mac Pro, I simply have to pull out the drive carrier, a metal flange that lets me plug a SATA drive directly into, or out of, its signal and power connectors.
  • I put a Post-it on the flange noting the date of removal.
  • I store the hard drive, flange and all, in an anti-static bag. Then I put the bag in a padded envelope. Now I've added another padded envelope outside that one. The package is marked "Personal Property" so that company auditors don't get confused by this non-company equipment.
  • Meanwhile, I take the former off-site drive and erase the drive header using an MS Windows system. I use the MS DOS command diskpart. This is because OS X will confuse a former RAID set member for a current member. There seems to be no way for OS X to erase this drive without messing up the RAID set.
  • I install the erased drive into one of the empty Mac Pro drive bays.
  • I power up the Mac Pro. The RAID set comes on-line, marked as "Degraded" since I've removed one of the drives. The replacement drive comes up as erased.
  • I start DiskUtil and give the replacement drive a quick HFS+ format. Then I drag it into the "degraded" RAID set. I ask it to rebuild the set and, yes, it can erase the new drive (again). The rebuilding begins.
  • Recently DiskUtil has been crashing shortly after it starts the rebuild. But I find it resumes the rebuild just fine when I restart DiskUtil. The total rebuild takes several hours. I can do other work while it proceeds.
I repeat this process every several weeks. If the house blows up, or if someone steals my computer (unlikely, since it's over a year old, and no longer worth much itself), I haven't lost everything.

The Dead Off-Site Drive

So far, I'm not sure if the drive was mishandled, or if it died a 'natural death.' It's possible that someone was rifling through the desk, took out the package with the hard drive, bounced it on the floor, and put it back. On the other hand, that seems unlikely.

In any case, I've increased the amount of packaging on the off-site hard drive. Hopefully that will further reduce the risk of physical damage.

It's important to exchange drives every couple of months for two reasons:

  • If something dramatic ruins the system at home, I still have the off-site backup. So I haven't lost more than a couple months' work. Since the drive has my family library on it, I really don't want to risk losing the whole thing.
  • If the off-site drive fails (like this one did) then the periodic swap will detect the failure.
Now I'm tempted to keep a fourth drive in a safe deposit box. The only problem is that I will still need a way to verify that drive's condition. I don't want it to die in storage without the death being detected.

I try to keep my entire family library on-line because it's the only way I can continuously assure its integrity. If I store things off-line, I can't assure that the off-line copies will remain readable. If the files are on-line, then I regularly re-write the information when I swap disks in the RAID set.

I used to burn DVDs to hold some archives off-line, but it's not really reliable or practical any more. If I burn parts of it to DVDs, then I have to periodically verify the DVDs to verify its integrity. Thanks to modern camera technology, I have too many gigabytes of family mementos to save them to DVD. It's just not practical to burn all those DVDs and monitor their condition.

My current approach is not 100% foolproof. A tornado could in theory take out both the office building and our home computer, though they're far enough apart to make it unlikely. It's also possible that many files might get damaged and the damage go undetected until after I've swapped out the off-line copy. I've thought about ways to detect this using Tripwire (0r a vairant) on a separate "archive" copy of critical files.

Opening a WD 7500

Most hard drives I've opened have been ancient things holding a mere 2 GB or less. The 7500 is the largest and most modern drive I've tried to open.

All modern drives rely on bolts with "Torx" or other exotic heads to keep the case intact. The bolt holes should be obvious in the above photo. Most of the bolt heads were covered by round plastic stickers. One bolt was covered by part of the drive's paper label. I had to scrape away that part of the label. Generally you can find that bolt by finding an indentation under the label.

The drive had 8 or 9 Torx bolts. I used a T-8 Torx bit to remove them. Unlike older drives, this one also had a bolt over the drive's spindle. That bolt required a 1/16 hex head bit to remove.

The drive has 4 separate platters. I look forward to taking them apart.

RAID Terminology

A reader recently pointed out that I misused RAID terminology in an earlier RAID post. I've corrected the post. I've also taught myself a mnemonic to remember which RAID is which:

RAID stands for Redundant Array of Inexpensive Disks.

  • RAID 0 = RAID "striping," in which we interleave data between two disks. It's numbered 0 (zero) because it's not real RAID: it's not Redundant.
  • RAID 1 = RAID "mirroring," in which we write the same data to two different disks. This is numbered 1 (one) because this is the first real RAID configuration: it's a redundant array.
I don't generally deal with the higher RAID modes, because I get what I need from RAID mirroring (RAID 1).

RAID Upgrading

The last time I upgraded my RAID configuration, I went from 250 GB drives to 750 GB. This time I'll probably go to 1.5 TB drives. I'll use an old 750 GB drive as my new system drive.

I already owned an extra 750 GB drive. I had formatted it to use as my next system drive, but it was pressed into service in the RAID array when the off-site drive failed.

Originally I'd figured that "any old drive would do" for my RAID array. I don't do "real work" on the array - I only use it for the Time Machine backups, which take place in the background.

If I make it a habit to recycle RAID drives as system drives, then I have to buy "reasonable quality" drives for the array. My system drive should be reasonably fast.

In the future I might use one "really fast" drive in the RAID array. It probably won't improve Time Machine performance, but I can use the faster drive when I upgrade my system. The slower drives get recycled and the faster drive serves as my new system drive. I might do that when I move up to 1.5 TB.

Post category: 
Wordpress tag: 

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer