Supporting a RAID 6 with 2 Failed Drives – Help and Advice

Here’s an interesting enquiry we received yesterday from a London company with a RAID 6 server. The company had noticed that 2 of the 7 drives on their RAID 6 had failed and were now paranoid about another drive failing and data being lost. The files held on the RAID 6 machine were important archive data and the company didn’t want to power the server down either as they believed it would not come back up again.

Additionally, this information came from the customer:

“We don’t have any backups in place at the minute. Is [recovery] a service you are able to provide please? As an interim measure we were backing up onto external HD’s but it takes a very long time.

It’s a custom system: 7x 4TB drives in a RAID 6 array, with a LSI 9266-8i card. Windows Server 2008 r2

The array has used about 18.3TB, with around 3.48TB free.”

I got one of our technicians involved who, once he’d familiarised himself with the situation replied:

Firstly, transferring 18.3TB of data from a double degraded RAID 6 is quite possibly the worst thing you could do, the amount of strain this will put on the system would be far worse than a rebuild would be. It is also impractical time scale wise and even at 100MB/s (which I highly doubt it will achieve) it will take approx 52 hours. There is also the issue of where do they store 18TB?

My personal advice for them if the data is that important is to power it down, clone the disks which will act as a backup then attempt the rebuild, doing this will be quick and will allow for any drives wth bad sectors to be identified before the rebuild procedure. If they cannot take the system down I think the rebuild will be the safest option.

As a service I would offer to take the server down, clone the drives (creating a backup) before a rebuild to make sure they do not compromise the data, but if it needs to be live, then they have very limited options.

After consulting with the customer, the following process was applied:

Stage 1 – Imaging and Duplicating Data. Creating a safe and secure back up

1. Arrive onsite
2. Safely shut the server down
3. Remove hard drives (and RAID card if necessary)
4. Return to Data Clinic to copy each drive and preserve data. (We recommend this is done at our offices as we have tools which will copy the data from the drives considerable quicker than what is available to you).

At this point we will have safe created an exact duplicate of your data.

Estimated time: 24 hours

Work Stage Two – Introducing two new drives, rebuilding the server and bringing the system back online

5. Arrive back on site
6. Introduce new drives into server and rebuild server
7. Bring system back online

Estimated time: 24 hours (Due to the amount of data, the rebuild will take considerable time)

Useful Links