Recovering Data from a Failed Dell MD1000 RAID 5 Server
- Dell MD1000 RAID 5
- 15 x 2TB Seagate Barracuda XT’s
- 1 Drive is an unused hot spare
- Windows Server
Several drives had simultaneously failed resulting in the drives going offline and the server crashing. Restarting the server made no difference, the failed drives remained offline and the data was inaccessible.
During the initial consultation we asked the client what attempts (if any) had been made to get the system working again. They replied “In attempts to bring the data back online we attempted to import a foreign config to the server, which failed. The unit was power cycled a couple of times as well, out of desperation, which was probably a mistake as I think it’s power-cycling that knocked the drives out initially”
Trying to import a foreign config into a failed server is an extremely risky action, and is best avoided, but when crucial company data goes offline, panic can easily set in and the strangest decisions about how to bring crashed servers back online can be taken. It’s a good job it didn’t work as the server’s config data would have been overwritten by completely foreign data which would have been disastrous.
Power cycling the hard drives is not a good idea either, especially if the server was mid-boot as the hard drives are accessing data continuously.
We’ve a useful web page which describes actions that should be avoided when a server crashes here http://www.dataclinic.co.uk/raid-or-server-failure-the-top-5-things-to-avoid/.
Analysis showed that the server was set up in a RAID 5 configuration and all hard drives were electronically and mechanically sound. Analysis of individual drives revealed some had bad media problems that couldn’t be repaired (see more info about bad media here). Also the server’s RAID 5 configuration data had become corrupted.
To recover the RAID data it was therefore necessary to make working images of all the hard drives using specific recovery hardware that would successfully read data from the bad areas of the hard drives with bad media. Once completed the next step was to work out the correct RAID configuration. We would then be able to determine if the data was still intact or whether it had become corrupted. After a Data Clinic tweak or three we were able to successfully recover all the customers data.
> Information Links