2

The drives on a perc h700 controller running RAID5 were experiencing errors, so I copied out the contents to images using ddrescue. All drives had some bad blocks, but most (>99.98%) of the bits were read successfully, and I now have a hard drive with image files for each of the drives that were part of the raid array.

Now that I have images of all the drives in a raid array, is there some way to use software (perhaps mdadm?) to access the files stored in the array? Or can that data only be accessed by using the controller that created the array?

1 Answer 1

4

Yes, with Linux MD RAID and its mdadm userspace interface tool you should be able to assemble the RAID even from PERC which is rebranded LSI/Avago/Broadcom MegaRAID. It uses SNIA DDF on-disk format, which mdadm is reffering to as User-Space Managed External Metadata Format.

You've taken the correct path of making raw images of all drives. The second step I strongly suggest is to set up overlays so to not write any changes to images themselves but to overlay files; that will make you infinite number of attempts. It's described in the Linux RAID wiki page conveniently named Recovering a damaged RAID. Or, you can use Qemu's qcow2 images and qemu-nbd to achieve the same essential outcome as I described here.

After setting up overlays you'll have a bunch of /dev/mapper/xxxY or /dev/nbdY virtual block devices. Now attempt to assemble RAID from these devices: start with mdadm --examine --scan. Don't hesitate to experiment; if something goes wrong you can always just re-create overlays (or create another set) and start over; that's exactly is what they are for.

Don't throw away ddresque's log files. They contain important information about which blocks of your physical drives were bad (and were filled with zeros by ddrescue). You will need to find out which stripes they are mapped into in the assembled array. All of these stripes need to be checked. Most likely at least one data block in these stripes will be damaged, even if just one block is wrong, unless that block happened to contain parity data. This is because all blocks are readable from images; the parity now just mismatch and it will be recomputed from other blocks, one of which is wrong, instead of using the parity to recompute the wrong block. Traditional RAID has no way to determine which one of these blocks is wrong when all of them are readable but there is a mismatch (except for partially degraded RAID6), so parity is considered the offending one.

Now, to recover these blocks you may go long and hard road of partially assembling arrays not including the drive where that block was faulty (of course with the fresh set of overlays!); then md will correctly use (hopefully correct) other blocks and parity to recover the stripe data without the damaged block. Since all of your drives display bad blocks, you probably will need to do this N times, each time excluding next drive. After extracting all the damaged stripes that way, you assemble the array the usual way and dd the recovered stripes back into the assembled array, replacing all-zeros blocks with correct data.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .