Re: Software RAID FAILED!

From: Matthew Schumacher <schu@schu.net>
Date: Mon Apr 24 2006 - 13:37:03 AKDT

Damien Hull wrote:
> Last week I had a software RAID 1 set fail. It's working now but I am
> wondering what went wrong. I also want to know why I was unable to reboot.

> QUESTIONS
> 1. Why did it fail ( drives seem to be fine )
> 2. Why didn't the system reboot from /dev/md0
> 3. Shouldn't one drive still work after something goes wrong?
>

1. I've seen this in both hardware and software raids. If the drive
fails to respond to a command for whatever reason then it is marked bad
even though it may be working fine. If this is a common problem I would
take a look at how hot the drives get. If you have a heat issue then
you might see intermittent problems like this.

2. In order for the machine to mount root on a raid 1 mirror a couple
of things need to happen:

  A. The bios needs to be set to try the second disk then the first disk
needs to fail enough for the bios to skip it.

  B. The bootloader needs to be installed in the MBR or in the active
partition of both drives. In order for this to work you need to
manually install the bootloader on each drive, or use a bootloader that
understands what /dev/md0 is and discovers where to put the actual boot
code.

  C. The kernel needs to be able to detect the volumes and start the
raid 1 virtual device (/dev/md0) before it goes to mount root in ro
mode. This may require an initrd image and pivot_root on some systems
depending on if your using evms or just md.

3. Just because one of the drives works doesn't mean it will boot,
there are a lot of steps to making raid 1 work on the root filesystem
correctly.

I recommend that you do some testing on a lab machine so that you know
your config is correct. Whenever I build a system with raid1 in
software I always shut it down, then unplug the primary disk and start
it back up to confirm that the second disk is configured to boot correctly.

schu
---------
To unsubscribe, send email to <aklug-request@aklug.org>
with 'unsubscribe' in the message body.
Received on Mon Apr 24 13:37:25 2006

This archive was generated by hypermail 2.1.8 : Mon Apr 24 2006 - 13:37:25 AKDT