Re: Software RAID FAILED!

From: Damien Hull <dhull@digitaloverload.net>
Date: Mon Apr 24 2006 - 22:44:43 AKDT

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I'm going to setup a server on my test network.

1. Super Micro 1u
2. SATA drives
3. default install of CentOS 4.3

I'll switch drives etc... and see if I can boot from both.

Matthew Schumacher wrote:
> Damien Hull wrote:
>
>>Last week I had a software RAID 1 set fail. It's working now but I am
>>wondering what went wrong. I also want to know why I was unable to reboot.
>
>
>>QUESTIONS
>>1. Why did it fail ( drives seem to be fine )
>>2. Why didn't the system reboot from /dev/md0
>>3. Shouldn't one drive still work after something goes wrong?
>>
>
>
> 1. I've seen this in both hardware and software raids. If the drive
> fails to respond to a command for whatever reason then it is marked bad
> even though it may be working fine. If this is a common problem I would
> take a look at how hot the drives get. If you have a heat issue then
> you might see intermittent problems like this.
>
> 2. In order for the machine to mount root on a raid 1 mirror a couple
> of things need to happen:
>
> A. The bios needs to be set to try the second disk then the first disk
> needs to fail enough for the bios to skip it.
>
> B. The bootloader needs to be installed in the MBR or in the active
> partition of both drives. In order for this to work you need to
> manually install the bootloader on each drive, or use a bootloader that
> understands what /dev/md0 is and discovers where to put the actual boot
> code.
>
> C. The kernel needs to be able to detect the volumes and start the
> raid 1 virtual device (/dev/md0) before it goes to mount root in ro
> mode. This may require an initrd image and pivot_root on some systems
> depending on if your using evms or just md.
>
> 3. Just because one of the drives works doesn't mean it will boot,
> there are a lot of steps to making raid 1 work on the root filesystem
> correctly.
>
> I recommend that you do some testing on a lab machine so that you know
> your config is correct. Whenever I build a system with raid1 in
> software I always shut it down, then unplug the primary disk and start
> it back up to confirm that the second disk is configured to boot correctly.
>
> schu
> ---------
> To unsubscribe, send email to <aklug-request@aklug.org>
> with 'unsubscribe' in the message body.
>
>
>

- --
You can get my public PGP key at https://keyserver.pgp.com
http://www.digitaloverload.net
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.7 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFETcVb+rNhalK/8UURAlyxAJ4u3lTam3tBIHlUu4cMcx+Lb+GKxACfad4o
8iwNYXb3e/ucRjfqMqRBn3M=
=tkwW
-----END PGP SIGNATURE-----
---------
To unsubscribe, send email to <aklug-request@aklug.org>
with 'unsubscribe' in the message body.
Received on Mon Apr 24 22:45:20 2006

This archive was generated by hypermail 2.1.8 : Mon Apr 24 2006 - 22:45:20 AKDT