[Fwd: Re: mdadm (Linux Raid)]

From: damien hull <dhull@digitaloverload.net>
Date: Tue Mar 08 2005 - 16:52:39 AKST

-- Attached file included as plaintext by Ecartis --
-- Desc: Forwarded message - Re: mdadm (Linux Raid)

Subject: Re: mdadm (Linux Raid)
From: damien hull <dhull@digitaloverload.net>
To: Scott Johnson <scott.a.johnson@gmail.com>
In-Reply-To: <7bf353fc0503081528d58c469@mail.gmail.com>
References: <7bf353fc0503081528d58c469@mail.gmail.com>
Content-Type: text/plain
Date: Tue, 08 Mar 2005 16:52:08 -0900
Message-Id: <1110333129.9547.72.camel@server.digitaloverload.local>
Mime-Version: 1.0
X-Mailer: Evolution 2.0.2 (2.0.2-3)
Content-Transfer-Encoding: 7bit

What distribution are you using?

WARNING! REBUILDING A RAID SET CAN RESULT IN THE LOSS OF DATA. MAKE SURE
YOU ARE WORKING WITH THE FAULTY DRIVE AND NOT THE GOOD ONE.

I've got raid 1 setup on Fedora Core 3. It's using mdadm for raid
management.

I have the following setup
1. /dev/md0 = /boot
2. /dev/md1 = / (everything but /boot)

To see how /dev/md1 is doing I run the following command
        mdadm --misc --detail /dev/md1

This will give me the following output

        [root@server ~]# mdadm --misc --detail /dev/md1
        /dev/md1:
                Version : 00.90.01
          Creation Time : Sun Jan 23 02:07:04 2005
             Raid Level : raid1
             Array Size : 159975168 (152.56 GiB 163.81 GB)
            Device Size : 159975168 (152.56 GiB 163.81 GB)
           Raid Devices : 2
          Total Devices : 2
        Preferred Minor : 1
            Persistence : Superblock is persistent
        
            Update Time : Tue Mar 8 15:52:02 2005
                  State : clean
         Active Devices : 2
        Working Devices : 2
         Failed Devices : 0
          Spare Devices : 0
        
        
            Number Major Minor RaidDevice State
               0 8 2 0 active sync /dev/sda2
               1 8 18 1 active sync /dev/sdb2
                   UUID : 7000452b:ab133e1d:57d95846:e879ae3d
                 Events : 0.15981
        
I'm using RAID 1 so I only have to drives to worry about. If one were to
go out of sync I could remove it from the RAID set and then bring it
back on line.

I'm going to take one drive off line. I don't recommend anyone do this
on a production server but I need the practice.

        mdadm --manage --set-faulty /dev/md1 /dev/sdb2

The above command sets the drive to faulty. I need to do this here
because you can't remove a drive that's not set to faulty. If this were
an actual failure it would already be set to "faulty".

After running the above command I can check the raid set with

        mdadm --misc --detail /dev/md1

The above command gives me the following output

        /dev/md1:
                Version : 00.90.01
          Creation Time : Sun Jan 23 02:07:04 2005
             Raid Level : raid1
             Array Size : 159975168 (152.56 GiB 163.81 GB)
            Device Size : 159975168 (152.56 GiB 163.81 GB)
           Raid Devices : 2
          Total Devices : 2
        Preferred Minor : 1
            Persistence : Superblock is persistent
        
            Update Time : Tue Mar 8 16:19:42 2005
                  State : clean, degraded
         Active Devices : 1
        Working Devices : 1
         Failed Devices : 1
          Spare Devices : 0
        
        
            Number Major Minor RaidDevice State
               0 8 2 0 active sync /dev/sda2
               1 0 0 -1 removed
               2 8 18 -1 faulty /dev/sdb2
                   UUID : 7000452b:ab133e1d:57d95846:e879ae3d
                 Events : 0.1599251

As you can see /dev/sdb2 is marked as "faulty". Check to see if yours is
telling you that a drive is faulty. If so we can try and fix it by doing
the following.

1. remove the drive from the array.
        mdadm /dev/md1 -r /dev/sdb2
2. add the drive back
        mdadm /dev/md1 -a /dev/sdb2
3. check the array
        mdadm --misc --detail /dev/md1
you should see the following output

        /dev/md1:
                Version : 00.90.01
          Creation Time : Sun Jan 23 02:07:04 2005
             Raid Level : raid1
             Array Size : 159975168 (152.56 GiB 163.81 GB)
            Device Size : 159975168 (152.56 GiB 163.81 GB)
           Raid Devices : 2
          Total Devices : 2
        Preferred Minor : 1
            Persistence : Superblock is persistent
        
            Update Time : Tue Mar 8 16:27:58 2005
                  State : clean, degraded, recovering
         Active Devices : 1
        Working Devices : 2
         Failed Devices : 0
          Spare Devices : 1

         Rebuild Status : 0% complete
        
            Number Major Minor RaidDevice State
               0 8 2 0 active sync /dev/sda2
               1 0 0 -1 removed
               2 8 18 1 spare /dev/sdb2
                   UUID : 7000452b:ab133e1d:57d95846:e879ae3d
                 Events : 0.1599573

As you can see the "Rebuild Status : 0% complete" tells us that the
array is trying to rebuild the drive.

The short version for all of this is as follows.

1. check the status of the array
        mdadm --misc --detail /dev/(array you want to check)
2. If one of the drives is marked "faulty" you can remove it. You can
not remove a drive that is not marked "faulty" because it's being used.
        mdadm /dev/md1 -r /dev/(faulty drive you want to remove)
3. Add the drive back into the array.
        mdadm /dev/md1 -a /dev/(drive that you want to add)

Lets see how my array is doing on the rebuild.
        mdadm --misc --detail /dev/md1

        /dev/md1:
                Version : 00.90.01
          Creation Time : Sun Jan 23 02:07:04 2005
             Raid Level : raid1
             Array Size : 159975168 (152.56 GiB 163.81 GB)
            Device Size : 159975168 (152.56 GiB 163.81 GB)
           Raid Devices : 2
          Total Devices : 2
        Preferred Minor : 1
            Persistence : Superblock is persistent
        
            Update Time : Tue Mar 8 16:38:23 2005
                  State : clean, degraded, recovering
         Active Devices : 1
        Working Devices : 2
         Failed Devices : 0
          Spare Devices : 1
        
        
         Rebuild Status : 15% complete
        
            Number Major Minor RaidDevice State
               0 8 2 0 active sync /dev/sda2
               1 0 0 -1 removed
               2 8 18 1 spare /dev/sdb2
                   UUID : 7000452b:ab133e1d:57d95846:e879ae3d
                 Events : 0.1599973

I'm using 160gig SATA drives so it's going to take a while to rebuild.
This is why I don't recommend any one do this on a production server. If
the good drive were to fail at this point I would loose everything.

Hope this helps.

On Tue, 2005-03-08 at 14:28 -0900, Scott Johnson wrote:
> Anyone on here have much experience with (rebuilding) mdadm arrays?
>
> If so, please contact me. Or, does anyone know a good mailing list
> where I could maybe get some help? One of the discs on my array got
> out a sync, and I *think* I know the command I want to use to rebuild,
> but just wanted to run it past someone before I actually proceed and
> possibly screw myself (by using the wrong command).
>
> Thanks.

---------
To unsubscribe, send email to <aklug-request@aklug.org>
with 'unsubscribe' in the message body.
Received on Tue Mar 8 16:52:51 2005

This archive was generated by hypermail 2.1.8 : Tue Mar 08 2005 - 16:52:51 AKST