Wednesday, April 18, 2012

RAID1 [UUU] or [UU_] problem

I noticed the other day that my RAID1 array seemed to have messed itself up slightly. I still don't know the reason for this but I have learned a small trick with mdadm.

The array should have two working devices with a hot spare and should look like this :


cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdc1[2](S) sdb1[1] sda1[0]
      104320 blocks [2/2] [UU]
     
md2 : active raid1 sdc2[2](S) sdb2[1] sda2[0]
      244091520 blocks [2/2] [UU]


However, when I checked I had this :

Personalities : [raid1]
md1 : active raid1 sdc1[2] sdb1[1] sda1[0]
104320 blocks [3/3] [UUU]
md2 : active raid1 sdc2(S) sdb2[1] sda2[0]
244091520 blocks [2/2] [UU]

Somehow sdc1 had gone from a hot spare to an extra mirrored partition. I wasn't aware before that you can have as many drives mirrored as you want. What I also didn't know was how to get back to the original configuration.

I tried to fail and remove the drive, but the RAID just showed this :

md1 : active raid1 sdb1[1] sda1[0]
104320 blocks [3/3] [UU_]

Which meant the RAID just thought it had a drive missing. I checked /etc/mdadm.conf and that showed what I expected :


DEVICE partitions
ARRAY /dev/md1 level=raid1 num-devices=2 uuid=8833ba3d:ca592541:20c7be04:42cbbdf1 spares=1
ARRAY /dev/md2 level=raid1 num-devices=2 uuid=43a5b70d:9733da5c:7dd8d970:1e476a26 spares=1

I wondered if it had anything to do with initrd not being sorted - my other server had a similar problem which was to do with having two IDE / PATA drives and then adding a SATA drive as a hot spare. In this instance the initial ram disk didn't know that it had to load a SATA driver so by the time it had the RAID running it was too late to add the drive to the array.

That was sorted with this :

/boot/initrd-*.img | sed 's,.*initrd-\(.*\)\.img,\1,' | while read initrd; do     mkinitrd -f /boot/initrd-$initrd.img $initrd; done

Make sure you have a backup boot option in place in case this goes pear shaped.


I finally plucked up the courage to talk to the gurus on the linux-raid mailing list. They were very helpful and the answer was actually quite simple.

A RAID1 can have as many active drives as you want (even though it is probably pointless). Logical for a RAID5 array for instance, but somewhat illogical for a RAID1.

The confusing part is there is no 'shrink' option in mdadm. You have to 'grow' it to shrink it DOWN.

Neil Brown provided this explanation & solution :


You don't want that?  Change it.

   mdadm --grow /dev/md1 --raid-disks=4

now it has 4 devices - though one will be missing.

   mdadm --grow /dev/md1 --raid-disks=2

now it has 2 devices.  Actually that won't work until you mark one of the
devices as failed, so

   mdadm /dev/md1 --fail /dev/sdc1
   mdadm --grow /dev/md1 --raid-disks=2

I had already failed and removed the drive but sure enough growing it down to two drives did the trick. I just added the drive back to the array and all was well with the world once again.

Hope that helps someone somewhere !

No comments:

Post a Comment