[gentoo-user] Oh no! My raid5 is not assembled, and how I fixed it.

June 09th, 2012 - 01:10 am ET by Paul Hartman | Report spam
I rebooted to upgrade to kernel 3.4.1. I accidentally had the
combination of uvesafb, nouveau kms and nvidia-drivers enabled, which
caused my system to go blank after rebooting. I was not able to SSH
into the machine, so I did the magic-sysrq REISUB to reboot into my
previous kernel. When it booted into the previous kernel (3.3.5), I
saw a whole bunch of "I/O error" messages scrolling by, for every disk
in my RAID array. I have never seen these errors before. I hoped it
was just some module confusion because I was booting a different
kernel. I was able to boot into my root filesystem, but the raid did
not assemble. After blacklisting nouveau and rebooting into 3.4.1,
there were none of the I/O errors mentioned, but mdraid failed with
this message:

* Starting up RAID devices ...
* mdadm main: failed to get exclusive lock on mapfile
mdadm: /dev/md2 is already in use.
mdadm: /dev/md1 is already in use.
[ !! ]

Oh no! Heart beating quickly... terabytes of data... Google finds
nothing useful with these messages.

My mdadm.conf has not changed, no physical disks have been added or
removed in over a year. mdadm configuration has not changed at all. I
have of course updated hundreds of packages since my last reboot,
including mdadm.

From the /proc/mdstat it shows that it's not detecting all of the
member disks/partitions:

Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5]
[raid4] [multipath] [faulty]
md1 : inactive sdb1[0](S)
1048575868 blocks super 1.1

md2 : inactive sdf2[5](S)
904938415 blocks super 1.1

unused devices: <none>


Those normally included all disks in sdb through sdf, partition 1 and
2 from each disk.

My mdadm.conf has always had only two ARRAY lines (for /dev/md1 and
/dev/md2) with the UUID of the arrays. Previously the member disks
were always automatically detected and assembled when I booted and
started mdadm. Running mdadm --query --examine on the partitions
showed they did still contain the valid raid information. So I felt
confident in trying to reassemble it.

To fix, I did:

/etc/init.d/mdraid stop

to stop the array (could have also done "mdadm -Ss", which is what the
stop script did)

Then I edited mdadm.conf and added a device line:

DEVICE /dev/sd[bcdef][12]

So now I am telling it specifically where to look. I then restarted mdraid:

/etc/init.d/mdraid start

et voilà! my raid was back and functioning. I don't know if this is a
result of a change in kernel or mdadm behavior, or simply a result of
my REISUB that left the raid in a strange state.
email Follow the discussionReplies 3 repliesReplies Make a reply

Replies

#1 Walter Dnes
June 09th, 2012 - 10:30 am ET | Report spam
Are you using sys-fs/mdadm-3.2.4 or sys-fs/mdadm-3.2.5? If so see
http://www.gossamer-threads.com/lis...dev/255107 (Gentoo Dev
list) and bug https://bugs.gentoo.org/show_bug.cgi?idA6081
Walter Dnes

Similar topics