I rebooted to upgrade to kernel 3.4.1. I accidentally had the
combination of uvesafb, nouveau kms and nvidia-drivers enabled, which
caused my system to go blank after rebooting. I was not able to SSH
into the machine, so I did the magic-sysrq REISUB to reboot into my
previous kernel. When it booted into the previous kernel (3.3.5), I
saw a whole bunch of "I/O error" messages scrolling by, for every disk
in my RAID array. I have never seen these errors before. I hoped it
was just some module confusion because I was booting a different
kernel. I was able to boot into my root filesystem, but the raid did
not assemble. After blacklisting nouveau and rebooting into 3.4.1,
there were none of the I/O errors mentioned, but mdraid failed with
* Starting up RAID devices ...
* mdadm main: failed to get exclusive lock on mapfile
mdadm: /dev/md2 is already in use.
mdadm: /dev/md1 is already in use.
[ !! ]
Oh no! Heart beating quickly... terabytes of data... Google finds
nothing useful with these messages.
My mdadm.conf has not changed, no physical disks have been added or
removed in over a year. mdadm configuration has not changed at all. I
have of course updated hundreds of packages since my last reboot,
From the /proc/mdstat it shows that it's not detecting all of the
md2 : inactive sdf2(S)
904938415 blocks super 1.1
unused devices: <none>
Those normally included all disks in sdb through sdf, partition 1 and
2 from each disk.
My mdadm.conf has always had only two ARRAY lines (for /dev/md1 and
/dev/md2) with the UUID of the arrays. Previously the member disks
were always automatically detected and assembled when I booted and
started mdadm. Running mdadm --query --examine on the partitions
showed they did still contain the valid raid information. So I felt
confident in trying to reassemble it.
To fix, I did:
to stop the array (could have also done "mdadm -Ss", which is what the
stop script did)
Then I edited mdadm.conf and added a device line:
So now I am telling it specifically where to look. I then restarted mdraid:
et voilà! my raid was back and functioning. I don't know if this is a
result of a change in kernel or mdadm behavior, or simply a result of
my REISUB that left the raid in a strange state.