Soft lockup problem

February 06th, 2012 - 10:50 am ET by Gerard Saraber | Report spam
Greetings everyone,
I've been having a bit of a problem since upgrading to the linux 3.x
series, I have a machine that we're using as a NAS that runs various
rsync processes (mostly at night), lately after a day or two, I will
come in in the morning to a load average of 49, but the machine not
really doing anything, when trying to run 'dstat' the command just
hung with no output at all. there were no errors in the logs, or even
anything that would vaguely point at anything I could work with.
So needing to get the machine back to work I attempted to reboot it
"shutdown -r now" on console... it gives a nice message saying it's
going to reboot, but nothing ever happens.. the only way to reboot it
is by using ctrl + alt + sysrq + b. after which the machine reboots
and the raid array comes back clean.

I'm not sure how to troubleshoot this, any pointers would be appreciated.

I'm compiling 3.2.4 at the moment and found a bunch of possibly useful
options in the kernel debugging section:
detect hard/soft lockups and detect hung tasks, maybe it'll give me
something more to go on.

Some details about the machine:
Linux xenbox 3.2.2 #1 SMP Sun Jan 29 10:28:22 CST 2012 x86_64 Intel(R)
Xeon(R) CPU 5140 @ 2.33GHz GenuineIntel GNU/Linux
It has 3 software raid arrays (2 x 5 drives and 1 x 4 drives) LVM'ed
together into a 23TB XFS filesystem.
6GB memory and a pair of Intel Gigabit ethernet controllers bonded together.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
email Follow the discussionReplies 3 repliesReplies Make a reply

Replies

#1 Gerard Saraber
February 06th, 2012 - 12:40 pm ET | Report spam
Also totally forgot, possibly quite important information, most of the
disks are connected via a pair of LSI controllers:

08:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E
PCI-Express Fusion-MPT SAS (rev 08)
0a:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS
2008 [Falcon] (rev 03)

a couple disks are connected with the onboard Intel SATA controller..

-Gerard Saraber

On Mon, Feb 6, 2012 at 9:40 AM, Gerard Saraber wrote:
Greetings everyone,
I've been having a bit of a problem since upgrading to the linux 3.x
series, I have a machine that we're using as a NAS that runs various
rsync processes (mostly at night), lately after a day or two, I will
come in in the morning to a load average of 49, but the machine not
really doing anything, when trying to run 'dstat' the command just
hung with no output at all. there were no errors in the logs, or even
anything that would vaguely point at anything I could work with.
So needing to get the machine back to work I attempted to reboot it
"shutdown -r now" on console... it gives a nice message saying it's
going to reboot, but nothing ever happens.. the only way to reboot it
is by using ctrl + alt + sysrq + b. after which the machine reboots
and the raid array comes back clean.

I'm not sure how to troubleshoot this, any pointers would be appreciated.

I'm compiling 3.2.4 at the moment and found a bunch of possibly useful
options in the kernel debugging section:
detect hard/soft lockups and detect hung tasks, maybe it'll give me
something more to go on.

Some details about the machine:
Linux xenbox 3.2.2 #1 SMP Sun Jan 29 10:28:22 CST 2012 x86_64 Intel(R)
Xeon(R) CPU 5140 @ 2.33GHz GenuineIntel GNU/Linux
It has 3 software raid arrays (2 x 5 drives and 1 x 4 drives) LVM'ed
together into a 23TB XFS filesystem.
6GB memory and a pair of Intel Gigabit ethernet controllers bonded together.


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Similar topics