[PATCH][RFC] acpi: Prevent scheduling while atomic warning in early boot

October 12th, 2011 - 02:50 pm ET by Steven Rostedt | Report spam
I hit the following bug:

[ 0.109053] ACPI: Core revision 20110623
[ 0.116130] BUG: scheduling while atomic: swapper/0/0x10000002
[ 0.120017] no locks held by swapper/0.
[ 0.124016] Modules linked in:
[ 0.128199] Pid: 0, comm: swapper Not tainted 3.1.0-rc1-test-00020-gd696b58 #5
[ 0.132016] Call Trace:
[ 0.134455] [<ffffffff83ec3be1>] __schedule_bug+0xb4/0xc0
[ 0.140038] [<ffffffff83fdc782>] schedule+0xaf/0x772
[ 0.144022] [<ffffffff810d3d63>] __cond_resched+0x2f/0x41
[ 0.148020] [<ffffffff83fdd197>] _cond_resched+0x2e/0x3e
[ 0.152022] [<ffffffff81b82bff>] acpi_ps_complete_op+0x5b5/0x5f8^M
[ 0.156021] [<ffffffff81b83a41>] acpi_ps_parse_loop+0x5e4/0x69f
[ 0.160021] [<ffffffff81b8b28c>] ? acpi_ut_exit+0x3a/0x49
[ 0.164021] [<ffffffff81b81a88>] acpi_ps_parse_aml+0x1db/0x5fb^M
[ 0.168021] [<ffffffff81b7e1a4>] acpi_ns_one_complete_parse+0x2a8/0x2fa
[ 0.172021] [<ffffffff81b7e279>] acpi_ns_parse_table+0x83/0x17a^M
[ 0.176023] [<ffffffff81b76ed8>] acpi_ns_load_table+0x104/0x234
[ 0.180022] [<ffffffff81b881b8>] acpi_tb_load_namespace+0x110/0x2a0
[ 0.184022] [<ffffffff81b88384>] acpi_load_tables+0x3c/0xa1
[ 0.188024] [<ffffffff8653c21e>] acpi_early_init+0xca/0x1a7
[ 0.192023] [<ffffffff864de1c7>] start_kernel+0x6b2/0x6ea
[ 0.196023] [<ffffffff864dd3dd>] x86_64_start_reservations+0xf5/0x100^M
[ 0.200023] [<ffffffff864dd140>] ? early_idt_handlers+0x140/0x140
[ 0.204023] [<ffffffff864dd521>] x86_64_start_kernel+0x139/0x14f


The commit 0a7992c90828a65 acpi: fix bogus preemption logic

tried again to fix the preempt logic by encapsulating the
ACPI_PREEMPTION_POINT() with a #ifndef CONFIG_PREEMPT and only testing
irqsoff. But when CONFIG_PREEMPT=n and CONFIG_DEBUG_ATOMIC_SLEEP=y, the
preempt count is still active. This code is called at boot up when
preemption is still disabled triggering the above dump.

Ideally, in_atomic() should not be used in general code, but I'm not
sure what should be used. This does silent the warning, and it should
not be an issue while it is still encapsulated in #ifndef CONFIG_PREEMPT

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

diff --git a/include/acpi/platform/aclinux.h b/include/acpi/platform/aclinux.h
index f72403c..1730ff8 100644
a/include/acpi/platform/aclinux.h
+++ b/include/acpi/platform/aclinux.h
@@ -59,6 +59,7 @@
#include <linux/ctype.h>
#include <linux/sched.h>
#include <asm/system.h>
+#include <linux/hardirq.h>
#include <linux/atomic.h>
#include <asm/div64.h>
#include <asm/acpi.h>
@@ -151,10 +152,14 @@ static inline void *acpi_os_acquire_object(acpi_cache_t * cache)
/*
* Used within ACPICA to show where it is safe to preempt execution
* when CONFIG_PREEMPT=n
+ *
+ * Note we still test for !in_atomic() in case CONFIG_DEBUG_ATOMIC_SLEEP
+ * is set. In that case, preempt_count is still updated and scheduling
+ * here will cause a warning in early boot.
*/
#define ACPI_PREEMPTION_POINT() \
do { \
- if (!irqs_disabled()) \
+ if (!irqs_disabled() && !in_atomic()) \
cond_resched(); \
} while (0)
#endif


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
email Follow the discussionReplies 1 replyReplies Make a reply

Replies

#1 Andrew Morton
October 12th, 2011 - 04:10 pm ET | Report spam
On Wed, 12 Oct 2011 14:49:02 -0400
Steven Rostedt wrote:

I hit the following bug:

...

The commit 0a7992c90828a65 acpi: fix bogus preemption logic

tried again to fix the preempt logic by encapsulating the
ACPI_PREEMPTION_POINT() with a #ifndef CONFIG_PREEMPT and only testing
irqsoff. But when CONFIG_PREEMPT=n and CONFIG_DEBUG_ATOMIC_SLEEP=y, the
preempt count is still active. This code is called at boot up when
preemption is still disabled triggering the above dump.

Ideally, in_atomic() should not be used in general code, but I'm not
sure what should be used. This does silent the warning, and it should
not be an issue while it is still encapsulated in #ifndef CONFIG_PREEMPT

a/include/acpi/platform/aclinux.h
+++ b/include/acpi/platform/aclinux.h
@@ -59,6 +59,7 @@
#include <linux/ctype.h>
#include <linux/sched.h>
#include <asm/system.h>
+#include <linux/hardirq.h>
#include <linux/atomic.h>
#include <asm/div64.h>
#include <asm/acpi.h>
@@ -151,10 +152,14 @@ static inline void *acpi_os_acquire_object(acpi_cache_t * cache)
/*
* Used within ACPICA to show where it is safe to preempt execution
* when CONFIG_PREEMPT=n
+ *
+ * Note we still test for !in_atomic() in case CONFIG_DEBUG_ATOMIC_SLEEP
+ * is set. In that case, preempt_count is still updated and scheduling
+ * here will cause a warning in early boot.
*/
#define ACPI_PREEMPTION_POINT() \
do { \
- if (!irqs_disabled()) \
+ if (!irqs_disabled() && !in_atomic()) \
cond_resched(); \
} while (0)
#endif



This macro *cannot be implemented correctly*.

If CONFIG_PREEMPT=n && CONFIG_PREEMPT_COUNT=n then there is no
indication available anywhere to tell you whether or not it is safe to
call schedule().

Please, delete it. Linux just doesn't work this way. The way the
kernel is designed is that the calling code must know what its own
state is. If you know you hold spinlocks then don't call
cond_resched(). If you know you're not holding spinlocks (or
irq_disable, etc) then it's safe to call cond_resched().


As it stands, ACPI at present might be calling schedule() while holding
spinlocks, which is deadlockable. It cunningly arranges to only
exhibit this buggy behaviour in situations where the kernel's
self-checking code is incapable of reporting on the bug :(

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Similar topics