[PATCH 3.1-rc8 0/2] Two false positives from debugging tools (lockdep, kmemleak)

September 29th, 2011 - 09:50 pm ET by Jonathan Nieder | Report spam
Hi,

Especially during the -rc period, it is nice to be able to take
advantage of lockdep and kmemleak to find bugs. False positives make
it harder to find real problems amid the noise.

At boot time, I get:

| allocated 14680064 bytes of page_cgroup
| please try 'cgroup_disable=memory' option if you don't want memory cgroups
| kmemleak: Trying to color unknown object at 0xffff88006f500000 as Grey
| Pid: 0, comm: swapper Not tainted 3.1.0-rc8-00031-gd5767c53535a #12
| Call Trace:
| [<ffffffff810f8637>] paint_ptr+0x56/0x8e
| [<ffffffff812e336c>] kmemleak_not_leak+0x23/0x42
| [<ffffffff816a00d5>] kmemleak_init+0x1db/0x23a
| [<ffffffff81686ad2>] start_kernel+0x31c/0x3b1
| [<ffffffff816862a8>] x86_64_start_reservations+0xaf/0xb3
| [<ffffffff81686140>] ? early_idt_handlers+0x140/0x140
| [<ffffffff816863ae>] x86_64_start_kernel+0x102/0x111
| kmemleak: Trying to color unknown object at 0xffff88006f600000 as Grey
[and so on, for 13 addresses in all]

Also:

| [ 222.425445] systemd-logind/908 is trying to acquire lock:
| [ 222.425447] (&ep->mtx){+.+.+.}, at: [<ffffffff8112da68>] ep_scan_ready_list+0x3c/0x1a7
| [ 222.425456]
| [ 222.425457] but task is already holding lock:
| [ 222.425459] (&ep->mtx){+.+.+.}, at: [<ffffffff8112dfd8>] sys_epoll_ctl+0x11d/0x520

Luckily both false positives seem to be known and well understood.
The first of the patches below (the kmemleak one) seems to have fallen
under the radar I can't even find it in linux-next. The second (the
lockdep one) is taken from akpm's -mm patchset.

Thoughts?

Nelson Elhage (1):
epoll: fix spurious lockdep warnings

Steven Rostedt (1):
cgroup/kmemleak: Annotate alloc_page() for cgroup allocations

fs/eventpoll.c | 25 ++++++++++++++++++-
mm/page_cgroup.c | 7 +++++--
2 files changed, 23 insertions(+), 9 deletions(-)

1.7.7.rc1

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
email Follow the discussionReplies 1 replyReplies Make a reply

Replies

#1 Jonathan Nieder
September 29th, 2011 - 10:00 pm ET | Report spam
From: Nelson Elhage
Date: Wed, 28 Sep 2011 10:50:51 +1000

epoll can acquire recursively acquire ep->mtx on multiple "struct
eventpoll"s at once in the case where one epoll fd is monitoring another
epoll fd. This is perfectly OK, since we're careful about the lock
ordering, but it causes spurious lockdep warnings. Annotate the recursion
using mutex_lock_nested, and add a comment explaining the nesting rules
for good measure.

Recent versions of systemd are triggering this, and it can also be
demonstrated with the following trivial test program:

int main(void) {
int e1, e2;
struct epoll_event evt = {
.events = EPOLLIN
};

e1 = epoll_create1(0);
e2 = epoll_create1(0);
epoll_ctl(e1, EPOLL_CTL_ADD, e2, &evt);
return 0;
}

Reported-by: Paul Bolle
Tested-by: Paul Bolle
Signed-off-by: Nelson Elhage
Acked-by: Jason Baron
Cc: Dave Jones
Cc: Davide Libenzi
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Jonathan Nieder

Taken from <git://github.com/sfrothwell/linux-next.git> (and I believe
Stephen got it from akpm's quilt queue somehow. The patch seems to
have visited the list at [1] and then gone silent. :) Anyway, it has
also been working well for me.

Thanks for reading.

[1] http://thread.gmane.org/gmane.linux.kernel/1177210

fs/eventpoll.c | 25 ++++++++++++++++++-
1 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index fe047d966dc5..2d1744ab5bcb 100644
a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -70,6 +70,15 @@
* simultaneous inserts (A into B and B into A) from racing and
* constructing a cycle without either insert observing that it is
* going to.
+ * It is necessary to acquire multiple "ep->mtx"es at once in the
+ * case when one epoll fd is added to another. In this case, we
+ * always acquire the locks in the order of nesting (i.e. after
+ * epoll_ctl(e1, EPOLL_CTL_ADD, e2), e1->mtx will always be acquired
+ * before e2->mtx). Since we disallow cycles of epoll file
+ * descriptors, this ensures that the mutexes are well-ordered. In
+ * order to communicate this nesting to lockdep, when walking a tree
+ * of epoll file descriptors, we use the current recursion depth as
+ * the lockdep subkey.
* It is possible to drop the "ep->mtx" and to use the global
* mutex "epmutex" (together with "ep->lock") to have it working,
* but having "ep->mtx" will make the interface more scalable.
@@ -464,13 +473,15 @@ static void ep_unregister_pollwait(struct eventpoll *ep, struct epitem *epi)
* @ep: Pointer to the epoll private data structure.
* @sproc: Pointer to the scan callback.
* @priv: Private opaque data passed to the @sproc callback.
+ * @depth: The current depth of recursive f_op->poll calls.
*
* Returns: The same integer error code returned by the @sproc callback.
*/
static int ep_scan_ready_list(struct eventpoll *ep,
int (*sproc)(struct eventpoll *,
struct list_head *, void *),
- void *priv)
+ void *priv,
+ int depth)
{
int error, pwake = 0;
unsigned long flags;
@@ -481,7 +492,7 @@ static int ep_scan_ready_list(struct eventpoll *ep,
* We need to lock this because we could be hit by
* eventpoll_release_file() and epoll_ctl().
*/
- mutex_lock(&ep->mtx);
+ mutex_lock_nested(&ep->mtx, depth);

/*
* Steal the ready list, and re-init the original one to the
@@ -670,7 +681,7 @@ static int ep_read_events_proc(struct eventpoll *ep, struct list_head *head,

static int ep_poll_readyevents_proc(void *priv, void *cookie, int call_nests)
{
- return ep_scan_ready_list(priv, ep_read_events_proc, NULL);
+ return ep_scan_ready_list(priv, ep_read_events_proc, NULL, call_nests + 1);
}

static unsigned int ep_eventpoll_poll(struct file *file, poll_table *wait)
@@ -737,7 +748,7 @@ void eventpoll_release_file(struct file *file)

ep = epi->ep;
list_del_init(&epi->fllink);
- mutex_lock(&ep->mtx);
+ mutex_lock_nested(&ep->mtx, 0);
ep_remove(ep, epi);
mutex_unlock(&ep->mtx);
}
@@ -1134,7 +1145,7 @@ static int ep_send_events(struct eventpoll *ep,
esed.maxevents = maxevents;
esed.events = events;

- return ep_scan_ready_list(ep, ep_send_events_proc, &esed);
+ return ep_scan_ready_list(ep, ep_send_events_proc, &esed, 0);
}

static inline struct timespec ep_set_mstimeout(long ms)
@@ -1267,7 +1278,7 @@ static int ep_loop_check_proc(void *priv, void *cookie, int call_nests)
struct rb_node *rbp;
struct epitem *epi;

- mutex_lock(&ep->mtx);
+ mutex_lock_nested(&ep->mtx, call_nests + 1);
for (rbp = rb_first(&ep->rbr); rbp; rbp = rb_next(rbp)) {
epi = rb_entry(rbp, struct epitem, rbn);
if (unlikely(is_file_epoll(epi->ffd.file))) {
@@ -1409,7 +1420,7 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
}


- mutex_lock(&ep->mtx);
+ mutex_lock_nested(&ep->mtx, 0);

/*
* Try to lookup the file inside our RB tree, Since we grabbed "mtx"
1.7.7.rc1

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Similar topics