[PATCH 0/14] uprobes: acked patches

July 29th, 2012 - 02:30 pm ET by Oleg Nesterov | Report spam
Hello.

Re-send. No changes, just added the acks from Srikar.

"uprobes: mmap_region() corrupts mm->mm_rb if uprobe_mmap() fails"
I sent yesterday doesn't depend on this series, it can go ahead
to fix the serious problem.

Oleg.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
email Follow the discussionReplies 9 repliesReplies Make a reply

Similar topics

Replies

#1 Oleg Nesterov
July 29th, 2012 - 02:30 pm ET | Report spam
1. vma_address() returns loff_t, this looks confusing and this is
unnecessary after the previous change. Make it return "ulong",
all callers truncate the result anyway.

2. Its name conflicts with mm/rmap.c:vma_address(), rename it to
offset_to_vaddr(), this matches vaddr_to_offset().

Signed-off-by: Oleg Nesterov
Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com>

kernel/events/uprobes.c | 15 +++++-
1 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 734e199..5c8ceaf 100644
a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -112,14 +112,9 @@ static bool valid_vma(struct vm_area_struct *vma, bool is_register)
return false;
}

-static loff_t vma_address(struct vm_area_struct *vma, loff_t offset)
+static unsigned long offset_to_vaddr(struct vm_area_struct *vma, loff_t offset)
{
- loff_t vaddr;
-
- vaddr = vma->vm_start + offset;
- vaddr -= (loff_t)vma->vm_pgoff << PAGE_SHIFT;
-
- return vaddr;
+ return vma->vm_start + offset - ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
}

static loff_t vaddr_to_offset(struct vm_area_struct *vma, unsigned long vaddr)
@@ -775,7 +770,7 @@ build_map_info(struct address_space *mapping, loff_t offset, bool is_register)
curr = info;

info->mm = vma->vm_mm;
- info->vaddr = vma_address(vma, offset);
+ info->vaddr = offset_to_vaddr(vma, offset);
}
mutex_unlock(&mapping->i_mmap_mutex);

@@ -1042,7 +1037,7 @@ int uprobe_mmap(struct vm_area_struct *vma)

list_for_each_entry_safe(uprobe, u, &tmp_list, pending_list) {
if (!ret) {
- loff_t vaddr = vma_address(vma, uprobe->offset);
+ unsigned long vaddr = offset_to_vaddr(vma, uprobe->offset);

ret = install_breakpoint(uprobe, vma->vm_mm, vma, vaddr);
/*
@@ -1103,7 +1098,7 @@ void uprobe_munmap(struct vm_area_struct *vma, unsigned long start, unsigned lon
build_probe_list(inode, vma, start, end, &tmp_list);

list_for_each_entry_safe(uprobe, u, &tmp_list, pending_list) {
- loff_t vaddr = vma_address(vma, uprobe->offset);
+ unsigned long vaddr = offset_to_vaddr(vma, uprobe->offset);
/*
* An unregister could have removed the probe before
* unmap. So check before we decrement the count.
1.5.5.1

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
#2 Oleg Nesterov
July 29th, 2012 - 02:30 pm ET | Report spam
Currently build_probe_list() builds the list of all uprobes attached
to the given inode, and the caller should filter out those who don't
fall into the [start,end) range, this is sub-optimal.

This patch turns find_least_offset_node() into find_node_in_range()
which returns the first node inside the [min,max] range, and changes
build_probe_list() to use this node as a starting point for rb_prev()
and rb_next() to find all other nodes the caller needs. The resulting
list is no longer sorted but we do not care.

This can speed up both build_probe_list() and the callers, but there
is another reason to introduce find_node_in_range(). It can be used
to figure out whether the given vma has uprobes or not, this will be
needed soon.

While at it, shift INIT_LIST_HEAD(tmp_list) into build_probe_list().

Signed-off-by: Oleg Nesterov
Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com>

kernel/events/uprobes.c | 103 +++++++++++++++++++++++
1 files changed, 50 insertions(+), 53 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 6194edb..c825404 100644
a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -939,59 +939,66 @@ void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consume
put_uprobe(uprobe);
}

-/*
- * Of all the nodes that correspond to the given inode, return the node
- * with the least offset.
- */
-static struct rb_node *find_least_offset_node(struct inode *inode)
+static struct rb_node *
+find_node_in_range(struct inode *inode, loff_t min, loff_t max)
{
- struct uprobe u = { .inode = inode, .offset = 0};
struct rb_node *n = uprobes_tree.rb_node;
- struct rb_node *close_node = NULL;
- struct uprobe *uprobe;
- int match;

while (n) {
- uprobe = rb_entry(n, struct uprobe, rb_node);
- match = match_uprobe(&u, uprobe);
-
- if (uprobe->inode == inode)
- close_node = n;
+ struct uprobe *u = rb_entry(n, struct uprobe, rb_node);

- if (!match)
- return close_node;
-
- if (match < 0)
+ if (inode < u->inode) {
n = n->rb_left;
- else
+ } else if (inode > u->inode) {
n = n->rb_right;
+ } else {
+ if (max < u->offset)
+ n = n->rb_left;
+ else if (min > u->offset)
+ n = n->rb_right;
+ else
+ break;
+ }
}

- return close_node;
+ return n;
}

/*
- * For a given inode, build a list of probes that need to be inserted.
+ * For a given range in vma, build a list of probes that need to be inserted.
*/
-static void build_probe_list(struct inode *inode, struct list_head *head)
+static void build_probe_list(struct inode *inode,
+ struct vm_area_struct *vma,
+ unsigned long start, unsigned long end,
+ struct list_head *head)
{
- struct uprobe *uprobe;
+ loff_t min, max;
unsigned long flags;
- struct rb_node *n;
-
- spin_lock_irqsave(&uprobes_treelock, flags);
-
- n = find_least_offset_node(inode);
+ struct rb_node *n, *t;
+ struct uprobe *u;

- for (; n; n = rb_next(n)) {
- uprobe = rb_entry(n, struct uprobe, rb_node);
- if (uprobe->inode != inode)
- break;
+ INIT_LIST_HEAD(head);
+ min = ((loff_t)vma->vm_pgoff << PAGE_SHIFT) + start - vma->vm_start;
+ max = min + (end - start) - 1;

- list_add(&uprobe->pending_list, head);
- atomic_inc(&uprobe->ref);
+ spin_lock_irqsave(&uprobes_treelock, flags);
+ n = find_node_in_range(inode, min, max);
+ if (n) {
+ for (t = n; t; t = rb_prev(t)) {
+ u = rb_entry(t, struct uprobe, rb_node);
+ if (u->inode != inode || u->offset < min)
+ break;
+ list_add(&u->pending_list, head);
+ atomic_inc(&u->ref);
+ }
+ for (t = n; (t = rb_next(t)); ) {
+ u = rb_entry(t, struct uprobe, rb_node);
+ if (u->inode != inode || u->offset > max)
+ break;
+ list_add(&u->pending_list, head);
+ atomic_inc(&u->ref);
+ }
}
-
spin_unlock_irqrestore(&uprobes_treelock, flags);
}

@@ -1021,9 +1028,8 @@ int uprobe_mmap(struct vm_area_struct *vma)
if (!inode)
return 0;

- INIT_LIST_HEAD(&tmp_list);
mutex_lock(uprobes_mmap_hash(inode));
- build_probe_list(inode, &tmp_list);
+ build_probe_list(inode, vma, vma->vm_start, vma->vm_end, &tmp_list);

ret = 0;
count = 0;
@@ -1032,11 +1038,6 @@ int uprobe_mmap(struct vm_area_struct *vma)
if (!ret) {
loff_t vaddr = vma_address(vma, uprobe->offset);

- if (vaddr < vma->vm_start || vaddr >= vma->vm_end) {
- put_uprobe(uprobe);
- continue;
- }
-
ret = install_breakpoint(uprobe, vma->vm_mm, vma, vaddr);
/*
* We can race against uprobe_register(), see the
@@ -1092,21 +1093,17 @@ void uprobe_munmap(struct vm_area_struct *vma, unsigned long start, unsigned lon
if (!inode)
return;

- INIT_LIST_HEAD(&tmp_list);
mutex_lock(uprobes_mmap_hash(inode));
- build_probe_list(inode, &tmp_list);
+ build_probe_list(inode, vma, start, end, &tmp_list);

list_for_each_entry_safe(uprobe, u, &tmp_list, pending_list) {
loff_t vaddr = vma_address(vma, uprobe->offset);
-
- if (vaddr >= start && vaddr < end) {
- /*
- * An unregister could have removed the probe before
- * unmap. So check before we decrement the count.
- */
- if (is_swbp_at_addr(vma->vm_mm, vaddr) == 1)
- atomic_dec(&vma->vm_mm->uprobes_state.count);
- }
+ /*
+ * An unregister could have removed the probe before
+ * unmap. So check before we decrement the count.
+ */
+ if (is_swbp_at_addr(vma->vm_mm, vaddr) == 1)
+ atomic_dec(&vma->vm_mm->uprobes_state.count);
put_uprobe(uprobe);
}
mutex_unlock(uprobes_mmap_hash(inode));
1.5.5.1

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
#3 Oleg Nesterov
July 29th, 2012 - 02:30 pm ET | Report spam
Kill insert_vm_struct()->uprobe_mmap(). It is not needed, nobody
except arch/ia64/kernel/perfmon.c uses insert_vm_struct(vma) with
vma->vm_file != NULL.

And it is wrong. Again, get_user_pages() can not succeed before
vma_link(vma) makes is visible to find_vma(). And even if this
worked, we must not insert the new bp before this mapping is
visible to vma_prio_tree_foreach() for uprobe_unregister().

Signed-off-by: Oleg Nesterov
Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com>

mm/mmap.c | 3
1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index e5a4614..4fe2697 100644
a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2345,9 +2345,6 @@ int insert_vm_struct(struct mm_struct * mm, struct vm_area_struct * vma)
security_vm_enough_memory_mm(mm, vma_pages(vma)))
return -ENOMEM;

- if (vma->vm_file && uprobe_mmap(vma))
- return -EINVAL;
-
vma_link(mm, vma, prev, rb_link, rb_parent);
return 0;
}
1.5.5.1

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
#4 Oleg Nesterov
July 29th, 2012 - 02:30 pm ET | Report spam
Kill copy_vma()->uprobe_mmap(new_vma), it is absolutely wrong.

This new_vma was just initialized to represent the new unmapped area,
[vm_start, vm_end) was returned by get_unmapped_area() in the caller.

This means that uprobe_mmap()->get_user_pages() will fail for sure,
simply because find_vma() can never succeed. And I verified that
sys_mremap()->mremap_to() indeed always fails with the wrong ENOMEM
code if [addr, addr+old_len] is probed.

And why this uprobe_mmap() was added? I believe the intent was wrong.
Note that the caller is going to do move_page_tables(), all registered
uprobes are already faulted in, we only change the virtual addresses.

NOTE: However, somehow we need to close the race with uprobe_register()
which relies on map_info->vaddr. This needs another fix I'll try to do
later. Probably we need uprobe_mmap() in move_vma() but we can not do
this right now, this can confuse uprobes_state.counter (which I still
hope we are going to kill).

Signed-off-by: Oleg Nesterov
Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com>

mm/mmap.c | 3
1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 3edfcdf..e5a4614 100644
a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2418,9 +2418,6 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
if (new_vma->vm_file) {
get_file(new_vma->vm_file);

- if (uprobe_mmap(new_vma))
- goto out_free_mempol;
-
if (vma->vm_flags & VM_EXECUTABLE)
added_exe_file_vma(mm);
}
1.5.5.1

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
#5 Oleg Nesterov
July 29th, 2012 - 02:30 pm ET | Report spam
Like do_wp_page(), __replace_page() should do munlock_vma_page()
for the case when the old page still has other !VM_LOCKED mappings.
Unfortunately this needs mm/internal.h.

Also, move put_page() outside of ptl lock. This doesn't really
matter but looks a bit better.

Signed-off-by: Oleg Nesterov
Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com>

kernel/events/uprobes.c | 8 ++++++--
1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 5c8ceaf..940005d 100644
a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -32,6 +32,7 @@
#include <linux/swap.h> /* try_to_free_swap */
#include <linux/ptrace.h> /* user_enable_single_step */
#include <linux/kdebug.h> /* notifier mechanism */
+#include "../../mm/internal.h" /* munlock_vma_page */

#include <linux/uprobes.h>

@@ -141,7 +142,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
pte_t *ptep;
int err;

- /* freeze PageSwapCache() for try_to_free_swap() below */
+ /* For try_to_free_swap() and munlock_vma_page() below */
lock_page(page);

err = -EAGAIN;
@@ -164,9 +165,12 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
page_remove_rmap(page);
if (!page_mapped(page))
try_to_free_swap(page);
- put_page(page);
pte_unmap_unlock(ptep, ptl);

+ if (vma->vm_flags & VM_LOCKED)
+ munlock_vma_page(page);
+ put_page(page);
+
err = 0;
unlock:
unlock_page(page);
1.5.5.1

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
Help Create a new topicNext page Replies Make a reply
Search Make your own search