[PATCH -mm v2 00/11] mm: scalable and unified arch_get_unmapped_area

June 21st, 2012 - 06:00 pm ET by Rik van Riel | Report spam
A long time ago, we decided to limit the number of VMAs per
process to 64k. As it turns out, there actually are programs
using tens of thousands of VMAs.

The linear search in arch_get_unmapped_area and
arch_get_unmapped_area_topdown can be a real issue for
those programs.

This patch series aims to fix the scalability issue by
tracking the size of each free hole in the VMA rbtree,
propagating the free hole info up the tree.

Another major goal is to put the bulk of the necessary
arch_get_unmapped_area(_topdown) functionality into one
set of functions, so we can eliminate the custom large
functions per architecture, sticking to a few much smaller
architecture specific functions instead.

In this version I have only gotten rid of the x86, ARM, SH
and MIPS arch-specific code, and am already showing a
fairly promising diffstat:

arch/arm/include/asm/pgtable.h | 6
arch/arm/mm/init.c | 4
arch/arm/mm/mmap.c | 217
arch/mips/include/asm/page.h | 2
arch/mips/include/asm/pgtable.h | 7
arch/mips/mm/mmap.c | 177 --
arch/sh/include/asm/pgtable.h | 4
arch/sh/mm/mmap.c | 219
arch/x86/include/asm/elf.h | 3
arch/x86/include/asm/pgtable_64.h | 4
arch/x86/kernel/sys_x86_64.c | 200 ++--
arch/x86/vdso/vma.c | 2
include/linux/mm_types.h | 19 +
include/linux/rbtree.h | 12 +
include/linux/sched.h | 13 +
lib/rbtree.c | 46 +++
mm/internal.h | 5
mm/mmap.c | 449 +++++++++++++++++++++++++++++
18 files changed, 478 insertions(+), 911 deletions(-)

v2: address reviewers' comments
optimize propagating info up the VMA tree (30% faster at frag test)
add SH architecture

TODO:
- eliminate arch-specific functions for more architectures
- integrate hugetlbfs alignment (with Andi Kleen's patch?)

Performance

Testing performance with a benchmark that allocates tens
of thousands of VMAs, unmaps them and mmaps them some more
in a loop, shows promising results.

Vanilla 3.4 kernel:
$ ./agua_frag_test_64
..

Min Time (ms): 6
Avg. Time (ms): 294.0000
Max Time (ms): 609
Std Dev (ms): 113.1664
Standard deviation exceeds 10

With -v2 patches:
$ ./agua_frag_test_64
..

Min Time (ms): 12
Avg. Time (ms): 31.0000
Max Time (ms): 42
Std Dev (ms): 3.3648
All checks pass

The total run time of the test goes down by about a
factor 5. More importantly, the worst case performance
of the loop (which is what really hurt some applications)
has gone down by about a factor 14.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
email Follow the discussionReplies 36 repliesReplies Make a reply

Similar topics

Replies

#1 Rik van Riel
June 21st, 2012 - 06:00 pm ET | Report spam
Fix the x86-64 cache alignment code to take pgoff into account.
Use the x86 and MIPS cache alignment code as the basis for a generic
cache alignment function.

Teach the generic arch_get_unmapped_area(_topdown) code to call the
cache alignment code.

Make sure that ALIGN_DOWN always aligns down, and ends up at the
right page colour.

The old x86 code will always align the mmap to aliasing boundaries,
even if the program mmaps the file with a non-zero pgoff.

If program A mmaps the file with pgoff 0, and program B mmaps the
file with pgoff 1. The old code would align the mmaps, resulting in
misaligned pages:

A: 0123
B: 123

After this patch, they are aligned so the pages line up:

A: 0123
B: 123

Signed-off-by: Rik van Riel

arch/mips/include/asm/page.h | 2 -
arch/mips/include/asm/pgtable.h | 1 +
arch/x86/include/asm/elf.h | 3 -
arch/x86/include/asm/pgtable_64.h | 1 +
arch/x86/kernel/sys_x86_64.c | 39 +++++++++++--
arch/x86/vdso/vma.c | 2 +-
include/linux/sched.h | 8 +++-
mm/mmap.c | 91 ++++++++++++++++++++++++++++++++--
8 files changed, 115 insertions(+), 32 deletions(-)

diff --git a/arch/mips/include/asm/page.h b/arch/mips/include/asm/page.h
index da9bd7d..459cc25 100644
a/arch/mips/include/asm/page.h
+++ b/arch/mips/include/asm/page.h
@@ -63,8 +63,6 @@ extern void build_copy_page(void);
extern void clear_page(void * page);
extern void copy_page(void * to, void * from);

-extern unsigned long shm_align_mask;
-
static inline unsigned long pages_do_alias(unsigned long addr1,
unsigned long addr2)
{
diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
index b2202a6..f133a4c 100644
a/arch/mips/include/asm/pgtable.h
+++ b/arch/mips/include/asm/pgtable.h
@@ -415,6 +415,7 @@ int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn,
*/
#define HAVE_ARCH_UNMAPPED_AREA
#define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN
+#define HAVE_ARCH_ALIGN_ADDR

/*
* No page table caches to initialise
diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index 5939f44..dc2d0bf 100644
a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -358,8 +358,6 @@ static inline int mmap_is_ia32(void)
enum align_flags {
ALIGN_VA_32 = BIT(0),
ALIGN_VA_64 = BIT(1),
- ALIGN_VDSO = BIT(2),
- ALIGN_TOPDOWN = BIT(3),
};

struct va_alignment {
@@ -368,5 +366,4 @@ struct va_alignment {
} ____cacheline_aligned;

extern struct va_alignment va_align;
-extern unsigned long align_addr(unsigned long, struct file *, enum align_flags);
#endif /* _ASM_X86_ELF_H */
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index 8af36f6..8408ccd 100644
a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -170,6 +170,7 @@ extern void cleanup_highmap(void);
#define HAVE_ARCH_UNMAPPED_AREA
#define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN
#define HAVE_ARCH_GET_ADDRESS_RANGE
+#define HAVE_ARCH_ALIGN_ADDR

#define pgtable_cache_init() do { } while (0)
#define check_pgt_cache() do { } while (0)
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index 2595a5e..c059c19 100644
a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -25,31 +25,44 @@
* @flags denotes the allocation direction - bottomup or topdown -
* or vDSO; see call sites below.
*/
-unsigned long align_addr(unsigned long addr, struct file *filp,
- enum align_flags flags)
+unsigned long arch_align_addr(unsigned long addr, struct file *filp,
+ unsigned long pgoff, unsigned long flags,
+ enum mmap_allocation_direction direction)
{
- unsigned long tmp_addr;
+ unsigned long tmp_addr = PAGE_ALIGN(addr);

/* handle 32- and 64-bit case with a single conditional */
if (va_align.flags < 0 || !(va_align.flags & (2 - mmap_is_ia32())))
- return addr;
+ return tmp_addr;

- if (!(current->flags & PF_RANDOMIZE))
- return addr;
+ /* Always allow MAP_FIXED. Colouring is a performance thing only. */
+ if (flags & MAP_FIXED)
+ return tmp_addr;

- if (!((flags & ALIGN_VDSO) || filp))
- return addr;
+ if (!(current->flags & PF_RANDOMIZE))
+ return tmp_addr;

- tmp_addr = addr;
+ if (!(filp || direction == ALLOC_VDSO))
+ return tmp_addr;

/*
* We need an address which is <= than the original
* one only when in topdown direction.
*/
- if (!(flags & ALIGN_TOPDOWN))
+ if (direction == ALLOC_UP)
tmp_addr += va_align.mask;

tmp_addr &= ~va_align.mask;
+ tmp_addr += ((pgoff << PAGE_SHIFT) & va_align.mask);
+
+ /*
+ * When aligning down, make sure we did not accidentally go up.
+ * The caller will check for underflow.
+ */
+ if (direction == ALLOC_DOWN && tmp_addr > addr) {
+ tmp_addr -= va_align.mask;
+ tmp_addr &= ~va_align.mask;
+ }

return tmp_addr;
}
@@ -159,7 +172,7 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,

full_search:

- addr = align_addr(addr, filp, 0);
+ addr = arch_align_addr(addr, filp, pgoff, flags, ALLOC_UP);

for (vma = find_vma(mm, addr); ; vma = vma->vm_next) {
/* At this point: (!vma || addr < vma->vm_end). */
@@ -186,7 +199,7 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
mm->cached_hole_size = vma->vm_start - addr;

addr = vma->vm_end;
- addr = align_addr(addr, filp, 0);
+ addr = arch_align_addr(addr, filp, pgoff, flags, ALLOC_UP);
}
}

@@ -235,7 +248,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,

addr -= len;
do {
- addr = align_addr(addr, filp, ALIGN_TOPDOWN);
+ addr = arch_align_addr(addr, filp, pgoff, flags, ALLOC_DOWN);

/*
* Lookup failure means no vma is above this address,
diff --git a/arch/x86/vdso/vma.c b/arch/x86/vdso/vma.c
index 00aaf04..83e0355 100644
a/arch/x86/vdso/vma.c
+++ b/arch/x86/vdso/vma.c
@@ -141,7 +141,7 @@ static unsigned long vdso_addr(unsigned long start, unsigned len)
* unaligned here as a result of stack start randomization.
*/
addr = PAGE_ALIGN(addr);
- addr = align_addr(addr, NULL, ALIGN_VDSO);
+ addr = arch_align_addr(addr, NULL, 0, 0, ALLOC_VDSO);

return addr;
}
diff --git a/include/linux/sched.h b/include/linux/sched.h
index fc76318..18f9326 100644
a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -390,12 +390,18 @@ extern int sysctl_max_map_count;
#ifdef CONFIG_MMU
enum mmap_allocation_direction {
ALLOC_UP,
- ALLOC_DOWN
+ ALLOC_DOWN,
+ ALLOC_VDSO,
};
extern void arch_pick_mmap_layout(struct mm_struct *mm);
extern void
arch_get_address_range(unsigned long flags, unsigned long *begin,
unsigned long *end, enum mmap_allocation_direction direction);
+extern unsigned long shm_align_mask;
+extern unsigned long
+arch_align_addr(unsigned long addr, struct file *filp,
+ unsigned long pgoff, unsigned long flags,
+ enum mmap_allocation_direction direction);
extern unsigned long
arch_get_unmapped_area(struct file *, unsigned long, unsigned long,
unsigned long, unsigned long);
diff --git a/mm/mmap.c b/mm/mmap.c
index 2420951..3da19f8 100644
a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1482,6 +1482,51 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
return error;
}

+#ifndef HAVE_ARCH_ALIGN_ADDR
+/* Each architecture is responsible for setting this to the required value. */
+unsigned long shm_align_mask = PAGE_SIZE - 1;
+EXPORT_SYMBOL(shm_align_mask);
+
+unsigned long arch_align_addr(unsigned long addr, struct file *filp,
+ unsigned long pgoff, unsigned long flags,
+ enum mmap_allocation_direction direction)
+{
+ unsigned long tmp_addr = PAGE_ALIGN(addr);
+
+ if (shm_align_mask <= PAGE_SIZE)
+ return tmp_addr;
+
+ /* Allow MAP_FIXED without MAP_SHARED at any address. */
+ if ((flags & (MAP_FIXED|MAP_SHARED)) == MAP_FIXED)
+ return tmp_addr;
+
+ /* Enforce page colouring for any file or MAP_SHARED mapping. */
+ if (!(filp || (flags & MAP_SHARED)))
+ return tmp_addr;
+
+ /*
+ * We need an address which is <= than the original
+ * one only when in topdown direction.
+ */
+ if (direction == ALLOC_UP)
+ tmp_addr += shm_align_mask;
+
+ tmp_addr &= ~shm_align_mask;
+ tmp_addr += ((pgoff << PAGE_SHIFT) & shm_align_mask);
+
+ /*
+ * When aligning down, make sure we did not accidentally go up.
+ * The caller will check for underflow.
+ */
+ if (direction == ALLOC_DOWN && tmp_addr > addr) {
+ tmp_addr -= shm_align_mask;
+ tmp_addr &= ~shm_align_mask;
+ }
+
+ return tmp_addr;
+}
+#endif
+
#ifndef HAVE_ARCH_GET_ADDRESS_RANGE
void arch_get_address_range(unsigned long flags, unsigned long *begin,
unsigned long *end, enum mmap_allocation_direction direction)
@@ -1515,18 +1560,22 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma = NULL;
struct rb_node *rb_node;
- unsigned long lower_limit, upper_limit;
+ unsigned long lower_limit, upper_limit, tmp_addr;

arch_get_address_range(flags, &lower_limit, &upper_limit, ALLOC_UP);

if (len > TASK_SIZE)
return -ENOMEM;

- if (flags & MAP_FIXED)
+ if (flags & MAP_FIXED) {
+ tmp_addr = arch_align_addr(addr, filp, pgoff, flags, ALLOC_UP);
+ if (tmp_addr != PAGE_ALIGN(addr))
+ return -EINVAL;
return addr;
+ }

if (addr) {
- addr = PAGE_ALIGN(addr);
+ addr = arch_align_addr(addr, filp, pgoff, flags, ALLOC_UP);
vma = find_vma(mm, addr);
if (TASK_SIZE - len >= addr &&
(!vma || addr + len <= vma->vm_start))
@@ -1535,7 +1584,7 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,

/* Find the left-most free area of sufficient size. */
for (addr = 0, rb_node = mm->mm_rb.rb_node; rb_node; ) {
- unsigned long vma_start;
+ unsigned long vma_start, tmp_addr;
bool found_here = false;

vma = rb_to_vma(rb_node);
@@ -1543,13 +1592,17 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
if (vma->vm_start > len) {
if (!vma->vm_prev) {
/* This is the left-most VMA. */
- if (vma->vm_start - len >= lower_limit) {
- addr = lower_limit;
+ tmp_addr = arch_align_addr(lower_limit, filp,
+ pgoff, flags, ALLOC_UP);
+ if (vma->vm_start - len >= tmp_addr) {
+ addr = tmp_addr;
goto found_addr;
}
} else {
/* Is this gap large enough? Remember it. */
vma_start = max(vma->vm_prev->vm_end, lower_limit);
+ vma_start = arch_align_addr(vma_start, filp,
+ pgoff, flags, ALLOC_UP);
if (vma->vm_start - len >= vma_start) {
addr = vma_start;
found_here = true;
@@ -1601,6 +1654,8 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
if (addr < lower_limit)
addr = lower_limit;

+ addr = arch_align_addr(addr, filp, pgoff, flags, ALLOC_UP);
+
found_addr:
if (TASK_SIZE - len < addr)
return -ENOMEM;
@@ -1643,12 +1698,17 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
if (len > TASK_SIZE)
return -ENOMEM;

- if (flags & MAP_FIXED)
+ if (flags & MAP_FIXED) {
+ unsigned long tmp_addr;
+ tmp_addr = arch_align_addr(addr, filp, pgoff, flags, ALLOC_DOWN);
+ if (tmp_addr != PAGE_ALIGN(addr))
+ return -EINVAL;
return addr;
+ }

/* requesting a specific address */
if (addr) {
- addr = PAGE_ALIGN(addr);
+ addr = arch_align_addr(addr, filp, pgoff, flags, ALLOC_DOWN);
vma = find_vma(mm, addr);
if (TASK_SIZE - len >= addr &&
(!vma || addr + len <= vma->vm_start))
@@ -1665,7 +1725,9 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
*/
if (upper_limit - len > mm->highest_vm_end) {
addr = upper_limit - len;
- goto found_addr;
+ addr = arch_align_addr(addr, filp, pgoff, flags, ALLOC_DOWN);
+ if (addr >= mm->highest_vm_end);
+ goto found_addr;
}

/* Find the right-most free area of sufficient size. */
@@ -1678,9 +1740,14 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
/* Is this gap large enough? Remember it. */
vma_start = min(vma->vm_start, upper_limit);
if (vma_start > len) {
- if (!vma->vm_prev ||
- (vma_start - len >= vma->vm_prev->vm_end)) {
- addr = vma_start - len;
+ unsigned long tmp_addr = vma_start - len;
+ tmp_addr = arch_align_addr(tmp_addr, filp,
+ pgoff, flags, ALLOC_DOWN);
+ /* No underflow? Does it still fit the hole? */
+ if (tmp_addr && tmp_addr <= vma_start - len &&
+ (!vma->vm_prev ||
+ tmp_addr >= vma->vm_prev->vm_end)) {
+ addr = tmp_addr;
found_here = true;
}
}
1.7.7.6

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
#2 Rik van Riel
June 21st, 2012 - 06:00 pm ET | Report spam
Remove the ARM special variants of arch_get_unmapped_area since the
generic functions should now be able to handle everything.

ARM only needs page colouring if cache_is_vipt_aliasing; leave
shm_align_mask at PAGE_SIZE-1 unless we need colouring.

Untested because I have no ARM hardware.

Cc: Russell King
Signed-off-by: Rik van Riel

arch/arm/include/asm/pgtable.h | 6 -
arch/arm/mm/init.c | 4 +
arch/arm/mm/mmap.c | 217 -
3 files changed, 4 insertions(+), 223 deletions(-)

diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index f66626d..6754183 100644
a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -296,12 +296,6 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
#include <asm-generic/pgtable.h>

/*
- * We provide our own arch_get_unmapped_area to cope with VIPT caches.
- */
-#define HAVE_ARCH_UNMAPPED_AREA
-#define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN
-
-/*
* remap a physical page `pfn' of size `size' with page protection `prot'
* into virtual address `from'
*/
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index f54d592..abaf862 100644
a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -600,6 +600,10 @@ void __init mem_init(void)
extern u32 itcm_end;
#endif

+ /* Tell the page colouring code what we need. */
+ if (cache_is_vipt_aliasing())
+ shm_align_mask = SHMLBA - 1;
+
max_mapnr = pfn_to_page(max_pfn + PHYS_PFN_OFFSET) - mem_map;

/* this will put all unused low memory onto the freelists */
diff --git a/arch/arm/mm/mmap.c b/arch/arm/mm/mmap.c
index ce8cb19..d90969b 100644
a/arch/arm/mm/mmap.c
+++ b/arch/arm/mm/mmap.c
@@ -11,22 +11,6 @@
#include <linux/random.h>
#include <asm/cachetype.h>

-static inline unsigned long COLOUR_ALIGN_DOWN(unsigned long addr,
- unsigned long pgoff)
-{
- unsigned long base = addr & ~(SHMLBA-1);
- unsigned long off = (pgoff << PAGE_SHIFT) & (SHMLBA-1);
-
- if (base + off <= addr)
- return base + off;
-
- return base - off;
-}
-
-#define COLOUR_ALIGN(addr,pgoff) \
- ((((addr)+SHMLBA-1)&~(SHMLBA-1)) + \
- (((pgoff)<<PAGE_SHIFT) & (SHMLBA-1)))
-
/* gap between mmap and stack */
#define MIN_GAP (128*1024*1024UL)
#define MAX_GAP ((TASK_SIZE)/6*5)
@@ -54,207 +38,6 @@ static unsigned long mmap_base(unsigned long rnd)
return PAGE_ALIGN(TASK_SIZE - gap - rnd);
}

-/*
- * We need to ensure that shared mappings are correctly aligned to
- * avoid aliasing issues with VIPT caches. We need to ensure that
- * a specific page of an object is always mapped at a multiple of
- * SHMLBA bytes.
- *
- * We unconditionally provide this function for all cases, however
- * in the VIVT case, we optimise out the alignment rules.
- */
-unsigned long
-arch_get_unmapped_area(struct file *filp, unsigned long addr,
- unsigned long len, unsigned long pgoff, unsigned long flags)
-{
- struct mm_struct *mm = current->mm;
- struct vm_area_struct *vma;
- unsigned long start_addr;
- int do_align = 0;
- int aliasing = cache_is_vipt_aliasing();
-
- /*
- * We only need to do colour alignment if either the I or D
- * caches alias.
- */
- if (aliasing)
- do_align = filp || (flags & MAP_SHARED);
-
- /*
- * We enforce the MAP_FIXED case.
- */
- if (flags & MAP_FIXED) {
- if (aliasing && flags & MAP_SHARED &&
- (addr - (pgoff << PAGE_SHIFT)) & (SHMLBA - 1))
- return -EINVAL;
- return addr;
- }
-
- if (len > TASK_SIZE)
- return -ENOMEM;
-
- if (addr) {
- if (do_align)
- addr = COLOUR_ALIGN(addr, pgoff);
- else
- addr = PAGE_ALIGN(addr);
-
- vma = find_vma(mm, addr);
- if (TASK_SIZE - len >= addr &&
- (!vma || addr + len <= vma->vm_start))
- return addr;
- }
- if (len > mm->cached_hole_size) {
- start_addr = addr = mm->free_area_cache;
- } else {
- start_addr = addr = mm->mmap_base;
- mm->cached_hole_size = 0;
- }
-
-full_search:
- if (do_align)
- addr = COLOUR_ALIGN(addr, pgoff);
- else
- addr = PAGE_ALIGN(addr);
-
- for (vma = find_vma(mm, addr); ; vma = vma->vm_next) {
- /* At this point: (!vma || addr < vma->vm_end). */
- if (TASK_SIZE - len < addr) {
- /*
- * Start a new search - just in case we missed
- * some holes.
- */
- if (start_addr != TASK_UNMAPPED_BASE) {
- start_addr = addr = TASK_UNMAPPED_BASE;
- mm->cached_hole_size = 0;
- goto full_search;
- }
- return -ENOMEM;
- }
- if (!vma || addr + len <= vma->vm_start) {
- /*
- * Remember the place where we stopped the search:
- */
- mm->free_area_cache = addr + len;
- return addr;
- }
- if (addr + mm->cached_hole_size < vma->vm_start)
- mm->cached_hole_size = vma->vm_start - addr;
- addr = vma->vm_end;
- if (do_align)
- addr = COLOUR_ALIGN(addr, pgoff);
- }
-}
-
-unsigned long
-arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
- const unsigned long len, const unsigned long pgoff,
- const unsigned long flags)
-{
- struct vm_area_struct *vma;
- struct mm_struct *mm = current->mm;
- unsigned long addr = addr0;
- int do_align = 0;
- int aliasing = cache_is_vipt_aliasing();
-
- /*
- * We only need to do colour alignment if either the I or D
- * caches alias.
- */
- if (aliasing)
- do_align = filp || (flags & MAP_SHARED);
-
- /* requested length too big for entire address space */
- if (len > TASK_SIZE)
- return -ENOMEM;
-
- if (flags & MAP_FIXED) {
- if (aliasing && flags & MAP_SHARED &&
- (addr - (pgoff << PAGE_SHIFT)) & (SHMLBA - 1))
- return -EINVAL;
- return addr;
- }
-
- /* requesting a specific address */
- if (addr) {
- if (do_align)
- addr = COLOUR_ALIGN(addr, pgoff);
- else
- addr = PAGE_ALIGN(addr);
- vma = find_vma(mm, addr);
- if (TASK_SIZE - len >= addr &&
- (!vma || addr + len <= vma->vm_start))
- return addr;
- }
-
- /* check if free_area_cache is useful for us */
- if (len <= mm->cached_hole_size) {
- mm->cached_hole_size = 0;
- mm->free_area_cache = mm->mmap_base;
- }
-
- /* either no address requested or can't fit in requested address hole */
- addr = mm->free_area_cache;
- if (do_align) {
- unsigned long base = COLOUR_ALIGN_DOWN(addr - len, pgoff);
- addr = base + len;
- }
-
- /* make sure it can fit in the remaining address space */
- if (addr > len) {
- vma = find_vma(mm, addr-len);
- if (!vma || addr <= vma->vm_start)
- /* remember the address as a hint for next time */
- return (mm->free_area_cache = addr-len);
- }
-
- if (mm->mmap_base < len)
- goto bottomup;
-
- addr = mm->mmap_base - len;
- if (do_align)
- addr = COLOUR_ALIGN_DOWN(addr, pgoff);
-
- do {
- /*
- * Lookup failure means no vma is above this address,
- * else if new region fits below vma->vm_start,
- * return with success:
- */
- vma = find_vma(mm, addr);
- if (!vma || addr+len <= vma->vm_start)
- /* remember the address as a hint for next time */
- return (mm->free_area_cache = addr);
-
- /* remember the largest hole we saw so far */
- if (addr + mm->cached_hole_size < vma->vm_start)
- mm->cached_hole_size = vma->vm_start - addr;
-
- /* try just below the current vma->vm_start */
- addr = vma->vm_start - len;
- if (do_align)
- addr = COLOUR_ALIGN_DOWN(addr, pgoff);
- } while (len < vma->vm_start);
-
-bottomup:
- /*
- * A failed mmap() very likely causes application failure,
- * so fall back to the bottom-up function here. This scenario
- * can happen with large stack limits and large mmap()
- * allocations.
- */
- mm->cached_hole_size = ~0UL;
- mm->free_area_cache = TASK_UNMAPPED_BASE;
- addr = arch_get_unmapped_area(filp, addr0, len, pgoff, flags);
- /*
- * Restore the topdown base:
- */
- mm->free_area_cache = mm->mmap_base;
- mm->cached_hole_size = ~0UL;
-
- return addr;
-}
-
void arch_pick_mmap_layout(struct mm_struct *mm)
{
unsigned long random_factor = 0UL;
1.7.7.6

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
#3 Rik van Riel
June 21st, 2012 - 06:10 pm ET | Report spam
It is useful to search an augmented rbtree based on the augmented
data, ie. not using the sort key as the primary search criterium.
However, we may still need to limit our search to a sub-part of the
whole tree, using the sort key as limiters where we can search.

In that case, we may need to stop searching in one part of the tree,
and continue the search at the nearest (great-?)uncle node in a particular
direction.

Add helper functions to find the nearest uncle node.

Signed-off-by: Rik van Riel

include/linux/rbtree.h | 4 ++++
lib/rbtree.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 50 insertions(+), 0 deletions(-)

diff --git a/include/linux/rbtree.h b/include/linux/rbtree.h
index 661288d..a74b74b 100644
a/include/linux/rbtree.h
+++ b/include/linux/rbtree.h
@@ -169,6 +169,10 @@ extern struct rb_node *rb_prev(const struct rb_node *);
extern struct rb_node *rb_first(const struct rb_root *);
extern struct rb_node *rb_last(const struct rb_root *);

+/* Find the prev or next uncle of a node in the desired direction. */
+extern struct rb_node *rb_find_prev_uncle(struct rb_node *);
+extern struct rb_node *rb_find_next_uncle(struct rb_node *);
+
/* Fast replacement of a single node without remove/rebalance/add/rebalance */
extern void rb_replace_node(struct rb_node *victim, struct rb_node *new,
struct rb_root *root);
diff --git a/lib/rbtree.c b/lib/rbtree.c
index d417556..08c16d8 100644
a/lib/rbtree.c
+++ b/lib/rbtree.c
@@ -437,6 +437,52 @@ struct rb_node *rb_prev(const struct rb_node *node)
}
EXPORT_SYMBOL(rb_prev);

+/*
+ * rb_find_{prev,next}_uncle - Find the nearest "uncle" node in a direction
+ *
+ * An "uncle" node is a sibling node of a parent or grandparent. These
+ * functions walk up the tree to the nearest uncle of this node in the
+ * desired direction.
+ *
+ * G
+ * / \
+ * P U
+ * \
+ * N
+ * This is necessary when searching for something in a bounded subset
+ * of an augmented rbtree, when the primary search criterium is the
+ * augmented data, and not the sort key.
+ */
+struct rb_node *rb_find_prev_uncle(struct rb_node *node)
+{
+ struct rb_node *prev;
+
+ while ((prev = node) && (node = rb_parent(node))) {
+ if (prev == node->rb_left)
+ continue;
+
+ if (node->rb_left)
+ return node->rb_left;
+ }
+
+ return NULL;
+}
+
+struct rb_node *rb_find_next_uncle(struct rb_node *node)
+{
+ struct rb_node *prev;
+
+ while ((prev = node) && (node = rb_parent(node))) {
+ if (prev == node->rb_right)
+ continue;
+
+ if (node->rb_right)
+ return node->rb_right;
+ }
+
+ return NULL;
+}
+
void rb_replace_node(struct rb_node *victim, struct rb_node *new,
struct rb_root *root)
{
1.7.7.6

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
#4 Rik van Riel
June 21st, 2012 - 06:10 pm ET | Report spam
The generic arch_get_unmapped_area(_topdown) should now be able
to do everything x86 needs. Remove the x86 specific functions.

TODO: make the hugetlbfs arch_get_unmapped_area call the generic
code with proper alignment info.

Cc: Andi Kleen
Signed-off-by: Rik van Riel

arch/x86/include/asm/pgtable_64.h | 2 -
arch/x86/kernel/sys_x86_64.c | 162 -
2 files changed, 0 insertions(+), 164 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index 8408ccd..0ff6500 100644
a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -167,8 +167,6 @@ static inline int pgd_large(pgd_t pgd) { return 0; }
extern int kern_addr_valid(unsigned long addr);
extern void cleanup_highmap(void);

-#define HAVE_ARCH_UNMAPPED_AREA
-#define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN
#define HAVE_ARCH_GET_ADDRESS_RANGE
#define HAVE_ARCH_ALIGN_ADDR

diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index c059c19..ee03b18 100644
a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -135,165 +135,3 @@ void arch_get_address_range(unsigned long flags, unsigned long *begin,
*end = current->mm->mmap_base;
}
}
-
-unsigned long
-arch_get_unmapped_area(struct file *filp, unsigned long addr,
- unsigned long len, unsigned long pgoff, unsigned long flags)
-{
- struct mm_struct *mm = current->mm;
- struct vm_area_struct *vma;
- unsigned long start_addr;
- unsigned long begin, end;
-
- if (flags & MAP_FIXED)
- return addr;
-
- arch_get_address_range(flags, &begin, &end, ALLOC_UP);
-
- if (len > end)
- return -ENOMEM;
-
- if (addr) {
- addr = PAGE_ALIGN(addr);
- vma = find_vma(mm, addr);
- if (end - len >= addr &&
- (!vma || addr + len <= vma->vm_start))
- return addr;
- }
- if (((flags & MAP_32BIT) || test_thread_flag(TIF_ADDR32))
- && len <= mm->cached_hole_size) {
- mm->cached_hole_size = 0;
- mm->free_area_cache = begin;
- }
- addr = mm->free_area_cache;
- if (addr < begin)
- addr = begin;
- start_addr = addr;
-
-full_search:
-
- addr = arch_align_addr(addr, filp, pgoff, flags, ALLOC_UP);
-
- for (vma = find_vma(mm, addr); ; vma = vma->vm_next) {
- /* At this point: (!vma || addr < vma->vm_end). */
- if (end - len < addr) {
- /*
- * Start a new search - just in case we missed
- * some holes.
- */
- if (start_addr != begin) {
- start_addr = addr = begin;
- mm->cached_hole_size = 0;
- goto full_search;
- }
- return -ENOMEM;
- }
- if (!vma || addr + len <= vma->vm_start) {
- /*
- * Remember the place where we stopped the search:
- */
- mm->free_area_cache = addr + len;
- return addr;
- }
- if (addr + mm->cached_hole_size < vma->vm_start)
- mm->cached_hole_size = vma->vm_start - addr;
-
- addr = vma->vm_end;
- addr = arch_align_addr(addr, filp, pgoff, flags, ALLOC_UP);
- }
-}
-
-
-unsigned long
-arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
- const unsigned long len, const unsigned long pgoff,
- const unsigned long flags)
-{
- struct vm_area_struct *vma;
- struct mm_struct *mm = current->mm;
- unsigned long addr = addr0, start_addr;
-
- /* requested length too big for entire address space */
- if (len > TASK_SIZE)
- return -ENOMEM;
-
- if (flags & MAP_FIXED)
- return addr;
-
- /* for MAP_32BIT mappings we force the legact mmap base */
- if (!test_thread_flag(TIF_ADDR32) && (flags & MAP_32BIT))
- goto bottomup;
-
- /* requesting a specific address */
- if (addr) {
- addr = PAGE_ALIGN(addr);
- vma = find_vma(mm, addr);
- if (TASK_SIZE - len >= addr &&
- (!vma || addr + len <= vma->vm_start))
- return addr;
- }
-
- /* check if free_area_cache is useful for us */
- if (len <= mm->cached_hole_size) {
- mm->cached_hole_size = 0;
- mm->free_area_cache = mm->mmap_base;
- }
-
-try_again:
- /* either no address requested or can't fit in requested address hole */
- start_addr = addr = mm->free_area_cache;
-
- if (addr < len)
- goto fail;
-
- addr -= len;
- do {
- addr = arch_align_addr(addr, filp, pgoff, flags, ALLOC_DOWN);
-
- /*
- * Lookup failure means no vma is above this address,
- * else if new region fits below vma->vm_start,
- * return with success:
- */
- vma = find_vma(mm, addr);
- if (!vma || addr+len <= vma->vm_start)
- /* remember the address as a hint for next time */
- return mm->free_area_cache = addr;
-
- /* remember the largest hole we saw so far */
- if (addr + mm->cached_hole_size < vma->vm_start)
- mm->cached_hole_size = vma->vm_start - addr;
-
- /* try just below the current vma->vm_start */
- addr = vma->vm_start-len;
- } while (len < vma->vm_start);
-
-fail:
- /*
- * if hint left us with no space for the requested
- * mapping then try again:
- */
- if (start_addr != mm->mmap_base) {
- mm->free_area_cache = mm->mmap_base;
- mm->cached_hole_size = 0;
- goto try_again;
- }
-
-bottomup:
- /*
- * A failed mmap() very likely causes application failure,
- * so fall back to the bottom-up function here. This scenario
- * can happen with large stack limits and large mmap()
- * allocations.
- */
- mm->cached_hole_size = ~0UL;
- mm->free_area_cache = TASK_UNMAPPED_BASE;
- addr = arch_get_unmapped_area(filp, addr0, len, pgoff, flags);
- /*
- * Restore the topdown base:
- */
- mm->free_area_cache = mm->mmap_base;
- mm->cached_hole_size = ~0UL;
-
- return addr;
-}
1.7.7.6

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
#5 Peter Zijlstra
June 22nd, 2012 - 06:00 am ET | Report spam
On Thu, 2012-06-21 at 17:57 -0400, Rik van Riel wrote:
+/*
+ * Use the augmented rbtree code to propagate info on the largest
+ * free gap between VMAs up the VMA rbtree.
+ */
+static void adjust_free_gap(struct vm_area_struct *vma)
+{
+ rb_augment_erase_end(&vma->vm_rb, vma_rb_augment_cb, NULL);
+}



I was more thinking along the lines of:

/*
* Abuse rb_augment_erase_end() to propagate a modification up
* the tree by pretending the modified node is the deepest node
* still in the tree.
*/


Alternatively, we could add rb_augment_mod() or somesuch.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
Help Create a new topicNext page Replies Make a reply
Search Make your own search