[PATCH 0/6 v4] pagemap handles transparent hugepage

January 27th, 2012 - 06:10 pm ET by Naoya Horiguchi | Report spam
Hi,

I rebased the patchset onto 3.3-rc1, and made some fixes on thp
optimization patch based on the feedbacks from Andrea.

Naoya Horiguchi (6):
pagemap: avoid splitting thp when reading
thp: optimize away unnecessary page table locking
pagemap: export KPF_THP
pagemap: document KPF_THP and make page-types aware of
introduce thp_ptep_get()
pagemap: introduce data structure for pagemap entry

Documentation/vm/page-types.c | 2 +
Documentation/vm/pagemap.txt | 4 +
arch/x86/include/asm/pgtable.h | 5 ++
fs/proc/page.c | 2 +
fs/proc/task_mmu.c | 135 +++++++++++++++++++++-
include/asm-generic/pgtable.h | 4 +
include/linux/huge_mm.h | 17 +++++
include/linux/kernel-page-flags.h | 1 +
mm/huge_memory.c | 120 +++++++++++++++--
9 files changed, 169 insertions(+), 121 deletions(-)

Thanks,
Naoya
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
email Follow the discussionReplies 8 repliesReplies Make a reply

Similar topics

Replies

#1 Naoya Horiguchi
January 27th, 2012 - 06:10 pm ET | Report spam
page-types, which is a common user of pagemap, gets aware of thp
with this patch. This helps system admins and kernel hackers know
about how thp works.
Here is a sample output of page-types over a thp:

$ page-types -p <pid> --raw --list

voffset offset len flags
...
7f9d40200 3f8400 1 ___U_lA____Ma_bH______t____________
7f9d40201 3f8401 1ff ________________T_____t____________

flags page-count MB symbolic-flags long-symbolic-flags
0x0000000000410000 511 1 ________________T_____t____________ compound_tail,thp
0x000000000040d868 1 0 ___U_lA____Ma_bH______t____________ uptodate,lru,active,mmap,anonymous,swapbacked,compound_head,thp

Signed-off-by: Naoya Horiguchi
Acked-by: Wu Fengguang
Reviewed-by: KAMEZAWA Hiroyuki
Acked-by: KOSAKI Motohiro

Changes since v1:
- fix misused word

Documentation/vm/page-types.c | 2 ++
Documentation/vm/pagemap.txt | 4 ++++
2 files changed, 6 insertions(+), 0 deletions(-)

diff --git 3.3-rc1.orig/Documentation/vm/page-types.c 3.3-rc1/Documentation/vm/page-types.c
index 7445caa..0b13f02 100644
3.3-rc1.orig/Documentation/vm/page-types.c
+++ 3.3-rc1/Documentation/vm/page-types.c
@@ -98,6 +98,7 @@
#define KPF_HWPOISON 19
#define KPF_NOPAGE 20
#define KPF_KSM 21
+#define KPF_THP 22

/* [32-] kernel hacking assistances */
#define KPF_RESERVED 32
@@ -147,6 +148,7 @@ static const char *page_flag_names[] = {
[KPF_HWPOISON] = "X:hwpoison",
[KPF_NOPAGE] = "n:nopage",
[KPF_KSM] = "x:ksm",
+ [KPF_THP] = "t:thp",

[KPF_RESERVED] = "r:reserved",
[KPF_MLOCKED] = "m:mlocked",
diff --git 3.3-rc1.orig/Documentation/vm/pagemap.txt 3.3-rc1/Documentation/vm/pagemap.txt
index df09b96..4600cbe 100644
3.3-rc1.orig/Documentation/vm/pagemap.txt
+++ 3.3-rc1/Documentation/vm/pagemap.txt
@@ -60,6 +60,7 @@ There are three components to pagemap:
19. HWPOISON
20. NOPAGE
21. KSM
+ 22. THP

Short descriptions to the page flags:

@@ -97,6 +98,9 @@ Short descriptions to the page flags:
21. KSM
identical memory pages dynamically shared between one or more processes

+22. THP
+ contiguous pages which construct transparent hugepages
+
[IO related page flags]
1. ERROR IO error occurred
3. UPTODATE page has up-to-date data
1.7.7.6

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
#2 Naoya Horiguchi
January 27th, 2012 - 06:10 pm ET | Report spam
This flag shows that a given pages is a subpage of transparent hugepage.
It helps us debug and test kernel by showing physical address of thp.

Signed-off-by: Naoya Horiguchi
Reviewed-by: Wu Fengguang
Reviewed-by: KAMEZAWA Hiroyuki
Acked-by: KOSAKI Motohiro

Changes since v2:
- replace if with else-if not to set KPF_THP for hugetlbfs page

Changes since v1:
- remove unnecessary ifdefs
- fix confusing patch description

fs/proc/page.c | 2 ++
include/linux/kernel-page-flags.h | 1 +
2 files changed, 3 insertions(+), 0 deletions(-)

diff --git 3.3-rc1.orig/fs/proc/page.c 3.3-rc1/fs/proc/page.c
index 6d8e6a9..7fcd0d6 100644
3.3-rc1.orig/fs/proc/page.c
+++ 3.3-rc1/fs/proc/page.c
@@ -115,6 +115,8 @@ u64 stable_page_flags(struct page *page)
u |= 1 << KPF_COMPOUND_TAIL;
if (PageHuge(page))
u |= 1 << KPF_HUGE;
+ else if (PageTransCompound(page))
+ u |= 1 << KPF_THP;

/*
* Caveats on high order pages: page->_count will only be set
diff --git 3.3-rc1.orig/include/linux/kernel-page-flags.h 3.3-rc1/include/linux/kernel-page-flags.h
index bd92a89..26a6571 100644
3.3-rc1.orig/include/linux/kernel-page-flags.h
+++ 3.3-rc1/include/linux/kernel-page-flags.h
@@ -30,6 +30,7 @@
#define KPF_NOPAGE 20

#define KPF_KSM 21
+#define KPF_THP 22

/* kernel hacking assistances
* WARNING: subject to change, never rely on them!
1.7.7.6

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
#3 Naoya Horiguchi
January 27th, 2012 - 06:10 pm ET | Report spam
Thp split is not necessary if we explicitly check whether pmds are
mapping thps or not. This patch introduces this check and adds code
to generate pagemap entries for pmds mapping thps, which results in
less performance impact of pagemap on thp.

Signed-off-by: Naoya Horiguchi
Reviewed-by: Andi Kleen
Reviewed-by: KAMEZAWA Hiroyuki

Changes since v3:
- Generate pagemap entry directly from pmd to avoid messy casting

Changes since v2:
- Add comment on if check in thp_pte_to_pagemap_entry()
- Convert type of offset into unsigned long

Changes since v1:
- Move pfn declaration to the beginning of pagemap_pte_range()

fs/proc/task_mmu.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++--
1 files changed, 47 insertions(+), 6 deletions(-)

diff --git 3.3-rc1.orig/fs/proc/task_mmu.c 3.3-rc1/fs/proc/task_mmu.c
index e418c5a..cfbba8d 100644
3.3-rc1.orig/fs/proc/task_mmu.c
+++ 3.3-rc1/fs/proc/task_mmu.c
@@ -600,6 +600,9 @@ struct pagemapread {
u64 *buffer;
};

+#define PAGEMAP_WALK_SIZE (PMD_SIZE)
+#define PAGEMAP_WALK_MASK (PMD_MASK)
+
#define PM_ENTRY_BYTES sizeof(u64)
#define PM_STATUS_BITS 3
#define PM_STATUS_OFFSET (64 - PM_STATUS_BITS)
@@ -658,6 +661,27 @@ static u64 pte_to_pagemap_entry(pte_t pte)
return pme;
}

+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+static u64 thp_pmd_to_pagemap_entry(pmd_t pmd, int offset)
+{
+ u64 pme = 0;
+ /*
+ * Currently pmd for thp is always present because thp can not be
+ * swapped-out, migrated, or HWPOISONed (split in such cases instead.)
+ * This if-check is just to prepare for future implementation.
+ */
+ if (pmd_present(pmd))
+ pme = PM_PFRAME(pmd_pfn(pmd) + offset)
+ | PM_PSHIFT(PAGE_SHIFT) | PM_PRESENT;
+ return pme;
+}
+#else
+static inline u64 thp_pmd_to_pagemap_entry(pmd_t pmd, int offset)
+{
+ return 0;
+}
+#endif
+
static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
struct mm_walk *walk)
{
@@ -665,14 +689,33 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
struct pagemapread *pm = walk->private;
pte_t *pte;
int err = 0;
-
- split_huge_page_pmd(walk->mm, pmd);
+ u64 pfn = PM_NOT_PRESENT;

/* find the first VMA at or above 'addr' */
vma = find_vma(walk->mm, addr);
- for (; addr != end; addr += PAGE_SIZE) {
- u64 pfn = PM_NOT_PRESENT;

+ spin_lock(&walk->mm->page_table_lock);
+ if (pmd_trans_huge(*pmd)) {
+ if (pmd_trans_splitting(*pmd)) {
+ spin_unlock(&walk->mm->page_table_lock);
+ wait_split_huge_page(vma->anon_vma, pmd);
+ } else {
+ for (; addr != end; addr += PAGE_SIZE) {
+ unsigned long offset = (addr & ~PAGEMAP_WALK_MASK)
+ >> PAGE_SHIFT;
+ pfn = thp_pmd_to_pagemap_entry(*pmd, offset);
+ err = add_to_pagemap(addr, pfn, pm);
+ if (err)
+ break;
+ }
+ spin_unlock(&walk->mm->page_table_lock);
+ return err;
+ }
+ } else {
+ spin_unlock(&walk->mm->page_table_lock);
+ }
+
+ for (; addr != end; addr += PAGE_SIZE) {
/* check to see if we've left 'vma' behind
* and need a new, higher one */
if (vma && (addr >= vma->vm_end))
@@ -754,8 +797,6 @@ static int pagemap_hugetlb_range(pte_t *pte, unsigned long hmask,
* determine which areas of memory are actually mapped and llseek to
* skip over unmapped regions.
*/
-#define PAGEMAP_WALK_SIZE (PMD_SIZE)
-#define PAGEMAP_WALK_MASK (PMD_MASK)
static ssize_t pagemap_read(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
{
1.7.7.6

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
#4 Hillf Danton
January 28th, 2012 - 06:30 am ET | Report spam
Hi Naoya

On Sat, Jan 28, 2012 at 7:02 AM, Naoya Horiguchi
wrote:
Currently when we check if we can handle thp as it is or we need to
split it into regular sized pages, we hold page table lock prior to
check whether a given pmd is mapping thp or not. Because of this,
when it's not "huge pmd" we suffer from unnecessary lock/unlock overhead.
To remove it, this patch introduces a optimized check function and
replace several similar logics with it.

Signed-off-by: Naoya Horiguchi
Cc: David Rientjes

Changes since v3:
 - Fix likely/unlikely pattern in pmd_trans_huge_stable()
 - Change suffix from _stable to _lock
 - Introduce __pmd_trans_huge_lock() to avoid micro-regression
 - Return 1 when wait_split_huge_page path is taken

Changes since v2:
 - Fix missing "return 0" in "thp under splitting" path
 - Remove unneeded comment
 - Change the name of check function to describe what it does
 - Add VM_BUG_ON(mmap_sem)

 fs/proc/task_mmu.c      |   70 +++++++++
 include/linux/huge_mm.h |   17 +++++++
 mm/huge_memory.c        |  120 ++++++++++++++++++++++-
 3 files changed, 96 insertions(+), 111 deletions(-)



[...]

@@ -1064,21 +1056,14 @@ int mincore_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 {
       int ret = 0;

-       spin_lock(&vma->vm_mm->page_table_lock);
-       if (likely(pmd_trans_huge(*pmd))) {
-               ret = !pmd_trans_splitting(*pmd);



Here the value of ret is either false or true,

-               spin_unlock(&vma->vm_mm->page_table_lock);
-               if (unlikely(!ret))
-                       wait_split_huge_page(vma->anon_vma, pmd);
-               else {
-                       /*
-                        * All logical pages in the range are present
-                        * if backed by a huge page.
-                        */
-                       memset(vec, 1, (end - addr) >> PAGE_SHIFT);
-               }
-       } else
+       if (__pmd_trans_huge_lock(pmd, vma) == 1) {
+               /*
+                * All logical pages in the range are present
+                * if backed by a huge page.
+                */
               spin_unlock(&vma->vm_mm->page_table_lock);
+               memset(vec, 1, (end - addr) >> PAGE_SHIFT);
+       }

       return ret;



what is the returned value of this function? /Hillf

 }
@@ -1108,20 +1093,10 @@ int move_huge_pmd(struct vm_area_struct *vma, struct vm_area_struct *new_vma,
               goto out;
       }


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
#5 Hillf Danton
January 29th, 2012 - 08:20 am ET | Report spam
Hi Naoya

On Sat, Jan 28, 2012 at 7:02 AM, Naoya Horiguchi
wrote:
Thp split is not necessary if we explicitly check whether pmds are
mapping thps or not. This patch introduces this check and adds code
to generate pagemap entries for pmds mapping thps, which results in
less performance impact of pagemap on thp.




Could the method proposed here cover the two cases of split THP in mem cgroup?

Thanks
Hillf
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
Help Create a new topicNext page Replies Make a reply
Search Make your own search