[RFC] [PATCH v2 0/8] Provide cgroup isolation for buffered writes.

March 22nd, 2011 - 06:20 pm ET by Justin TerAvest | Report spam
This patchset adds tracking to the page_cgroup structure for which cgroup has
dirtied a page, and uses that information to provide isolation between
cgroups performing writeback.

I know that there is some discussion to remove request descriptor limits
entirely, but I included a patch to introduce per-cgroup limits to enable
this functionality. Without it, we didn't see much isolation improvement.

I think most of this material has been discussed on lkml previously, this is
just another attempt to make a patchset that handles buffered writes for CFQ.

There was a lot of previous discussion at:
http://thread.gmane.org/gmane.linux.kernel/1007922

Thanks to Andrea Righi, Kamezawa Hiroyuki, Munehiro Ikeda, Nauman Rafique,
and Vivek Goyal for work on previous versions of these patches.

For version 2:
- I collected more statistics and provided data in the cover sheet
- blkio id is now stored inside "flags" in page_cgroup, with cmpxchg
- I cleaned up some patch names
- Added symmetric reference wrappers in cfq-iosched

There are a couple lingering issues that exist in this patchset-- it's meant
to be an RFC to discuss the overall design for tracking of buffered writes.
I have at least a couple of patches to finish to make absolutely sure that
refcounts and locking are handled properly, I just need to do more testing.

Documentation/block/biodoc.txt | 10 +
block/blk-cgroup.c | 203 +++++++++++++++++-
block/blk-cgroup.h | 9 +-
block/blk-core.c | 218 +++++++++++++
block/blk-settings.c | 2 +-
block/blk-sysfs.c | 59 +++
block/cfq-iosched.c | 473 ++++++++++++++++++++++++++++++-
block/cfq.h | 6 +-
block/elevator.c | 7 +-
fs/buffer.c | 2 +
fs/direct-io.c | 2 +
include/linux/blk_types.h | 2 +
include/linux/blkdev.h | 81 +++++++-
include/linux/blkio-track.h | 89 ++++++++
include/linux/elevator.h | 14 +-
include/linux/iocontext.h | 1 +
include/linux/memcontrol.h | 6 +
include/linux/mmzone.h | 4 +-
include/linux/page_cgroup.h | 38 +++-
init/Kconfig | 16 ++
mm/Makefile | 3 +-
mm/bounce.c | 2 +
mm/filemap.c | 2 +
mm/memcontrol.c | 6 +
mm/memory.c | 6 +
mm/page-writeback.c | 14 +-
mm/page_cgroup.c | 29 ++-
mm/swap_state.c | 2 +
28 files changed, 1066 insertions(+), 240 deletions(-)


8f0b0f4 cfq: Don't allow preemption across cgroups
a47cdc6 block: Per cgroup request descriptor counts
8dd7adb cfq: add per cgroup writeout done by flusher stat
1fa0b6d cfq: Fix up tracked async workload length.
e9e85d3 block: Modify CFQ to use IO tracking information.
f8ffb19 cfq-iosched: Make async queues per cgroup
1d9ee09 block,fs,mm: IO cgroup tracking for buffered write
31c7321 cfq-iosched: add symmetric reference wrappers


= Isolation experiment results

For isolation testing, we run a test that's available at:
git://google3-2.osuosl.org/tests/blkcgroup.git

It creates containers, runs workloads, and checks to see how well we meet
isolation targets. For the purposes of this patchset, I only ran
tests among buffered writers.

Before patches
:32:06 INFO experiment 0 achieved DTFs: 666, 333
10:32:06 INFO experiment 0 FAILED: max observed error is 167, allowed is 150
10:32:51 INFO experiment 1 achieved DTFs: 647, 352
10:32:51 INFO experiment 1 FAILED: max observed error is 253, allowed is 150
10:33:35 INFO experiment 2 achieved DTFs: 298, 701
10:33:35 INFO experiment 2 FAILED: max observed error is 199, allowed is 150
10:34:19 INFO experiment 3 achieved DTFs: 445, 277, 277
10:34:19 INFO experiment 3 FAILED: max observed error is 155, allowed is 150
10:35:05 INFO experiment 4 achieved DTFs: 418, 104, 261, 215
10:35:05 INFO experiment 4 FAILED: max observed error is 232, allowed is 150
10:35:53 INFO experiment 5 achieved DTFs: 213, 136, 68, 102, 170, 136, 170
10:35:53 INFO experiment 5 PASSED: max observed error is 73, allowed is 150
10:36:04 INFO --ran 6 experiments, 1 passed, 5 failed

After patches
==:05:22 INFO experiment 0 achieved DTFs: 501, 498
11:05:22 INFO experiment 0 PASSED: max observed error is 2, allowed is 150
11:06:07 INFO experiment 1 achieved DTFs: 874, 125
11:06:07 INFO experiment 1 PASSED: max observed error is 26, allowed is 150
11:06:53 INFO experiment 2 achieved DTFs: 121, 878
11:06:53 INFO experiment 2 PASSED: max observed error is 22, allowed is 150
11:07:46 INFO experiment 3 achieved DTFs: 589, 205, 204
11:07:46 INFO experiment 3 PASSED: max observed error is 11, allowed is 150
11:08:34 INFO experiment 4 achieved DTFs: 616, 109, 109, 163
11:08:34 INFO experiment 4 PASSED: max observed error is 34, allowed is 150
11:09:29 INFO experiment 5 achieved DTFs: 139, 139, 139, 139, 140, 141, 160
11:09:29 INFO experiment 5 PASSED: max observed error is 1, allowed is 150
11:09:46 INFO --ran 6 experiments, 6 passed, 0 failed

Summary
Isolation between buffered writers is clearly better with this patch.


= Read latency results
To test read latency, I created two containers:
- One called "readers", with weight 900
- One called "writers", with weight 100

I ran this fio workload in "readers":
[global]
directory=/mnt/iostestmnt/fio
runtime0
time_based=1
group_reporting=1
exec_prerun='echo 3 > /proc/sys/vm/drop_caches'
cgroup_nodelete=1
bs=4K
sizeQ2M

[iostest-read]
description="reader"
numjobs
rw=randread
new_group=1


and this fio workload in "writers"
[global]
directory=/mnt/iostestmnt/fio
runtime0
time_based=1
group_reporting=1
exec_prerun='echo 3 > /proc/sys/vm/drop_caches'
cgroup_nodelete=1
bs=4K
sizeQ2M

[iostest-write]
description="writer"
cgroup=writers
numjobs=3
rw=write
new_group=1



I've pasted the results from the "read" workload inline.

Before patches
=Starting 16 processes

Jobs: 14 (f): [_rrrrrr_rrrrrrrr] [36.2% done] [352K/0K /s] [86 /0 iops] [eta 01m:00s]·············
iostest-read: (groupid=0, jobs): err= 0: pid 606
Description : ["reader"]
read : io532KB, bwE5814 B/s, iops1 , runt= 30400msec
clat (usec): min!90 , max0399K, avg0395175.13, stdev= 0.20
lat (usec): min!90 , max0399K, avg0395177.07, stdev= 0.20
bw (KB/s) : min= 0, max= 260, per=0.00%, avg= 0.00, stdev= 0.00
cpu : usr=0.00%, sys=0.03%, ctx691, majf=2, minfF8
IO depths : 10.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >d=0.0%
submit : 0=0.0%, 40.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >d=0.0%
complete : 0=0.0%, 40.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >d=0.0%
issued r/w/d: total383/0/0, short=0/0/0

lat (msec): 4=0.03%, 10=2.66%, 20t.84%, 50!.90%, 100=0.09%
lat (msec): 250=0.06%, > 00=0.41%

Run status group 0 (all jobs):
READ: io532KB, aggrbD5KB/s, minbE5KB/s, maxbE5KB/s, mint0400msec, maxt0400msec

Disk stats (read/write):
sdb: ios744/18, merge=0/16, ticksT2713/1675, in_queueU0714, util™.15%



After patches
tarting 16 processes
Jobs: 16 (f): [rrrrrrrrrrrrrrrr] [100.0% done] [557K/0K /s] [136 /0 iops] [eta 00m:00s]
iostest-read: (groupid=0, jobs): err= 0: pid183
Description : ["reader"]
read : io940KB, bwP6105 B/s, iops3 , runt= 30228msec
clat (msec): min=2 , max)866 , avgF3.42, stdev1.84
lat (msec): min=2 , max)866 , avgF3.42, stdev1.84
bw (KB/s) : min= 0, max= 198, per1.69%, avg6.52, stdev.83
cpu : usr=0.01%, sys=0.03%, ctxB74, majf=2, minfF4
IO depths : 10.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >d=0.0%
submit : 0=0.0%, 40.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >d=0.0%
complete : 0=0.0%, 40.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >d=0.0%
issued r/w/d: total735/0/0, short=0/0/0

lat (msec): 4=0.05%, 10=0.32%, 202.99%, 50d.61%, 100=1.26%
lat (msec): 250=0.11%, 500=0.11%, 750=0.16%, 1000=0.05%, > 00=0.35%

Run status group 0 (all jobs):
READ: io940KB, aggrbI4KB/s, minbP6KB/s, maxbP6KB/s, mint0228msec, maxt0228msec

Disk stats (read/write):
sdb: iosA89/0, merge=0/0, ticks–428/0, in_queueG8798, util0.00%



Summary
Read latencies are a bit worse, but this overhead is only imposed when users
ask for this feature by turning on CONFIG_BLKIOTRACK. We expect there to be a something of a latency vs isolation tradeoff.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
email Follow the discussionReplies 14 repliesReplies Make a reply

Replies

#1 Justin TerAvest
March 22nd, 2011 - 06:20 pm ET | Report spam
IO tracking bits are used to send IOs to the right async
queue. Current task is still used to identify the cgroup of the
synchronous IO. Current task is also used if IO tracking is disabled.

Signed-off-by: Justin TerAvest

block/blk-cgroup.c | 6 +-
block/blk-core.c | 7 +-
block/cfq-iosched.c | 175 +++++++++++++++++++++++++++++++++++++++++--
block/elevator.c | 5 +-
include/linux/elevator.h | 6 +-
5 files changed, 172 insertions(+), 27 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index f626c65..9732cfd 100644
a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -111,6 +111,9 @@ blkio_policy_search_node(const struct blkio_cgroup *blkcg, dev_t dev,

struct blkio_cgroup *cgroup_to_blkio_cgroup(struct cgroup *cgroup)
{
+ if (!cgroup)
+ return &blkio_root_cgroup;
+
return container_of(cgroup_subsys_state(cgroup, blkio_subsys_id),
struct blkio_cgroup, css);
}
@@ -1550,6 +1553,7 @@ unsigned long get_blkio_cgroup_id(struct bio *bio)
id = page_cgroup_get_blkio_id(pc);
return id;
}
+EXPORT_SYMBOL(get_blkio_cgroup_id);

/**
* get_cgroup_from_page() - determine the cgroup from a page.
@@ -1576,8 +1580,6 @@ struct cgroup *get_cgroup_from_page(struct page *page)

return css->cgroup;
}
-
-EXPORT_SYMBOL(get_blkio_cgroup_id);
EXPORT_SYMBOL(get_cgroup_from_page);

#endif /* CONFIG_CGROUP_BLKIOTRACK */
diff --git a/block/blk-core.c b/block/blk-core.c
index 5256932..1b7936bf 100644
a/block/blk-core.c
+++ b/block/blk-core.c
@@ -583,7 +583,8 @@ static inline void blk_free_request(struct request_queue *q, struct request *rq)
}

static struct request *
-blk_alloc_request(struct request_queue *q, int flags, int priv, gfp_t gfp_mask)
+blk_alloc_request(struct request_queue *q, struct bio *bio, int flags, int priv,
+ gfp_t gfp_mask)
{
struct request *rq = mempool_alloc(q->rq.rq_pool, gfp_mask);

@@ -595,7 +596,7 @@ blk_alloc_request(struct request_queue *q, int flags, int priv, gfp_t gfp_mask)
rq->cmd_flags = flags | REQ_ALLOCED;

if (priv) {
- if (unlikely(elv_set_request(q, rq, gfp_mask))) {
+ if (unlikely(elv_set_request(q, rq, bio, gfp_mask))) {
mempool_free(rq, q->rq.rq_pool);
return NULL;
}
@@ -757,7 +758,7 @@ static struct request *get_request(struct request_queue *q, int rw_flags,
rw_flags |= REQ_IO_STAT;
spin_unlock_irq(q->queue_lock);

- rq = blk_alloc_request(q, rw_flags, priv, gfp_mask);
+ rq = blk_alloc_request(q, bio, rw_flags, priv, gfp_mask);
if (unlikely(!rq)) {
/*
* Allocation failed presumably due to memory. Undo anything
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 011d268..c75bbbf 100644
a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -14,6 +14,7 @@
#include <linux/rbtree.h>
#include <linux/ioprio.h>
#include <linux/blktrace_api.h>
+#include <linux/blkio-track.h>
#include "cfq.h"

/*
@@ -309,6 +310,10 @@ static void cfq_get_queue_ref(struct cfq_queue *cfqq);
static void cfq_put_queue_ref(struct cfq_queue *cfqq);

static struct cfq_group *cfq_get_next_cfqg(struct cfq_data *cfqd);
+static struct cfq_group *cfq_get_cfqg_bio(struct cfq_data *cfqd,
+ struct bio *bio, int create);
+static struct cfq_queue **
+cfq_async_queue_prio(struct cfq_group *cfqg, int ioprio_class, int ioprio);

static struct cfq_rb_root *service_tree_for(struct cfq_group *cfqg,
enum wl_prio_t prio,
@@ -451,7 +456,7 @@ static inline int cfqg_busy_async_queues(struct cfq_data *cfqd,
}

static void cfq_dispatch_insert(struct request_queue *, struct request *);
-static struct cfq_queue *cfq_get_queue(struct cfq_data *, bool,
+static struct cfq_queue *cfq_get_queue(struct cfq_data *, struct bio*, bool,
struct io_context *, gfp_t);
static struct cfq_io_context *cfq_cic_lookup(struct cfq_data *,
struct io_context *);
@@ -463,9 +468,55 @@ static inline struct cfq_queue *cic_to_cfqq(struct cfq_io_context *cic,
return cic->cfqq[is_sync];
}

+/*
+ * Determine the cfq queue bio should go in. This is primarily used by
+ * front merge and allow merge functions.
+ *
+ * Currently this function takes the ioprio and iprio_class from task
+ * submitting async bio. Later save the task information in the page_cgroup
+ * and retrieve task's ioprio and class from there.
+ */
+static struct cfq_queue *cic_bio_to_cfqq(struct cfq_data *cfqd,
+ struct cfq_io_context *cic, struct bio *bio, int is_sync)
+{
+ struct cfq_queue *cfqq = cic_to_cfqq(cic, is_sync);
+
+#ifdef CONFIG_CGROUP_BLKIOTRACK
+ if (!cfqq && !is_sync) {
+ const int ioprio = task_ioprio(cic->ioc);
+ const int ioprio_class = task_ioprio_class(cic->ioc);
+ struct cfq_group *cfqg;
+ struct cfq_queue **async_cfqq;
+ /*
+ * async bio tracking is enabled and we are not caching
+ * async queue pointer in cic.
+ */
+ cfqg = cfq_get_cfqg_bio(cfqd, bio, 0);
+ if (!cfqg) {
+ /*
+ * May be this is first rq/bio and io group has not
+ * been setup yet.
+ */
+ return NULL;
+ }
+ async_cfqq = cfq_async_queue_prio(cfqg, ioprio_class, ioprio);
+ return *async_cfqq;
+ }
+#endif
+ return cfqq;
+}
+
static inline void cic_set_cfqq(struct cfq_io_context *cic,
struct cfq_queue *cfqq, bool is_sync)
{
+#ifdef CONFIG_CGROUP_BLKIOTRACK
+ /*
+ * Don't cache async queue pointer as now one io context might
+ * be submitting async io for various different async queues
+ */
+ if (!is_sync)
+ return;
+#endif
cic->cfqq[is_sync] = cfqq;
}

@@ -1032,7 +1083,9 @@ cfq_find_alloc_cfqg(struct cfq_data *cfqd, struct cgroup *cgroup, int create)
unsigned int major, minor;

cfqg = cfqg_of_blkg(blkiocg_lookup_group(blkcg, key));
- if (cfqg && !cfqg->blkg.dev && bdi->dev && dev_name(bdi->dev)) {
+ if (!bdi || !bdi->dev || !dev_name(bdi->dev))
+ goto done;
+ if (cfqg && !cfqg->blkg.dev) {
sscanf(dev_name(bdi->dev), "%u:%u", &major, &minor);
cfqg->blkg.dev = MKDEV(major, minor);
goto done;
@@ -1083,20 +1136,61 @@ done:
* Search for the cfq group current task belongs to. If create = 1, then also
* create the cfq group if it does not exist. request_queue lock must be held.
*/
-static struct cfq_group *cfq_get_cfqg(struct cfq_data *cfqd, int create)
+static struct cfq_group *cfq_get_cfqg(struct cfq_data *cfqd, struct page *page,
+ int create)
{
struct cgroup *cgroup;
struct cfq_group *cfqg = NULL;

rcu_read_lock();
- cgroup = task_cgroup(current, blkio_subsys_id);
+
+ if (!page)
+ cgroup = task_cgroup(current, blkio_subsys_id);
+ else
+ cgroup = get_cgroup_from_page(page);
+
+ if (!cgroup) {
+ cfqg = &cfqd->root_group;
+ goto out;
+ }
+
cfqg = cfq_find_alloc_cfqg(cfqd, cgroup, create);
if (!cfqg && create)
cfqg = &cfqd->root_group;
+out:
rcu_read_unlock();
return cfqg;
}

+struct cfq_group *cfq_get_cfqg_bio(struct cfq_data *cfqd,
+ struct bio *bio, int create)
+{
+ struct page *page = NULL;
+
+ /*
+ * Determine the group from task context. Even calls from
+ * blk_get_request() which don't have any bio info will be mapped
+ * to the task's group
+ */
+ if (!bio)
+ goto sync;
+
+#ifdef CONFIG_CGROUP_BLKIOTRACK
+ if (!bio_has_data(bio))
+ goto sync;
+
+ /* Map the sync bio to the right group using task context */
+ if (cfq_bio_sync(bio))
+ goto sync;
+
+ /* Determine the group from info stored in page */
+ page = bio_iovec_idx(bio, 0)->bv_page;
+#endif
+
+sync:
+ return cfq_get_cfqg(cfqd, page, create);
+}
+
static void cfq_get_group_ref(struct cfq_group *cfqg)
{
cfqg->ref++;
@@ -1180,6 +1274,11 @@ void cfq_unlink_blkio_group(void *key, struct blkio_group *blkg)

#else /* GROUP_IOSCHED */

+static struct cfq_group *cfq_get_cfqg_bio(struct cfq_data *cfqd,
+ struct bio *bio, int create)
+{
+}
+
static void cfq_get_group_ref(struct cfq_group *cfqg)
{
}
@@ -1491,7 +1590,7 @@ cfq_find_rq_fmerge(struct cfq_data *cfqd, struct bio *bio)
if (!cic)
return NULL;

- cfqq = cic_to_cfqq(cic, cfq_bio_sync(bio));
+ cfqq = cic_bio_to_cfqq(cfqd, cic, bio, cfq_bio_sync(bio));
if (cfqq) {
sector_t sector = bio->bi_sector + bio_sectors(bio);

@@ -1615,7 +1714,7 @@ static int cfq_allow_merge(struct request_queue *q, struct request *rq,
if (!cic)
return false;

- cfqq = cic_to_cfqq(cic, cfq_bio_sync(bio));
+ cfqq = cic_bio_to_cfqq(cfqd, cic, bio, cfq_bio_sync(bio));
return cfqq == RQ_CFQQ(rq);
}

@@ -2848,14 +2947,10 @@ static void changed_ioprio(struct io_context *ioc, struct cfq_io_context *cic)
spin_lock_irqsave(cfqd->queue->queue_lock, flags);

cfqq = cic->cfqq[BLK_RW_ASYNC];
+
if (cfqq) {
- struct cfq_queue *new_cfqq;
- new_cfqq = cfq_get_queue(cfqd, BLK_RW_ASYNC, cic->ioc,
- GFP_ATOMIC);
- if (new_cfqq) {
- cic->cfqq[BLK_RW_ASYNC] = new_cfqq;
- cfq_put_queue_ref(cfqq);
- }
+ cic_set_cfqq(cic, NULL, BLK_RW_ASYNC);
+ cfq_put_queue_ref(cfqq);
}

cfqq = cic->cfqq[BLK_RW_SYNC];
@@ -2895,6 +2990,7 @@ static void cfq_init_cfqq(struct cfq_data *cfqd, struct cfq_queue *cfqq,
static void changed_cgroup(struct io_context *ioc, struct cfq_io_context *cic)
{
struct cfq_queue *sync_cfqq = cic_to_cfqq(cic, 1);
+ struct cfq_queue *async_cfqq = cic_to_cfqq(cic, 0);
struct cfq_data *cfqd = cic_to_cfqd(cic);
unsigned long flags;
struct request_queue *q;
@@ -2916,6 +3012,12 @@ static void changed_cgroup(struct io_context *ioc, struct cfq_io_context *cic)
cfq_put_queue_ref(sync_cfqq);
}

+ if (async_cfqq != NULL) {
+ cfq_log_cfqq(cfqd, async_cfqq, "changed cgroup");
+ cic_set_cfqq(cic, NULL, 0);
+ cfq_put_queue_ref(async_cfqq);
+ }
+
spin_unlock_irqrestore(q->queue_lock, flags);
}

@@ -2938,6 +3040,24 @@ retry:
/* cic always exists here */
cfqq = cic_to_cfqq(cic, is_sync);

+#ifdef CONFIG_CGROUP_BLKIOTRACK
+ if (!cfqq && !is_sync) {
+ const int ioprio = task_ioprio(cic->ioc);
+ const int ioprio_class = task_ioprio_class(cic->ioc);
+ struct cfq_queue **async_cfqq;
+
+ /*
+ * We have not cached async queue pointer as bio tracking
+ * is enabled. Look into group async queue array using ioc
+ * class and prio to see if somebody already allocated the
+ * queue.
+ */
+
+ async_cfqq = cfq_async_queue_prio(cfqg, ioprio_class, ioprio);
+ cfqq = *async_cfqq;
+ }
+#endif
+
/*
* Always try a new alloc if we fell back to the OOM cfqq
* originally, since it should just be a temporary situation.
@@ -2992,14 +3112,14 @@ cfq_async_queue_prio(struct cfq_group *cfqg, int ioprio_class, int ioprio)
}

static struct cfq_queue *
-cfq_get_queue(struct cfq_data *cfqd, bool is_sync, struct io_context *ioc,
- gfp_t gfp_mask)
+cfq_get_queue(struct cfq_data *cfqd, struct bio *bio, bool is_sync,
+ struct io_context *ioc, gfp_t gfp_mask)
{
const int ioprio = task_ioprio(ioc);
const int ioprio_class = task_ioprio_class(ioc);
struct cfq_queue **async_cfqq = NULL;
struct cfq_queue *cfqq = NULL;
- struct cfq_group *cfqg = cfq_get_cfqg(cfqd, 1);
+ struct cfq_group *cfqg = cfq_get_cfqg_bio(cfqd, bio, 1);

if (!is_sync) {
async_cfqq = cfq_async_queue_prio(cfqg, ioprio_class,
@@ -3018,7 +3138,25 @@ cfq_get_queue(struct cfq_data *cfqd, bool is_sync, struct io_context *ioc,
*async_cfqq = cfqq;
}

+#ifdef CONFIG_CGROUP_BLKIOTRACK
+ /*
+ * ioc reference. If async request queue/group is determined from the
+ * original task/cgroup and not from submitter task, io context can
+ * not cache the pointer to async queue and everytime a request comes,
+ * it will be determined by going through the async queue array.
+ *
+ */
+ if (is_sync)
+ cfq_get_queue_ref(cfqq);
+#else
+ /*
+ * async requests are being attributed to task submitting
+ * it, hence cic can cache async cfqq pointer. Take the
+ * queue reference even for async queue.
+ */
+
cfq_get_queue_ref(cfqq);
+#endif
return cfqq;
}

@@ -3686,7 +3824,8 @@ split_cfqq(struct cfq_io_context *cic, struct cfq_queue *cfqq)
* Allocate cfq data structures associated with this request.
*/
static int
-cfq_set_request(struct request_queue *q, struct request *rq, gfp_t gfp_mask)
+cfq_set_request(struct request_queue *q, struct request *rq, struct bio *bio,
+ gfp_t gfp_mask)
{
struct cfq_data *cfqd = q->elevator->elevator_data;
struct cfq_io_context *cic;
@@ -3707,7 +3846,7 @@ cfq_set_request(struct request_queue *q, struct request *rq, gfp_t gfp_mask)
new_queue:
cfqq = cic_to_cfqq(cic, is_sync);
if (!cfqq || cfqq == &cfqd->oom_cfqq) {
- cfqq = cfq_get_queue(cfqd, is_sync, cic->ioc, gfp_mask);
+ cfqq = cfq_get_queue(cfqd, bio, is_sync, cic->ioc, gfp_mask);
cic_set_cfqq(cic, cfqq, is_sync);
} else {
/*
diff --git a/block/elevator.c b/block/elevator.c
index c387d31..9edaeb5 100644
a/block/elevator.c
+++ b/block/elevator.c
@@ -770,12 +770,13 @@ struct request *elv_former_request(struct request_queue *q, struct request *rq)
return NULL;
}

-int elv_set_request(struct request_queue *q, struct request *rq, gfp_t gfp_mask)
+int elv_set_request(struct request_queue *q, struct request *rq,
+ struct bio *bio, gfp_t gfp_mask)
{
struct elevator_queue *e = q->elevator;

if (e->ops->elevator_set_req_fn)
- return e->ops->elevator_set_req_fn(q, rq, gfp_mask);
+ return e->ops->elevator_set_req_fn(q, rq, bio, gfp_mask);

rq->elevator_private[0] = NULL;
return 0;
diff --git a/include/linux/elevator.h b/include/linux/elevator.h
index d93efcc445..c3a884c 100644
a/include/linux/elevator.h
+++ b/include/linux/elevator.h
@@ -24,7 +24,8 @@ typedef struct request *(elevator_request_list_fn) (struct request_queue *, stru
typedef void (elevator_completed_req_fn) (struct request_queue *, struct request *);
typedef int (elevator_may_queue_fn) (struct request_queue *, int);

-typedef int (elevator_set_req_fn) (struct request_queue *, struct request *, gfp_t);
+typedef int (elevator_set_req_fn) (struct request_queue *, struct request *,
+ struct bio *bio, gfp_t);
typedef void (elevator_put_req_fn) (struct request *);
typedef void (elevator_activate_req_fn) (struct request_queue *, struct request *);
typedef void (elevator_deactivate_req_fn) (struct request_queue *, struct request *);
@@ -117,7 +118,8 @@ extern void elv_unregister_queue(struct request_queue *q);
extern int elv_may_queue(struct request_queue *, int);
extern void elv_abort_queue(struct request_queue *);
extern void elv_completed_request(struct request_queue *, struct request *);
-extern int elv_set_request(struct request_queue *, struct request *, gfp_t);
+extern int elv_set_request(struct request_queue *, struct request *,
+ struct bio *bio, gfp_t);
extern void elv_put_request(struct request_queue *, struct request *);
extern void elv_drain_elevator(struct request_queue *);

1.7.3.1

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Similar topics