[PATCH 0/2] msync improvements

May 31st, 2012 - 04:50 pm ET by Paolo Bonzini | Report spam
These two patches are improvements on the implementation of the msync
system call. To give some context, I found them while working on
the implementation of a persistent dirty bitmap.

Paolo Bonzini (2):
msync: support syncing a small part of the file
msync: start async writeout when MS_ASYNC

include/linux/fs.h | 3 +-
mm/fadvise.c | 2 +-
mm/filemap.c | 11 +++++
mm/msync.c | 63 +++++++++++++++++++++++++++++++++-
4 files changed, 50 insertions(+), 29 deletions(-)

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
email Follow the discussionReplies 12 repliesReplies Make a reply

Similar topics

Replies

#6 Andrew Morton
June 13th, 2012 - 06:10 pm ET | Report spam
On Wed, 13 Jun 2012 15:51:33 -0600
Zan Lynx wrote:

On Wed, 2012-06-13 at 14:26 -0700, Andrew Morton wrote:
> On Thu, 31 May 2012 22:43:54 +0200
> Paolo Bonzini wrote:
>
> > msync does not need to flush changes to the entire file, even with MS_SYNC.
> > Instead, it can use vfs_fsync_range to only synchronize a part of the file.
> >
> > In addition, not all metadata has to be synced; msync is closer to
> > fdatasync than it is to msync. So, pass 1 to vfs_fsync_range.
>
> Would be nice, but if applications were previously assuming that an
> msync() was syncing the whole file, this patch will secretly and subtly
> break them. Convince me that this change won't weaken anyone's data
> integrity ;)

As an interested observer and a programmer who uses msync()...

I never assumed msync() did the whole file.



OK, that's one user accounted for. 3 million to go.

Look, I'm not terribly averse to the change, but it does have this
risk. And it's a nasty risk because anyone who is hit by it simply
will not know that his applcation has lost some of its data integrity.

It has an address and length
argument. I always assumed it only looked for dirty pages within that
range. That is also what the msync() documentation claims.

As for weakening data integrity because of assumptions programmers *may*
have made, I think this is a bad argument which followed to its logical
conclusion can only lead to requiring an implicit sync() before and
after every system call. :-)



No, not at all. The issue is the *removal* of a side-effect upon which
some apps/designers may have been depending. Perhaps unintentionally!

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
#7 Zan Lynx
June 13th, 2012 - 06:20 pm ET | Report spam

On Wed, 2012-06-13 at 14:26 -0700, Andrew Morton wrote:
On Thu, 31 May 2012 22:43:54 +0200
Paolo Bonzini wrote:

> msync does not need to flush changes to the entire file, even with MS_SYNC.
> Instead, it can use vfs_fsync_range to only synchronize a part of the file.
>
> In addition, not all metadata has to be synced; msync is closer to
> fdatasync than it is to msync. So, pass 1 to vfs_fsync_range.

Would be nice, but if applications were previously assuming that an
msync() was syncing the whole file, this patch will secretly and subtly
break them. Convince me that this change won't weaken anyone's data
integrity ;)



As an interested observer and a programmer who uses msync()...

I never assumed msync() did the whole file. It has an address and length
argument. I always assumed it only looked for dirty pages within that
range. That is also what the msync() documentation claims.

As for weakening data integrity because of assumptions programmers *may*
have made, I think this is a bad argument which followed to its logical
conclusion can only lead to requiring an implicit sync() before and
after every system call. :-)

Asking programmers to use sync_file_range() is not a replacement for
improving msync() because msync() is POSIX and can be used everywhere.
sync_file_range() requires Linux specific code.

Knowledge Is Power
Power Corrupts
Study Hard
Be Evil




To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
#8 Paolo Bonzini
June 14th, 2012 - 05:00 am ET | Report spam
Il 14/06/2012 00:08, Andrew Morton ha scritto:
> As an interested observer and a programmer who uses msync()...
>
> I never assumed msync() did the whole file.


OK, that's one user accounted for. 3 million to go.

Look, I'm not terribly averse to the change, but it does have this
risk. And it's a nasty risk because anyone who is hit by it simply
will not know that his applcation has lost some of its data integrity.



Sure, it reminds me of the recent bug caused by using memcpy with
overlapping areas.

But really, in this case the possibility for confusion is really small,
and would entirely be the fault of the programmer. There are argument
for addr and length, and the man page says: "To be more precise, the
part of the file that corresponds to the memory area starting at addr
and having length length is updated". Our implementation having been
suboptimal for a long time, is not a good reason to keep it suboptimal.

Anyway yes, there is an off-by-one.

Paolo
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
#9 Paolo Bonzini
June 14th, 2012 - 05:10 am ET | Report spam
Il 13/06/2012 23:29, Andrew Morton ha scritto:
> If the application does not want to start I/O, it can simply call msync
> with flags equal to MS_INVALIDATE. This one remains a no-op, as it should
> be on a reasonable implementation.

Means that people will find that their msync(MS_ASYNC) call will newly
start IO. This may well be undesirable for some.



If they knew it was a no-op, and relying on it, they might as well not
have called it at all and save a syscall.

Also, it hardwires into the kernel behaviour which userspace itself
could have initiated, with sync_file_range(). ie: reduced flexibility.



No, it hardwires into the kernel behaviour which is mentioned in the
spec. Strictly speaking it is correct that an asynchronous operation
is never done (the two are indistinguishable), but still this patch
improves the fidelity to the specification and in a portable manner.

Paolo
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
#10 Andrew Morton
June 14th, 2012 - 06:10 am ET | Report spam
On Thu, 14 Jun 2012 11:02:02 +0200 Paolo Bonzini wrote:

Il 13/06/2012 23:29, Andrew Morton ha scritto:
> > If the application does not want to start I/O, it can simply call msync
> > with flags equal to MS_INVALIDATE. This one remains a no-op, as it should
> > be on a reasonable implementation.
>
> Means that people will find that their msync(MS_ASYNC) call will newly
> start IO. This may well be undesirable for some.

If they knew it was a no-op, and relying on it, they might as well not
have called it at all and save a syscall.



Nope. Back in the day (3+ years ago?), msync(MS_ASYNC) would flush
pte-dirty bits into page->flags:PG_Dirty. So it was used to make the
filesystem aware that userspace had modified some MAP_SHARED memory.
The fs would then perform writeback within (typically) 30 seconds. So
a legitimate use would be for the app to periodically run
msync(MS_ASYNC) to avoid having modified data floating about
un-written-back for arbitrarily long periods.

Nowadays we mark the pte's read-only and take a fault on first write,
and mark the page dirty at that time.

So it wasn't a no-op, and with this patch, such applications will newly
find themselves to be starting I/O within the msync() call.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
Help Create a new topicNext page Previous pageReplies Make a reply
Search Make your own search