[RFC][PATCH 0/3] fork: Add the ability to create tasks with given pids

November 17th, 2011 - 06:50 am ET by Pavel Emelyanov | Report spam
Gentlemen, please, find some time for this, your ACK/NACK on the API proposal
is required badly.


There's currently a work in progress with checkpoint-restore functionality
in the userspace. Most of the API for doing this kernel already provides, but
sometimes it's not enough. One of the required things is the ability to
create a process with its pids (in different pid namespaces) set to some
given values, rather than generated. Currently kernel doesn't allow for this,
so an API extension is required.

The proposal is to introduce the CLONE_CHILD_USEPIDS flag for clone() syscall
and pass the pids values in the child_tidptr. In order not to introduce the
hole for the pid-reuse attack, using this flag will result in EPERM in case
the pid namespace we're trying to create pid in has at least one pid (except
for the init's one) generated with regular fork()/clone().

Currently Tejun and Oleg are worrying only about the intrusiveness of this
approach, although Oleg agrees, that it solves all the problems it should. The
previous attempts to implement the similar stuff stopped, but no objections
against this were expressed. So the decision of whether it's OK to go this
way or not is required.


The API will be used like in the code below

/* restore new pid namespace with an init in it */
pid = clone(CLONE_NEWPID);
if (pid)
return 0;

/*
* init of a new pid namespace.
* recreate the process tree
*/

restore_children:
while (1) {
pid = next_pid_from_image();
if (!pid)
/* no more children */
break;

pid = clone(CLONE_CHILD_USEPIDS, &pid);
if (pid == 0)
goto restore_children;
}

/*
* the process tree is recreated, can proceed with restoring
* other stuff
*/


Thanks,
Pavel
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
email Follow the discussionReplies 36 repliesReplies Make a reply

Similar topics

Replies

#6 Oleg Nesterov
November 17th, 2011 - 12:10 pm ET | Report spam
On 11/17, Pavel Emelyanov wrote:

On 11/17/2011 07:32 PM, Oleg Nesterov wrote:
> On 11/17, Pavel Emelyanov wrote:
>>
>> +static int set_pidmap(struct pid_namespace *pid_ns, int pid)
>> +{
>> + int offset;
>> + struct pidmap *map;
>> +
>> + /*
>> + * When creating a new pid namespace we must make its init
>> + * have pid == 1 in it.
>> + */
>> + if (pid_ns->child_reaper == NULL)
>> + return 0;
>
> Do we really need this check? please see below...
>
>> + /*
>> + * Don't allow to create a task with a pid which has recently
>> + * belonged to some other (dead already) task. Only init (of
>> + * a freshly created namespace) and his clones can do this.
>> + */
>> + if (pid_ns->last_pid != 1)
>> + return -EPERM;
>
> ->last_pid == 1. This means that pid_nr == 1 was already created
> in this namespace via CLONE_NEWPID, and the child with this pid
> must be ->child_reaper, no?

If you use the CLONE_NEWPID | CLONE_CHILD_USEPIDS



Ah wait... I misread the check above as if it returns the error if
->child_reaper == NULL.

So, CLONE_NEWPID simply ignores child_tidptr[0], alloc_pid()
fallbacks to alloc_pidmap() after set_pidmap() returns 0.

> Cough. I really think 45a68628 should be reverted ;) IMHO it
> complicates the understanding of CLONE_NEWPID logic.

If we remove it, then it's OK to remove the check above, but in this case we
make it possible to have an init with pid != 1. This is flexible, but ... strange.



No, I didn't mean we should allow ->child_reaper with pid != 1, sorry
for confusion.

Oleg.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
#7 Linus Torvalds
November 17th, 2011 - 12:30 pm ET | Report spam
Ok. Patches 1-2 look sane on their own. I think we can merge them
regardless in the next cycle if Oleg & co agree. Oleg?

Patch 3 obviously is generating discussion and actually introduces new
functionality, so this one is the contentious one..

Linus

On Thu, Nov 17, 2011 at 9:43 AM, Pavel Emelyanov wrote:

When restoring a task (or a set of tasks) we need to recreate them
with exactly the same pid(s) as they had before. [...]


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
#8 Oleg Nesterov
November 17th, 2011 - 01:50 pm ET | Report spam
My previous objection was wrong, I should try to find something else ;)

On 11/17, Pavel Emelyanov wrote:

+static int set_pidmap(struct pid_namespace *pid_ns, int pid)
+{
+ int offset;
+ struct pidmap *map;
+
+ /*
+ * When creating a new pid namespace we must make its init
+ * have pid == 1 in it.
+ */
+ if (pid_ns->child_reaper == NULL)
+ return 0;
+
+ /*
+ * Don't allow to create a task with a pid which has recently
+ * belonged to some other (dead already) task. Only init (of
+ * a freshly created namespace) and his clones can do this.
+ */
+ if (pid_ns->last_pid != 1)
+ return -EPERM;
+
+ map = &pid_ns->pidmap[pid/BITS_PER_PAGE];



probably we should check that map < PIDMAP_ENTRIES ?

Oleg.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
#9 Oleg Nesterov
November 17th, 2011 - 02:10 pm ET | Report spam
On 11/17, Linus Torvalds wrote:

Ok. Patches 1-2 look sane on their own. I think we can merge them
regardless in the next cycle if Oleg & co agree. Oleg?



I am not sure 1/3 really makes sense without 3/3... although it
cleanups the "goto out" logic in alloc_pid().

Anyway I agree, the patches look fine.

Patch 3 obviously is generating discussion and actually introduces new
functionality, so this one is the contentious one..

Linus

On Thu, Nov 17, 2011 at 9:43 AM, Pavel Emelyanov wrote:
>
> When restoring a task (or a set of tasks) we need to recreate them
> with exactly the same pid(s) as they had before. [...]



Yes.

Just in case, I believe technically 3/3 is correct too, modulo the
small problem with the unchecked access to ->pidmap[] (unless I missed
something again).

I am not sure about pid_max... probably we do not care in this case?

Oleg.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
#10 Pavel Emelyanov
November 18th, 2011 - 05:10 am ET | Report spam
On 11/17/2011 10:36 PM, Oleg Nesterov wrote:
My previous objection was wrong, I should try to find something else ;)

On 11/17, Pavel Emelyanov wrote:

+static int set_pidmap(struct pid_namespace *pid_ns, int pid)
+{
+ int offset;
+ struct pidmap *map;
+
+ /*
+ * When creating a new pid namespace we must make its init
+ * have pid == 1 in it.
+ */
+ if (pid_ns->child_reaper == NULL)
+ return 0;
+
+ /*
+ * Don't allow to create a task with a pid which has recently
+ * belonged to some other (dead already) task. Only init (of
+ * a freshly created namespace) and his clones can do this.
+ */
+ if (pid_ns->last_pid != 1)
+ return -EPERM;
+
+ map = &pid_ns->pidmap[pid/BITS_PER_PAGE];



probably we should check that map < PIDMAP_ENTRIES ?



Yes, you're right here. The checks for given pid being correct are required. Will fix.

Oleg.



Thanks,
Pavel
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
Help Create a new topicNext page Previous pageReplies Make a reply
Search Make your own search