Bug#595063: read() builtin doesn't read integer value /proc files (but bash's does)

September 01st, 2010 - 04:20 am ET by Steve Schnepp | Report spam
Hi, I opened bug 595063 on the debian BTS [1] and I was suggested to
resend the email upstream.

So I copied the body of the bug below :

dash's read() builtin seems to read the underlying file 1 char at a
time. This doesn't work with some files under /proc, since procfs isn't
fully POSIX compliant.

With bash it works :

$ bash -c 'read MAX < /proc/sys/kernel/pid_max; echo $MAX'
32768

With dash it only reads the first character :

$ dash -c 'read MAX < /proc/sys/kernel/pid_max; echo $MAX'
3

If we use the cat(1) external program it works :

$ dash -c 'MAX=$(cat /proc/sys/kernel/pid_max); echo $MAX'
32768

After a little digging, it only appears on files that contains just an
integer value. When asked to read with a non-null offset (*ppos != 0),
__do_proc_dointvec() just returns 0 (meaning an EOF) as shown on [2].

I'm aware that the issue isn't strictly a dash one, since it has the
right to read one character at a time. But since fixing procfs to be
conforming to POSIX isn't a realistic option, would it be possible to
have a workaround that doesn't involve an external tool like cat(1) ?

[1] http://bugs.debian.org/cgi-bin/bugr...i?bugY5063
[2] http://lxr.linux.no/#linux+v2.6.32/...tl.c#L2371
Steve Schnepp
http://blog.pwkf.org/



To UNSUBSCRIBE, email to debian-bugs-dist-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
email Follow the discussionReplies 4 repliesReplies Make a reply

Similar topics

Replies

#1 Jilles Tjoelker
September 02nd, 2010 - 03:20 pm ET | Report spam
On Wed, Sep 01, 2010 at 10:10:11AM +0200, Steve Schnepp wrote:
Hi, I opened bug 595063 on the debian BTS [1] and I was suggested to
resend the email upstream.

So I copied the body of the bug below :

dash's read() builtin seems to read the underlying file 1 char at a
time. This doesn't work with some files under /proc, since procfs isn't
fully POSIX compliant.

[snip]

After a little digging, it only appears on files that contains just an
integer value. When asked to read with a non-null offset (*ppos != 0),
__do_proc_dointvec() just returns 0 (meaning an EOF) as shown on [2].

I'm aware that the issue isn't strictly a dash one, since it has the
right to read one character at a time. But since fixing procfs to be
conforming to POSIX isn't a realistic option, would it be possible to
have a workaround that doesn't involve an external tool like cat(1) ?



Given that other files in /proc do work, I don't see why the ones that
only contain an integer value cannot be fixed. All the necessary state
to produce the second and further bytes is available.

Choosing a powerful abstraction like a regular file has its
implications.

Note that a change in the file between the single-byte reads will cause
an inconsistent value to be read. This is also the case with regular
files on a filesystem, so it is acceptable.

If single-byte reads are really unacceptable, then the proper way to
read these files needs to be documented, and clear violations that will
not work properly should cause an error (in this case, this means that
reading one byte from offset 0 should fail like reading one byte from
offset 1 does).

Jilles Tjoelker



To UNSUBSCRIBE, email to
with a subject of "unsubscribe". Trouble? Contact
Replies Reply to this message
#2 Steve Schnepp
September 03rd, 2010 - 05:30 am ET | Report spam
2010/9/2 Jilles Tjoelker :

Thanks for your prompt reply.

Note that a change in the file between the single-byte reads will cause
an inconsistent value to be read. This is also the case with regular
files on a filesystem, so it is acceptable.



Are you implying that:
- if the procfs is made to support char per char reads, dash reading
an inconsistent value is actually a feature ?
- buffering should, therefore, always be explicit ?

On a side note, the whole procfs seems to be designed around one
unique page read if possible (1x 4K).
I think it does so in order to be able to vastly simplify its
usage/implementation by kernel modules.

If single-byte reads are really unacceptable, then the proper way to
read these files needs to be documented, and clear violations that will
not work properly should cause an error (in this case, this means that
reading one byte from offset 0 should fail like reading one byte from
offset 1 does).



+1 for "the proper way to read these files needs to be documented" and
I also think that emitting an error would be better than silently
returning erroneous data. [ EOVERFLOW is coming to my mind ]

Steve Schnepp
http://blog.pwkf.org/



To UNSUBSCRIBE, email to
with a subject of "unsubscribe". Trouble? Contact
Replies Reply to this message
#3 Jilles Tjoelker
September 03rd, 2010 - 05:30 pm ET | Report spam
On Thu, Sep 02, 2010 at 05:02:55PM +0200, Steve Schnepp wrote:
2010/9/1 Steve Schnepp :
> conforming to POSIX isn't a realistic option, would it be possible to
> have a workaround that doesn't involve an external tool like cat(1) ?

Hi, I just hacked & attached a little patch away to be able to solve
this case.
Feel free to reply with your comments.

NB: I just targeted dash-0.5.5.1, but it might apply to any version.



This patch assumes that the file descriptor is discarded afterwards (its
position does not matter). Therefore the very common construct
while read x; do
...
done
stops working.

A possible fix is to check first if the input supports seeking. If it
does, use the buffering and at the end of the line seek backwards for
the number of bytes remaining in the buffer. If it does not, read one
byte at a time.

Jilles Tjoelker



To UNSUBSCRIBE, email to
with a subject of "unsubscribe". Trouble? Contact
Replies Reply to this message
#4 Jilles Tjoelker
September 04th, 2010 - 03:40 pm ET | Report spam
On Sat, Sep 04, 2010 at 08:20:33PM +0200, Steve Schnepp wrote:
2010/9/3 Jilles Tjoelker :
> This patch assumes that the file descriptor is discarded afterwards (its
> position does not matter). Therefore the very common construct
>  while read x; do
>    ...
>  done
> stops working.

Ohh.. thanks for that, I didn't see it.

Actually "while read x" continues to work.
But "reopening the file" doesn't as in :

read a b < datafile
echo ${a} ${b}
read a b < datafile
echo ${a} ${b}



You're right, it's even stranger than I expected.

I attached an updated patch that corrects this pb by discarding the
buffer when opening a new file.



This discarding is still bad as it throws away valid data if the open
file description is shared. This happens if stdin is redirected inside a
while read... loop.

Furthermore, I think constructions like
read x; cat
and
read x; (read y); read z
should keep working. This requires that the input's file position be
synced whenever another process may see it (fork/exit). Due to the
highly dynamic character of the shell and the common use of fd 0, this
probably means that you can't do better than syncing after each read
builtin. (For example, 'read' could be overridden with a function after
the third line.)

Another thought:
exec 3<&0; read x; read y <&3
or even
sh -c 'read x; read y <&3' 3<&0
Different file descriptors may refer to the same open file description
and the shell may not know this.

Jilles Tjoelker



To UNSUBSCRIBE, email to
with a subject of "unsubscribe". Trouble? Contact
email Follow the discussion Replies Reply to this message
Help Create a new topicReplies Make a reply
Search Make your own search