Directory file descriptors

Plash supports open() on directories. It supports the use of fchdir() and close() on the resulting directory file descriptor. However, it doesn't support dup() on directory FDs, and execve() won't preserve them.

Directory file descriptors require special handling. Under Plash, when open() is called on a file, it will return a real, kernel-level file descriptor for a file. The file server passes the client this file descriptor across a socket. But it's not safe to do this with kernel-level directory file descriptors, because if the client obtained one of these it could use it to break out of its chroot jail (using the kernel-level fchdir system call).

A complete solution would be to virtualize file descriptors fully, so that every libc call involving file descriptors is intercepted and replaced. This would be a lot of work, because there are quite a few FD-related calls. It raises some tricky questions, such as what bits of code use real kernel FDs and which use virtualised FDs. It might impact performance. And it's potentially dangerous: if the changes to libc failed to replace one FD-related call, it could lead to the wrong file descriptors being used in some operation, because in this case a virtual FD number would be treated as a real, kernel FD number. (There is no similar danger with virtualising the system calls that use the file namespace, because the use of chroot() means that the process's kernel file namespace is almost entirely empty.)

However, a complete solution is complete overkill. There are probably no programs that pass a directory file descriptor to select(), and no programs that expect to keep a directory file descriptor across a call to execve() or in the child process after fork().

So I have adopted a partial solution to virtualising file descriptors. When open() needs to return a virtualized file descriptor -- in this case, for a directory -- the server returns two parts to the client: it returns the real, kernel-level file descriptor that it gets from opening /dev/null (a "dummy" file descriptor), and it returns a reference to a dir_stack object (representing the directory).

Plash's libc open() function returns the kernel-level /dev/null file descriptor to the client program, but it stores the dir_stack object in a table maintained by libc. Plash's fchdir() function in libc consults this table; it can only work if there is an entry for the given file descriptor number in the table.

Creating a "dummy" kernel-level file descriptor ensures that the file descriptor number stays allocated from the kernel's point of view. It provides a FD that can be used in any context where an FD can be used, without -- as far as I know -- any harmful effects. The client program will get a more appropriate error than EBADF if it passes the file descriptor to functions which aren't useful for directory file descriptors, such as select() or write().