If we pass a directory as an argument to a program, it may contain symbolic links to anywhere. Since processes may now have different namespaces, we have a choice of namespaces in which to resolve the destinations of the symbolic links. Do we resolve them in the user's namespace, or the process's namespace?
If we resolve symlinks in the user's namespace, and we allow the process to create symlinks to arbitrary destinations, it could create a symlink to `/' and thereby grant itself access to all of the user's filesystem. Instead, we could try to restrict the ability of a process to create symlinks, so that it can only create symlinks to files and directories that it already has access to. But since symlinks are interpreted relative to their position in the filesystem, which can change, it would be difficult to make this robust. Furthermore, the problem of pre-existing symlinks remains. A user should be able to tell what files and directories they're granting access to based on the command invocation. Granting access also to files and directories that are symlinked to, perhaps from deep inside a directory, violates this, because there is little constraint on the destinations of symlinks.
Resolving symlinks in the process's namespace makes more sense. It follows the normal semantics of symlinks under Unix, which is that symlinks are simply a convenience that *could* be implemented by the process itself rather than by the kernel.
Ultimately, the solution is to do away with symbolic links and replace them with object references.
If we are to implement these semantics, we must be careful not to use the kernel's ability to follow symlinks. There is not a straightforward option for turning off following symlinks in the underlying filesystem. When we give a pathname such as `a/b/c' to the kernel, if `a/b' is a symbolic link the kernel will always follow it, interpreting it in its namespace.
The approach used in the file server is to set the current working directory to each component of the pathname in turn. For each component, do:
lstat() on the leaf name. If it's a symlink, do readlink() and interpret the link.
Otherwise, if it's a directory, do open(leaf, O_NOFOLLOW | O_DIRECTORY). If O_NOFOLLOW or O_DIRECTORY are not supported, we can do fstat() to check that the object opened is the same as the one we lstat()'d (it may have changed between the system calls).
Do fchdir() to set the current directory to the directory.
Obviously this requires more system calls than allowing the kernel to resolve symlinks.
Note that the server must never send the clients FDs for directories. A client could use a directory FD to break out of its chroot jail.
The Unix kernel can be regarded as providing a set of capability registers (file descriptors) that can contain directory object references, along with a special capability register (the current working directory) relative to which pathnames are resolved. References can be copied from a normal register to the special register using fchdir(). References can be copied from the special register to the normal registers using open(".").
Unfortunately, this model falls down in two places:
Directories with `execute' but not `read' permission cannot be opened with open(). One can chdir() into them, but not fchdir() into them.
Arguably, Unix should let you open() such directories but not read their contents using the resulting FD.
This could be worked around, but no workaround is implemented yet.
link() is unusual in that it takes two pathname arguments. It is difficult to use safely (without the kernel following symlinks). We have no guarantee that the source file (or destination) is the one we intended to link. Any check will be vulnerable to race conditions.
The same applies to rename().
Under Plash, link() and rename() are only implemented for the same-directory case.