Executable objects in Plash Introduction ============ Plash now extends the concept of executables -- which are anything that can be invoked via Unix's execve() call -- so that in addition to executable data files, you can have executable *objects*. In this case, execve() works by invoking the object via a method call. Executable objects can be attached to the filesystem tree and unmodified Unix programs can call them. Executable objects can be constructed from Unix programs as well. The executable objects feature allows for fine-grained control over how processes are constituted, in particular their file namespaces. This is similar to chroot() environments under Linux. chroot() also allows a process's root directory (its file namespace) to be changed. It can be used to run different Linux distributions on the same machine, change the libraries a program dynamically links with, etc. However, Linux has only limited, heavyweight mechanisms for creating file namespaces. Plash's mechanisms are lightweight, flexible, and not restricted to the superuser, and Plash can treat the files that a program receives as arguments separately from its library files and configuration files. Applying POLA to argument files and other files =============================================== We can divide the files that a process uses into two sets, Arg and Env. Arg is the set of files that are passed as parameters to the program. Env consists of the remaining files: libraries, configuration files, files that would be installed by a package manager -- files that the program is in some sense "linked with". Plash has always provided control over the Arg set, applying the Principle of Least Authority (POLA) to it. However, Plash has a default setting for the contents of the Env set. The use of executable objects lets you change that default on a per-program basis and apply POLA to all the files a program accesses. By default, Plash maps the system's "/usr", "/lib", "/bin" and "/etc" directories (as read-only) into the file namespace of processes that it starts, along with "/dev/null" and "/dev/tty" (as read-write) -- this is the default Env set. Any other files or directories are mapped into the file namespace if and only if they are listed on the command line -- this is the Arg set. In this default mode of operation, POLA is applied to files in the user's home directory, but not to system files. The programs you use do not have to be declared in advance. This way, Plash can be used almost as a drop-in replacement for non-POLA shells like Bash. You can run Unix programs using command lines that are not *too* different from their equivalents under Bash, because Plash's default Env set covers most of the program's actual Env set. We can do a bit better if we are prepared to declare a program before using it, in order to provide some information about the program that is not provided in a Unix installation. Plash lets you create an executable object and bind it to a variable, specifying the Env portion of the program's file namespace. Given this control, you can include files that are not in Plash's default (such as configuration files in your home directory) or leave out files that are in Plash's default -- this helps get back the convenience of a non-POLA shell such as Bash while providing better security. Perhaps more importantly, you can control not only *whether* a filename is mapped in the namespace, but *what* file it maps to -- this provides something you couldn't do before. It is possible to install two Linux distributions on one computer, and run one inside the other in a chroot() environment. However, the interoperability between these two sets of programs is very limited. Linux doesn't normally provide a fine-grained mechanism for granting a program access to files outside its chroot() environment, and the mechanisms for creating chroot() environments are limited: you can hard link files (but not directories, and not across partitions), and you can use "mount --bind" on directories (but not individual files). Furthermore, the chroot() call is only available to the superuser. It's difficult enough to use this for a couple of installed distributions; to do it on a per-program basis is totally impractical. In contrast, Plash provides lightweight mechanisms for creating file namespaces (which are simply directories, although they do not have to be stored on a Linux filesystem). Executable objects can be self-contained and provide their own execution environment, which allows for better interoperability between programs: a process can invoke an executable object which uses a different file namespace (root directory) to the caller for files in its Env set, yet the executable object can receive its Arg set from the caller. Invocations between programs ============================ This document mainly focuses on applying POLA when the user invokes an executable using the shell. It doesn't give much attention to the cases in which one program invokes an executable using execve(): in this case, we desire that the caller apply POLA and not pass too much authority on to the callee, and we desire that the callee not be confusable. If the caller *doesn't* apply POLA and the callee *is* confusable -- which will be true if they are unmodified Unix programs -- and if the two have Env sets that clash -- that is, the same filename maps to different files in each -- then we have some basic workability problems, not just security problems. I hope to discuss these problems, and some solutions, in a forthcoming document. Examples ======== I'll look at creating an executable object for the Unix command line program `oggenc', which encodes WAV files as Ogg Vorbis files [Ogg Vorbis is like the MP3 format, but a bit smaller and free of patent problems]. To invoke `oggenc' with Plash you might do: oggenc input_file.wav => -o output_file.ogg (1) In this case, the resulting process's file namespace will contain: * /usr/bin/oggenc (read-only) * /usr, /lib, /bin, /etc (read-only) * /dev/null (read-write) * under the pathname of the current working directory: input_file.wav (read-only), output_file.ogg (read-write slot) * /dev/tty (read-write) However, it happens to be that `oggenc' doesn't need to access "/etc" or all of "/usr". We could define an executable object for running `oggenc' that gives the program an execution environment containing less: def my_oggenc = capcmd exec-object '/usr/bin/oggenc' /x=(mkfs /usr/bin/oggenc /usr/lib /lib) [This needs to be entered on one line when using the shell interactively. Alternatively, you can put it in a file and load it with "source " -- each command or declaration must be terminated with ';'.] This will create an executable object and bind it to the variable "my_oggenc". To invoke the object, we use the same syntax as before: my_oggenc input_file.wav => -o output_file.ogg (2) In this case, the the resulting process's file namespace will contain: * /usr/bin/oggenc (read-only) * /usr/lib, /lib (read-only) * under the pathname of the current working directory: input_file.wav (read-only), output_file.ogg (read-write slot) * /dev/tty (read-write) [actually, not included in current version] While in (1), "oggenc" is treated as a filename and searched for in PATH, in (2), "my_oggenc" is recognised by the shell as a bound variable. The shell doesn't start a new process in this case, it just invokes the executable object that "my_oggenc" is bound to. The shell creates a namespace from the arguments, which it passes to "my_oggenc", but it doesn't include "/usr", "/lib", "/bin" and "/etc" as before -- the "my_oggenc" is expected to provide the files it needs itself. Suppose we don't want to install "oggenc" and the libraries it uses in our system's "/usr" directory. Maybe we don't have access to that directory, because we don't have root access. Maybe we have older versions of those libraries in "/usr" which some other program uses, and we don't want to risk messing that program up by upgrading its libraries. Maybe we just want to organise our files differently from usual. Perhaps we are running RedHat, but a Debian distribution is installed under "/debian", and we want to use Debian's version of `oggenc'. def my_oggenc = capcmd exec-object '/usr/bin/oggenc' /x=(mkfs /usr/bin/oggenc=(F /debian/usr/bin/oggenc) /usr/lib=(F /debian/usr/lib) /lib=(F /debian/lib)) [NB. This requires that Plash is installed in the Debian distribution as well, so that libc.so will still be taken from /usr/lib/plash/lib rather than /lib.] These declarations still give `oggenc' a lot of files it doesn't need. We could give a tighter definition that lists exactly those files that `oggenc' needs in its execution environment. `oggenc' is fairly simple: it doesn't use a huge number of dynamically-linked libraries, and it doesn't need any configuration files. Under Linux, we can find out the dynamic libraries that an executable file uses with the "ldd" command: bash$ ldd /usr/bin/oggenc libvorbisenc.so.0 => /usr/lib/libvorbisenc.so.0 (0x40028000) libvorbis.so.0 => /usr/lib/libvorbis.so.0 (0x4009c000) libm.so.6 => /lib/i686/libm.so.6 (0x400bb000) libogg.so.0 => /usr/lib/libogg.so.0 (0x400dd000) libc.so.6 => /lib/i686/libc.so.6 (0x42000000) /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000) [Run this under Plash using "!!ldd /usr/bin/oggenc".] Given this information, we can make a new definition: def my_oggenc = capcmd exec-object '/usr/bin/oggenc' /x=(mkfs /usr/bin/oggenc /usr/lib/libvorbisenc.so.0 /usr/lib/libvorbis.so.0 /lib/i686/libm.so.6 /usr/lib/libogg.so.0 /usr/lib/plash/lib/libc.so.6) [Future work will be to provide tools to help with constructing a definition like this.] ("/lib/ld-linux.so.2" is the dynamic linker and doesn't need to be included.) Suppose we want another program to be able to invoke `my_oggenc'. We can attach the object into a filesystem with a syntax like this: bash + /my-bin/oggenc=my_oggenc [NB. I don't use `/bin/oggenc=my_oggenc' because it's not yet possible to attach objects inside other attached directories, such as `/bin/oggenc' inside `/bin', which is attached implicitly.] This runs Bash with the pathname `/my-bin/oggenc' mapped to `my_oggenc'. You can then run `my_oggenc' from inside Bash. This is a good way in general to test out the file namespaces that Plash creates. New features in more detail =========================== The addition of executable objects to Plash involves the following new features: * a method call ("Exeo") for invoking an executable object, and another method ("Exep") for testing whether an object is an executable; * changes to libc so that execve() will use executable objects; * new syntax in the shell: * a declaration for binding references to variables; * an expression for starting a process and getting the reference that it returns; * an expression for creating a directory object; * new semantics for the case in which a command is invoked through a variable rather than a filename; * a program, "exec-object", for constructing executable objects. Also, note that this is the first major use of Plash's object-capability protocol. The "Exeo" method call has the following arguments: * Argv: an array of strings representing argv * Env: an array of strings representing the environment * Fds: an array of (i, fd) pairs, giving the file descriptor for FD number i * Root: an object representing the root directory * Cwd: a string representing the current working directory * Pgid: an integer representing the process group ID Let's break down the new features used in the examples above: * Shell syntax: declaration: def VAR = EXPR This evaluates expression EXPR and binds the resulting object reference to variable VAR, unless evaluating EXPR resulted in an error. * Shell syntax: expression: capcmd COMMAND ARGS... This built-in expression is similar to a normal command invocation, except that it expects the resulting process to return an object reference as a result. The shell passes the process a return continuation argument ("return_cont"; see the PLASH_CAPS environment variable), which the process invokes with the result. This expression doesn't wait for the process to exit: the process will typically act as a server and stay running in the background to handle invocations of the object that it returned. If the process drops the return continuation without invoking it (which will happen if it exits without passing the reference on), the expression results in an error. * Executable program: exec-object FILENAME ROOT-DIR This is a command that is only useful when called using a "capcmd" expression. It returns an executable object E. When E is invoked with a root directory DIR2 as an argument, it forks a new process whose root directory is set to be a union of ROOT-DIR and DIR2, and that new process execve()s FILENAME. If you want different behaviour, you can modify exec-object; it is not built into the shell. * Shell syntax: expression: mkfs ARGS... This expression returns a fabricated directory object containing the files listed in ARGS. The object resides in a server process started by the shell. ARGS is processed in the same way as argument lists to commands, so read-only access will be given for files that are listed unless "=>" is used, and objects can be attached at points in the directory tree using "PATH=EXPR". Putting these together: * capcmd exec-object '/bin/blah' /arg=(mkfs /bin/blah /etc/blah.conf /usr) This is a shell expression which will return an executable object E. When E is invoked, it will run /bin/blah with a file namespace containing /etc/blah.conf, the /usr directory, and whatever files are in the root directory that E was passed as an argument. If you wanted to replace the /etc/blah.conf that the program sees with another file, say /my/blah-1.conf, you could do: capcmd exec-object '/bin/blah' /arg=(mkfs /bin/blah /etc/blah.conf=(F /my/blah-1.conf) /usr) Notes ===== The process replacement behaviour --------------------------------- Normally, execve() replaces the current process. Method calls don't and can't have that behaviour: the callee does not even have to start a new process. The modified libc is responsible for emulating the process replacement behaviour. execve() (and the other functions in the `exec' family which use it) will test whether the filename it is given resolves to an executable object or a regular file. This test uses the "Exep" method. Note that this is different to the shell: the shell chooses its behaviour according to whether the command name is a bound variable or not. If execve() is given an executable object, it invokes it (passing the root directory, file descriptors, etc.). When the method call returns, this means the new process has exited; it gives the exit code. libc's execve() wait for the method call to return, and then exits, using the same exit code. Plash does not modify libc's wait() and waitpid(). This is slightly unsatisfactory in three respects: * It doesn't let P return the correct wait() status code to its parent when the process created by X dies with an unhandled signal (such as SIGSEGV). * It doesn't let P notify its parent when the new process is stopped (by SIGSTOP or when the user presses Ctrl-Z). * kill() doesn't work as expected: it sends a signal to the process that is waiting, not the process it spawned. * There is an extra process hanging around, filling up the process table and taking up memory (and holding onto open file descriptors -- though this could be fixed) but not doing much else. The solution to this would be to modify wait() and waitpid(). This would not be too bad because they can only be used on child processes. Modifying kill() as well would be trickier and less desirable, because it involves a global namespace of process IDs, and we would like to avoid global namespaces. Discovering file descriptors ---------------------------- libc's execve() finds out which file descriptor indexes a process has open simply by trying to dup() each index in turn, upto a high index number. If your program uses FDs with big FD numbers (eg. >1000), this may cause problems. Although the Linux `proc' filesystem can be used to find out what file descriptors a process has open, this is not available in the Linux chroot() environment Plash uses to run programs in, and there's no way to use it securely. Garbage collection ------------------ exec-object will exit when the reference to the object it provides is dropped, and it has no more processes to handle. Limitations =========== Linux, job control, and TTY file descriptors -------------------------------------------- File descriptors for TTYs under Unix do not behave like capabilities in the sense that the kernel takes a process's "process group" into account when the process does IO on a TTY file descriptor. This is part of the Unix job control mechanism. A process will be stopped (with SIGTTIN) if it tries to read from a TTY when it is not part of the TTY's current process group. I don't think this is a good design. So far, however, it has not a problem because the processes started by `exec-object' can simply set their process group ID to the one specified in the "exec" invocation. That lets them read input from the terminal. However, processes also have a "session ID". Typically, the processes running under a given terminal window run in their own distinct session. A process cannot set its process group ID to a process group that belongs to a different session. So if an exec-object instance E, started from one terminal window W1, is invoked by a process in another terminal window W2, E won't be able to start a process P that can read input from the user in W2, even if P has the appropriate TTY file descriptor. This may be a problem in the future. I can see two ways around this: * Just arrange for all the relevant processes to be running under the same session ID. This would only work if we're not using existing terminal emulators (xterm, gnome-terminal, etc.). It might not work at all. * Virtualise IO on file descriptors to use method calls on objects instead. There would be a lot of libc functions to modify in order to do this properly, but this has other uses. Job control ----------- You can start a process via the shell using an object invocation, and you can stop the process by pressing Ctrl-Z, but the shell is not informed that the process has been stopped, so the shell will not return control to the user and display a prompt. This needs to be fixed. It is a deficiency in the specification of the "Exeo" method call. exec-object ----------- exec-object does not set the current working directory of the processes it starts. exec-object doesn't provide any control over the arguments and environment variables it passes to the processes it starts. exec-object doesn't start its child processes with a different UID, so the child process could kill it, ptrace() it, etc. (exec-object should use "run-as-anonymous" like the shell does.) Shell limitations ----------------- The shell does not provide a mechanism for sharing object references with other instances of the shell, with other users, or across the network. The shell does not allow for recursive definitions using "def". The shell only supports "capcmd CMD ARGS..." where CMD is an executable file, but not where CMD is an executable object, and it doesn't support running CMD in the standard Unix way (as the `!!' syntax does). A "capcmd !! CMD ARGS..." expression would allow the use of existing setuid executables from programs running under Plash. A "capcmd VAR ARGS..." expression would make it possible to have a single process provide multiple executable objects, ie: def factory_maker = capcmd factory-maker-maker def echo = capcmd factory_maker '/bin/echo' ... def ls = capcmd factory_maker '/bin/ls' ...