Introduction

I photograph to see what the world looks like in photographs.

—Garry Winogrand (1928–1984)

Despite being a very important part of any operating system, file systems tend to get little attention. Linux has three editions for Linux Device Drivers, another three for Understanding the Linux Kernel and two for Linux Kernel Development. For the 2.4 networking stack there is Linux Networking Architecture by Klaus Wehrle et al. and for the memory subsystem there is Understanding the Linux Virtual Memory Manager by Mel Gorman. The aptly named UNIX Filesystems: Evolution, Design, and Implementation is only giving a general overview of how things work. Practical File System Design with the Be File System by Dominic Giampaolo is an an enjoyable read but, as the name indicates, it only deals with BeFS. The same is also true for HFS+ in the very thick but also very interested Mac OS X Internals: A Systems Approach by Amit Singh. I really hope that someday somebody will spend some time and put together a nice book or website in which file systems, new and old, are presented and analyzed in detail.

As the disclaimer from the front page says, I don't know as much as I want about file systems. I'm making progress in learning about them in the traditional way of playing and understanding the existing code. What I'm attempting in this project is to complement this by a visual approach in which the main purpose is to try to graphically depict some the ways the things go. The main observable thing I'm using is the external symbols used by kernel modules. There are two main reasons for doing this. First, many operating systems build their file systems as kernel modules. This is useful because it separates the part we are interested in from the rest of kernel. And second, because the modules need to be loaded dynamically, the functions they call and data they access from the kernel show up as external (unresolved) symbols in their binaries. These can be easily extracted using nm. One drawback of this approach is represented by the chains of calls like the ones in the figure from below. Luckily, in the Linux kernel many functions are explicitly marked as inline so this case might not occur so frequently. I haven't check explicitly for this yet to but it is something I would like to do.

Chain of calls problem. Module 1 and Module 2 are two kernel modules and f1 and f2 are two functions exported by the kernel. Even f1 are two f2 different calls they are in fact closely related to each other. This is not captured by the relations between the modules and the kernel and can only be detected by also looking the way things happen inside the kernel.

That being said, here is a quick overview of the next sections. The following two are the big ones. The first is a detail analysis of one particular Linux Kernel tree and the second is a shorter one done over a large number of file systems from Linux Kernel 2.6.0 to 2.6.29. After that there is a small section that shows some aspects of the BSD family. After conclusions there is an appendix consisting of three things: the first one explains how the file systems for Linux were compiled, the second one shows timelines for the releases of Linux Kernel, FreeBSD, NetBSD and OpenBSD; the last is a detailed map of the external symbols of the kernel modules analyzed in the second section.

On more thing. The figures are accompanied by captions which describe the plot and note some of the interesting things that are going on there. If you find any mistake please let me know and I'll try to fix it.

Happy reading/viewing!