Anti-Aliasing

Last week we learned algorithms for scan conversion of lines and polygons. The results of assignment 2 (and 1) show that edges do not look like a smooth straight line but have a staircase-like appearance -- more so for certain orientations of the edges. The primary reason for this is aliasing. Before we understand that, we need some signal processing theory.

Fourier transform

We define our image as a two dimensional function -- for each pair (x,y), we have an intensity value f(x,y). In fact for interactively changing images, we can define a three dimensional function: each image is associated with a certain time, i.e. we have the intensity function f(x,y,t). We may also consider three different intensity functions for the red, green and blue channels respectively. To make understanding easier, we will consider a one dimensional function. Think of it as the gray level of one scan line of a single snap-shot image. (In practice, the dimensions are orthogonal, and we can easily extend our analysis to higher dimensions.)

We often refer to the intensity function, f(x), as signal in the spatial domain, i.e. for each position in space (along the x axis) f has a value. These signals have a dual form -- in the so called frequency domain. Instead of representing the intensity at each position, we represent the amplitude of rate of change of the signal or the frequency. Thus for a given frequency value (number of cycles per pixel, or more appropriately cycles per inter-pixel distance) we ask how "much" of the given signal has that frequency. For example, F(0), where F is the frequency domain representation of f, is the amplitude of DC (meaning direct current, to borrow a term from signal theory) component of the signal -- i.e. the signal's average value.

Here are the functions to transform a function from the spatial to the frequency domain (also knows as Fourier transform).

There exists a duality between the two domains, i.e. for each operation in one domain there exists an equivalent operation in the other domain. The pair we are most interested in is the multiplication-convolution pair. Let us denote multiplication by · and convolution by ¤. Also, denote a spatial domain function by a lower case letter and its Fourier transform by upper-casing the letter. Also denote the transform by <->. Thus:

f(x) <-> F(u)
f(x) · g(x) <-> F(u) ¤ G(u)
Convolution is defined as follows:
a(t) ¤ b(t) = Integral (a(s) b(t-s) ds)
Intuitively f ¤ g it is the weighted average of f, (weighted by g). You can visualize convolution as follows. To compute f ¤ g at x0, flip f about its center (we say f is centered at x=0) and translate it to center it at x0. Call the resulting signal f2.   f ¤ g is the area under the curve f2 · g. Convolution is a commutative operation (clearly: since multiplication is commutative).

Sampling and reconstruction

The image generation process involves the pixelization of a continuous image. Instead of sending an analog signal to the video output, we send just the samples taken at discrete grid coordinates -- we sample the image. Sampling f is mathematically equivalent to multiplying f by the comb function shown below:

Theory tells us that if a signal has a maximum frequency of µ we must sample it at a frequency of 2µ or it would be impossible to reconstruct the original signal. 2µ is also called the Nyquist frequency. We will see some evidence of this shortly.

If we sample at a frequency less than the Nyquist rate, the samples can be used to reconstruct another signal of lower frequency. This phenomenon of a high frequency signal behaving as a low frequency signal (see figure below) is called aliasing.

To see how we may recover f from f · comb let us concentrate on the frequency domain:

f · comb (call it fc) <-> F ¤ COMB (call it FC).
F ¤ COMB is just F replicated at the intervals of teeth in COMB (can you see why). If these replica do not overlap, and they do not if we sample at the Nyquist rate, we may be able to reconstruct F(u) by multiplying FC(u) by PULSE(u) (also called box) as shown below.

The multiplication with the PULSE (This process is also called filtering.) gives F(u) and by inverse Fourier transformation we get back the original signal. Another way of getting back f is to convolve fc with the sinc function, since

sinc(x) = sin(x)/(x) <-> PULSE(u)

The image we finally see is a filtered version. CRT reconstructs a continuous image by a crude interpolation (by the "set voltage and hold" electronics) followed by a Gaussian blur (of the flourescent spot). This roughly corresponds to filtering by a box filter followed by a Gaussian filter.

Band limiting

In order to eliminate aliasing, or to do anti-aliasing, we must sample at the Nyquist rate. Unfortunately, it is quite common for the frequency spectrum of a signal to be unbounded. In other words, the Fourier transform, F(u), of many common signals have a non-zero value at arbitrarily high values of u. Thus we would need to pick an arbitrarily large number of samples per pixel. However, most information in an image often lies in the low frequency range. (This is more true of scan-converted primitives than of true images, e.g. textures.) Human acuity at high frequencies deteriorates as well. Thus one solution that is often acceptable is to eliminate the high frequencies of the signal, i.e. limit its frequency band. We thence change the original signal (and hope that the change is not too drastic.)

To band limit a signal we can do something quite similar to filtering. In effect, we compute the Fourier transform of a given signal f, filter out the frequencies greater than 1 cycle per/pixel and reconstruct a new signal f'. If we now sample f' once a pixel, it can later be exactly reconstructed with no aliasing. In reality, we do not need to explicitly reconstruct f', since later it would only be sampled once a pixel. We just compute the value of f' at each pixel position.

How do we compute f'(x) for a given pixel number x? Remember, in order to filter out the replicas we multiplied the frequency domain signal with a PULSE. Similarly, in order to truncate the higher frequencies we can multiply the frequency domain signal with a PULSE. With a narrower PULSE, larger range of frequencies is discarded. With a wider PULSE, more of the higher frequency content is retained. One way to compute f' is to find F, the Fourier transform of f and then discard the frequencies outside the PULSE and then compute the inverse Fourier transform of the resulting signal. Or we could just find the value of f ¤ sinc at x. (We call this operation sinc-filtering the signal.) Unfortunately, both these methods require continuous integration of functions that may extend up to infinity. Hence we

  1. Approximate the continuous integration by using a discrete approximation
  2. Convolve with a filter that does not extend to infinity, but has a small range of spatial domain where it has a non-zero value (i.e. it has a narrow support). Such filters are also called finite impulse response (as opposed to infinite impulse response) filters.

Let us look at some other filters and see how close their frequency domain representation are to PULSE. A commonly used filter is the box-filter or pulse in the spatial domain. We know what its Fourier transform looks like -- it's SINC. We could use a truncated sync, however it's Fourier transform exhibits ringing (the Gibbs' phenomenon) shown on the right. A Gaussian filter works well, but it attenuates many lower (read: important) frequencies as well. A triangle filter (or Bartlett filter) transforms to SINC2, a SINC like function, but with smaller magnitude at high frequencies.

Practicum

In practice, we do not use continuous convolution but sample the filter. (What does that do to our reconstruction? Note that by sampling a filter, say a sinc, we are replicating PULSE in the frequency domain. Thus we retain some of the higher frequencies as well when we multiply that with F.) This sampling is an offline process and we precompute a table of filter-values evaluated at a number of different x positions. This table is sometimes referred to as the filter kernel.

For two dimensional signals (images), we extend the filter to two dimensions. We store the values at a number of (x,y) positions. Typically we sample the filter function uniformly on a grid more dense the pixel grid. This is also called sub-sampling. At drawing time, we evaluate the image intensity at each sub-sample position and take a weighted average of these samples (weighted by the corresponding entry in the filter table). Some examples of 2D-filter kernels are given below:

Box Filter Bartlett Filter Gaussian Filter
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1234321
2468642
36912963
4812161284
36912963
2468642
1234321
14810841
412252925124
825495849258
1095867582910
825495849258
412252925124
14810841