Computer Vision
Fall 2000

Sparse Chapter Outline of Introductory Techniques for Computer Vision

Chapter 6 - Camera Calibration

Extrinsic Parameters

Definition -    the parameters that define the location and orientation of the camera reference frame with respect to a known world reference frame.

• R, the 3 x 3 rotation matrix
• T, the 3d translation vector

Intrinsic Parameters

Definition -    the parameters that are needed to link the pixel coordinates of an image point with the corresponding coordinates in the camera reference frame.

• fx = f/sx, length in effective horizontal pixel size units.
• å = sy/sx, aspect ratio
• (ox,oy), image center coordinates

Chapter 7 - Stereopsis

Introduction

Definition -    Stereo vision refers to the ability to infer information on the 3d structure and distance of a scene from two or more images taken from different viewpoints.

Correspondence Problem -    Which parts of the left and right images are projections of the same scene element?
Assumptions

• Most scene points are visible from both viewpoints
• Corresponding image regions are similar

Correlation based correspondence algorithms attempt to match elements' image windows of fixed size. The criterion for measuring is a measure of the correlation between these two windows; one might you cross-correlation, or sum of squared differences, for example. SSD is less biased by the presence of very small or large intensity values.
Feature based method restrict the search for correspondences to a sparse set of features. Instead of window, numerical and symbolic properties of features are used from feature descriptors. Most methods narrow the number of possible features with thich to match by constraints: geometric constraints, analytical constraints.

Reconstruction Problem -    Given a number of corresponding parts of the left and right image, and possibly information on the geometry of the stereo system, what can we say about the 3d location and structure of the observed objects?

Triangulation -    The way in which stereo determines the position in space of corresponding points in pairs of images.

Baseline -    The distance between the centers of projection

Disparity -    The difference in retinal position betwen the corresponding points in two images. Disparity is inversely propertional to the depth of the point in space.

Intrinsic Stereo Parameters -    Characterize the transformation mapping an image point from camera to pixel coordinates in each camera.

Extrinsic Stereo Parameters -    Describe the relative position and orientation of the two cameras.

Epipolar Geometry

Definition -    The geometry of stereo. Each point in the left image is restricted to lie on a given line in the right image, the epipolar line--and vice versa. This is called the epipolar constraint.

Epipoles -    The point at which the line through the centers of projection of each image intersects the image planes. The left epipole is the image of the center of projection of the right camera and vice versa.

Essential Matrix E -    Establishes a natural link between the epipolar constraint and the extrinsic parameters of the stereo system. Extrinsic parameters can be retrieved via E In sum, Eis the mapping between points and epipolar lines we were looking for.
Satisfies the equation: prTEpl = 0   where p is in camera coordinates
Properties

1. encodes information on the extrinsic parameters only
2. has rank 2
3. its two nonzero singular values are equal

Fundamental Matrix F -    Establishes a link between the epipolar constraint and the extrinsic parameters of the stereo system. The difference from the Essential Matrix is that F is defined in terms of pixel coordinates, while E is defined in terms of camera coordinates.
Satisfies the equation: prTFpl = 0   where p is in pixel coordinates
Properties

1. encodes information on both the intrinsic and extrinsic parameters
2. has rank 2
NOTE: Relationship between E and F is F = Mr-T E Ml-1 where M are the matrices of the left and right intrinsic parameters.

Rectification F -    Given a stereo pair of images, rectification determines a transformation of each image such that pairs of conjugate epipolar lines become collinear and parallel to one of the images axes, usually the horizontal one. Why? Because, then the correspondence problem is reduced to 1d from 2d.

3d Reconstruction

The amount of 3d Reconstruction possible depends on the amount of a priori knowledge available on the parameters of the stereo system.

1. Both Intrinsic and Extrinsic parameters are known --> you can solve the reconstruction unambiguously by triangulation.
2. If only the intrinsic parameters are known --> you can solve the problem, and estimate the extrinsic parameters up to an unknown scaling factor. Why? Because we do not know the baseline of the system and therefore cannot reconstruct its actual depth.
3. If no intrinsic or extrinsic and only the pixel correspondences are known, you can still obtain a reconstruction of the environment, but only up to an unknown, global projective transformation

Chapter 8 - Motion

Image Sequence -    A series of N images, or frames, acquired at discrete time instants tk = t0 + k »t, where »t is a fixed time interval, and k = 0, 1, ... , N-1
We must assume that illumination conditions do not vary; in this case, then, image changes are said to be solely caused by relative motion between camera and scene.
Visual motion allows us to compute useful properties of the observed 3d world with very little knowledge about it--It is possible to compute the time, þ, taken by a vertical bar perpendicular to the optical axis to reach the camera only from image information; without knowing either the real size of the bar or its velocity.

Three Subproblems of Motion
• Correspondence - Which elements of a frame correspond to which elements in the next frame of the sequence? This is different than stereo correspondence because image sequences are sampled temporally at very high rates, and the spatial disparities between consecutive frames are, on average, much smaller in motion than in typical stereo pairs. Correspondence can easily be enhanced by exploiting the temporal aspect in motion sequences and employing tracking techniques.
The Correspondence problem can also be cast as the problem of estimating the apparent motion of the image brightness pattern, usually called the optical flow
• Two strategies for solving the correspondence problem:
• Differential Methods lead to dense measures; that is, computed at each image pixel. They use estimates of time derivatives, and therefore require the images to be sampled closely
• Matching Method lead to sparse measures; that is, computed only at a subset of image points.
• Reconstruction - Given a number of corresponding elements, and possible knowledge of the camera's intrinsic parameters, what can we say about the 3d motion and structure of the observed world? Unlike stereo, in motion the relative 3d displacement between the viewing camera and the scene is not necessarily cause by a single 3d rigid transformation.
• Segmentation - What are the regions of the image plane which correspond to different moving objects?

The Motion Field

Definition -    The motion field is the 2d vector field of velocities of the image points, induced by the relative motion between the viewing camera and the observed scene. It can also be thought of as the projection of the 3d velocity field on the image plane.

Basic Equations of Motion Field    The motion field,v, is given by v = f (ZV - VzP)/Z2
Notice that the motion field is the sum of two components, one of which depends on translation only, the other on rotation only.
The part of the motion field that depends on angular velocity does not carry information on depth.
The key difference between stereo disparity maps and motion fields is that the motion fields are a differential concept based on velocity and time derivatives, and the difference between frames must be very small; whereas in stereo, no such constraint is placed on the system.
The motion field of a pure translation is radial.
The focus of expansion is the point from which all motion vectors point away in a pure translation motion field. The focus of contractionis the opposite.

Pure Translation Motion Field Properties -

1. If Tz¬0, the motion field is radial, and all vector point towards (or away from) a single point, p0. If Tz = 0, the motion field is parallel.
2. The length of motion field vectors is inversely proportional to the depth Z; if Tz ¬= 0, it is also inversely proportional to the distance from p to p0.
3. p0 is the vanishing point in the direction of translation.
4. p0 is the intersection of the ray parallel to the translation vector with the image plane.

A Moving Plane -
The motion field of a moving planar surface, at any instant t, is a quadratic polynomial in the coordinates (x, y, f) of the image points.
The same motion field can be generated by two different planar surfaces undergoing two different 3d motions. Planar surfaces lack generality: for example, the eight point algorithm fails to yield a unique solution if the points are coplanar in 3d space.
Since the motion field of a planar surface is described exactly and globally by a polynomial of second degree, the motion field of any smooth surface is likely to be approximated well by a low-order polynomial even over relatively large regions of the image plane.

Motion Parallax -    The relative motion field of two instantaneously coincident points does not depend on the rotational component of motion in 3d space. The decoupling of rotational parameters and depth is responsible for this. Motion parallax is used to compute structure and motion from optical flow.

Instantaneous Epipole -    The point p0, being the intersection of the image plane with the direction of translation of the center of projection, is the instantaneous epipole between the pairs of consecutive images in the sequence. Thus, it is possible to locate p0 without any a priori knowledge of the intrinsic parameters of the system

Optical Flow

Definition -    A vector field subject to the image brightness constancy constraint, and loosely defined as the apparent motion of the image brightness pattern

Optical flow can be computed from time-varying image sequences under the following assumptions:

• Lambertian Surfaces
• pointwise light source at infinity
• no photometric distortion

the error of this approximation is
• small at points with high spatial gradient
• exactly zero only for translational motion or for any rigid motion such that the illumination direction is parallel to the angular velocity

The Image Brightness Constancy

The image irradiance is proportional to the scene radiance in the direction of the optical axis of the camera; assuming that this proportionality factor is the same across the entire image plane. the constancy of the apparent brightness E over time is dE/dt = 0. Via, differentiation, we can rewrite this equation to (¥E)Tv + Et = 0. The subscript t denotes partial differentiation with respect to time.

• The measured difference in image gradient via the IBC increases as the spatial gradient increases suggesting that points with high spatial image gradient are the locations at which the motion field can be best estimated by the IBC.
• In general, the computed difference between optical flow and motion is unlikely to be exactly zero; thus, the apparent motion of the image brightness is almost always different from the motion field. Optical flow is, then, the apparent motion.

The Aperture Problem

Given the image brightness constancy equation, how much of the motion field can be determined when the only component of the image brightness constancy equation is in the direction of the spatial image gradient?. Thus, the aperture problem is stated as The component of the motion field in the direction orthogonal to the spatial image gradient is not constrained by the image brightness constancy equation.

Differential Techniques for Motion Field Estimation

A least-squares estimate has become the industry standard because

• They are not iterative; therefore, the are genuinely local, and less biased than iterative methods by possible discontinuities of the motion field.
• They do not involve derivatives of order higher than the first; therefore, they are less sensitive to noise than methods requiring higher order derivatives.

Assumptions
• The image brightness constancy equation yields a good approximation of the normal component of the motion field.
• The motion field is well approximated by a constant vector field within any small patch of the image plane

Feature Tracking is the problem of matchine features from frame to frame in long sequences of images.

Using the Motion Field

Given the motion field estimated from an image sequence, compute the shape, or strcuture, of the visible objects, and their motion with respect to the viewing camera.

Sparse Motion Fields

Factorization Method Assumptions

• The camera model is orthographics -- because image features would not remain constant under the perspective projection.
• The position of n image points, corresponding to the scene points P1 , P2 ... Pn, not all coplanar, have been tracked in N frames, with N >= 3.

The factorization method creates a registered measurement matrix denotes the tracked image feature positions through each frame in the sequence, subtracted from the centroid of the image points in the ith frame. It is based of the fundamental theorem that, sans noise, the registered measurement matrix has at most rank 3. The method is based on the decomposition of W (the registered measurement matrix) into the product of a 2N x 3 matrix R and a 3 x n matrix S. R describes the frame-to-frame rotation of the camera with respect to the points. S describes the points' structure as [x y z]' tuples.

Dense Motion Fields

1. determine the direction of translation through approximate motion parallax
2. determine a least squares approximation of the rotational component of the optical flow, and use it in the motion field equations to compute depth

The approximate motion parallax wherein the differences between optical flow vectors at an image point p and at any point close to p can be regarded as noisy estimates of the motion parallax at p.

Motion-based Segmentation

Relax the assumption that the motion between the camera and the scene is described by a single 3d motion to deal with the problem of multiple motions Restricting the problem to the case where the camera is still and there are multiple moving objects in the scene, the problem can be stated as find the regions in the image, if any, corresponding to the different moving objects.
The most simple strategy is probably taking thresholded image differences at the pixel level.

http://www.cs.jhu.edu/~jcorso/class/computer_vision/trucco_verri_outline.html