Definition - the parameters that define the location and orientation of the camera reference frame with respect to a known world reference frame.
Definition - the parameters that are needed to link the pixel coordinates of an image point with the corresponding coordinates in the camera reference frame.
Definition - Stereo vision refers to the ability to infer information on the 3d structure and distance of a scene from two or more images taken from different viewpoints.
Correspondence Problem -
Which parts of the left and right images are projections of the same scene element?
Assumptions
Reconstruction Problem - Given a number of corresponding parts of the left and right image, and possibly information on the geometry of the stereo system, what can we say about the 3d location and structure of the observed objects?
Triangulation - The way in which stereo determines the position in space of corresponding points in pairs of images.
Baseline - The distance between the centers of projection
Disparity - The difference in retinal position betwen the corresponding points in two images. Disparity is inversely propertional to the depth of the point in space.
Intrinsic Stereo Parameters - Characterize the transformation mapping an image point from camera to pixel coordinates in each camera.
Extrinsic Stereo Parameters - Describe the relative position and orientation of the two cameras.
Definition - The geometry of stereo. Each point in the left image is restricted to lie on a given line in the right image, the epipolar line--and vice versa. This is called the epipolar constraint.
Epipoles - The point at which the line through the centers of projection of each image intersects the image planes. The left epipole is the image of the center of projection of the right camera and vice versa.
Essential Matrix E -
Establishes a natural link between the epipolar constraint and the extrinsic parameters of the stereo system. Extrinsic parameters can be retrieved via E In sum, Eis the mapping between points and epipolar lines we were looking for.
Satisfies the equation: pr
Properties
Fundamental Matrix F -
Establishes a link between the epipolar constraint and the extrinsic parameters of the stereo system. The difference from the Essential Matrix is that F is defined in terms of pixel coordinates, while E is defined in terms of camera coordinates.
Satisfies the equation: pr
Properties
NOTE: Relationship between E and F is F = Mr-T E Ml-1 where M are the matrices of the left and right intrinsic parameters.
Rectification F - Given a stereo pair of images, rectification determines a transformation of each image such that pairs of conjugate epipolar lines become collinear and parallel to one of the images axes, usually the horizontal one. Why? Because, then the correspondence problem is reduced to 1d from 2d.
The amount of 3d Reconstruction possible depends on the amount of a priori knowledge available on the parameters of the stereo system.
Image Sequence -
A series of N images, or frames, acquired at discrete time instants tk = t0 + k »t, where »t is a fixed time interval, and k = 0, 1, ... , N-1
We must assume that illumination conditions do not vary; in this case, then, image changes are said to be solely caused by relative motion between camera and scene.
Visual motion allows us to compute useful properties of the observed 3d world with very little knowledge about it--It is possible to compute the time, þ, taken by a vertical bar perpendicular to the optical axis to reach the camera only from image information; without knowing either the real size of the bar or its velocity.
Definition - The motion field is the 2d vector field of velocities of the image points, induced by the relative motion between the viewing camera and the observed scene. It can also be thought of as the projection of the 3d velocity field on the image plane.
Basic Equations of Motion Field
The motion field,v, is given by v = f (ZV - VzP)/Z2
Pure Translation Motion Field Properties -
A Moving Plane -
Motion Parallax -
The relative motion field of two instantaneously coincident points does not depend on the rotational component of motion in 3d space. The decoupling of rotational parameters and depth is responsible for this. Motion parallax is used to compute structure and motion from optical flow.
Instantaneous Epipole -
The point p0, being the intersection of the image plane with the direction of translation of the center of projection, is the instantaneous epipole between the pairs of consecutive images in the sequence. Thus, it is possible to locate p0 without any a priori knowledge of the intrinsic parameters of the system
Definition -
A vector field subject to the image brightness constancy constraint, and loosely defined as the apparent motion of the image brightness pattern
Optical flow can be computed from time-varying image sequences under the following assumptions:
The Image Brightness Constancy
The image irradiance is proportional to the scene radiance in the direction of the optical axis of the camera; assuming that this proportionality factor is the same across the entire image plane. the constancy of the apparent brightness E over time is dE/dt = 0. Via, differentiation, we can rewrite this equation to (¥E)Tv + Et = 0. The subscript t denotes partial differentiation with respect to time.
The Aperture Problem
Given the image brightness constancy equation, how much of the motion field can be determined when the only component of the image brightness constancy equation is in the direction of the spatial image gradient?. Thus, the aperture problem is stated as The component of the motion field in the direction orthogonal to the spatial image gradient is not constrained by the image brightness constancy equation.
Differential Techniques for Motion Field Estimation
A least-squares estimate has become the industry standard because
Using the Motion Field
Given the motion field estimated from an image sequence, compute the shape, or strcuture, of the visible objects, and their motion with respect to the viewing camera.
Sparse Motion Fields
Factorization Method Assumptions
Dense Motion Fields
Motion-based Segmentation
Relax the assumption that the motion between the camera and the scene is described by a single 3d motion to deal with the problem of multiple motions Restricting the problem to the case where the camera is still and there are multiple moving objects in the scene, the problem can be stated as find the regions in the image, if any, corresponding to the different moving objects.
Notice that the motion field is the sum of two components, one of which depends on translation only, the other on rotation only.
The part of the motion field that depends on angular velocity does not carry information on depth.
The key difference between stereo disparity maps and motion fields is that the motion fields are a differential concept based on velocity and time derivatives, and the difference between frames must be very small; whereas in stereo, no such constraint is placed on the system.
The motion field of a pure translation is radial.
The focus of expansion is the point from which all motion vectors point away in a pure translation motion field. The focus of contractionis the opposite.
The motion field of a moving planar surface, at any instant t, is a quadratic polynomial in the coordinates (x, y, f) of the image points.
The same motion field can be generated by two different planar surfaces undergoing two different 3d motions. Planar surfaces lack generality: for example, the eight point algorithm fails to yield a unique solution if the points are coplanar in 3d space.
Since the motion field of a planar surface is described exactly and globally by a polynomial of second degree, the motion field of any smooth surface is likely to be approximated well by a low-order polynomial even over relatively large regions of the image plane.
the error of this approximation is
Assumptions
Feature Tracking is the problem of matchine features from frame to frame in long sequences of images.
The factorization method creates a registered measurement matrix denotes the tracked image feature positions through each frame in the sequence, subtracted from the centroid of the image points in the ith frame. It is based of the fundamental theorem that, sans noise, the registered measurement matrix has at most rank 3. The method is based on the decomposition of W (the registered measurement matrix) into the product of a 2N x 3 matrix R and a 3 x n matrix S. R describes the frame-to-frame rotation of the camera with respect to the points. S describes the points' structure as [x y z]' tuples.
The approximate motion parallax wherein the differences between optical flow vectors at an image point p and at any point close to p can be regarded as noisy estimates of the motion parallax at p.
The most simple strategy is probably taking thresholded image differences at the pixel level.