Overview of Research

There have been two main thrusts in my graduate research. First, I am interested in developing techniques for using Computer Vision to enhance the interaction between man and machine. I am a member of the Visual Interaction Cues project. We propose a new paradigm for vision-based interaction that operates under the premise that global user tracking is unnecessary for general interaction tasks. Instead, we maintain a relationship between interface components and their projections in the images. This mapping constrains the image analysis to local image regions in which streams of interaction cues will occur.

Second, I am extremely interested in the chicken-and-egg problems of segmentation and correspondence; i.e. if we could segment the objects in images then computing correspondences would be easier and vice versa. For example, most techniques in Computer Vision attempt to solve the correspondence problem indifferent to the image segmentation. Typically, point correspondences are used in the solution (e.g. Stereo, Structure-from-Motion, Local Methods in Object Recognition, etc.). While point measurements permit localization with high accuracy, they are susceptible to noise, occlusion, lighting, etc. making them difficult to robustly extract and correspond. However, by incorporating information from image regions one can approximate a segmentation and reason about correspondences at a level higher than the pixel level. To that end, I have designed a set of techniques to integrate information from coherent image regions thus creating a sparse image segmentation comprised of regions that are extracted and matched with high robustness. In a coarse-to-fine strategy, the region matches constrain the search for point correspondences improving their matching robustness and reducing computational complexity.

In my dissertation, I solve problems in both of these areas, and I apply the region-based methods in a system allowing the creation of large-scale, dynamic, user-constructed mixed-realities. Below, I list the projects in which I have been involved.

My graduate work was partially funded by the National Science Foundation and a fellowship from the Link Foundation.

[Click for detailed project page on coherent regions.]

[Click for detailed project page on subspace fusion for global segmentation.]

Coherent Image Regions - Coupled Segmentation and Correspondence
[Click for detailed project page on coherent regions.]
[Click for detailed project page on subspace fusion for global segmentation.]

We study methods that attempt to integrate information from coherent image regions to represent the image. Our novel sparse image segmentation can be used to solve robust region correspondences and therefore constrain the search for point correspondences. The philosophy behind this work is that coherent image regions provide a concise and stable basis for image representation: concise meaning that the required space for representing the image is small, and stable meaning that the representation is robust to changes in both viewpoint and photometric imaging conditions.

In addition, we have proposed a subspace labeling technique for global Image segmentation in a particular feature subspace is a fairly well understood problem. However, it is well known that operating in only a single feature subspace, e.g. color, texture, etc, seldom yields a good segmentation for real images. However, combining information from multiple subspaces in an optimal manner is a difficult problem to solve algorithmically. We propose a solution that fuses contributions from multiple feature subspaces using an energy minimization approach. For each subspace, we compute a per-pixel quality measure and perform a partitioning through the standard normalized cut algorithm. To fuse the subspaces into a final segmentation, we compute a subspace label for every pixel. The labeling is computed through the graph-cut energy minimization framework proposed by Boykov et al. Finally, we combine the initial subspace segmentation with the subspace labels obtained from the energy minimization to yield the final segmentation.

[Click for official project page]
Vision-Based Man-Machine Interaction
[Click for detailed project page] [Click for official project page]

We have developed a methodology for vision-based interaction called Visual Interaction Cues (VICs). The VICs paradigm is a methodology for vision-based interaction operating on the fundamental premise that, in general vision-based HCI settings, global user modeling and tracking are not necessary. For example, when a person presses the number-keys while making a telephone call, the telephone maintains no notion of the user. Instead, it only recognizes the action of pressing a key. In contrast, typical methods for vision-based HCI attempt to perform global user tracking to model the interaction. In the telephone example, such methods would require a precise tracker for the articulated motion of the hand. However, such techniques are computationally expensive, prone to error and the re-initialization problem, prohibit the inclusion of an arbitrary number of users, and often require a complex gesture-language the user must learn. In the VICs paradigm, we make the observation that analyzing the local region around an interface component (the telephone key, for example) will yield sufficient information to recognize user actions.

Direct Methods for Surface Tracking
We have developed a set of algorithms to directly track planar surfaces and parametric surfaces under a calibrated stereo-rig. A movie demonstrating the planar surface tracking is here. A binary pixel mask is maintained which determines those pixel belonging to the plane (and with good texture); it is shown in red in the lower left of the video. The green vector being rendered is the plane's normal vector. Left is an image of the system that was built with our plane tracking routines to localize mobile robots. In the image, we show the real scene, the two walls that are being tracked (one in blue and one in red), and an overhead (orthogonal) projection of the reconstructed walls.

[Click for detailed project page]
Interactive Haptic Rendering of Deformable Surfaces
[Click for detailed project page]

We have developed a new method for interactive deformation and haptic rendering of viscoelastic surfaces. There are competing demands for haptic rendering and graphics renderings; i.e. an implicit object representation is best for Haptic interaction while an explicit representation is best for Graphic rendering. In our approach, we fuse an implicit and explicit object representation permitting fast haptic interaction and fast graphic rendering. Objects are defined by a discretized Medial Axis Transform (MAT), which consists of an ordered set of circles (in 2D) or spheres (in 3D) whose centers are connected by a skeleton. Our implementation, called DeforMAT, is appealing because it takes advantage of single point haptic interaction to render efficiently while maintaining a very low memory footprint.

Real-Time Volume Visualization
We developed a method for the voxelization of large scalar fields with the goal of interactive volume rendering. An adaptive octree is used to optimally sample the underlying unstructured grid. The unstructured grid is embedded into a voxel-space and those regions not corresponding to input data are flagged as being outside of the embedded model. The octree nodes share borders enabling smooth data continuity between them. Gradients are computed and stored with the textures for lighting computation. We integrated this system as a preprocess for an interactive volume system that we developed. This approach leverages the current 3D texture mapping PC hardware for the problem of unstructured grid rendering. We specialize the 3D texture octree to the task of rendering unstructured grids through a novel pad and stencil algorithm, which distinguishes between data and non-data voxels. Both the voxelization and rendering processes efficiently manage large, out-ofcore datasets. The system manages cache usage in main memory and texture memory, as well as bandwidths among disk, main memory, and texture memory. It also manages rendering load to achieve interactivity at all times. It maximizes a quality metric for a desired level of interactivity. It has been applied to a number of large data and produces high quality images at interactive, user-selectable frame rates using standard PC hardware.

last updated: 15 March 2005; © 2004,2005. jcorso