There have been two main thrusts in my graduate research. First, I am
interested in developing techniques for using Computer Vision to enhance
the interaction between man and machine. I am a member of the Visual Interaction Cues project. We
propose a new paradigm for vision-based interaction that operates under
the premise that global user tracking is unnecessary for general
interaction tasks. Instead, we maintain a relationship between
interface components and their projections in the images. This
mapping constrains the image analysis to local image regions in which
streams of interaction cues will occur.
Second, I am extremely interested in the chicken-and-egg
problems of segmentation and correspondence; i.e. if we could segment
the objects in images then computing correspondences would be easier and
vice versa. For example, most techniques in Computer Vision attempt to
solve the correspondence problem indifferent to the image segmentation.
Typically, point correspondences are used in the solution (e.g. Stereo,
Structure-from-Motion, Local Methods in Object Recognition, etc.). While
point measurements permit localization with high accuracy, they are
susceptible to noise, occlusion, lighting, etc. making them difficult
to robustly extract and correspond. However, by incorporating
information from image regions one can approximate a segmentation and
reason about correspondences at a level higher than the pixel level. To
that end, I have designed a set of techniques to integrate information
from coherent image regions thus creating a sparse image segmentation
comprised of regions that are extracted and matched with high
robustness. In a coarse-to-fine strategy, the region matches constrain
the search for point correspondences improving their matching robustness
and reducing computational complexity.
In my dissertation, I solve problems in both of these areas, and I apply
the region-based methods in a system allowing the creation of
large-scale, dynamic, user-constructed mixed-realities. Below, I list
the projects in which I have been involved.
My graduate work was partially funded by the National
Science Foundation and a fellowship from the Link Foundation.
|
Coherent Image Regions - Coupled Segmentation and Correspondence
We study methods that attempt to integrate information from coherent
image regions to represent the image. Our novel sparse image
segmentation can be used to solve robust region correspondences and
therefore constrain the search for point correspondences. The
philosophy behind this work is that coherent image regions provide a
concise and stable basis for image representation: concise meaning that
the required space for representing the image is small, and stable
meaning that the representation is robust to changes in both viewpoint
and photometric imaging conditions.
In addition, we have proposed a subspace labeling technique for global
Image segmentation in a particular feature subspace is a fairly well
understood problem. However, it is well known that operating in only a
single feature subspace, e.g. color, texture, etc, seldom yields a good
segmentation for real images. However, combining information from
multiple subspaces in an optimal manner is a difficult problem to solve
algorithmically. We propose a solution that fuses contributions from
multiple feature subspaces using an energy minimization approach. For
each subspace, we compute a per-pixel quality measure and perform a
partitioning through the standard normalized cut algorithm. To fuse the
subspaces into a final segmentation, we compute a subspace label for
every pixel. The labeling is computed through the graph-cut energy
minimization framework proposed by Boykov et al. Finally, we combine
the initial subspace segmentation with the subspace labels obtained from
the energy minimization to yield the final segmentation.
|
|
Vision-Based Man-Machine Interaction
We have developed a methodology for vision-based interaction called
Visual Interaction Cues (VICs). The VICs paradigm is a methodology for
vision-based interaction operating on the fundamental premise that, in
general vision-based HCI settings, global user modeling and tracking are
not necessary. For example, when a person presses the number-keys while
making a telephone call, the telephone maintains no notion of the user.
Instead, it only recognizes the action of pressing a key. In contrast,
typical methods for vision-based HCI attempt to perform global user
tracking to model the interaction. In the telephone example, such
methods would require a precise tracker for the articulated motion of
the hand. However, such techniques are computationally expensive, prone
to error and the re-initialization problem, prohibit the inclusion of an
arbitrary number of users, and often require a complex gesture-language
the user must learn. In the VICs paradigm, we make the observation that
analyzing the local region around an interface component (the telephone
key, for example) will yield sufficient information to recognize user
actions.
-
Jason J. Corso and Guangqi Ye and Gregory D. Hager.
Analysis of Multi-Modal Gestures with a Coherent Probabilistic
Graphical Model.
Virtual Reality, 2005.
(to appear).
-
Jason J. Corso.
Vision-Based Techniques for Dynamic, Collaborative Mixed-Realities.
In Brian J. Thompson, editor, Research Papers of the Link
Foundation Fellows, volume 4. University of Rochester Press.
Invited Report for Link Foundation Fellowship; to be released Fall
2004.
-
Guangqi Ye and Jason J. Corso and Gregory D. Hager.
Gesture Recognition using 3D Appearance and Motion Features.
In B. Kisacanin and V. Pavlovic and T. Huang, editor,
Real-Time Vision for Human-Computer Interaction. 2005.
Extended version of the paper by the same title in Proceedings
of Workshop on Real-Time Vision for Human-Computer Interaction (at CVPR
2004); to be released 2005.
-
Guangqi Ye, Jason J. Corso, Darius Burschka, and Gregory D. Hager.
VICs: A Modular HCI Framework Using Spatio-Temporal Dynamics.
Machine Vision and Applications, 2004.
(to appear).
-
Guangqi Ye and Jason J. Corso and Gregory D. Hager.
Gesture Recognition Using 3D Appearance and Motion Features.
In Proceedings of Workshop on Real-time Vision for
Human-Computer Interaction (at CVPR 2004), 2004.
-
Guangqi Ye, Jason J. Corso, Gregory D. Hager, and Allison M. Okamura.
VisHap: Augmented Reality Combining Haptics and Vision.
In Proceedings of IEEE International Conference on Systems, Man
and Cybernetics, 2003.
-
Jason J. Corso, Darius Burschka, and Gregory D. Hager.
The 4DT: Unencumbered HCI With VICs.
In Proceedings of CVPRHCI, 2003.
-
Guangqi Ye, Jason J. Corso, Darius Burschka, and Gregory D. Hager.
VICs: A Modular Vision-Based HCI Framework.
In Proceedings of 3rd International Conference on Computer
Vision Systems, pages 257-267, 2003.
|
|
Direct Methods for Surface Tracking
We have developed a set of algorithms to directly track planar surfaces
and parametric surfaces under a calibrated stereo-rig.
A movie demonstrating the planar surface tracking is here. A binary pixel mask is maintained which
determines those pixel belonging to the plane (and with good
texture); it is shown in red in the lower left of the video. The green
vector being rendered is the plane's normal vector. Left is an image
of the system that was built with our plane tracking routines to
localize mobile robots. In the image, we show the real scene, the two
walls that are being tracked (one in blue and one in red), and an
overhead (orthogonal) projection of the reconstructed walls.
-
William W. Lau, Nicholas A. Ramey, Jason J. Corso, Nitish Thakor, and
Gregory D. Hager.
Stereo-Based Endoscopic Tracking of Cardiac Surface Deformation.
In Proceedings of Seventh International Conference on Medical
Image Computing and Computer-Assisted Intervention (MICCAI), 2004.
-
Nicholas A. Ramey and Jason J. Corso and William W. Lau and Darius Burschka
and Gregory D. Hager.
Real Time 3D Surface Tracking and Its Applications.
In Proceedings of Workshop on Real-time 3D Sensors and Their
Use (at CVPR 2004), 2004.
-
Jason J. Corso, Nicholas Ramey, and Gregory D. Hager.
Stereo-Based Direct Surface Tracking with Deformable Parametric
Models.
Technical report, The Johns Hopkins University, 2003.
CIRL Lab Technical Report 2003-02.
-
Jason J. Corso, Darius Burschka, and Gregory D. Hager.
Direct Plane Tracking in Stereo Image for Mobile Navigation.
In Proceedings of International Conference on Robotics and
Automation, 2003.
-
Jason J. Corso and Gregory D. Hager.
Planar Surface Tracking Using Direct Stereo.
Technical report, The Johns Hopkins University, 2002.
CIRL Lab Technical Report.
|
|
Interactive Haptic Rendering of Deformable Surfaces
We have developed a new method for interactive deformation and haptic
rendering of viscoelastic surfaces. There are competing demands for
haptic rendering and graphics renderings; i.e. an implicit object
representation is best for Haptic interaction while an explicit
representation is best for Graphic rendering. In our approach, we fuse
an implicit and explicit object representation permitting fast haptic
interaction and fast graphic rendering. Objects are defined by a
discretized Medial Axis Transform (MAT), which consists of an ordered
set of circles (in 2D) or spheres (in 3D) whose centers are connected by
a skeleton. Our implementation, called DeforMAT, is appealing because
it takes advantage of single point haptic interaction to render
efficiently while maintaining a very low memory footprint.
|
|
Real-Time Volume Visualization
We developed a method for the voxelization of large scalar fields with
the goal of interactive volume rendering. An adaptive octree is used to
optimally sample the underlying unstructured grid. The unstructured
grid is embedded into a voxel-space and those regions not corresponding
to input data are flagged as being outside of the embedded model. The
octree nodes share borders enabling smooth data continuity between them.
Gradients are computed and stored with the textures for lighting
computation. We integrated this system as a preprocess for an
interactive volume system that we developed. This approach leverages
the current 3D texture mapping PC hardware for the problem of
unstructured grid rendering. We specialize the 3D texture octree to the
task of rendering unstructured grids through a novel pad and stencil
algorithm, which distinguishes between data and non-data voxels. Both
the voxelization and rendering processes efficiently manage large,
out-ofcore datasets. The system manages cache usage in main memory and
texture memory, as well as bandwidths among disk, main memory, and
texture memory. It also manages rendering load to achieve interactivity
at all times. It maximizes a quality metric for a desired level of
interactivity. It has been applied to a number of large data and
produces high quality images at interactive, user-selectable frame rates
using standard PC hardware.
|
|