HOME
PEOPLE
RESEARCH
PUBLICATIONS
MEDIA

 

 

 

 

 

               
    Vision-Based Human Computer Interaction  
   


VICS - Visual Interaction Cues for Local Based HCI

 

People - Dr. Gregory Hager, Dr. Darius Burschka, Dr. Jason Corso, Dr. Grant Ye

Sponsored by the National Science Foundation.

Description The Visual Interaction Cues (VICs) project is focused on developing new techniques for vision-based human computer interaction (HCI). The VICs paradigm is a methodology for vision-based interaction operating on the fundamental premise that, in general vision-based HCI settings, global user modeling and tracking are not necessary. For example, when a person presses the number-keys while making a telephone call, the telephone maintains no notion of the user. Instead, it only recognizes the action of pressing a key. In contrast, typical methods for vision-based HCI attempt to perform global user tracking to model the interaction. In the telephone example, such methods would require a precise tracker for the articulated motion of the hand. However, such techniques are computationally expensive, prone to error and the re-initialization problem, prohibit the inclusion of an arbitrary number of users, and often require a complex gesture-language the user must learn. In the VICs paradigm, we make the observation that analyzing the local region around an interface component (the telephone key, for example) will yield sufficient information to recognize user actions.

The principled techniques of the VICs paradigm are applicable in general HCI settings as well as advanced simulation and virtual reality. We are actively investigating 2D, 2.5D, and 3D environments; we've developed a new HCI platform called the 4D Touchpad (figure below) where vision-based methods can complement the conventional mouse and keyboard.

In the VICs project, we study both low-level image analysis techniques and high-level gesture language modeling. In low-level image analysis, we use deterministic (color, shape, motion, etc.), machine learning (e.g. neural networks), and dynamic modeling (e.g. Hidden Markov Models) to model the spatio-temporal characteristics of various hand gestures. We have constructed a highlevel language model that integrates a set of low-level gestures into a single, coherent probabilistic framework. In the language model, every low-level gesture is called a Gesture Word, and each complete action is a sequence of these words called a Gesture Sentence.

Examples of the algorithms being studied follow (videos are below):


Adaptive Background Modeling


Intelligent Buttons


The 4D Touchpad - A new platform for the development of next-generation interfaces.
It facilitates unencumbered interaction with an architecture that provides core functionality for general vision-based interfaces.


The new 4DT system. Interaction components are rendered over a flat monitor.


System calibration and hand segmentation of the new 4DT system.

PUBLICATIONS

Demo Movies -

More Information - At Jason Corso's Project Page

 
         
               


 
© Copyright
 
2005 - CIRL
Webmaster: Henry Lin