We have developed a methodology for vision-based interaction called
Visual Interaction Cues (VICs). The VICs paradigm is a methodology for
vision-based interaction operating on the fundamental premise that, in
general vision-based HCI settings, global user modeling and tracking are
not necessary. In contrast, typical methods for vision-based HCI attempt
to perform global user tracking to model the interaction. Such
techniques are computationally expensive, prone to error and the
re-initialization problem, prohibit the inclusion of an arbitrary number
of users, and often require a complex gesture-language the user must
learn. In the VICs paradigm, we make the observation that analyzing the
local region around an interface component will yield sufficient
information to recognize user actions.
In the VICs project, we study both low-level image analysis
techniques and high-level gesture language modeling for HCI. In
low-level image analysis, we use deterministic (color, shape, motion,
etc.), machine learning (e.g. neural networks), and dynamic modeling
(e.g. Hidden Markov Models) to model the spatio-temporal characteristics
of various hand gestures. We have also constructed a high-level language
model that integrates a set of low-level gestures into a single,
coherent probabilistic framework. In the language model, every low-level
gesture is called a Gesture Word, and each complete action is a sequence
of these words called a Gesture Sentence.
The principle techniques of the VICs paradigm are applicable in
general HCI settings as well as advanced simulation and virtual reality.
We are actively investigating 2D, 2.5D, and 3D environments; we've
developed a new HCI platform called the 4D Touchpad (figure left) where
vision-based methods can complement the conventional mouse and keyboard.
We have implemented a real-time system in which users use intuitive
gestures to control a typical 2D interface.