A non-intrusive real-time program detects the face and face features
of a moving workstation user at a rate of between 10 and 30
Hertz. Based on the face pose, it determines where on the display the
subject is looking. Button selection can be done by opening the
mouth. The long term goal is to provide a system for controlling a
computer using head movements and gaze direction. A skin color model
is used along with geometric knowledge about the face and weak
assumptions about the lighting. Good results are reported with various
subjects and conditions, including facial hair, 3D motion, and use of
eyeglasses. Neuro-psychophysical evidence from studies in human vision shows that the
brain does not keep a complete, detailed image of the world, but rather samples
the world as needed, by directing the eyes to various points of interest in the
environment, via short, very fast eye movements called saccades. In the context
of active vision, it appears critical the need for a selective visual attention
mechanism which would choose the next fixation point in some optimal way. Such
a mechanism is termed "gaze control" and introduces a number of very difficult
and interesting research problems, such as the selection of the next fixation
point and integration of information across saccades. We describe one possible
implementation of a gaze control mechanism based on recent advances in human
and computer vision. The system uses a SONY EVI-D30 pan-tilt-zoom camera,
controlled through the serial port by a Pentium II 400 MHz PC. In the context
of a visual search task, we show how the back-projection histogram and a
symmetry operator can be used to select interesting regions in an image.
Semantic knowledge is acquired in the form of a policy produced by a Q-learning
program, in which states are defined as clusters of image histograms, and
actions are saccades in one of 8 central angles. Experimental results show an
increased number of saccades at the beginning of learning, which gradually
decreases over a number of training epochs, demonstrating the potential of
reinforcement learning techniques in unifying perception and action. The ubiquitous heterogeneity of a structured soil results in
isolated pools of solutions, "hot spots" of soil organic matter
and microbial populations. Although these pools are interconnected
via the internal pore networks within soil aggregates, little is
known of the tortuosity or continuity of the pore interconnecting
these pools. APS Synchrotron images of individual soil aggregates
provide excellent resolutions of these interconnected pore networks
within individual soil aggregates. The reconstructed volume of a
soil aggregate contains a wealth of information. In order to extract
this information, evaluation of the internal aggregate structure
must be done. The polyhedral histogramming technique is a tool for
identifying and segmenting structures within a volume. It can be
used in either an interactive fashion so that a user can guide the
analysis, or in an automatic mode for performing segmentation of
the volume. After the structures have been identified and
segmented the connectivity, shape, and volume of the structures
can be determined. Once the interconnectedness of internal
porosities of soil aggregates are quantified, additional mechanisms
and mathematical descriptions can be assembled to further refine
predictive models associated with environmental sequestration and
detoxification of soil contaminants and better predict
availabilities of ions to the roots of sustainable agricultural
production systems. The increasing availability of personal photo collections makes
archiving and retrieving them more difficult. The demand for finding
conceptual objects (e.g., faces) in a large amount of photo collections
leads to the design of a system that can easily and efficiently manage
the collections with content-based (face) descriptors. In this talk, I
will address the problem of consumer photos management based on
detecting and identifying people. The talk mainly focuses on how to
extract and construct face descriptors using skin-tone color and facial
features. These face descriptors can be passed onto an
identification/matching engine. I will also mention a semi-automatic
system/interface that detects faces and facial features, allows peoples
to edit detected faces, and represents photos in a database by face
descriptors. Experimental results on several photo collections will be
illustrated. A robot that navigates in an indoor office building needs to be able to
get successfully from any location of the building to any other
location. A successful approach for implementing robot navigation
uses the Partially Observable Markov Decision Process (POMDP)
framework which is a robust framework that takes into consideration
sensor and actuator uncertainty. Unfortunately POMDPs scale poorly
with large environments and therefore we propose a hierarchical
framework based on the Hierarchical Hidden Markov Models (HHMMs). Our main
goal is to explore hierarchical modeling as a basis for designing more
efficient methods for model construction and useage. As a case study
we focus on indoor robot navigation and show how this framework can be
used to learn a hierarchy of models of the environment at different
levels of spatial abstraction. We introduce the idea of model
reuse that can be used to combine already learned models into a
larger model. We describe an extension of the HHMM model to includes
actions, which we call hierarchical POMDPs, and describe a modified
hierarchical Baum-Welch algorithm to learn these models. We train
different families of hierarchical models for a simulated and a real
world corridor environment and compare them with the standard ``flat''
representation of the same environment. We show that the hierarchical
POMDP approach, combined with model reuse, allows learning
hierarchical models that fit the data better and train faster than
flat models. Basically, the presentation is about the issues of human-like robot design.
The name of the robot is called 'DEV', is derived from "Mental Development".
The robot is of human size, with eyes, mouth, head, hands, arms, torso, etc.
At the later stage of the project, the robot can develop mental skills
autonomously and learn.
A simple biometric system has a sensor module, a feature extraction module
and a matching module. The performance of a biometric system is largely
affected by the reliability of the sensor used and the degrees of freedom
offered by the features extracted. Also, if the biometric trait being
sensed or measured is noisy (a fingerprint with a scar or a voice altered
by a cold, for example), the resultant confidence score (or matching
score) computed by the matching module may not be reliable. Simply put,
the matching score generated by a noisy input has a large confidence
interval. This problem can be addressed by installing multiple sensors
that capture different biometrics. Such systems known as multimodal
biometric systems are expected to be more reliable due to the presence of
multiple pieces of evidence. However an intelligent scheme is required to
fuse the decisions churned out by the individual sensors. In this work we attempt to deal with the problem of decision fusion by
first building a multimodal biometric system and then devising various
schemes to integrate these various modalities. The proposed system uses
the fingerprint, face, hand geometry and voice features of an individual
for verification purposes. Computational markets comprise a subclass of multiagent systems where
the primary mode of interaction is via market price systems. Markets
can provide effective decentralized allocation of resources in a
variety of situations, and economic analysis a powerful design tool
for interaction protocols. For the past several years, my research
group has been exploring the space of market mechanisms, and
applications to various distributed environments. In this talk, I
present an overview of our approach, and highlight results on auction
protocols for decentralized scheduling. Elaborations of this problem
represent realistic commerce scenarios and present challenges for
designers of bidding strategy, as illustrated by observations on the
ICMAS-00 Trading Agent Competition. They proposed that salient sensory events trigger neuronal value systems capable
of modulating synaptic plasticity. They investigate the capacity of value
systems to modulate their own responses in the context of various conditioning
tasks and suggest that plasticity in sensory afferents to value systems may
provide a neurobiological basis for mediating the changing effects of saliency
on adaptive behavioral responses.
These work can be used in Vision-based Navigation of SAIL3. I will introduce
their work about value system and discuss how to use it in our robot. Image Processing Projects: Human discourse is an active process of converting thoughts
into speech, gesture, and gaze activity. Grounded on the psycholinguistic
foundations on the production of such multimodal 'conversational-acts' (as
opposed to the mono-dimensional speech-act), we address the interpretation
of gesture, speech, and gaze in the context of discourse management. We
investigate the cues afforded by each mode of interaction and the algorithms
necessary to detect and extract them; study the spatial and temporal
relationships among these cues and associate them with topical units in
discourse; study the interactions of gesture, speech and gaze in discourse
segmentation; and a multimedia database system that integrates these
elements into a coherent whole. Our approach involves experiments designed
to discover and quantify cues in the various modalities, and their relation
with respect to discourse management; the development of computational
algorithms to detect and recognize such cues; and the integration of these
cues into a cogent discourse management system.
We present psycholinguistic phenomena that are detected by our analysis.
The understanding of how such phenomena are detectable from video and audio
signal, and the determination of the kinds of computable cues that support
such analysis are the first steps toward the bridging the signal-sense gap
in multi-modal interaction. Among these are cues for semantic segmentation
and organization, cross-modal temporal integration, and the significance of
'hold tension release'.
We have assembled a strong interdisciplinary team comprising
psycholinguistic, machine vision and signal processing researchers to
address the holistic nature of discourse and language itself. This permits
us to base our research squarely on the realities of human communication in
spontaneous discourse across a wide range of pragmatic conditions.
Technology is being developed that have significant impact on natural
language discourse analysis, human-computer interaction systems,
neuropathological studies (Parkinson's Disease and Left/Right Hemisphere
Damage) and discourse and video databases. Another significant outcome of
this research is to introduce computational and quantitative rigor to the
psycholinguistic study of discourse production. This represents a model of
collaborative research between the fields of engineering and cognitive
science. Vera Bakic: A Human-Computer Interaction Interface based on face feature
tracking in 2D
Silviu Minut: Selective Visual Attention:
A Model for Gaze Control
Paul Albee: Quantifying the Volume and
Internal Porosity of Synchrotron Imaged Soil Aggregates by a Polyhedral
Histogramming Technique
Rein-Line (Vincent) Hsu:
Managing Personal Photo Collections Based on Human Faces
Georgios N Theocharous:
Learning Hierarchical Partially Observable Markov Decision Process Models
for Robot Navigation
Shuqing Zeng: Humanoid Robot Design
Arun Ross: Multimodal Biometric
Systems
Professor Michael Wellman: Computational Markets,
Decentralized Scheduling, and Trading Agents
Xiao Huang: Value Systems and Its
Role in Adaptive Behavior
Professor Brian Lovell:Computer Vision and Medical Image Processing Projects at the University of Queensland
A Methodology for Quality Control in Cell Nucleus Segmentation:
We present a very robust algorithm for segmenting the nucleus from the
cytoplasm of pap smear cells as one step along the difficult road to producing and
auomated pap smear screening system to detect early signs of cancer of the cervix.
The particular focus of this talk is the means by which we improve algorithm performance
from 99.6% to 100% accuracy by automatic data validation.
Mime: A Gesture-Driven Computer Interface:
The aim of this project was to produce a simple, lightweight computer vision interface to replace the
computer mouse interface using hand gestures alone. The focus is on very
efficient computation for a software only solution. MIME is intended to
harness the intuitiveness and flexibility of gesture as a natural computer
interface.
Professor Francis Quek: Multimodal Discourse: Gesture, Speech and Gaze