APE Weekly Seminar Abstracts - Fall 2000

Vera Bakic: A Human-Computer Interaction Interface based on face feature tracking in 2D

A non-intrusive real-time program detects the face and face features of a moving workstation user at a rate of between 10 and 30 Hertz. Based on the face pose, it determines where on the display the subject is looking. Button selection can be done by opening the mouth. The long term goal is to provide a system for controlling a computer using head movements and gaze direction. A skin color model is used along with geometric knowledge about the face and weak assumptions about the lighting. Good results are reported with various subjects and conditions, including facial hair, 3D motion, and use of eyeglasses.

Silviu Minut: Selective Visual Attention: A Model for Gaze Control

Neuro-psychophysical evidence from studies in human vision shows that the brain does not keep a complete, detailed image of the world, but rather samples the world as needed, by directing the eyes to various points of interest in the environment, via short, very fast eye movements called saccades. In the context of active vision, it appears critical the need for a selective visual attention mechanism which would choose the next fixation point in some optimal way. Such a mechanism is termed "gaze control" and introduces a number of very difficult and interesting research problems, such as the selection of the next fixation point and integration of information across saccades. We describe one possible implementation of a gaze control mechanism based on recent advances in human and computer vision. The system uses a SONY EVI-D30 pan-tilt-zoom camera, controlled through the serial port by a Pentium II 400 MHz PC. In the context of a visual search task, we show how the back-projection histogram and a symmetry operator can be used to select interesting regions in an image. Semantic knowledge is acquired in the form of a policy produced by a Q-learning program, in which states are defined as clusters of image histograms, and actions are saccades in one of 8 central angles. Experimental results show an increased number of saccades at the beginning of learning, which gradually decreases over a number of training epochs, demonstrating the potential of reinforcement learning techniques in unifying perception and action.

Paul Albee: Quantifying the Volume and Internal Porosity of Synchrotron Imaged Soil Aggregates by a Polyhedral Histogramming Technique

The ubiquitous heterogeneity of a structured soil results in isolated pools of solutions, "hot spots" of soil organic matter and microbial populations. Although these pools are interconnected via the internal pore networks within soil aggregates, little is known of the tortuosity or continuity of the pore interconnecting these pools. APS Synchrotron images of individual soil aggregates provide excellent resolutions of these interconnected pore networks within individual soil aggregates. The reconstructed volume of a soil aggregate contains a wealth of information. In order to extract this information, evaluation of the internal aggregate structure must be done. The polyhedral histogramming technique is a tool for identifying and segmenting structures within a volume. It can be used in either an interactive fashion so that a user can guide the analysis, or in an automatic mode for performing segmentation of the volume. After the structures have been identified and segmented the connectivity, shape, and volume of the structures can be determined. Once the interconnectedness of internal porosities of soil aggregates are quantified, additional mechanisms and mathematical descriptions can be assembled to further refine predictive models associated with environmental sequestration and detoxification of soil contaminants and better predict availabilities of ions to the roots of sustainable agricultural production systems.

Rein-Line (Vincent) Hsu: Managing Personal Photo Collections Based on Human Faces

The increasing availability of personal photo collections makes archiving and retrieving them more difficult. The demand for finding conceptual objects (e.g., faces) in a large amount of photo collections leads to the design of a system that can easily and efficiently manage the collections with content-based (face) descriptors. In this talk, I will address the problem of consumer photos management based on detecting and identifying people. The talk mainly focuses on how to extract and construct face descriptors using skin-tone color and facial features. These face descriptors can be passed onto an identification/matching engine. I will also mention a semi-automatic system/interface that detects faces and facial features, allows peoples to edit detected faces, and represents photos in a database by face descriptors. Experimental results on several photo collections will be illustrated.

Georgios N Theocharous: Learning Hierarchical Partially Observable Markov Decision Process Models for Robot Navigation

A robot that navigates in an indoor office building needs to be able to get successfully from any location of the building to any other location. A successful approach for implementing robot navigation uses the Partially Observable Markov Decision Process (POMDP) framework which is a robust framework that takes into consideration sensor and actuator uncertainty. Unfortunately POMDPs scale poorly with large environments and therefore we propose a hierarchical framework based on the Hierarchical Hidden Markov Models (HHMMs). Our main goal is to explore hierarchical modeling as a basis for designing more efficient methods for model construction and useage. As a case study we focus on indoor robot navigation and show how this framework can be used to learn a hierarchy of models of the environment at different levels of spatial abstraction. We introduce the idea of model reuse that can be used to combine already learned models into a larger model. We describe an extension of the HHMM model to includes actions, which we call hierarchical POMDPs, and describe a modified hierarchical Baum-Welch algorithm to learn these models. We train different families of hierarchical models for a simulated and a real world corridor environment and compare them with the standard ``flat'' representation of the same environment. We show that the hierarchical POMDP approach, combined with model reuse, allows learning hierarchical models that fit the data better and train faster than flat models.

Shuqing Zeng: Humanoid Robot Design

Basically, the presentation is about the issues of human-like robot design. The name of the robot is called 'DEV', is derived from "Mental Development". The robot is of human size, with eyes, mouth, head, hands, arms, torso, etc. At the later stage of the project, the robot can develop mental skills autonomously and learn.

Arun Ross: Multimodal Biometric Systems

A simple biometric system has a sensor module, a feature extraction module and a matching module. The performance of a biometric system is largely affected by the reliability of the sensor used and the degrees of freedom offered by the features extracted. Also, if the biometric trait being sensed or measured is noisy (a fingerprint with a scar or a voice altered by a cold, for example), the resultant confidence score (or matching score) computed by the matching module may not be reliable. Simply put, the matching score generated by a noisy input has a large confidence interval. This problem can be addressed by installing multiple sensors that capture different biometrics. Such systems known as multimodal biometric systems are expected to be more reliable due to the presence of multiple pieces of evidence. However an intelligent scheme is required to fuse the decisions churned out by the individual sensors.

In this work we attempt to deal with the problem of decision fusion by first building a multimodal biometric system and then devising various schemes to integrate these various modalities. The proposed system uses the fingerprint, face, hand geometry and voice features of an individual for verification purposes.

Professor Michael Wellman: Computational Markets, Decentralized Scheduling, and Trading Agents

Computational markets comprise a subclass of multiagent systems where the primary mode of interaction is via market price systems. Markets can provide effective decentralized allocation of resources in a variety of situations, and economic analysis a powerful design tool for interaction protocols. For the past several years, my research group has been exploring the space of market mechanisms, and applications to various distributed environments. In this talk, I present an overview of our approach, and highlight results on auction protocols for decentralized scheduling. Elaborations of this problem represent realistic commerce scenarios and present challenges for designers of bidding strategy, as illustrated by observations on the ICMAS-00 Trading Agent Competition.

Xiao Huang: Value Systems and Its Role in Adaptive Behavior

They proposed that salient sensory events trigger neuronal value systems capable of modulating synaptic plasticity. They investigate the capacity of value systems to modulate their own responses in the context of various conditioning tasks and suggest that plasticity in sensory afferents to value systems may provide a neurobiological basis for mediating the changing effects of saliency on adaptive behavioral responses. These work can be used in Vision-based Navigation of SAIL3. I will introduce their work about value system and discuss how to use it in our robot.

Professor Brian Lovell:Computer Vision and Medical Image Processing Projects at the University of Queensland

Image Processing Projects:
A Methodology for Quality Control in Cell Nucleus Segmentation: We present a very robust algorithm for segmenting the nucleus from the cytoplasm of pap smear cells as one step along the difficult road to producing and auomated pap smear screening system to detect early signs of cancer of the cervix. The particular focus of this talk is the means by which we improve algorithm performance from 99.6% to 100% accuracy by automatic data validation.
Mime: A Gesture-Driven Computer Interface:
The aim of this project was to produce a simple, lightweight computer vision interface to replace the computer mouse interface using hand gestures alone. The focus is on very efficient computation for a software only solution. MIME is intended to harness the intuitiveness and flexibility of gesture as a natural computer interface.

Professor Francis Quek: Multimodal Discourse: Gesture, Speech and Gaze

Human discourse is an active process of converting thoughts into speech, gesture, and gaze activity. Grounded on the psycholinguistic foundations on the production of such multimodal 'conversational-acts' (as opposed to the mono-dimensional speech-act), we address the interpretation of gesture, speech, and gaze in the context of discourse management. We investigate the cues afforded by each mode of interaction and the algorithms necessary to detect and extract them; study the spatial and temporal relationships among these cues and associate them with topical units in discourse; study the interactions of gesture, speech and gaze in discourse segmentation; and a multimedia database system that integrates these elements into a coherent whole. Our approach involves experiments designed to discover and quantify cues in the various modalities, and their relation with respect to discourse management; the development of computational algorithms to detect and recognize such cues; and the integration of these cues into a cogent discourse management system. We present psycholinguistic phenomena that are detected by our analysis. The understanding of how such phenomena are detectable from video and audio signal, and the determination of the kinds of computable cues that support such analysis are the first steps toward the bridging the signal-sense gap in multi-modal interaction. Among these are cues for semantic segmentation and organization, cross-modal temporal integration, and the significance of 'hold tension release'. We have assembled a strong interdisciplinary team comprising psycholinguistic, machine vision and signal processing researchers to address the holistic nature of discourse and language itself. This permits us to base our research squarely on the realities of human communication in spontaneous discourse across a wide range of pragmatic conditions. Technology is being developed that have significant impact on natural language discourse analysis, human-computer interaction systems, neuropathological studies (Parkinson's Disease and Left/Right Hemisphere Damage) and discourse and video databases. Another significant outcome of this research is to introduce computational and quantitative rigor to the psycholinguistic study of discourse production. This represents a model of collaborative research between the fields of engineering and cognitive science.

Yilu Zhang: Online Development of Audio-Driven Behaviors

Motivated by autonomous developmental process of higher animals and humans from infancy to adulthood, the goal of our SAIL developmental robot project is to investigate how to enable a machine to develop its cognitive and behavioral skills through online, real-time interactions with the environment. The work that will be presented is on the audition development of the project. Two major challenges of autonomous development are: (1) The teacher cannot provide class labels; (2) All the multiple sensory inputs are continuous in time and no hand segmentation is allowed. I will describe how these challenges are addressed by our developmental program through online reinforcement learning.

Wey-Shuan Hwang: Developmental Humanoids: Humanoids that Develop Skills Automatically

It is extremely challenging for humans to program a robot to a sufficient degree that it acts properly in typical unknown human environments. This is especially true for humanoids due to the very large number of redundant degrees of freedom and a large number of sensors that are required for humanoids to work safely and effectively in the human environment. Motivated by human mental development from infancy to adulthood, we enable robots to develop its mind automatically, through online, real time interactions with its environment. We demonstrated our work on the tasks of robot navigation and motion tracking.