APE Weekly Seminar Abstracts - Fall 2003

Alexander Topchy: Consensus Functions and Ensembles of Weak Partitions

Clustering ensembles combine outputs of multiple clustering algorithms. Such ensembles are aimed at increased robustness and quality of clustering solutions often unattainable by any single algorithm. Distributed data mining with multiple sources of data or features is an important area of applications of this approach. Unlike supervised classification, clustering does not explicitly provide necessary feedback to guide the decision fusion process. We review two key issues in combination design from graph-based, combinatorial and statistical perspectives: 1. Generation of elementary partitions. Their diversity and strengh. 2. Consensus functions. Co-association, hypergraph, disambiguation voting, information-theoretic, and maximum likelihood algorithms. Empirical results on several datasets are also presented.

Shuqing Zeng: Online-learning and Attention-based Approach to Obstacle Avoidance Behavior Using Range Finder

We considered the problem of developing local reactive obstacle-avoidance behaviors by a mobile robot through online real-time learning. The robot operated in an unknown bounded 2-D environment populated by static or moving obstacles (with slow speeds) of arbitrary shape. The sensory perception was based on a laser range finder. We presented a learning-based approach to the problem. To reduce greatly the number of training samples needed, an attention mechanism was used. An efficient, real-time implementation of the approach had been tested, demonstrating smooth obstacle-avoidance behaviors in a corridor with a crowd of moving students as well as static obstacles.

Shailesh Saini: Creating Effective Real-time Demos from Off-line Algorithms

The talk will have three areas of focus. One area will focus on work done on previous graduate students' face regonition algorithm in an attempt to create a real-time face detection application. Most of the work was focused on optimization of the application in order to get acceptable performance with respect to time. The second area will focus on creating a demo on both the tablet PC and a regular PC using a signature verification algorithm. Most of the work was focused on making the algorithm real-time and then developing the demo to take advantage of this. The third area will focus on the creation of a multimodal enrollment/verification system. Once again the emphasis will be on getting the application to run in real-time.

Umut Uludag: Multimedia Content Protection Via Biometrics-based Encryption

Encryption can be used to protect the intellectual property rights of copyrighted digital material such as audio, image, video... files. But traditional cryptosystems are vulnerable to illegal key exchange problems that may render such systems useless for copyright protection purposes. A new multimedia content protection framework that is based on the biometric data of the users and a layered encryption/decryption scheme will be presented. The applicability and computational requirements of the proposed method will be addressed.

Dirk Colbry: Demonstration of 3D Image Scanning using the Minolta VIVID910

The PRIP lab resently acquired a Minolta vivid910 Non-Contact 3D Digitizer. This scanner uses structured lighting from a laser to make fast (.3 sec) 3D depth images (with texture). These images can be converted to many 3D file types including 3D Studio Max and VRML 2.0. This presentation will; demonstrate the capabilities and limitations of the scanner, discuss the software and utilities that are available to us, present work being done at other labs using a similar camer and open up a discussion on possible research topics and ideas to fully utilize this new tool.

Dr. Rong Jin: A New Boosting Algorithm Using Input-Dependent Regularizer

AdaBoost has proved to be an effective method to improve the performance of base classifiers both theoretically and empirically. However, previous studies have shown that AdaBoost might suffer from the overfitting problem, especially for noisy data. In addition, most current work on boosting assumes that the combination weights are fixed constants and therefore does not take particular input patterns into consideration. In this paper, we present a new boosting algorithm, ``WeightBoost'', which tries to solve these two problems by introducing an input-dependent regularization factor to the combination weight. Similarly to AdaBoost, we derive a learning procedure for WeightBoost, which is guaranteed to minimize training errors. Empirical studies on eight different UCI data sets and one text categorization data set show that WeightBoost almost always achieves a considerably better classification accuracy than AdaBoost. Furthermore, experiments on data with artificially controlled noise indicate that the WeightBoost is more robust to noise than AdaBoost.

Dr. Pang-Ning Tan: Discovery of Association Patterns and its Applications

Association analysis is an active research topic in data mining and has been successfully applied to a wide variety of applications including business decision support, medical diagnosis, text mining, collaborative filtering, and bioinformatics. The goal of association analysis is to extract interesting relationships among attributes of a large data set. In this talk, I will present several extensions to the current association analysis formulation, including the concepts of indirect associations and hyperclique patterns. If time permits, I will also describe some of my research work in applying association analysis to Earth Science research and network intrusion detection.

Silviu Minut: Density Estimation With Mercer Kernels: Applications to Object Detection

Density estimation is fundamental in virtually any pattern recognition problem: given a set of data (training samples) one must infer the density of the data in order to be able to classify new data (test samples). Often one assumes e.g. that the data is gaussian, or a mixture of gaussians, in which case PCA or EM can be used to estimate the density. In many realistic problems, however, the data is non-parametric. In this case, we can find a non-linear embedding of the original data into another vector space (feature space), such that the embedded data becomes gaussian. Although the feature space is usually infinite dimensional, it is possible to estimate the density of the embedded data using PCA, and recover the density of the original data. As an application, we use this technique to introduce shape priors in object detection using snakes and energy minimization.

Dr. Selin Aviyente: Time-Frequency Methods and Applications

Time-frequency analysis aims at representing the energy distribution of time-varying signals simultaneously in time and frequency. In this talk, two major analysis tools; wavelets and time-frequency distributions will be introduced. Recent research results in applying information theoretic measures on these representations for the purposes of signal detection and classification will be presented. Finally, some applications of these tools in different areas such as fingerprint identification and image watermarking will be illustrated.

Nan Zhang: Independent Component Analysis

In this talk, I will first give a brief introduction to Independent Component Analysis (ICA), then discuss a new incremental algorithm for ICA. Starting from a L-infinity norm sparseness measure contrast function, we derive the learning algorithm based on a winner-take-all learning mechanism. It avoids the optimization of high order non-linear function or density estimation, which have been used by other ICA methods, such as negentropy approximation, infomax, and maximum likelihood estimation based methods. We show that when the latent independent random variables are super-Gaussian distributions, the network efficiently extracts the independent components.

Anoop Namboodiri: Document Authentication using Online Signatures as Watermarks

Authentication of digital documents is an important concern as digital documents replace the traditional paper-based documents. This is especially important when digital documents are exchanged over the Internet and can easily be accessed or modified by intruders. One of the well-known methods used for authentication of digital documents is the public key encryption-based authentication. However, the encryption-based method is not suitable for widespread distribution of a document since it needs to be decrypted by each recipient, before using it, or additional data should be tagged along with the document. In this talk, I will present a watermarking-based solution, where an on-line signature is embedded in the document. The recipients can verify both the integrity of the document and the claimed identity of the author.

Dr. Liza Levina: The benefits of assuming independence in classification when there are many more variables than observations

While general statistical intuition tells us that using all the available dependence information is better than not using it, we show just the opposite in the case when there are many more variables than observations (the "large p, small n" scenario). This phenomenon is well known in machine learning practice, and will be demonstrated on examples from texture classification and gene expression data. Analytically, we consider the issue in the classical context of discriminating between two normal populations, and prove that the "naive Bayes" classifier based on the independence assumption greatly outperforms the discriminant rule which attempts to estimate the full covariance structure. We also show how in practice shrinkage can further improve on Naive Bayes. For the special case of stationary covariance structure, we introduce a class of rules spanning the range between independence and arbitrary dependence and prove they achieve Bayes optimality for the Gaussian colored noise model.

Hong Chen: The Extraction of the Tooth Contour for Matching Dental Radiographs

The contours of the teeth are important features in dental biometrics. They can be utilized as cues for identifying individuals. The extraction of other features (e.g., the dental work) also need the contours of teeth as reference. To extract the contours, we have proposed a method based on edge detection, and an extraction method by classifying the pixels in the radiograph using an intensity distribution model. However, without the proper guidance of tooth shape knowledge, these methods are subject to errors when the tooth contours are fuzzy and only partially visible. We propose a new method for the contour extraction of the teeth with the directional snake model. The external energy term is redefined to equip the snake model with the ability to distinguish edges of adjacent teeth. Also, the gradient vector field (GVF) will be discussed.

Xiao Huang: Office presence detection using multimodal context information

An office presence detection system is presented in this paper. Context information from multi-sensory inputs is integrated to infer a user's activities in an office. We design a layered architecture to model human activities with different granularities. An IHDR (Incremental Hierarchical Discriminant Regression) tree is used to automatically generate models for acoustic signals from unsegmented auditory streams, with a high adaptive capability to new settings. Hidden Markov Models (HMM) are implemented to detect human motion patterns. The outputs of the above two components are fed into high-level HMMs to analyze human activities. Experimental results of the real-time prototype system are reported.