The spatial distribution of gray level intensities in an image can be naturally
modeled using Markov Random Field (MRF) models. We develop and investigate the
performance of face detection algorithms derived from MRF considerations. For
enhanced detection, the MRF models are defined for every permutation of site
indices in the image. We find the optimal permutation that provides maximum
discriminatory power to identify faces from nonfaces. The MRF models
successfully detect faces in a number of test images in real time.
Clustering and Feature Selection
Martin Law
Dr. Anil Jain
Dr. Mario Figueiredo
This work proposes an unsupervised algorithm for learning a finite
mixture model from multivariate data. The adjective "unsupervised" is
justified by two properties of the algorithm: (i) it is capable of
selecting the number of components, and (ii) unlike the standard
expectation-maximization (EM)
algorithm,it does not require careful initialization. The proposed method
also avoids another drawback of EM for mixture fitting: the possibility of
convergence towards a singular estimate at the boundary of the parameter
space. The novelty of our approach is that we do not use a model selection
criterion to choose one among a set of preestimated candidate models;
instead, we seamlessly integrate estimation and model selection in a
single
algorithm. Our technique can be applied to any type of parametric mixture
model for which it is possible to write an EM algorithm. In our first
paper, we illustrate it with experiments involving Gaussian mixtures.
These experiments testify for the good performance of our approach.
This approach is extended to perform feature selection -- the selection
of "good" variables -- for learning a mixture model in an unsupervised
setting. Feature selection in unsupervised learning is much more difficult
than its counter-part in supervised learning because of the lack of class
labels. By treating the relevance of each feature as a Bernoulli
random variable, we obtain an EM algorithm that estimate both the
number of components and the importance of the features simultaneously.
A complimentary approach based on
a "wrapper" on the standard EM mixture learning algorithm is also
proposed for feature selection in unsupervised learning. The optimal
feature subset size is determined automatically by the entropy of
assignment, instead of manually adjusted.
Our experimental results show that the proposed methods can be useful
for many real world data sets.
Dr. Anil Jain
Dr. Ana Fred
Alexander Topchy
We explore the idea of evidence accumulation for combining
the results of multiple clusterings. Initially, n
d-dimensional data is decomposed into a large number
of compact clusters; the K-means algorithm performs this
decomposition, with several clusterings obtained by N
random initializations of the K-means. Taking the cooccurrences
of pairs of patterns in the same cluster as votes
for their association, the data partitions are mapped into a
co-association matrix of patterns. This nxn matrix represents
a new similarity measure between patterns. The final clusters
are obtained by applying a MST-based clustering
algorithm on this matrix. Results on both synthetic and
real data show the ability of the method to identify arbitrary
shaped clusters in multidimensional data.
A more detailed clustering page can be found here.
3D Human face models have been widely used in applications such as
face recognition, facial expression recognition, human action recognition,
head tracking, facial animation, video compression/coding, and augmented
reality. Modeling human faces provides a potential solution to the
variations encountered on human face images. We propose a method of
modeling human faces based on a generic face model (a triangular mesh
model) and individual facial measurements containing both shape and
texture information. The modeling method adapts a generic face model to
the given facial features, extracted from registered range and color
images, in a global-to-local fashion. It iteratively moves the vertices
of the mesh model to smoothen the non-feature areas, and uses the 2.5D
active contours to refine feature boundaries. The resultant face model has
been shown to be visually similar to the true face. Initial results show
that the constructed model is quite useful for recognizing profile views.
Face Detection in Color Images
Vincent Hsu
Dr. Anil Jain
Human face detection is often the first step in applications such as video
surveillance, human computer interface, face recognition, and image
database management. We propose a face detection algorithm for color
images in the presence of varying lighting conditions as well as complex
backgrounds. Our method detects skin regions over the entire image, and
then generates face candidates based on the spatial arrangement of these
skin patches. The algorithm constructs eye, mouth, and boundary maps for
verifying each face candidate. Experimental results demonstrate successful
detection over a wide variety of facial variations in color, position,
scale, rotation, pose, and expression from several photo collections.
Fingerprint Mosaicking
Arun Ross
Dr. Anil Jain
It has been observed that the reduced contact area offered by solid-state
fingerprint sensors do not provide sufficient
information (e.g., minutiae) for high accuracy user verification. Further,
multiple impressions of the same finger acquired by
these sensors, may have only a small region of overlap thereby affecting the
matching performance of the verification
system. To deal with this problem, we suggest a fingerprint mosaicking scheme
that constructs a composite fingerprint
image using multiple impressions. In the proposed algorithm, two impressions of a
finger are initially aligned using the
corresponding minutiae points. This alignment is used by the well-known iterative
closest point algorithm (ICP) to compute
a transformation matrix that defines the spatial relationship between the two
impressions. The transformation matrix is used
in two ways: (a) the two impressions are stitched together to generate a
composite image. Minutiae points are then detected
in this composite image. (b) the minutia maps obtained from each of the
individual impressions are integrated to create a
larger minutia map. The availability of a composite template improves the
performance of the fingerprint matching system as
is demonstrated in our experiments.
Hybrid Fingerprint Matcher
Arun Ross
Dr. Anil Jain
Most fingerprint matching systems rely on the distribution of minutiae on the
fingertip to represent and match fingerprints. While the ridge flow pattern is
generally used for classifying fingerprints, it is seldom used for matching.
This
work describes a hybrid fingerprint matching scheme that uses both minutiae and
ridge flow information to represent and match fingerprints. A set of 8
Gabor filters, whose spatial frequencies correspond to the average inter-ridge
spacing in fingerprints, is used to capture the ridge strength at equally
spaced orientations. A square tessellation of the filtered images is then used
to
construct an eight-dimensional feature map, called the ridge feature map. The
ridge feature map along with the minutiae set of a fingerprint image
is used for matching purposes.
The proposed technique has the following
features: (i) the entire image is taken into account in constructing the ridge
feature map, and every tessellated cell is equally weighted; (ii) minutiae
matching is used to determine the affine transformation parameters relating the
query and the template images for ridge feature map extraction;
(iii) filtering and ridge feature map extraction are implemented in the
frequency domain
thereby speeding up the matching process; (iv) filtered query
images are cached to greatly increase the one-to-many matching speed. The
hybrid matcher performs better than a minutiae-based
fingerprint matching system. The genuine accept rate of the hybrid matcher is
observed to be ~10% higher than that of a minutiae-based system at
low FAR values. Fingerprint verification (one-to-one matching) using the hybrid
matcher on a Pentium
III, 800 MHz system takes
~1.4 seconds, while fingerprint identification (one-to-many matching)
involving 1,000 templates
takes ~0.2 seconds per match.
Multimodal Biometrics
Arun Ross
Dr. Anil Jain
A simple biometric system has a sensor module, a feature extraction module and a matching module. The performance of a biometric system is largely affected by the reliability of the sensor used and the degree of freedom offered by the features extracted. Also, if the biometric trait being sensed or measured is noisy (a fingerprint with a scar or a voice altered by a cold, for example), the resultant confidence score (or matching score) computed by the matching module may not be reliable. Simply put, the matching score generated by a noisy input has a large confidence interval. This problem can be addressed by installing multiple sensors that capture different biometrics. Such systems known as multimodal biometric systems are expected to be more reliable due to the presence of multiple pieces of evidence. However an intelligent scheme is required to fuse the decisions churned out by the individual sensors.
In this work we attempt to deal with the problem of decision fusion by
first building a bimodal biometric system and then devising various schemes to
integrate the outputs of the two sensors. The proposed system uses the
fingerprint and voice features of an individual for verification purposes.
Online Script Recognition
Anoop Namboodiri
Dr. Anil Jain
Automatic identification of handwritten script facilitates many important applications such as automatic transcription of multi-lingual documents and search for documents on the web containing a particular script. The increase in usage of handheld devices which accept handwritten input is creating a huge volume of handwritten data. This work proposes a method to classify words and lines in an on-line handwritten document into one of the six scripts: Arabic, Cyrillic, Devnagari, Han, Hebrew or Roman. The classification is based on 11 different spatial and temporal features extracted from the strokes of the words. The proposed system attained an overall classification accuracy of 85% at the word level with 5-fold cross validation. The classification accuracy improves to 96% as the number of words in the test sample is increased to four and to 96.5% for complete text lines, consisting of an average of seven words.
We present a hierarchical approach for extracting homogeneous regions in on-line documents. The problem of identifying and processing
ruled and unruled tables, text and drawings is addressed. The on-line document is first segmented into regions with only text stroke and
regions with both text and non-text strokes. The text region is further classified as unruled table or plain text. Stroke clustering is
used to segment the non-text regions. Each non-text segment is then classified as drawing, ruled table or underlined keyword using
stroke properties. The individual regions are processed and the results are assembled to identify the structure of the on-line document.
Digital Watermarking of Fingerprint Images
Umut Uludag
Dr. Anil Jain
Watermarking of digital media has gained considerable attention in the last years
as a means of copyright protection and
content verification. Watermarking of fingerprint images aims to embed watermark
information to the fingerprint image without
decreasing the fingerprint identification-verification performance. In this
project, we are working on such watermarking
methods to increase the security of the fingerprints.
Dental Biometrics
Hong Chen
Anil K. Jain
Dental Biometrics:
The main purpose of forensic dentistry is to identify deceased individuals for
whom other means of identification (e.g., fingerprint, face, etc) are not
available. We try to identify people using their
post-mortem (PM) and ante-mortem (AM) radiographs. In other words, given a PM
radiograph, we search the database to locate a matching AM radiograph.
The variant quality of the radiographs requires us to perform preprocessing
procedures such as image enhancement and restoration. The matching is based
on the contour of the teeth and its image intensity. Currently, we're
attempting to use the striate and trabecular patterns present in the teeth.
A combination method is expected to further improve the matching accuracy.
Microbes stained with special fluorescein and chemicals exhibit
different colors.
The color indicates the state of the microbes during an important
activity like metabolism or fission. The morphtype,
biovolumn, and aggregate situation of those microbes in some special colors
are important for further biological analysis. The color image itself has
a noisy background and a complex foreground. Because the amount of microbes
in an image is large, a lot of time is needed to manually extract information
from the image. The goal of this work is to identify the microbes present in
the noisy image and to extract useful information about them. The
proposed system identifies the microbes in a
digitized color image using an interactive user interface. There is no fixed
color range for a specific color stain; microbes with the same reaction
to the chemical might show different but related colors. Color segmentation
(with local thresholding and region growing) is applied to eliminate the
background noise and select the region of interest.
The microbes present in the region of interest
might be connected or even overlap with each other.
The connection and overlapping problems distort the extracted information:
the microbe count will be incorrect and the biovolumn of microbes will be
underestimated. The morphtype and color information
are used to split the connected microbes. To distinguish
between overlapping microbes and to count the number of microbes correctly,
further morphtype analysis is required.
Minutiae Verification and Classification for Fingerprint
Matching
Salil Prabhakar
Dr. Anil Jain
Raw image data offer rich source of information for matching and
classification. For simplicity of pattern recognition system design, a
sequential approach consisting of sensing, feature extraction and matching
is conventionally adopted where each stage transforms a particular
component of information relatively independently. The interaction between
these modules is limited. Some of the errors in the end-to-end sequential
processing can be easily eliminated especially for the feature extraction
stage by revisiting the original image data. We propose a feedback path
for the feature extraction stage, followed by a feature refinement stage
for improving the matching performance. This performance improvement is
illustrated in the context of a minutiae-based fingerprint verification
system. We show that a minutia verification stage based on reexamining
the gray-scale profile in a detected minutia's spatial neighborhood in the
sensed image can improve the matching performance by ~4% on our database.
Further, we show that a feature refinement stage which assigns a class
label to each detected minutia (ridge ending and ridge bifurcation) before
matching can also improve the matching performance by ~3%. A combination
of feedback (minutia verification) in the feature extraction phase and
feature refinement (minutia classification) improves the overall
performance of the fingerprint verification system by ~8%.
Automatic Surveillance Using Omnidirectional and
Active Cameras
Dan Gutchess
Dr. Anil Jain
We are developing a real-time automated surveillance system which uses
an omnidirectional video camera in combination with multiple active
cameras. Tracking of multiple subjects in an indoor environment is
performed using omnidirectional video as input. The world coordinates of
each subject in the room are estimated in order to direct the attention
of one or more pan-tilt-zoom cameras. The system automatically controls
these cameras for the purpose of obtaining high resolution images and
video sequences of subjects. In particular, we demonstrate that
high-quality facial images may be extracted from the images captured by
the system. The automatic acquisition of such images makes this system
useful for experiments involving face and action recognition.
Detecting, Tracking and Interpreting Faces
Vera Bakic
Dr. George Stockman
A non-intrusive real-time program is developed which detects the eyes and nose of a moving workstation user at a rate of between 10 and 30 Hertz. The program creates a base facility for other capabilities such as detecting gaze direction and facial gestures, creating face models, and normalizing for face recognition. A skin color model is used along with geometric knowledge about the face and weak assumptions about the lighting. Good results are reported over various conditions, including facial hair, 3D motion, clothing color, and use of eyeglasses. Good performance has been demonstrated with dozens of subjects on a low end SGI workstation with an eye-camera to acquire images.
Our work is directed toward a general capability to detect and track a
human face as it moves in a 3D workspace. Having achieved this
capability, it can then be used to enable others. For example, 3D pose
can be used directly for Human-Computer Interface (HCI) or for
evaluation of how humans explore computer displays or virtual
environments. An extension of the system that enables to determine
into which region on the workstation screen the user is looking is
under development. The news system will enable to the user to issue
commands to the computer using head movements and gaze direction.
Visual Learning
Nicolae Duta
Dr. Anil Jain
Building a trainable object detection/segmentation/matching system with
applications in automatic medical diagnosis and personal identity
verification.
Image and Video Databases
Aditya Vailaya
Dr. Anil Jain
Typical digital video search is based on queries involving a single shot. We generalize this problem by allowing queries that involve a video clip (say a 10 sec video segment). We propose two schemes for query by video clip: (i) retrieval based on key frames follows the traditional representation of identifying shots, computing key frames from a video, and then extracting image features around the key frames. Based on each key frame in the query, a similarity value (using color, texture, and motion) is associated with the key frames in the database video. Consecutive key frames in the database video that are highly similar to the query key frames are then used to generate the set of retrieved video clips. (ii) In retrieval using sub-sampled frames, we uniformly sub-sample the query clip as well as the database video. Retrieval is based on matching color and texture features of the sub-sampled frames. Initial experiments on two video databases (basketball video with approximately 16,000 frames and a CNN news video with approximately 20,000 frames) show promising results. Experiments using segments from one basketball video as query and a different basketball video as the database show that the feature representation and matching schemes are robust. We are currently investigating methods for improving the performance of the system using semantic knowledge of the given domain, object segmentation and tracking, detection of text and faces, and combining the various matching schemes.
Due to the huge amount of potentially interesting documents available over
the Internet, searching for relevant information has become very
difficult. Since image and video are a major source of these data, grouping
images into (semantically) meaningful categories using low-level
visual features is an important (and challenging) problem in content-based
image retrieval. Using Bayesian classifiers, we attempt to capture
high-level concepts from low-level image features. Specifically,
we have developed Bayesian classifiers for semantic image classification
(indoor vs. outdoor, city vs. landscape, and sunset vs. forest vs. mountain),
image orientation detection, and object detection (detecting regions of sky
and vegetation in outdoor images). We demonstrate that a small
codebook (the optimal codebook size is selected using a modified MDL
criterion) extracted from a learning vector quantizer can be used to
estimate the class-conditional densities of the observed features needed for
image classification. We have developed an incremental learning paradigm, a
feature selection scheme, a rejection scheme, and a classifier combination
strategy using bagging to improve classifier performance. Empirical results
on a large database (24,000 images) show that semantic categorization
and organization of the database using the proposed classification schemes
improves both retrieval accuracy and efficiency.
3D Object Recognition and Registration
Chitra Dorai
Dr. Anil Jain
Deformable Models For Object Matching
Yu Zhong
Dr. Anil Jain
We propose a general object localization and retrieval scheme based on object
shape using deformable templates. Prior knowledge of an object shape is
described by a prototype template which consists of the representative
contour/edges, and a set of probabilistic deformation transformations on the
template. A Bayesian scheme, which is based on this prior knowledge and the
edge information in the input image, is employed to find a match between the
deformed template and objects in the image. Computational efficiency is
achieved via a coarse-to-fine implementation of the matching algorithm. Our
method has been applied to retrieve objects with a variety of shapes from
images with complex background. The proposed scheme is invariant to location,
rotation, and moderate scale changes of the template.