Pattern Recognition and Image Processing Laboratory
Department of Computer Science
A714 Wells Hall
Michigan State University
East Lansing MI 48824-1027

George Stockman, Director
(517) 355-5240, FAX: 432-1061
stockman@cps.msu.edu

http://web.cps.msu.edu/prip

Laboratory Brochure

17 April 1995

Introduction

The Pattern Recognition and Image Processing Laboratory of the Department of Computer Science at Michigan State University is located in room 350 of the Engineering Building. The PRIP lab supports the research of faculty, visiting scholars, graduate and undergraduate students in the areas of pattern recognition, image processing, computer vision, and vision-guided robotics.

Image processing is the manipulation and transformation of sensed images without attaching domain knowledge to the image. Removing random and structured noise, converting from color to grayscale, enhancing contrast, and reducing resolution are examples of image processing. Image processing is often done before pattern recognition to simplify the extraction of features.

Pattern recognition is the process of grouping or categorizing data based on similar sets of features. Pattern recognition methods apply to a broad range of problems in addition to machine vision. Humans reason using general features such as edges, shapes, colors, and shadows. Pattern recognition can make this higher level of abstraction usable in computer vision tasks.

Computer vision includes all areas of machine understanding of the external environment in response to visual input. Computer vision, like human visual perception, is the primary source for knowledge of a changing external world.

Vision-guided robotics is a particular application of computer vision to the problem of guiding mobile robots and robot manipulators. This application may involve handling both uncertainty in the precise location of a robot or manipulator and computer-controlled interaction with the environment being viewed.

Since its founding in 1978, PRIP lab faculty members have received over five million dollars in research and equipment grants. Research projects have been funded by the National Science Foundation, State of Michigan, Department of Defense, Babcock & Wilcox, DuPont, General Motors, IBM, Innovision, Institute for Defense Analysis, Massey-Ferguson, NASA, Northrop, Peugeot S.A., Siemens, and Texas Instruments.

PRIP faculty have advised a large number of Ph.D. students, and have published over one hundred journal articles, papers in conference proceedings, and book chapters. Six books have been edited or written by PRIP faculty.

PRIP faculty also provide consultation and assistance to projects located in the Composite Materials and Structures Center, the Center for Microbial Ecology, and the Departments of Crop and Soil Science, Agricultural Engineering, Civil Engineering, Mechanical Engineering, and Biomechanics.

Faculty

University Distinguished Professor Anil K. Jain is Lab Director, an Associate Editor of the IEEE Transactions on Neural Networks, The Journal of Pattern Recognition, Pattern Recognition Letters, the Journal of Mathematical Imaging, and the Journal of Intelligent Systems.

Professor George Stockman is an Associate Editor of The Journal of Pattern Recognition.

Assistant Professor John Weng is an Associate Editor of the IEEE Transactions on Image Processing.

Visiting Scholars and Students

These visiting scholars and graduate students are currently working with PRIP faculty.

Equipment

The Department of Computer Science operates its own computer facilities for students, faculty, and staff. The facilities consist largely of Sun servers and workstations, with workstations from Apple, DEC, and SGI for special applications. A full-time Computing Facilities Coordinator and ten half time graduate assistants keep the equipment running around the clock.

The PRIP lab operates a wide range of equipment to support the activities of its researchers. The lab has a full-time system administrator and a part-time graduate assistant to see that its equipment is running and to help researchers in its use. The following is a list of the major equipment available in the PRIP lab.

Software Packages

The PRIP lab has a large selection of software on its Sun Unix systems. Some of the major packages are:

In addition to these packages, the PRIP group has accumulated over the years a number of useful programs written in the course of research projects.

Datasets

The PRIP lab has online a number of useful datasets:

Current Research Projects

Abstracts of current research projects with recent papers. Theses are listed separately at the end.

3-D Surface and Motion From Images

Two types of problems are being addressed. For monocular sequences, the task is to compute 3-D motion and structure parameters of the scene using the results from stereo matching and image matching. Our contributions include: uniqueness of the solution from line features as well as its stability; closed-form solutions from planar scenes and their optimality; significant improvement in the stability of the closed-form solution using point features; introduction of optimization techniques so that the sensitivity to noise approaches a theoretical lower bound (Cramer-Rao bound); new methods for estimating errors in the closed-form and the optimized solutions; and a dynamic model for the modeling, estimation and prediction of 3-D motion using long image sequences. The algorithms developed have been tested with synthesized data and real-world images.

For stereo image sequences, the contributions include a closed-form approximate matrix-weighted solution for motion and structure from consecutive stereo image pairs, which is better than existing solutions based on feature points with typical stereo setups; and a recursive-batch technique for dealing with long stereo image sequences, which takes advantage of two very different schemes: Kalman filtering and batch processing.

We have also developed algorithms to solve the correspondence problem (binocular stereo) and the structure from motion problem. Our first method is based on multiple image attributes and is one of the first algorithms that can deal with large image disparities and compute dense 3-D depth maps. Another method that we have developed is based on the windowed Fourier Phase (WFP). The WFP is quasi-linear and spatially dense, with spatial period and slope controlled by the selected frequency. The WFP includes the

zero-crossings and the peaks as special cases, but it contains additional information essential for stable matching. Theoretically, the WFP is complete in representing the signals up to a multiplicative constant. The implementation of the matching algorithm resembles that of a neural network and is well suited for commercially available parallel frame-rate hardware. Experiments have achieved good results with random-dot images and natural images, used either as stereograms or as consecutive views of a moving scene.

Applications of Pattern Recognition and Image Processing

Many projects have been undertaken in cooperation with faculty outside the Department of Computer Science. In addition to providing solutions to specific problems, such projects help create a large database of images useful in verifying our pattern recognition and image processing algorithms. Current areas of research are in remote sensing for land use planning, measurement of root systems for analyzing plant growth, analysis of bacterial culture images, detection of structures in NMR brain scans, processing of technical drawings, classification of fingerprints, analysis of sequences of turbulent flow images, and precise measurement of the human body.

Camera System Calibration

Two types of calibration problems are being considered, one is about image position and the other is about image intensity. For the former, new techniques have been developed to compensate for both radial and tangential distortions in a camera lens and to determine the spatial relationships between the camera and the 3-D world. The experiments have shown that the distortion compensation leads to a significant improvement in accuracy. A new normalized measure has been introduced that can be used to objectively evaluate and compare the accuracies of various calibration techniques, despite the parameter differences among the camera systems. Intensity calibration is necessary because of the peripheral attenuation in images due to optical lens, whose effect typically results in darker corners in images. We have developed a model for such peripheral attenuation. An intensity calibration method has also been developed to compensate the intensity peripheral attenuation.

Comprehensive Learning Theory and SHOSLIF

Comprehensive visual learning is the treatment of theories and techniques for computer vision systems to automatically learn to understand comprehensive visual information with minimal human-imposed rules about the visual world. The concept of comprehensive learning here implies two coverages: comprehensive coverage of visual world and comprehensive coverage of vision

algorithm. This project investigates reasons for the shortcomings of currently prevailing approaches to computer vision and introduces the promising direction of comprehensive learning towards overcoming these difficulties. The SHOSLIF (Self Organizing Hierarchical Optimal Subspace Learning and Inference Framework) is a framework that aims to provide a unified theory and methodology for comprehensive visual learning. Its objective is not just to attack a particular vision problem, but a wide variety of vision problems. It addresses critical problems such as how to automatically select the most useful features; how to automatically organize visual information using a coarse-to-fine space partition tree that results in a very low, logarithmic time complexity for retrieving data from a large visual knowledge base, and how to achieve invariance based on learning. This framework has been used to build the following systems: SHOSLIF-O, SHOSLIF-M, SHOSLIF-N, and SHOSLIF-R. The predecessor of the SHOSLIF is the Cresceptron system.

Document Image Analysis

We have shown that the general framework of using Gabor filters to characterize image texture is applicable in several different document image analysis problems. In particular, we have considered the following three problems: text-graphics separation, address block location, and bar code localization. In each one of these problems, the text content or the bar code in the image is considered to define

a unique texture that can be easily characterized by a small number of Gabor filters. Both supervised and unsupervised methods have been used to identify regions of text or bar code in the input document images. The same filter parameters have been used in all the three problem domains. Experimental results demonstrate the generality and effectiveness of our approach for segmentation and classification of document images. Recent work emphasizes skew detection, separating handwritten and machine printed characters, and page layout segmentation.

Face and Object Recognition from Intensity Images

The objective of this project is to recognize and segment objects from images, using the SHOSLIF approach. Our system uses the theories of optimal projection for optimal feature selection and a hierarchical structure for low computational complexity. The system can proceed under a supervised, unsupervised, or hybrid learning mode. In the supervised mode, a hierarchy of class labels is provided with each training image. No class labels are given under the unsupervised learning mode, and some training images are labeled in

the hybrid mode. In a preliminary experiment, we have trained the system on a diverse set of objects from natural scenes, ranging from human faces to street signs to aerial photographs. Eight hundred images were used for training, 712 human face images (356 classes) and 82 other objects (41 classes). The disjoint test set consists of 78 faces and 38 other objects, among which 91% of the human faces were correctly recognized by the top choice (in terms of similarity) and for the other objects, the corresponding recognition rate is 87%.

Efforts are also underway toward communicating with computers using facial expressions, authorization using recognition of learned faces, and general modeling for presentation and animation. The ability to sense faces in both 2-D and 3-D and the capability of representing faces for computer manipulation are central to these applications.

Image Retrieval using Color, Shape and Texture

Content-based image retrieval has evolved as an interesting and challenging area of research. Large image databases are used in a number of applications, including criminal identification, multimedia encyclopedia, geographic information system, online applications for art and art history, medical image archives, and trademarks. With the increase in the amount of image data, a fast and automatic procedure is required for indexing and retrieval.

Most of the recent work has concentrated on developing a single concise feature like color, shape or texture for retrieval. Single feature-based indexing and retrieval might lack sufficient discriminatory information and might not be able to accommodate large orientation and scale changes. Different features have different invariance properties and therefore, they should be integrated for better retrieval results.

Our goal is to develop an efficient content-based image retrieval scheme for an image database. We have built an image database of trademark images. Our database currently consists of over 400 logotypes. We are able correctly to retrieve images from this database on the basis of color, shape and texture.

Industrial Inspection

Inspection is the process of determining if a product (part, object, or item) deviates from a given set of specifications. Inspection usually involves measurement of specific features such as assembly integrity, surface finish, geometric dimensions, and so on. Automatic inspection is desirable because human inspectors are not consistent, and it has been reported that human visual inspection is at best 80% effective. This level of effectiveness can be achieved only if a rigidly structured set of inspection checks is implemented. Many inspection tasks are time-consuming or boring for humans to perform. For example, human visual inspection has been estimated to account for 10% or more of the total labor cost of manufactured products. Some manufactured part defects are too subtle for detection by a human eye. Machine vision results in lower labor costs and improved quality. Finally, automatic inspection allows objects to be inspected in environments unsafe for humans.

We have been working on a variety of inspection tasks, including nondestructive testing of composite materials, automatic inspection of surface finish, and locating defects in metal castings using range images. We work closely with Innovision Corporation, which designs and builds real-time inspection systems.

  • P. Uthaisombut, D. Guyer, and G. Stockman, "Using machine vision to inspect cherries for cracks and bruises," Tech. Rep. PRIP, Michigan State University, March 1995.
  • T. Newman and A. K. Jain, "A survey of automated visual inspection," CVGIP: Image Understanding, vol. 61, pp. 231-262, March 1995.
  • T. Newman and A. K. Jain, "Biderectional template matching for 3-D CAD-based inspection," in Proceedings of the SPIE conference on Machine Vision Applications in Industrial Vision, (San Jose, CA), February 1994.
  • T. Newman and A. Jain, "CAD-based inspection of 3-D objects using range images," in Proceedings of the 2nd CAD-Based Vision Workshop, (Champion, PA), pp. 236-243, February 1994.
  • A. K. Jain and M.-P. Dubuisson, "Segmentation of X-ray and C-scan images of fiber reinforced composite materials," Pattern Recognition, vol. 25, pp. 257-270, March 1992.
  • A. K. Jain, M.-P. Dubuisson, and M. S. Madhukar, "Multisensor fusion for nondestructive inspection of fiber reinforced composite materials," in Proceedings of the 6th Conference of the American Society of Composites, (Albany, NY), pp. 941-950, October 1991.
  • A. K. Jain, F. Farrokhnia, and D. Alman, "Texture analysis of automotive finishes," in Proceedings of Vision90, (Detroit, MI), pp. 8.1-8.16, November 1990.

    Integration of Vision Modules

    The issue of integration of vision modules in a total system context is being addressed. Individual cues from visual modules are fallible and often ambiguous. As a result, only integrated vision systems can be expected to give reliable performance in practice. The design of such systems is challenging because each vision module works under different and possibly conflicting sets of assumptions. We have proposed and implemented a multiresolution system that integrates perceptual grouping, segmentation, stereo, shape from shading, and line labeling modules. We demonstrate the efficacy of our approach using images of several different realistic scenes. The output of the integrated system is shown to be relatively insensitive to the constraints imposed by the individual modules. The numerical accuracy of the recovered depth is assessed in the case of synthetically generated data. Finally, we have quantitatively evaluated our approach by reconstructing geons from the depth data obtained from the integrated system. Presently, we are exploring the following enhancements to the existing implementation: inclusion of more feedback paths, inclusion of more vision modules, and a more objective evaluation of our results.

    Autonomous Navigation of Mobile Robots

    Our research in vision-guided mobile robots includes object recognition, path planning, pose estimation, and sensor fusion. The lab has an autonomous robot, called ROME, built upon a LABMATE mobile robot base from Transitions Research Corporation. This platform serves as a testbed for experiments utilizing infrared object detection, ultrasound range finding, and stereo vision. An onboard Sun SPARCstation with SunVideo hardware and extra serial and parallel ports for controlling the robot and receiving sensor data provides the computing power.

    Our current work uses the SHOSLIF framework for autonomous

    navigation. The task is to control a mobile robot to navigate autonomously in an unstructured (i.e., unknown) environment based on only visual images. No active sensors, such as sonar or infrared proximity sensors, are necessary. The navigation control signals are used to correct heading direction, speed and step distance. In the learning phase, ROME was manually controlled to take pictures at typical positions of a hallway section for training. The intended control signal associated with each scene was also recorded as a desired output. ROME went through three training drives inside a campus building, during which a total of 363 training images was taken, 280 of which were taken from two straight hallways and 83 from a turn. In the test phase, we let ROME navigate autonomously along three straight hallways and two turns, including two trained hall ways and one trained turn. In more than 30 runs, ROME successfully navigated along straight sections and turns that it had not learned.

    Markov Random Field Models

    We have examined the role of Markov Random Field (MRF) models in image segmentation and image synthesis. In addition to image modeling, MRF models are useful for incorporating contextual information in decision making. We are particularly interested in using MRF models for texture classification and segmentation, and sensor fusion. Two of our main research thrusts in this area are parameter estimation and parallel implementation.

    Matching Rigid Moving Objects

    In this project we are interested in developing methods for tracking rigid moving objects having arbitrary curved surfaces. Intended applications for this project include intelligent traffic systems, security systems, and general robot vision. Motion of the moving objects in a sequence of color images is used to perform image segmentation and boundary extraction. Motion-based and color-based segmentations are integrated to obtain a reliable contour of the object. A 3-D object is modeled by a set of different 2-D silhouettes. The silhouette of the object observed from any given viewpoint is derived by the curvature method of Basri and Ullman. The derived silhouette is then fitted to the observed silhouette to determine the object pose: fitting is carried out by Newton's method for nonlinear least-squares minimization of fitting parameters. Two different approaches to matching are used. One approach derives salient local features and uses them as matching primitives. In the second approach, correspondence is guided by template matching, where the similarity measure is based on the minimization of the overall Euclidean distance between the derived silhouette and the observed silhouette. Some of these algorithms have been successfully tested with images of moving vehicles on highway ramps and city streets.

    Modeling and Recognition of 3-D Objects

    We are investigating extraction and evaluation of features for recognition of three-dimensional objects and construction of object models. The major thrusts of the research are: sensing and feature extraction, modeling of 3-D objects, recognition and pose estimation using matching of object and model features, and integrating mechanical CAD techniques in object modeling. This research will have impact in manufacturing environments for automation of bin-picking, inspection, assembly, and sorting.

    We are currently investigating a new approach for the representation and recognition of 3-D objects with free-form surfaces from dense range data. Our surface representation scheme, cosmos, describes an object concisely in terms of maximal surface patches of constant shape index. These maximal patches are mapped onto the unit sphere via their orientations, and aggregated via shape spectral functions. Surface properties such as area, curvedness, and connectivity that are required to capture local and global information are also built into the representation. The scheme yields not only a meaningful and rich description useful for recovering the object, but provides a set of powerful indexing primitives for matching. The intended application of this research is automatic recognition of manufactured and natural objects with free-form surfaces.

    Motion Event Recognition

    The objective of this project is to understand temporal events such as hand signs, facial expressions, lip motion during speech, human body motions, as well as other events that involve more than one object, such as, "Tom points to Jim." Clearly, understanding of temporal events requires the capability of recognizing static objects in conjunction with their changes. A preliminary investigation of hand sign recognition has been conducted as the phase I of the project. In this experiment 504 training samples were used which contain 28 hand signs such as "angry," "any,' "boy," "yes," "cute," "fine," "funny," "girl," "happy," "hi," etc, from which the system reached a correct recognition rate of 98% for a set of 504 independent test sequences. The approach employed is SHOSLIF and thus the system is called SHOSLIF-M (SHOSLIF for motion understanding).

    Parallel Algorithms for Computer Vision

    Computer vision involves many processing algorithms that demand an enormous amount of memory and computational resources. This research project studies the exploitation of parallelism in computer vision applications and the development of a parallel programming environment for these applications.

    We are studying parallel algorithms for computer vision using coarse-grained approach on a workstation cluster using PVM, a high level communication library. We have implemented distributed algorithms for pattern clustering, motion, structure estimation from image sequences, and edge detection and surface reconstruction based on weak membrane models.

    High performance custom computing platforms can be built easily using field-programmable gate arrays (FPGAs) at an affordable cost to achieve high performance index and fast prototyping. Splash 2 is a Xilinx 4010 FPGA-based array processor designed and developed by Supercomputing Research Center. We are porting many vision applications on Splash 2. A fingerprint matching algorithm for rolled fingerprints has been successfully ported. For

    many low-level vision applications such as smoothing, edge detection, morphological operations, we have implemented a generalized filter on Splash 2 with near-ASIC (application-specific integrated circuit) level performance. Currently, we are focusing our efforts to port two compute-intensive applications, namely, feature extraction from fingerprint images and page layout segmentation on Splash 2.

    Region Detection in Medical Images

    The objective of this project is to develop a technique that is reliable, adaptive, and versatile to solve the problem of region detection in a relatively wide class of medical images. Learning is essential in achieving this objective. Learning takes place in two stages: learning for automatic selection of threshold values and learning for automatic selection of the region of interest from candidate regions in the attention map. The result from the second stage is evaluated based on a learned cost measure and the outcome is fed back to the first stage when necessary. This feedback enhances the reliability of the entire system. Experiments have been conducted to approximately locate the endocardium boundaries of the left and right ventricles from gradient-echo MR images. Cardiac CT images have also been used for testing.

    Statistical Pattern Recognition and Artificial Neural Networks

    We have investigated a number of problems dealing with classifier design and exploratory pattern analysis. For example, we have applied bootstrapping (a resampling technique) to the problems of determining the number of clusters in a data set, calculating the width of Parzen windows in density estimation, and classifier error rate estimation. Other problems of interest include curse of dimensionality, feature selection, decision tree design, tests for randomness and multivariate normality, and cluster validity.

    We are currently developing a systematic ANN design and training methodology and its applications in pattern recognition and image processing. The salient feature of our research is to draw upon statistical pattern recognition techniques to solve some key aspects of ANN design and learning procedures. We have studied the relationship between the number of training samples and the number of hidden nodes, designed an ANN to implement the k-nearest neighbor decision rule, and have mapped several multivariate data projection algorithms on appropriate networks.

    Texture Analysis

    Texture analysis is an important and useful area of study in machine vision. Even though the diversity of natural and artificial textures makes it impossible to give a universal definition of texture, most natural surfaces exhibit texture and a successful vision system must be able to deal with the textured world surrounding it. We have been involved in different areas of texture analysis: texture segmentation, texture classification, and texture synthesis using techniques such as multi-channel filtering, fractal analysis, and Markov random fields.

    Transitory Image Sequences and Their Integration

    A transitory image sequence is one in which no scene element is visible through the entire sequence. When a camera system scans a scene that cannot be covered by a single view, then the image sequence is called transitory. This project deals with some major theoretical and algorithmic issues associated with the task of estimating structure and motion from transitory image sequences. It is shown that integration with a transitory sequence has properties that are very different from those with a non-transitory one. Two representations, world-centered (WC) and camera-centered (CC), behave very differently with a transitory sequence. The asymptotic error rates indicate that one representation is significantly superior to the other, depending on whether one needs camera-centered or world-centered estimates. Using Cramer-Rao lower error bound, we show that these error rates are not only the rates obtained by the proposed algorithm, but also the best rates possible. Based on the error rate analysis, we introduce an efficient "cross-frame" estimation technique for the CC representation. For the WC representation, our analysis indicates that a good technique should be based on camera global pose instead of interframe motions. In addition to testing with synthetic data, rigorous experiments were conducted with real-image sequences taken by a fully calibrated camera system. A comparison of the experimental results with the ground truth has demonstrated that reliable structure information can be obtained from transitory image sequences.

    Vision-Guided Robot Manipulators

    The objective of this project is to develop a highly adaptive method for vision-based object manipulation in an unstructured environment without requiring a complete description of the world and explicit camera calibration. This project uses SHOSLIF-R, SHOSLIF for robot manipulators. Unlike conventional robot systems that depend very much on the availability of accurate global position of the manipulator and objects, the system under development learns, through interactive visual feedback, the unknown and nonlinear relationships among the sensed objects, the hand, and the visual sensors. The robot manipulator is equipped with a visual recognition system called Cresceptron that is used to recognize the objects and the robot hand from images. In the learning phase an adaptive hierarchical network is automatically generated to learn the hand-eye coordination as well as the objects.

    In the performance phase the network controls the manipulator to perform some tasks, such as reaching and picking up a learned object and moving the object to a desired position.

    Recent Theses

    Recent Ph.D and M.S. theses completed by graduate students in the PRIP lab.

    Photo Collage

    The final sheet contains sequences of images illustrating seven different research projects.

    For More Information

    For information about current research projects in the PRIP lab, please contact:

    Chitra Dorai/Karissa Miller, PRIP Lab Managers
    Department of Computer Science
    A714 Wells Hall
    Michigan State University
    East Lansing, Michigan USA 48824-1027
    

    Email: manager@pixel.cps.msu.edu WWW: http://web.cps.msu.edu/prip

    Click here to send your queries by e-mail.