| Author: | Aditya Vailaya |
| Advisor: | A. K. Jain |
| Email: | vailayaa@cse.msu.edu; http://www.cse.msu.edu/~vailayaa |
Due to the huge amount of potentially interesting documents available over the Internet, searching for relevant information has become very difficult. Since image and video are a major source of these data, grouping images into (semantically) meaningful categories using low-level visual features is an important (and challenging) problem in content-based image retrieval. Using Bayesian classifiers, we attempt to capture high-level concepts from low-level image features. Specifically, we have developed classifiers for semantic image classification (indoor vs. outdoor, man-made vs. natural, and sunset vs. forest vs. mountain) and object detection (detecting regions of sky and vegetation in outdoor images). We demonstrate that a small codebook (the optimal codebook size is selected using a modified MDL criterion) extracted from a learning vector quantizer can be used to estimate the class-conditional densities of the observed features needed for image classification. We have developed an incremental learning paradigm, a feature selection scheme, a rejection scheme, and a classifier combination strategy using bagging to improve classifier performance. Empirical results on a large database (~24,000 images) show that semantic categorization and organization of the database using the proposed classification schemes improves both retrieval accuracy and efficiency.