Fengjie Li and Wei Tong and Anil Jain and Rong Jin and Jung-Eun Lee
April, 2009
Local image features, such as SIFT descriptors, have been shown to be more effective for content-based image retrieval (CBIR) than global image features. They represent each image by a bag of low-level image features that are often referred to as key points. The similarity between two images is decided by the similarity between the associated sets of key points. The most popular approach for key point based image similarity measure is the clustering-based bag-of-words model. It maps each key point to a visual word in a codebook that is constructed by a clustering algorithm, and represents each image by a histogram of visual words. Despite its success, there are three main shortcomings of the clustering-based bag-of-words model: (i) it is computationally expensive to cluster millions of key points into thousands of visual words; (ii) there is no theoretical analysis on the performance of the bag-of-words model; (iii) it usually does not allow for partial matching between key points. We propose a new scheme for key point quantization that addresses these shortcomings. Instead of clustering, the proposed scheme quantizes each key point into a vector by a collection of randomly generated hyper-spheres, and a bag-of-model is constructed for each image by summing the quantization vectors of all its key points. Our theoretical analysis shows that the resulting image similarity provides an upper bound for the similarity based on the optimal partial matching between two sets of key points. Empirical study on a database of $100,000$ tattoo images with ten million key points shows that (i) the proposed scheme is significantly more efficient than the clustering-based approach for key point quantization, and (ii) it achieves better retrieval accuracy than the clustering-based approach.
You are granted permission for the non-commercial reproduction, distribution, display, and performance of this technical report in any format.