Efficient Similarity Search Based on Data Distribution Properties in High Dimensions

Graduate

Author: Jinhua Li
Advisor: Dr. Sakti Pramanik
Email: lijinhua@cse.msu.edu; http://www.cse.msu.edu/~lijinhua

Nearest Neighbor search is a fundamental task in many applications. At present state of the art approaches to nearest neighbor search are not efficient in high dimensions. In this poster we present clustered AB-tree for large high dimensional databases, which combine an efficient Angle based Balanced index structure AB-tree with clustering methods. Clustered AB-tree uses heuristics to decide whether or not to access a node in the multiple index trees based on the estimated angle and the weight of the node. Extensive experiments on synthetic data and real data demonstrate a significant performance gain of clustered AB-tree over SS-tree, another tree based indexing scheme. We have also shown that the clustered AB-tree performs better than VA-file, the best known sequential access method.

 

Return to Workshop Listing