Collaborative Research: Support Efficient Similarity Searches for Multidimensional Non-ordered Discrete Data Spaces

Supported by the National Science Foundation Research Grants

Principal Investigator at MSU

Dr. Sakti Pramanik
Department of Computer Science and Engineering,
Michigan State University, East Lansing, MI, USA
pramanik@msu.edu
(NSF Grant for MSU: IIS-0414576)
 

Principal Investigator at UM-D

Dr. Qiang Zhu
Department of Computer and Information Science,
The University of Michigan, Dearborn, MI, USA
qzhu@umich.edu
(NSF Grant for UM-D: IIS-0414594)
 

Graduate Students

Changqing Chen
Dashiell Matthews Kolbe
Hyun-Jeong Seok
Alok Watve
Qiang Xue

Undergraduate Students

Chad Klochko
Rachel Radziszewski

Alumni

Gang Qian

Project Overview

This collaborative research project conducted jointly by the Michigan State University and the University of Michigan at Dearborn is to investigate the issues and techniques for supporting efficient similarity searches in multid imensional Non-ordered Discrete Data Spaces (NDDS). Similarity searches in N DDSs are becoming increasingly important for application areas such as genom e sequence databases. Efficient similarity searches require robust indexing techniques. Unfortunately, existing indexing methods are either not suitable for an NDDS (e.g., the R*-tree) or too generic to provide good performance for an NDDS (e.g., metric trees). The main goal of this project is to study the fundamental properties of NDDSs and develop indexing methods exploiting these properties to support efficient similarity searches in NDDSs. We intro duce a set of essential geometric concepts for an NDDS based on some similar concepts for a traditional (ordered) Continuous Data Spaces (CDS). Using th em, we explore and compare promising data-partitioning (DP) based and space- partitioning (SP) based indexing techniques for NDDSs, including index tree structures, building strategies, search algorithms and performance models. W e also study other related issues including supporting various types of quer ies, adopting different distance measures, indexing hybrid data spaces with mixed ordered and non-ordered dimensions, developing efficient bulk loading techniques, and utilizing effective compression schemes. This research will provide contemporary database indexing techniques to solve relevant issues i n the emerging fields such as bioinformatics, biometrics and E-commerce.

Project References

Project Web Sites

  • Web site at MSU
  • Web site at UM-D

  • --=_2byo1chufy6p--