Collaborative Research: Support Efficient Similarity Searches for Multidimen=
sional Non-ordered Discrete Data Spaces
Supported by the National Science Foundation=
Research Grants
=
Principal Investigator at MSU
Dr. Sakti Pramanik
Department of Computer Science and Engineering,
Michigan State University, East Lansing, MI, USA
pramanik@msu.edu
(NSF Grant for MSU: IIS-0414576)
Principal Investigator at UM-D
Dr. Qiang Zhu
Department of Computer and Information Science,
The University of Michigan, Dearborn, MI, USA
qzhu@umich.edu
(NSF Grant for UM-D: IIS-0414594)
Graduate Students
Changqing Chen
Dashiell Matthews Kolbe
Hyun-Jeong Seok
Alok Watve
Qiang Xue
Undergraduate Students
Chad Klochko
Rachel Radziszewski
Alumni
Gang Qian
Project Overview
This collaborative research project conducted jointly by the Michigan State =
University and the University of Michigan at Dearborn is to investigate the =
issues and techniques for supporting efficient similarity searches in multid=
imensional Non-ordered Discrete Data Spaces (NDDS). Similarity searches in N=
DDSs are becoming increasingly important for application areas such as genom=
e sequence databases. Efficient similarity searches require robust indexing =
techniques. Unfortunately, existing indexing methods are either not suitable=
for an NDDS (e.g., the R*-tree) or too generic to provide good performance =
for an NDDS (e.g., metric trees). The main goal of this project is to study =
the fundamental properties of NDDSs and develop indexing methods exploiting =
these properties to support efficient similarity searches in NDDSs. We intro=
duce a set of essential geometric concepts for an NDDS based on some similar=
concepts for a traditional (ordered) Continuous Data Spaces (CDS). Using th=
em, we explore and compare promising data-partitioning (DP) based and space-=
partitioning (SP) based indexing techniques for NDDSs, including index tree =
structures, building strategies, search algorithms and performance models. W=
e also study other related issues including supporting various types of quer=
ies, adopting different distance measures, indexing hybrid data spaces with =
mixed ordered and non-ordered dimensions, developing efficient bulk loading =
techniques, and utilizing effective compression schemes. This research will =
provide contemporary database indexing techniques to solve relevant issues i=
n the emerging fields such as bioinformatics, biometrics and E-commerce.
Project References
-
Dashiell Kolbe, Qiang Zhu and Sakti Pramanik, "Efficient k-Nearest Neighbor =
Searching in Non-ordered Discrete Data Spaces", ACM Transactions on Infor=
mation Systems (TOIS), 2009 (to appear).
-
Hyun-Jeong Seok, Gang Qian, Qiang Zhu, Alexander Oswald and Sakti Pramanik,
"Bulk-Loading the ND-Tree in Non-ordered Discrete Data Spaces", Proc=
. of 13th International Conference on Database Systems for Advanced Applicat=
ions (DASFAA'08), pp 156-171, New Delhi, India, March 19 - 22, 2008.
-
Qiang Xue, James Cole and Sakti Pramanik, "Sequence Homology Search Based on=
Database Indexing Using the Profile Hidden Markov Model", Proc. of IEEE =
International Conference on Bioinformatics and Bioengineering (BIBE'06),=
pp 135 - 140, Washington, DC, Oct. 16-18, 2006.
Project Web Sites
Web site
at MSU
Web sit=
e
at UM-D