Incorporating
Background Knowledge into Data Mining
Team Members:
- Pang-Ning Tan (Faculty advisor)
- Haibin Cheng (PhD student)
- Samah Fodeh (PhD student)
Overview:
Data mining is a large-scale
data analysis endeavor that has found its applications in diverse areas such as
business intelligence, computer security, bioinformatics, and geo-sciences.
Despite its tremendous promise, data mining is often plagued by the fundamental
problem that the models and patterns derived from data tend to be inferior
compared to human expertise. High false alarm rates, clusters that are hard to
interpret, and spuriousness of discovered patterns are among the typical
grievances voiced by users when applying data mining techniques in a practical
setting. These problems arise because many data mining techniques begin from tabula rasa, or the
blank state, where the underlying algorithms have no innate knowledge of the
particular domain. Significant advances must therefore be made if data mining
is to be successfully deployed in critical applications such as computer
security and medical diagnostic systems. This project seeks to improve the
effectiveness of data mining by incorporating background knowledge
automatically into the mining process. Specifically, this project aims to
achieve this goal by developing innovative algorithms for combining information
from multiple sources and exploring new problem areas that may benefit from
using background knowledge.
Publications:
- Haibin Cheng and Pang-Ning Tan. Semi-supervised Learning with Data
Calibration for Long-Term Time Series Forecasting. To appear in Proc of
ACM SIGKDD Int’l Conf on Data Mining (KDD-2008), Las Vegas, Nevada, August 24-27 (2008)
- Samah Fodeh
and Pang-Ning Tan Incorporating Background
Knowledge for Subjective Rule Evaluation, In Proc of IEEE Int'l Conf on
Tools with Artificial Intelligence (ICTAI-07), Patras,
Greece, October 29-31 (2007).
- Jing Gao, Pang-Ning Tan,
and Haibin Cheng, Semi-supervised Clustering
with Partial Background Information. In Proc of SDM'06: SIAM Int'l Conf. on Data Mining, Bethesda, MD, Apr 20-22 (2006).
- Jing Gao, Haibin Cheng, and
Pang-Ning Tan, A Novel Framework for
Incorporating Labeled Examples into Anomaly Detection. In Proc of SDM'06: SIAM Int'l Conf. on Data Mining, Bethesda, MD, Apr 20-22 (2006).
- Jerry Scripps and Pang-Ning Tan, Clustering in the
Presence of Bridge-Nodes. In Proc of SDM'06: SIAM Int'l Conf. on Data Mining, Bethesda, MD, Apr 20-22 (2006).
- Pang-Ning Tan and Rong Jin, Ordering Patterns by Combining Opinions from
Multiple Sources, Proc of the Tenth ACM SIGKDD Int'l Conf on Knowledge
Discovery and Data Mining (KDD-2004), Seattle, WA, Aug 22-25 (2004).