National
Science Foundation Award Number: # 0712987 (August 1, 2007 - July 31, 2010)
Contact Information:
Pang-Ning Tan, PI
Department of Computer Science and
Engineering
3115 Engineering Building
Michigan State University
East Lansing, MI 48824-1226
Phone (517) 432 9240
E-mail: ptan at cse dot msu dot edu
URL: http://www.cse.msu.edu/~ptan
List of Collaborators:
- Vipin Kumar (U of Minnesota)
- Michael Steinbach (U of Minnesota)
- Chris Potter (NASA Ames)
- Steve Klooster (NASA Ames)
- Julie Winkler (Michigan State University)
- Sharon Zhong (Michigan State University)
List of Supported Students:
- Haibin Cheng (PhD student)
- Zubin Abraham (PhD student)
- Marie Buckner (Undergraduate student)
Project Award Information:
- Title: III-CXT: Collaborative Research: Spatio-Temporal
Data Mining For Global Scale Eco-Climatic Data
- Award Number:# 0712987 (joint collaboration
with
U of Minnesota, Award Number: 0713227)
- Duration: August 1, 2007 - July 31, 2010
- NSF directorate and division:NSF ORG:
IIS, Division of Information & Intelligent Systems
- Award Abstract : http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=0712987
- Keywords: data mining,
spatio-temporal data analysis, predictive modeling, anomaly detection
Project Summary:
The
overall goal of this project, in collaboration with researchers from University
of Minnesota and NASA Ames, is to develop novel data mining techniques to
enhance our understanding of the complex relationships between global carbon
cycle and climate systems. Towards this end, our research activities at
Michigan State University have
focused on the following two areas: - Development of novel anomaly
detection algorithms for detecting and characterizing ecosystem disturbance
events (such as forest fires, droughts, floods, and deforestation) from
global-scale climate and vegetation cover data. The problem is challenging due
to several reasons. First, it is difficult to establish a formal definition of
an anomaly that encompasses all types of physical, biological, and
anthropogenic disturbance events. Second, current anomaly detection algorithms
are susceptible to the presence of noise in the data. Third, distinguishing
man-made from climate-induced disturbance events is a challenging task.
Finally, the large volume of data that must be processed have made real-time
exploratory analysis of disturbance events computationally infeasible. The
research activities of this project are directed toward (1) developing an
effective anomaly detection algorithm for disturbance event detection that is
resilient to noise and leverages information from multivariate time series
(e.g., climate and vegetation cover data) to improve detection rate, (2)
developing efficient techniques for processing the large-scale data to
facilitate the visual exploration of ecosystem disturbances in near
real-time.
- Development of predictive modeling techniques for
climate impact assessment studies. Scientists are interested in obtaining
reliable long-term projections of future climate scenarios to enable accurate
assessment of their potential impacts on the ecosystem and society. One way to
perform long-term forecasting is by repeatedly invoking a model that makes its
prediction one step at a time. However, since the model uses predicted values
from the past to infer future values, this approach may lead to an error
accumulation problem. Our objective is to improve future climate projection by
combining historical climate data with future data obtained from
computer-generated global climate models using a semi-supervised learning
approach. Second, although effective predictive modeling methods such as
support vector machines (SVM) have been developed in the machine learning
literature, adapting the methods to spatial and temporal data is not a trivial
task. Since the models are globally constructed from the entire training data,
they do not take into consideration constraints due to the local spatial and
temporal neighborhoods of the data points. Our objective is to develop local
support vector machine models that take into consideration the spatial and
temporal neighborhood of the data.
Training and Development:
The grant has supported 2 PhD students and 1 female undergraduate student. The project provides research and practical
experience to train students on how to conduct inter-disciplinary research in Computer Science, with application to the
Earth Science domain.
Publications:
- Shyam Boriah, Vipin Kumar, Michael
Steinbach, Pang-Ning Tan, Chris Potter, and Steve Klooster.
Detecting Ecosystem Disturbances and Land Cover Change using Data Mining,
Next Generation of Data Mining, Chapman & Hall/CRC, 2008.
- Haibin Cheng, Pang-Ning Tan. Semi-supervised Learning
with Data Calibration for Long-Term Time Series Forecasting, Proc of
the ACM SIGKDD International Conference on Data Mining, August, 2008.
- Haibin Cheng, Pang-Ning Tan, Christopher Potter, Steve Klooster. Data Mining
for Visual Exploration and Detection of Ecosystem Disturbances, Proc
of 16th ACM SIGSPATIAL International Conference on Advances in Geographic
Information Systems (ACM GIS), November, 2008
- Haibin Cheng, Pang-Ning Tan, Christopher Potter, Steve Klooster. A Multivariate A Robust Graph-Based
Algorithm for Detection and Characterization of Anomalies in Noisy
Multivariate Time Series, ICDM Workshop on Spatial and Spatio-temporal
Data Mining (SSTDM 2008), 2008.
- Haibin Cheng, Pang-Ning Tan, Christopher Potter, Steve Klooster. Detection
and Characterization of Anomalies in Multivariate Time Series. Proc of
SIAM International Conference on Data Mining Proc of the SIAM
International Conference on Data Mining, 2009.
- Haibin Cheng, Pang-Ning Tan, Rong Jin. Efficient
Algorithm for Localized Support Vector Machine. IEEE
Transactions on Knowledge and Data Engineering, 22(4): 537-549,
2009.
- Zubin Abraham, Pang-Ning Tan. A Semi-supervised Framework for Simultaneous Classification and
Regression of Zero-Inflated Time Series Data with Application to Precipitation Prediction,
Proc of the IEEE Workshop on Spatial and Spatiotemporal Data Mining, 2009.
- Zubin Abraham, Pang-Ning Tan.
An Integrated Framework for Simultaneous Classification and
Regression of Time Series Data, Proc of
the SIAM International Conference on Data Mining, 2010.
Presentations:
-
Pang-Ning Tan, Analysis and Modeling of Eco-climatic Data,' Virginia Tech University (2008).
-
Pang-Ning Tan, Data Mining for Analysis and Modeling of Eco-climatic Data, George Mason University (2008).
-
Pang-Ning Tan, Pattern Discovery and Predictive Modeling of Earth Science Data, Universitas Indonesia (2009).
-
Pang-Ning Tan, Predictive Modeling of Earth Science Data, Michigan State University, presentation to UCAR site visitors
from National Center for Atmospheric Research (2010).
Online Software:
-
Code for
time series anomaly
detection.
-
Code for our SDM 2010 paper can be downloaded
here.
Broader Impacts:
The techniques developed in this project have been applied to real-world applications in the Earth Science domain
(disturbance event detection and statistical downscaling). In collaboration with researchers at University of Minnesota
and NASA Ames, an interactive viewer that incorporates the algorithm for disturbance event detection was developed. The
viewer serves as a tool to assist our Earth Science collaborators in exploring the events derived from eco-climatic data.