III-CXT: Collaborative Research: Spatio-Temporal Data Mining For Global Scale Eco-Climatic Data

National Science Foundation Award Number: # 0712987 (August 1, 2007 - July 31, 2010)

 

Contact Information:

Pang-Ning Tan, PI
Department of Computer Science and Engineering
3115 Engineering Building
Michigan State University
East Lansing, MI 48824-1226
Phone (517) 432 9240
E-mail: ptan at cse dot msu dot edu     URL: http://www.cse.msu.edu/~ptan

List of Supported Students:

Project Award Information:

Project Summary:

Context: The remote sensing data that consists of satellite observations of the land surface, biosphere, solid Earth, atmosphere, and oceans, combined with historical climate records and predictions from ecosystem models, offers new opportunities for understanding how the Earth is changing, for determining what factors cause these changes, and for predicting future changes. In turn, this could provide an unprecedented opportunity for predicting and preventing future ecological problems by managing the ecology and health of our planet. Data mining and knowledge discovery techniques can aid this effort by discovering patterns that capture complex interactions among ocean temperature, air pressure, surface meteorology, and terrestrial carbon flux.

 

Intellectual Merit  The goals of this work are twofold: 1) to better understand global scale patterns in biosphere processes, particularly patterns in the global carbon cycle and climate system. More specifically, the proposed data mining research is driven by the need to address the following two challenges: (i) understanding how ocean, atmosphere and land processes are coupled and (ii) detecting and predicting ecosystem disturbances such as fires, floods, and hurricanes. 2) to support innovative Computer Science (CS) research in data mining. In particular, the spatio-temporal nature of Earth Science data means that standard CS data mining techniques often cannot be directly applied. As an example, in Earth Science research, a key step is the selection of the locations and time periods that are to be used to investigate possible relationships between two Earth Science phenomena, e.g., El Nino and milder winters in the Midwestern United States. Currently, this selection is based on domain knowledge, but automating this process would be very beneficial. The proposed work will develop new data mining techniques that address the high dimensionality, large size, and spatio-temporal nature of the data.

 

Broader Impacts New algorithms and techniques for the analysis of large spatio-temporal data sets developed in this project will be made available to other researchers within the Earth Science community. To a large extent, this project will encapsulate these algorithms within easy to use visual tools so that users will be able to more easily extract useful knowledge from Earth Science data sets. Although the focus will be on Earth Science data, the data mining techniques that are developed will be applicable to a wide variety of other fields that have data collected over time on a spatial grid. To give a specific example, spatio-temporal clustering has been used to track cyclones and animal migrations, and to model mobile phone users and neuronal activities in the brain, and the new spatio-temporal clustering techniques could also prove useful for these applications.


Duration: 3 years

Publications:

  1. Haibin Cheng, Pang-Ning Tan. Semi-supervised Learning with Data Calibration for Long-Term Time Series Forecasting, Proc of the ACM SIGKDD International Conference on Data Mining, August, 2008.
  2. Haibin Cheng, Pang-Ning Tan, Christopher Potter, Steve Klooster. Data Mining for Visual Exploration and Detection of Ecosystem Disturbances, Proc of 16th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM GIS), November, 2008
  3. Haibin Cheng, Pang-Ning Tan, Christopher Potter, Steve Klooster. A Multivariate A Robust Graph-Based Algorithm for Detection and Characterization of Anomalies in Noisy Multivariate Time Series, ICDM Workshop on Spatial and Spatio-temporal Data Mining (SSTDM 2008), 2008.
  4. Haibin Cheng, Pang-Ning Tan, Christopher Potter, Steve Klooster. Detection and Characterization of Anomalies in Multivariate Time Series. Proc of SIAM International Conference on Data Mining (SDM 2009), 2009.
  5. Haibin Cheng, Pang-Ning Tan, Rong Jin. Efficient Algorithm for Localized Support Vector Machine. To appear in IEEE Transactions on Knowledge and Data Engineering, DOI: 10.1109/TKDE.2009.116, 2009.

Contributions to Resources for Research and Education:

Software: A Matlab-based visualization tool for detecting and clustering ecosystem disturbance events has been developed. The software is currently being evaluated by our research collaborators Christopher Potter (NASA Ames) and Steve Klooster (California State University). A description of the disturbance event viewer is available here. The tools and algorithms developed in the project will become publicly available in the near future from this web site.

 

Education: Tan has taught a graduate-level data mining course at the Michigan State University during Fall 2007. The course included lectures on anomaly detection and the application of data mining to the Earth Science domain.