III-CXT: Collaborative Research: Spatio-Temporal Data
Mining For Global Scale Eco-Climatic DataPang-Ning Tan, PI
Department of Computer Science and
Engineering
3115 Engineering Building
Michigan State University
East Lansing, MI 48824-1226
Phone (517) 432 9240
E-mail: ptan at cse dot msu dot edu
URL: http://www.cse.msu.edu/~ptan
Context: The remote sensing data that consists of satellite
observations of the land surface, biosphere, solid Earth, atmosphere, and
oceans, combined with historical climate records and predictions from ecosystem
models, offers new opportunities for understanding how the Earth is changing,
for determining what factors cause these changes, and for predicting future
changes. In turn, this could provide an unprecedented opportunity for
predicting and preventing future ecological problems by managing the ecology
and health of our planet. Data mining and knowledge discovery techniques can
aid this effort by discovering patterns that capture complex interactions among
ocean temperature, air pressure, surface meteorology, and terrestrial carbon
flux.
Intellectual Merit The
goals of this work are twofold: 1) to better understand global scale patterns
in biosphere processes, particularly patterns in the global carbon cycle and
climate system. More specifically, the proposed data mining research is driven
by the need to address the following two challenges: (i)
understanding how ocean, atmosphere and land processes are coupled and (ii)
detecting and predicting ecosystem disturbances such as fires, floods, and
hurricanes. 2) to support innovative Computer Science (CS) research in data
mining. In particular, the spatio-temporal nature of
Earth Science data means that standard CS data mining techniques often cannot
be directly applied. As an example, in Earth Science research, a key step is
the selection of the locations and time periods that are to be used to
investigate possible relationships between two Earth Science phenomena, e.g.,
El Nino and milder winters in the Midwestern United States. Currently, this
selection is based on domain knowledge, but automating this process would be
very beneficial. The proposed work will develop new data mining techniques that
address the high dimensionality, large size, and spatio-temporal
nature of the data.
Broader Impacts New algorithms and techniques for the analysis of
large spatio-temporal data sets developed in this
project will be made available to other researchers within the Earth Science
community. To a large extent, this project will encapsulate these algorithms
within easy to use visual tools so that users will be able to more easily
extract useful knowledge from Earth Science data sets. Although the focus will
be on Earth Science data, the data mining techniques that are developed will be
applicable to a wide variety of other fields that have data collected over time
on a spatial grid. To give a specific example, spatio-temporal
clustering has been used to track cyclones and animal migrations, and to model
mobile phone users and neuronal activities in the brain, and the new spatio-temporal clustering techniques could also prove
useful for these applications.
Duration:
3 years
Software:
A Matlab-based visualization tool for detecting and
clustering ecosystem disturbance events has been developed. The software is
currently being evaluated by our research collaborators Christopher Potter
(NASA Ames) and Steve Klooster (California State
University). A description of the disturbance event viewer is available here. The tools and algorithms developed in the project
will become publicly available in the near future from this web site.
Education: Tan has taught a graduate-level data mining course
at the Michigan State University during Fall 2007. The course included lectures
on anomaly detection and the application of data mining to the Earth Science
domain.