Cancer/Pollution Interaction

Cancer/Pollution Interaction

Welcome to the project visualization page by Anu Pakanati, Daniel Couvertier, and Elos Eden for our CSE 881 Data Mining Course at Michigan State University taught by Dr. Pang-Ning Tan. In this project, we combined data from the National Cancer Institute Database and the Environmental Protection Agency into a single database organized by county, discretized the data into equal frequency bins using WEKA, and finally used Christian Borgelt's Apriori algorithm to do association rule mining. The rules generated were those found with a minimum support of 5%, a minimum confidence of 50%, and a lift of at least 10%.

We are interested in finding rules of the form pollution -> cancer, since those may be (but are not necessarily) indicative of a causal link between pollution levels and cancer rates.

Our visualization is done using the Google Maps API and a very cool tool, PolyGonzo by Ernesto Delgado and Michael Geary. Charts were generated offline using MatPlotLib and Python.

Below you can interact with the rules that we found. Choose a cancer and one or more pollutants and see what turns up!

Select a Cancer:


Select Pollutants (at least 1):

CO - Carbon Monoxide
NOX - Nitrogen Oxides
VOC - Volatile Organic Compounds
SO2 - Sulfur Dioxide
PM10 - Particulate Matter (10 micrometer diameter or less)
PM25 - Particulate Matter (2.5 micrometer diameter or less)
NH3 - Ammonia
TOTEMIS - Total Emissions