Mining Graphs and Network Data

Team Members:

Overview:

Graph and network mining has leaped to the forefront of data mining research, spurred by an avalanche of structured data from applications such as bioinformatics, cheminformatics, online social networks, and sensor networks. This structured data is often best represented either as a set of independent graphs or as a large network of interconnected nodes. The proliferation of graph and network data is both an opportunity and a challenge. Graph mining is playing an increasingly important role in the analysis of highly structured data such as chemical compounds, proteins, VLSI designs, and program execution traces. Examples of graph mining applications include predicting protein function (graph classification), searching for compounds with certain substructures (graph similarity search), and detecting software bugs in programs (graph anomaly detection). Network mining offers the opportunity for analyzing large collections of inter-related objects such as Web graphs, social networks, and biological networks. Common applications are assessing the spread of epidemics (influence maximization), predicting future collaboration between authors (link prediction), and finding authoritative web pages (link-based node ranking). Traditional algorithms, which typically assume that objects are independent and identically distributed (i.i.d), are not appropriate for mining such data. Furthermore, some of the data mining tasks of interest (particularly in network data) have no counterparts in record-based data (e.g., influence maximization and link prediction). Thus, new techniques are needed for modeling and analyzing the graph and network data.

Publications:

  1. Jerry Scripps, Pang-Ning Tan, Feilong Chen, and Abdol-Hossein Esfahanian. A Matrix Alignment Approach for Link Prediction, to appear in Proc of the 19th Int’l Conf on Pattern Recognition (ICPR-08), Tampa, Florida, Dec 8-11 (2008)
  2. Feilong Chen, Jerry Scripps, and Pang-Ning Tan. Link Mining for a Social Bookmarking Web Site, to appear in Proc of IEEE/WIC/ACM International Conference on Web Intelligence (WI-2008), Sydney, Australia, December 9-12 (2008)
  3. Jerry Scripps, Pang-Ning Tan, and Abdol-Hossein Esfahanian. Exploring the Link Structure and Community-based Node Roles in Networked Data, In Proc of IEEE Int'l Conf. on Data Mining (ICDM-07), Omaha, Nebraska, Oct 28-31 (2007)
  4. Jerry Scripps, Pang-Ning Tan and Abdol-Hossein Esfahanian. Node Roles and Community Structure in Networks. In Proc of WebKDD, San Jose, CA, August (2007).
  5. Kapila Moonesinghe, Hamed Valizadegan, Samah Fodeh, and Pang-Ning Tan. A Probabilistic Substructure-Based Approach for Graph Classification, In Proc of IEEE Int'l Conf on Tools with Artificial Intelligence (ICTAI-07), Patras, Greece, October 29-31 (2007).