
Graph and network mining has
leaped to the forefront of data mining research, spurred by an avalanche of
structured data from applications such as bioinformatics, cheminformatics,
online social networks, and sensor networks. This structured data is often best
represented either as a set of independent graphs or as a large network of
interconnected nodes. The proliferation of graph and network data is both an
opportunity and a challenge. Graph mining is playing an increasingly important
role in the analysis of highly structured data such as chemical compounds,
proteins, VLSI designs, and program execution traces. Examples of graph mining
applications include predicting protein function (graph classification),
searching for compounds with certain substructures (graph similarity search), and
detecting software bugs in programs (graph anomaly detection). Network mining
offers the opportunity for analyzing large collections of inter-related objects
such as Web graphs, social networks, and biological networks. Common
applications are assessing the spread of epidemics (influence maximization),
predicting future collaboration between authors (link prediction), and finding
authoritative web pages (link-based node ranking). Traditional algorithms,
which typically assume that objects are independent and identically distributed
(i.i.d), are not appropriate for mining such data.
Furthermore, some of the data mining tasks of interest (particularly in network
data) have no counterparts in record-based data (e.g., influence maximization
and link prediction). Thus, new techniques are needed for modeling and
analyzing the graph and network data.