Attributed Graph Mining

Illustration of the possible extensions of a pattern with automorphisms.

Active from 2012 to 2014

Research rationale

Graphs are well suited to model complex structures present in the real world. Because of this ubiquitousness, graphs are extensively studied in graph theory, and, more recently, in the field of data mining. Until the 2000s, most research focused either on unlabeled graphs or on graphs with nodes associated with a single label. However, in many applications, objects (represented by nodes) are associated with multiple characteristics, and these can be represented as node attributes. Graphs in which nodes are annotated with sets of attributes (or itemsets) are named attributed graphs and up to now, only few studies are devoted to their analysis.

Mining attributed graphs is very difficult because the search space is much larger than for labeled graphs. However, there is a need for effective methods that can help identify hidden structural patterns, but which can also highlight the relationship between node attributes.

Results

A new mining algorithm combining itemset extension and structural extension

The strategy for searching for frequent patterns that we have proposed is to start from a set of initial patterns composed of nodes associated with a single item and to build the search tree from the spanning tree originating from these nodes, using an order based on a code that we have defined. We proposed a complete method for navigating the search space that combines two extension types: itemset extension and structure extension. We empirically defined the notion of closure on attributed graphs by considering that an attributed graph is closed if it is not included in any other attributed graph that has the same support as it. We have also proposed two concise representations of the patterns that are defined either according to the inclusion on itemsets (c-closed patterns), or according to the inclusion on the structures (s-closed patterns). We have shown that the enumeration of c-closed patterns allows to drastically reduce both the number of returned patterns and the execution time. Tests have shown that this condensed representation offers a good compromise between speed of execution and conciseness of results1 2 3.

Handling of cycles and isomorphic patterns

The consideration of cycles in a graph required a special treatment of the isomorphic patterns that are inevitably generated for all explorations that start on another node that is part of the cycle. Patterns that have many subgraph isomorphisms with the analyzed pattern present difficulties for all existing algorithms because the problem of subgraph isomorphisms is NP-complete. We have proposed two optimizations that make it possible, on the one hand, to trim the search tree generated from an automorphic pattern and, on the other hand, to delete certain ways of obtaining automorphic patterns that do not allow new canonical patterns to be generated 4 5.

A new condensed representation of weighted paths

We have addressed the problem of extracting frequent weighted paths in a single attributed directed acyclic graph (aDAG) where each weight expresses the frequency of a transition. Frequent paths are used to analyze the causal relationship between sequences of events and/or attributes. As the number of patterns can be very large, we have designed a condensed representation for such collections6 7 8.

Integrating mathematical models defined by experts into the extraction process

By noting that in many data science contexts, experts have often capitalized part of their knowledge in mathematical models, we have proposed to use these models to derive new constraints that can be used during the data mining phase to improve both pattern relevancy and computational efficiency We have defined a method of patterns mining under constraint of a modele. We also studied some properties of predicates and constraints in order to use them to optimize pattern calculations. We have shown that taking into account constraints from mathematical models makes it possible to better target analysis, while improving performance through model properties. We have thus obtained more relevant patterns, complementing or contradicting the expert knowledge on the studied phenomena 9 10.

Funding

ProgramANR Program FOSTER
Year2011-2013
FunderANR
Grant nameSpatio-temporal data mining: application to the understanding and monitoring of soil erosion
Grant idANR-10-COSI-012
Project coordinatorNazha-Selmaoui Folcher

Softwares

  • AADAGE: Extraction of frequent patterns in attributed graphs
  • IMIT: Mining frequent patterns in attributed trees

  1. (2013). Frequent Pattern Mining in Attributed Trees. 17th Pacific Asia Conference on Knowledge Discovery and Data Mining (PAKDD'13), J. Pei et al. (Eds.): PAKDD 2013, Part I, LNAI 7818, Pp. 26–37. Springer, Heidelberg (2013).

    PDF

    ↩︎

  2. (2013). Extraction de motifs fréquents dans des arbres attribués. 13ème Conférence Francophone sur l’Extraction et la Gestion des Connaissances (EGC'13). Revue des Nouvelles Technologies de l’Information, volume E-24.

    PDF Conference Site

    ↩︎

  3. ↩︎

  4. (2014). Extraction de motifs dans des graphes orientés attribués en présence d’automorphisme. 14ème Conférence Francophone sur l’Extraction et la Gestion des Connaissances (EGC'14), Revue des Nouvelles Technologies de l’Information, volume E-26.

    PDF Conference Site

    ↩︎

  5. ↩︎

  6. ↩︎

  7. (2013). Extraction de motifs condensés dans un unique graphe orienté acyclique attribué. 13ème Conférence Francophone sur l’Extraction et la Gestion des Connaissances (EGC'13), Revue des Nouvelles Technologies de l’Information, volume E-24.

    PDF Conference Site

    ↩︎

  8. ↩︎

  9. (2014). Improving Pattern Discovery Relevancy by Deriving Constraints from Expert Models. 21st European Conference on Artificial Intelligence (ECAI'14), Proceedings Published by IOS Press in Frontiers in Artificial Intelligence and Applications Serie Volume 263.

    PDF Conference Site

    ↩︎

  10. ↩︎

Claude Pasquier
Claude Pasquier
Researcher in Computer Science / Computational Biology

Université côte d’Azur, CNRS, I3S Laboratory, Sophia Antipolis

Related