Abstract
This paper analyses the graph mining problem, and the frequent pattern mining task associated with it. In general, frequent pattern mining looks for a graph which occurs frequently within a network or, in the transactional setting, within a dataset of graphs. We discuss this task in the transactional setting, which is a problem of interest in many fields such as bioinformatics, chemoinformatics and social networks. We look at the graph mining problem from a Knowledge Representation point of view, hoping to learn something about support for higher-order logics in declarative languages and solvers. Graph mining is studied as a prototypical problem; it is easily expressible mathematically and exists in many variations. As such, it appears to be a prime candidate for a declarative approach; one would expect this allows for a clear, structured, statement of the problem combined with easy adaptation to changing requirements and variations. Current state-of-the-art KR languages such as IDP and ASP aspire to be practical solvers for such problems (Bruynooghe, Theory Practice Logic Program. (TPLP) 15(6), 783–817 2015). Nevertheless, expressing the graph mining problem in these languages requires unexpectedly complicated and unintuitive encoding techniques. These techniques are in contrast to the ease with which one can transform the mathematical definition of graph mining to a higher-order logic specification, and distract from the problem essentials, complicating possible future adaptation. In this paper, we argue that efforts should be made towards supporting higher-order logic specifications in modern specification languages, without unintuitive and complicated encoding techniques. We argue that this not only makes representation clearer and more susceptible to future adaptation, but might also allow for faster, more competitive solver techniques to be implemented.
Similar content being viewed by others
References
Abramson, H., Rogers, H.: Meta-Programming in Logic Programming. MIT Press (1989)
Abrial, J.R.: The B-Book. Cambridge University Press. https://doi.org/10.1017/CBO9780511624162 (1996)
Abrial, J.R.: Modeling in Event-B: System and Software Engineering. Cambridge University Press (2010)
Aoga, J.O.R., Guns, T., Schaus, P.: An efficient algorithm for mining frequent sequence with constraint programming. In: Frasconi, P., Landwehr, N., Manco, G., Vreeken, J. (eds.) Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2016, Riva del Garda, Italy, September 19-23, 2016, Proceedings, Part II, Lecture Notes in Computer Science, vol. 9852, pp 315–330. Springer (2016), https://doi.org/10.1007/978-3-319-46227-1_20
Babai, L.: Graph isomorphism in quasipolynomial time. CoRR 1512.03547 (2015)
Bogaerts, B., Janhunen, T., Tasharrofi, S.: Solving QBF instances with nested SAT solvers. In: Darwiche, A. (ed.) Beyond NP, Papers from the 2016 AAAI Workshop, Phoenix, Arizona, USA, February 12, 2016., AAAI Workshops. http://www.aaai.org/ocs/index.php/WS/AAAIW16/paper/view/12603, vol. WS-16-05. AAAI Press (2016)
Bowen, J.P.: Formal Specification and Documentation using Z. International Thomson Computer Press (1996)
Brewka, G., Delgrande, J.P., Romero, J., Schaub, T.: asprin: Customizing answer set preferences without a headache. In: AAAI, pp. 1467–1474. AAAI Press (2015)
Bruynooghe, M., Blockeel, H., Bogaerts, B., de Cat, B., Pooter, S.D., Jansen, J., Labarre, A., Ramon, J., Denecker, M., Verwer, S.: Predicate logic as a modeling language: modeling and solving some machine learning and data mining problems with IDP3. Theory Practice Logic Program. (TPLP) 15(6), 783–817 (2015). https://doi.org/10.1017/S147106841400009X
de Cat, B., Denecker, M., Bruynooghe, M., Stuckey, P.J.: Lazy model expansion: Interleaving grounding with search. J. Artif. Intell. Res. 52, 235–286 (2015). https://doi.org/10.1613/jair.4591
Chen, W., Kifer, M., Warren, D.S.: Hilog: A foundation for higher-order logic programming. J. Logic Program. 15(3), 187–230 (1993)
Cuteri, B., Dodaro, C., Ricca, F., Schu̇ller, P.: Constraints, lazy constraints, or propagators in ASP solving: An empirical analysis. Theory Pract. Logic Program. (TPLP) 17(5-6), 780–799 (2017)
Dasseville, I., van der Hallen, M., Janssens, G., Denecker, M.: Semantics of templates in a compositional framework for building logics. Theory Pract. Logic Program. (TPLP) 15(4-5), 681–695 (2015). https://doi.org/10.1017/S1471068415000319
De Cat, B., Bogaerts, B., Bruynooghe, M., Janssens, G., Denecker, M.: Predicate logic as a modelling language: The IDP system. CoRR 1401.6312v2 (2016)
De Raedt, L., Guns, T., Nijssen, S.: Constraint programming for itemset mining. In: ACM SIGKDD, pp. 204–212 (2008)
Eiter, T., Fink, M., Ianni, G., Krennwallner, T., Redl, C., Schu̇ller, P.: A model building framework for answer set programming with external computations. Theory Pract. Logic Program. (TPLP) 16(4), 418–464 (2016)
Eiter, T., Ianni, G., Krennwallner, T.: Answer set programming: A primer. In: Reasoning Web, Lecture Notes in Computer Science, vol. 5689, pp. 40–110. Springer (2009)
Gebser, M., Kaminski, R., Kaufmann, B., Schaub, T.: Answer Set Solving in Practice. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan and Claypool Publishers (2012)
Gebser, M., Kaminski, R., Kaufmann, B., Schaub, T.: Clingo = ASP + control: Preliminary report. CoRR 1405.3694 (2014)
Gebser, M., Kaufmann, B., Schaub, T.: Solution enumeration for projected boolean search problems. In: Constraint Programming, Artificial Intelligence and Operations Research (CPAIOR), Lecture Notes in Computer Science, vol. 5547, pp. 71–86. Springer (2009)
Guyet, T., Moinard, Y., Quiniou, R., Schaub, T.: Efficiency Analysis of ASP Encodings for Sequential Pattern Mining Tasks, pp 41–81. Springer International Publishing, Cham (2018)
van der Hallen, M., Janssens, G.: A grounder from second-order logic to qbf. In: Quantified Boolean Formulas, Papers from the 2018 FLoC Quantified Boolean Formulas and Beyond Workshop, Oxford, England, July 8, 2018 (accepted), Federated Logic Conference (FLoC): workshop proceedings (2018)
van der Hallen, M., Paramonov, S., Leuschel, M., Janssens, G.: Knowledge representation analysis of graph mining. CoRR 1608.08956 (2016)
Immerman, N.: Descriptive complexity and model checking. In: Arvind, V., Ramanujam, R. (eds.) Foundations of Software Technology and Theoretical Computer Science, 18th Conference, Chennai, India, December 17-19, 1998, Proceedings, Lecture Notes in Computer Science, vol. 1530, pp 1–5. Springer (1998), https://doi.org/10.1007/978-3-540-49382-2_1
Järvisalo, M.: Itemset mining as a challenge application for answer set enumeration. Logic Programming and Nonmonotonic Reasoning (LPNMR), 304–310 (2011)
Kaufmann, B., Leone, N., Perri, S., Schaub, T.: Grounding and solving in answer set programming. AI Mag. 37(3), 25–32 (2016). http://www.aaai.org/ojs/index.php/aimagazine/article/view/2672
Kemmar, A., Lebbah, Y., Loudni, S., Boizumault, P., Charnois, T.: Prefix-projection global constraint and top-k approach for sequential pattern mining. Constraints 22(2), 265–306 (2017). https://doi.org/10.1007/s10601-016-9252-z
Lamport, L.: Specifying Systems, The TLA+ Language and Tools for Hardware and Software Engineers. Addison-Wesley (2002)
Leuschel, M., Butler, M.J.: ProB: An automated analysis toolset for the B method. STTT 10(2), 185–203 (2008)
Li, H., Yap, C.W., Ung, C.Y., Xue, Y., Cao, Z.W., Chen, Y.Z.: Effect of selection of molecular descriptors on the prediction of bloodbrain barrier penetrating and nonpenetrating agents by statistical learning methods. J. Chem. Inf. Model. 45(5), 1376–1384 (2005). https://doi.org/10.1021/ci050135u. PMID: 16180914
Lonsing, F., Biere, A.: Depqbf: A dependency-aware QBF solver. JSAT 7(2–3), 71–76 (2010). http://jsat.ewi.tudelft.nl/content/volume7/JSAT7_6_Lonsing.pdf
Lonsing, F., Egly, U., Gelder, A.V.: Efficient clause learning for quantified boolean formulas via QBF pseudo unit propagation. In: Järvisalo, M., Gelder, A.V. (eds.) Theory and Applications of Satisfiability Testing - SAT 2013 - 16th International Conference, Helsinki, Finland, July 8-12, 2013. Proceedings, Lecture Notes in Computer Science, vol. 7962, pp 100–115. Springer (2013), https://doi.org/10.1007/978-3-642-39071-5_9
McCarthy, J.: Elaboration tolerance. In: Working Papers of the Fourth International Symposium on Logical formalizations of Commonsense Reasoning, Commonsense-1998 (1998)
Muggleton, S., Raedt, L.D.: Inductive logic programming: Theory and methods. J. Log. Program. 19(/20), 629–679 (1994). https://doi.org/10.1016/0743-1066(94)90035-3
Nijssen, S., Kok, J.N.: Frequent graph mining and its application to molecular databases. In: Proceedings of the IEEE International Conference on Systems, Man & Cybernetics: The Hague, Netherlands, 10-13 October 2004, pp. 4571–4577. IEEE. https://doi.org/10.1109/ICSMC.2004.1401252 (2004)
Paramonov, S., Chen, T., Guns, T.: Generic mining of condensed pattern representations under constraints. In: CEUR: Young Scientist‘s Second International Workshop on Trends in Information Processing Proceedings (YSIP), vol. 1837, pp. 138–177 (2017)
Peitl, T., Slivovsky, F., Szeider, S.: Dependency learning for QBF. In: Gaspers, S., Walsh, T. (eds.) Theory and Applications of Satisfiability Testing - SAT 2017 - 20th International Conference, Melbourne, VIC, Australia, August 28 - September 1, 2017, Proceedings, Lecture Notes in Computer Science, vol. 10491, pp 298–313. Springer (2017), https://doi.org/10.1007/978-3-319-66263-_19
Rückert, U., Kramer, S.: Optimizing feature sets for structured data. In: Kok, J.N., Koronacki, J., de Mántaras, R.L., Matwin, S., Mladenic, D., Skowron, A. (eds.) Machine Learning: ECML 2007, 18th European Conference on Machine Learning, Warsaw, Poland, September 17-21, 2007, Proceedings, Lecture Notes in Computer Science, vol. 4701, pp 716–723. Springer (2007), https://doi.org/10.1007/978-3-540-74958-5_72
Silva, J.P.M., Sakallah, K.A.: GRASP - a new search algorithm for satisfiability. In: International Conference on Computer-Aided Design (ICCAD), San Jose, California, USA, November 10-14 1996, pp. 220–227 (1996)
Takigawa, I., Mamitsuka, H.: Graph mining: Procedure, application to drug discovery and recent advances. Drug Discov. Today 18(1), 50–57 (2013). https://doi.org/10.1016/j.drudis.2012.07.016. http://www.sciencedirect.com/science/article/pii/S1359644612002759
Weinzierl, A.: Blending lazy-grounding and CDNL search for answer-set solving. In: Logic Programming and Nonmonotonic Reasoning (LPNMR), Lecture Notes in Computer Science, vol. 10377, pp. 191–204. Springer (2017)
Yan, X., Han, J.: gspan: Graph-based substructure pattern mining. In: Proceedings of the 2002 IEEE International Conference on Data Mining, ICDM ’02, pp 721–. IEEE Computer Society, Washington (2002)
Acknowledgements
We would like to thank the reviewers for their insightful and helpful comments, thanks to which many improvements could be made.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Matthias van der Hallen is supported by a Ph.D. fellowship from the Research Foundation - Flanders (FWO - Vlaanderen).
Rights and permissions
About this article
Cite this article
van der Hallen, M., Paramonov, S., Janssens, G. et al. Knowledge representation analysis of graph mining. Ann Math Artif Intell 86, 21–60 (2019). https://doi.org/10.1007/s10472-019-09624-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10472-019-09624-y
Keywords
- Knowledge representation
- Higher order
- Graph mining
- Answer set programming
- Imperative declarative programming