Information-theoretic graph mining

www.lmu.de | UB | Blättern | Anleitung

Zur erweiterten Suche

English

Zur erweiterten Suche

Real world data from various application domains can be modeled as a graph, e.g. social networks and biomedical networks like protein interaction networks or co-activation networks of brain regions. A graph is a powerful concept to model arbitrary (structural) relationships among objects. In recent years, the prevalence of social networks has made graph mining an important center of attention in the data mining field. There are many important tasks in graph mining, such as graph clustering, outlier detection, and link prediction. Many algorithms have been proposed in the literature to solve these tasks. However, normally these issues are solved separately, although they are closely related. Detecting and exploiting the relationship among them is a new challenge in graph mining. Moreover, with data explosion, more information has already been integrated into graph structure. For example, bipartite graphs contain two types of node and graphs with node attributes offer additional non-structural information. Therefore, more challenges arise from the increasing graph complexity. This thesis aims to solve these challenges in order to gain new knowledge from graph data. An important paradigm of data mining used in this thesis is the principle of Minimum Description Length (MDL). It follows the assumption: the more knowledge we have learned from the data, the better we are able to compress the data. The MDL principle balances the complexity of the selected model and the goodness of fit between model and data. Thus, it naturally avoids over-fitting. This thesis proposes several algorithms based on the MDL principle to acquire knowledge from various types of graphs: Info-spot (Automatically Spotting Information-rich Nodes in Graphs) proposes a parameter-free and efficient algorithm for the fully automatic detection of interesting nodes which is a novel outlier notion in graph. Then in contrast to traditional graph mining approaches that focus on discovering dense subgraphs, a novel graph mining technique CXprime (Compression-based eXploiting Primitives) is proposed. It models the transitivity and the hubness of a graph using structure primitives (all possible three-node substructures). Under the coding scheme of CXprime, clusters with structural information can be discovered, dominating substructures of a graph can be distinguished, and a new link prediction score based on substructures is proposed. The next algorithm SCMiner (Summarization-Compression Miner) integrates tasks such as graph summarization, graph clustering, link prediction, and the discovery of the hidden structure of a bipartite graph on the basis of data compression. Finally, a method for non-redundant graph clustering called IROC (Information-theoretic non-Redundant Overlapping Clustering) is proposed to smartly combine structural information with non-structural information based on MDL. IROC is able to detect overlapping communities within subspaces of the attributes. To sum up, algorithms to unify different learning tasks for various types of graphs are proposed. Additionally, these algorithms are based on the MDL principle, which facilitates the unification of different graph learning tasks, the integration of different graph types, and the automatic selection of input parameters that are otherwise difficult to estimate.

Graph Mining, Graph Compression, Link Prediction, Minimum Description Length

Feng, Jing

11. Jun. 2015

2015

Englisch

Universitätsbibliothek der Ludwig-Maximilians-Universität München

https://nbn-resolving.org/urn:nbn:de:bvb:19-183384

Feng, Jing (2015): Information-theoretic graph mining. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik

Vorschau

PDF
Feng_Jing.pdf
1MB

DOI: 10.5282/edoc.18338

URN: urn:nbn:de:bvb:19-183384

Abstract

Dokumententyp:	Dissertationen (Dissertation, LMU München)
Keywords:	Graph Mining, Graph Compression, Link Prediction, Minimum Description Length
Themengebiete:	000 Allgemeines, Informatik, Informationswissenschaft 000 Allgemeines, Informatik, Informationswissenschaft > 004 Informatik
Fakultäten:	Fakultät für Mathematik, Informatik und Statistik
Sprache der Hochschulschrift:	Englisch
Datum der mündlichen Prüfung:	11. Juni 2015
1. Berichterstatter:in:	Böhm, Christian
MD5 Prüfsumme der PDF-Datei:	4271fe0ef7bebb9c5a9d977783a186cd
Signatur der gedruckten Ausgabe:	0001/UMC 23030
ID Code:	18338
Eingestellt am:	29. Jun. 2015 13:10
Letzte Änderungen:	23. Oct. 2020 22:01

Nur für Administratoren und Editoren: Dokument bearbeiten