Finding Second-Order Clubs
Lu, Yajun
Citations
Abstract
Modeling data entities and their pairwise relationships as a graph is a popular technique to visualizing and mining information from datasets in a variety of fields such as social networks, biological networks, web graphs, and document networks. A powerful technique in this setting involves the detection of clusters. Clique, a subset of pairwise adjacent vertices, is often viewed as an idealized representation of a cluster. However, in the presence of errors in the data on which the graph is based, clique requirement may be too restrictive, resulting in small clusters or clusters that miss key members. Consequently, graph-theoretic clique generalizations based on the principle of relaxing elementary structural properties of a clique have been proposed in diverse fields to describe clusters of interest. For example, an s-club is a distance-based clique relaxation originally introduced in social network analysis to model cohesive social subgroups. In this dissertation, we consider low-diameter clusters that require another property like robustness, heredity, or connectedness (parameterized by r) to hold, in addition to the diameter. Specifically, we study s-clubs with side-constraints to make them less “fragile”, i.e., less susceptible to increase in the diameter if vertices (and edges) are deleted. The overall goal of this dissertation is to develop effective exact algorithms with an emphasis on s = 2, 3, 4 and low values of r to solve the maximum r-robust s-club and r-hereditary s-club problems on moderately large instances (around 10,000 vertices and less than 5% density). We analyze the complexity of the associated feasibility testing and optimization problems. Cut-like formulations are proposed for the maximum r-robust s-club problem with r ≥ 2 and s ∈ {2, 3, 4}. We explore preprocessing techniques and develop a graph decomposition approach for solving such problems. The computational benefits of each of the algorithmic ideas are empirically evaluated through our computational studies. Our approach permits us to solve problems optimally on very large and sparse real-life networks.