Cluster Detection via the Sigma Function

Perhaps the most natural notion of clustering in a graph is the clique number, namely, given a graph G what is the largest subgraph isomorphic to a complete graph? While much can be said deterministically (and even probabilistically), computing this statistic is generally very hard. However, the problem of determining large clusters remains of great interest.

One avenue to skirt the difficulty of computing clique number is to relax our problem to a similar, but more computationally tractable approach. Thus, we shall define a statistic known as the sigma-function of a graph, which we denote by σ(G). For our purposes, it suffices to note that the sigma-function gives a real value bounded between the clique and chromatic number. Further, the sigma-function arises from optimization over all positive edge-weightings of G.

In what follows we present the sigma-function of each graph and a drawing. We indicate edge weights using the color scheme below; note that edge weights are normalized so that the largest weight always has value one. The reader who is interested in the definitions and analysis of the sigma-function are encouraged to see a paper co-authored with Fan Chung, found here. The Matlab™ code used to produce these graphs is found at the bottom of the page.


Key


Each graph consists of a drawing (with weighting), a degree plot, basic statistics, and both a Matlab™ and Graphviz DOT file representation.

Largest Component of Erdős 1


edges=130, nodes=33, sigma=6.0270
Matrix File   Dot File

Graph I had lying around


edges=98, nodes=28, sigma=5.0270
Matrix File   Dot File

Dr. Seuss word graph


edges=103, nodes=76, sigma=3
Matrix File   Dot File

Largest Component of NBA Playoff Graph


edges=31, nodes=28, sigma=2.1099
Matrix File   Dot File

Grötzsch Graph


edges=20, nodes=11, sigma=2.3997
Matrix File   Dot File

Erdős-Renyi Random Graph G(100, 6/100)


edges=316, nodes=100, sigma=3.1966
Matrix File   Dot File
Note: This took about an hour to run on math107. Our solver also starts to max out in capability soon after this.

Partial Duplication Largest Component (p=.4, n=300)


edges=86, nodes=52, sigma=5.0000
Matrix File   Dot File

Partial Duplication Largest Component Screwup (p=.1,n=100)


edges=228, nodes=70, sigma=4.0000
Matrix File   Dot File

Payley Graph

This graph is a large Payley which I ran a new iteration of the sigma software on. It also demonstrates our new and improved drawing code.

Matlab Code

Here are the files used to compute the sigma-function and edge weights. A brief description of the formulæ involved can be found in the paper linked at the top of the page. Basically, this program solves an SDP (using the library SeDuMi) and converts the solution into a weight matrix.

To use the attached files, one would need to first install SeDuMi (found here) and have it accessible to Matlab™. Then, having all the below m-files in one place and available to Matlab, one need do the following to run an instance.

>> load example_graph;
>> [W, L, sigma] = sigma_calc_weights(example_graph, 0);
>> sigma_write_dot(example_graph, W, 'example.dot', 0, 0);
The result will be the weight matrix W, the corresponding Laplacian L, and the sigma value sigma. This algorithm works with sparse matrices; indeed, they are highly recommended. The last method, sigma_write_dot, is a utility routine for writing the graph with edge weights in GraphViz format (indeed, this is how the above images were produced). If you actually want to run this code, feel free to contact me, and I will give you further pointers. As you might expect, these files come with no warranty of any kind and no implied support. Use at your own risk. Ross M. Richardson, 2005