Math 176 - Data Structures
Programming Assignment #3
Connected Components (Union-Find)

Due date: Friday, November 10 at 6:00 PM.
Total points for this assignment: 75 points.

In this homework assignment, you will write routines for keeping track of the connected components of an undirected graph with online (real-time, interactive) methods. This is essentially the same as a union-find algorithm, using union by size and using path compression. One new feature of the connected components routines you will write is that you have to keep track of the minumum numbered node in each connected component.

You will write a class `ConnectedComponents` which exactly matches in functionality the routines documented here: online documentation for `ConnectedComponents`. This includes the following routines:

1. A constructor which creates initially a graph with no edges on N nodes (N any positive integer). The vertices of the graph are numbered with integers from 0 to N-1.
2. `addEdge(i,j)` - adds an edge from vertex number `i` to vertex number j. The connected component information must be updated accordingly. If the two vertices were not already in the same connected component, then this routine must return the number, N, of the smallest numbered vertex in the new enlarged connected component. It returns the negative number -(N+1) if the two vertices were already in the same connected component before the new edge was added.
3. `minConnected(i) `- returns the number of the smallest numbered vertex that is in the same connected component as vertex number `i`.
4. `areConnected(i,j)`.- returns true if vertices `i` and `j `are in the same connected component of the graph.

In addition, your routines must keep statistics on how many pointers are traversed in the routines `minConnected`, `areConnected` and `addEdge`. For example, in `minConnected`, if vertex `i` is the root vertex of the tree storing the elements of vertex  `i`'s its connected component, then zero pointers are traversed. On the other hand, if it is not the root vertex in its connected component, then at least one pointer will be traversed. The routine

• `getNumPointerTraversals() `- returns the total number of pointer traversals.

By the inverse Ackermann upper bound, you should expect that a very small number of extra pointer need to be traversed per operation. I will provide you with a main program, called CcStatistics.java, that runs trials for you to gather statistics about numbers of pointer traversals. The main program will allow you to gather the following kinds of statistics:

• You will specify a number N of vertices in the graph.
• You will specify whether to join random vertices or whether to connect random connected components.
• The program will then either add edges until either (a) the entire graph is connected and there is a single connected component, or (b) there are 10 edges added in a row that do not reduce the number of connected components..

Gather statistics for several values of N. Use N=100, 1000, 10000, 100000, 10000000 (if your computer cannot go as high as one million, use the a large value for N close to the maximum attainable). As usual, remember to increase the heap size with the -X command line options.

Please note that this main program will not test the accuracy of your code. It will only gather statistics. You are responsible for ensuring the accuracy of your code with your own test programs.

Source Materials: You should look for the HTML documentation on CcStatistics.java and ConnectedComponents.java. You will find the source code for CcStatistics.java and for HashSetLinear.java in the directory ProgHomework3 on ieng9. You should get a copy the source code files and compile them yourself. The version of HashSetLinear is a special one, that includes a routine getRandomElement that is used by the CcStatistics program.
There is no full tester for the ConnectedComponents class. However, some kinds of errors in ConnectedComponents will cause errors in to occur in CcStatistics, so this will give you a partial test of your program.

Turnin materials: You must turn in the following:

• Your source code, in a file ConnectedComponents.java. As usual, this code must be compilable on its own without additional files.
• A README file. This should include a table showing the results of your tests gather statistics. Do your experiments conform with the what you would expect from the theorem stated in class about the cost of the union-find with path compression algorithm? Explain.

This programming assignment is covered by the usual Academic Integrity Guidelines for programming assignments.