What one typically wants to know when looking at a ConfusionMatrix is which clusters go together. That is, can we infer from the confusion matrix what the optimal mapping between different clustering classes is? The answer is yes. The problem itself is one of graph-matching. To solve this problem, we implement a graph matcher using an algorithm described by H. Gabow in his 1973 Ph.D. Thesis from Stanford University called N-cubed weighted matching.

While the details are complex, the usage is simple - pass in a confusion matrix and get a matrix with ones where classes match, zeros elsewhere. Let's try an example using our FullEM and KMeans results.

>>> cm.getAdjacencyMatrix() [[1,0,0,0,0,] [0,0,0,0,1,] [0,0,0,1,0,] [0,1,0,0,0,] [0,0,1,0,0,]]

So, from this, we can see that the class mappings are . From this matrix we would be able to perform ``best-guess'' comparative analysis between the two clusterings.

There is also a related method called `getAdjacencyList()`

which returns the mapping which we just derived by hand from the matrix.

>>> cm.getAdjacencyList() [('4', '4'), ('2', '1'), ('3', '3'), ('0', '5'), ('1', '2')]

Another adjacency-type method is the `getAgreementList()`

which will return a list equal to the number of elements in the ConfusionMatrix where each entry is a zero or one. A one indicates that both clustering agree that the given point belongs to the same class, a zero means the disagree. For a perfect clustering, all the entries would be one. This method provides a way to easily subset out the points which are 'stable'. So see how many points are agreed upon in this example, run the following command.

>>> len(filter(None, cm.getAgreementList())) 557

So 557 out of 750, or 74.3% of the datapoints were agreed upon. Notice how similar this percentage is to the NMI scores.