A confusion matrix shown the level of agreement (or disagreement) between two classifications.
Building on the previous example. Let's see exactly how much the two clustering agree.
>>> from compClust.score.ConfusionMatrix import * >>> cm.createConfusionMatrixFromLabeling(kmeans_labeling, ... fullem_labeling) >>> cm.printCounts() 140 78 0 0 3 12 0 0 0 140 0 0 0 105 0 0 85 100 0 0 0 0 87 0 0
Our assumption appears to be well founded. Most of the entries in the confusion matrix are zero, indicating that there is a large amount of agreement. To quantify the exact amount of agreement, we have various scoring metrics available. Let's look at them now.