NMI stands for Normalized Mutual Information and is a performance metric for confusion matrices published by A. D. Forbes in 1995 in the Journal of Clinical Monitoring. NMI attempts to determine how well one classification (assumed to be the columns of the confusion matrix) is able to predict the second classification.

One should also look at the transposed NMI score as well. This can help one determine if one set of classes over- or under-specifies the second set of classes. NMI allows us to establish this one-way relationship. If the NMI and transposed NMI scores are high, then either classification is good at predicting the other.

The NMI scores for the KMeans vs. FullEM classifications are shown in Table 2. The transposed NMI scores are misleading since the ground truth for this dataset has 45 clusters. The hope is that the 5 cluster results is able to find an clusters such that each one is composed of one or more complete clusters from the 45 cluster ground truth. A high NMI score indicates this is true. The ground truth labeling can be found in the file

/proj/cluster_gazing2/data/synthetic/results/reference/ clust_t_15c3_p_0750_d_03_v_0d3_a_reference.txt.gz

Let's look at two examples. One where there is a two-way relation, and another where one classification over-specifies the other. Pay close attention to how the transposed scores differ in the two examples.

>>> from compClust.mlx.datasets import * >>> from compClust.mlx.labelings import * >>> from compClust.score.ConfusionMatrix import ConfusionMatrix >>> from random import random >>> ds = PhantomDataset(1000,1) >>> labeling1 = Labeling(ds) >>> labeling2 = Labeling(ds) >>> labeling3 = Labeling(ds) >>> labeling1.labelRows([1,2,3,4,5]*200) >>> labeling2.labelRows([1,2,3,4,5,6,7,8,9,10]*100) >>> labeling3.labelRows(map(lambda x : x + int(random()*1.3), ... [1,2,3,4,5]*200)) >>> cm1.createConfusionMatrixFromLabeling(labeling1, ... labeling3) >>> cm2.createConfusionMatrixFromLabeling(labeling2, ... labeling3) >>> cm1.printCounts() 55 145 0 0 0 0 0 44 156 0 0 0 0 0 40 160 0 0 0 0 0 55 145 0 0 0 0 0 48 152 >>> cm2.printCounts() 32 68 0 0 0 0 0 20 80 0 0 0 0 0 17 83 0 0 0 0 0 24 76 0 0 0 0 0 28 72 23 77 0 0 0 0 0 24 76 0 0 0 0 0 23 77 0 0 0 0 0 31 69 0 0 0 0 0 20 80 >>> cm1.NMI() 0.68119067506097197 >>> cm1.transposeNMI() 0.73142895910285066 >>> cm2.NMI() 0.68310769380320679 >>> cm2.transposeNMI() 0.51268566272042138

Notice that the NMI scores are almost identical between the two confusion matrices, but the transposed scores are much different. Experiment with difference confusion matrices and get a feel for how the NMI score changes with respect to the the matrix geometry.

An important point to make is that the NMI score of a random confusion matrix is zero. If we create two mostly random labelings and compute their score, we will see that they are very near zero.

>>> labeling1.removeAll() >>> labeling2.removeAll() >>> labeling1.labelRows(map(lambda x : int(random()*10), ... [None]*1000)) >>> labeling2.labelRows(map(lambda x : int(random()*10), ... [None]*1000)) >>> cm1.createConfusionMatrixFromLabeling(labeling1, ... labeling3) >>> cm1.printCounts() 8 17 19 19 20 10 2 25 16 22 12 17 8 31 23 20 19 19 4 17 21 25 20 15 5 16 25 26 17 23 5 18 24 18 28 20 5 13 15 12 22 12 7 18 11 18 24 10 2 14 26 30 16 13 9 20 16 25 15 13 >>> cm1.NMI() 0.015745466195284386 >>> cm1.transposeNMI() 0.011853264106865047