ROC Analysis

ROC Analysis is a tool to examine the overlap of the members of a cluster with that of the surrounding space. It does this by comparing the false positive rate on the X axis and the false negative rate on the Y axis.

To visualize the way that the ROC curve is computed imagine a hypersphere that starts with zero size at the cluster center, then as the sphere grows for every point you hit that is in the cluster you increment along the Y axis, for every point that you hit that is not in the cluster you increment along the X axis. A perfect ROC score would have a vertical line at x=0, followed by a horizontal line at y=1.

The way that CompClustTk visualizes this shows the standard ROC curve along the left, with histograms of the cluster members and non-members on the right.

To visualize an ROC curve go to the Analysis menu and select Cluster ROC analysis.

Figure: Cluster ROC Analysis

The menu item will bring up a dialog box that allows you to select which Cluster Labeling to use. In this case let's select the Early G1 cluster from the Cho Classification.

The Clustering Labeling specifies which clustering that one wants to explore, while Cluster label allows one to specify which cluster you want to consider the ``inside'' cluster for the analysis.

Figure: ROC Analysis Dialog Box

Once selected you can see the comparisons, on the left is the standard ROC curve, and on the right is the plot of how many points were found at each distance bin, red represents cluster members, and blue represents everything else.

The more separated those two histograms are the better the clustering.

This is not a particularly well separated cluster. If one looks at the histogram of distances, there are several data points that are considered part of this cluster that are actually quite far from the cluster center.

Figure: ROC Analysis Cho Classification of Early G1

In this case we look at one of the better clusters found by our EM algorithm. You can see in the histogram that the elements in the cluster rapidly taper off as one moves from the cluster center.

Figure: ROC Analysis DiagEM of cluster 5

To be fair, the EM algorithm we use is building clusters using a Gaussian cloud, which does a very good job of

Brandon King 2005-05-27