next up previous contents
Next: Supervised Wrappers Up: Unsupervised Wrappers Previous: TSplit   Contents

XClust

Requirements:

XClust is the wrapper around the XCluster algorithm created by Prof. Gavin Sherlock of Stanford University. This is a bottom-up clustering algorithm, and as such, runs in $O\big (N^2 \log N)$ time. However, in our experience, XClust can produce consistently good results over a wide variety of datasets. In light of this experience, it would seem that XClust might be a good first choice for clustering a dataset if there is no obvious favorite.

To see just how well XClust can perform, let's cluster our favorite dataset again.

>>> from compClust.mlx.datasets import Dataset
>>> from compClust.mlx.wrapper.XClust import XClust
>>> ds = Dataset('synth_t_15c3_p_0750_d_03_v_0d3.txt')
>>> parameters = {}
>>> parameters['k'] = 15
>>> parameters['transform_method'] = 'none'
>>> parameters['cluster_on'] = 'rows'
>>> parameters['distance_metric'] = 'euclidean'
>>> parameters['agglomerate_method'] = 'clusterNumber'
>>> xclust = XClust(ds, parameters)
>>> xclust.validate()
1
>>> xclust.run()
1
>>> results = xclust.getLabeling()
>>> map(lambda x : len(results.getRowsByLabel(x)),
... results.getLabels())
[57, 65, 51, 82, 13, 86, 36, 12, 71, 22, 87, 29, 76, 13, 50]

These results look much better than what we've been able to produce before. The standard deviation of the number of points per cluster is 27.6, better than DiagEM, but FullEM is still better.


next up previous contents
Next: Supervised Wrappers Up: Unsupervised Wrappers Previous: TSplit   Contents
Lucas Scharenbroich 2003-08-27