next up previous contents
Next: Running Hierarchical Up: Meta-Wrappers Previous: Hierarchical   Contents

Terminators

The Hierarchical clustering algorithm provides a very fine level of control over the clustering process via a set of functions called Terminators. Terminators are second-order functions which return a function that, when evaluated on a Hierarchical Node, determine if that node should progress or not. Terminator functions can also be chained so that multiple, independent criteria can be examined before taking action.

To complicate matter further, there are three classes of Terminator functions which are summarized below.

Prologues
These functions are evaluated just before a node is sent to a clustering algorithm. If any on the Terminators in the prologue chain vote false, the run is terminated.
Resets
These are evaluated immediately after returning from clustering. Their job is to examine the clustering results and determine if it was a 'good' clustering or not. A reset Terminator can change the parameters of a clustering algorithm and then vote false, to force the dataset to be re-run with the new parameters. This terminator can cause infinite loops, so you must be very careful in its implementation.
Epilogue
Epilogues are evaluated after the clustering has passed through the Reset Terminators, but before the subsets are recursed upon. This Terminator is useful for creating behaviors such as, ``If only one class was found, there is no use re-clustering it.''

Examples of Terminators which are a standard part of the Terminator modules are:

trueTerminator
Always returns true.
falseTerminator
Always returns false.
clusterNumTerminator
Returns false if the number of datapoints in a cluster is equal to one.
clusterSize(size)
Returns false when the size of a dataset fall below its argument size.
PDRatio(ratio)
Returns false when the ratio of number of datapoints to number of dimensions ( $\frac{rows}{columns}$) falls below a threshold ratio.

You can create your own Terminator functions by writing a function which return lambda forms. As an example, the clusterSize() terminator is written as:

def clusterSize(size):
  return lambda node : 
      node.algorithm.getDataset().getNumRows() >= size

As you see, it returns a function which takes a single argument node and checks the size of the dataset of the algorithm within the node.


next up previous contents
Next: Running Hierarchical Up: Meta-Wrappers Previous: Hierarchical   Contents
Lucas Scharenbroich 2003-08-27