Terminators

Next: Running Hierarchical Up: Meta-Wrappers Previous: Hierarchical Contents

Terminators

The Hierarchical clustering algorithm provides a very fine level of control over the clustering process via a set of functions called Terminators. Terminators are second-order functions which return a function that, when evaluated on a Hierarchical Node, determine if that node should progress or not. Terminator functions can also be chained so that multiple, independent criteria can be examined before taking action.

To complicate matter further, there are three classes of Terminator functions which are summarized below.

Prologues: These functions are evaluated just before a node is sent to a clustering algorithm. If any on the Terminators in the prologue chain vote false, the run is terminated.
Resets: These are evaluated immediately after returning from clustering. Their job is to examine the clustering results and determine if it was a 'good' clustering or not. A reset Terminator can change the parameters of a clustering algorithm and then vote false, to force the dataset to be re-run with the new parameters. This terminator can cause infinite loops, so you must be very careful in its implementation.
Epilogue: Epilogues are evaluated after the clustering has passed through the Reset Terminators, but before the subsets are recursed upon. This Terminator is useful for creating behaviors such as, ``If only one class was found, there is no use re-clustering it.''

Examples of Terminators which are a standard part of the Terminator modules are:

trueTerminator: Always returns true.
falseTerminator: Always returns false.
clusterNumTerminator: Returns false if the number of datapoints in a cluster is equal to one.
clusterSize(size): Returns false when the size of a dataset fall below its argument size.
PDRatio(ratio): Returns false when the ratio of number of datapoints to number of dimensions ( $\frac{rows}{columns}$ ) falls below a threshold ratio.

You can create your own Terminator functions by writing a function which return lambda forms. As an example, the clusterSize() terminator is written as:

def clusterSize(size):
  return lambda node : 
      node.algorithm.getDataset().getNumRows() >= size

As you see, it returns a function which takes a single argument node and checks the size of the dataset of the algorithm within the node.

Next: Running Hierarchical Up: Meta-Wrappers Previous: Hierarchical Contents

Lucas Scharenbroich 2003-08-27