The Hierarchical clustering algorithm provides a very fine level of control over the clustering process via a set of functions called Terminators
. Terminators are second-order functions which return a function that, when evaluated on a Hierarchical Node, determine if that node should progress or not.
Terminator
functions can also be chained so that multiple, independent criteria can be examined before taking action.
To complicate matter further, there are three classes of Terminator
functions which are summarized below.
Terminators
in the prologue chain vote false, the run is terminated.
Terminator
can change the parameters of a clustering algorithm and then vote false, to force the dataset to be re-run with the new parameters. This terminator can cause infinite loops, so you must be very careful in its implementation.
Terminators
, but before the subsets are recursed upon. This Terminator
is useful for creating behaviors such as, ``If only one class was found, there is no use re-clustering it.''
Examples of Terminators
which are a standard part of the Terminator
modules are:
You can create your own Terminator
functions by writing a function which return lambda forms. As an example, the clusterSize()
terminator is written as:
def clusterSize(size): return lambda node : node.algorithm.getDataset().getNumRows() >= size
As you see, it returns a function which takes a single argument node
and checks the size of the dataset of the algorithm within the node.