Table of Contents

Class: DistanceFromMean compClust/mlx/models/DistanceFromMean.py

Produces a model which determines fitness by the summing the distances to the mean of each cluster.

This model has the shortcoming that as k increases, the score continues to improve. The reason for this is that the average squared distance continually decreases, thus raising the fitness score.

Base Classes   
IModel
Methods   
__computeClosestClusterDistances
__init__
__repr__
evaluateFitness
initFromLabels
  __computeClosestClusterDistances 
__computeClosestClusterDistances ( self,  data_points )

Given a datapoint find the its closest cluster and return the distance between it and its cluster.

Exceptions   
ValueError(( "Dimensionality of the data point [%d] must equal " + "the dimensionality of the cluster means [%d]" ) %(len( point ), len(means [ 0 ] ) ) )
  __init__ 
__init__ (
        self,
        means=None,
        data=None,
        labels=None,
        )

  __repr__ 
__repr__ ( self )

  evaluateFitness 
evaluateFitness ( self,  data )

Return the fitness of the model given a particular set of data.

The fitness equation is:

N ---------------------- N-1 ---- \ 2 > (mean_a - point_k ) / ---- k=0 where point_k is one of the data points, mean_a is the closest cluster mean, and N is the number of data points.

  initFromLabels 
initFromLabels (
        self,
        data,
        labels,
        )

Given a dataset and labeling construct compute the means and use as the class means.


Table of Contents

This document was automatically generated on Wed Aug 27 14:25:03 2003 by HappyDoc version 2.1