Produces a model which determines fitness by the summing the distances
to the mean of each cluster. This model has the shortcoming that as k increases, the score continues
to improve. The reason for this is that the average squared distance
continually decreases, thus raising the fitness score.
Methods
|
|
__computeClosestClusterDistances
__init__
__repr__
evaluateFitness
initFromLabels
|
|
__computeClosestClusterDistances
|
__computeClosestClusterDistances ( self, data_points )
Given a datapoint find the its closest cluster and return
the distance between it and its cluster.
Exceptions
|
|
ValueError(( "Dimensionality of the data point [%d] must equal " + "the dimensionality of the cluster means [%d]" ) %(len( point ), len(means [ 0 ] ) ) )
|
|
|
__init__
|
__init__ (
self,
means=None,
data=None,
labels=None,
)
|
|
__repr__
|
__repr__ ( self )
|
|
evaluateFitness
|
evaluateFitness ( self, data )
Return the fitness of the model given a particular set of data.
The fitness equation is:
N
----------------------
N-1
----
\ 2
> (mean_a - point_k )
/
----
k=0
where point_k is one of the data points, mean_a is the closest
cluster mean, and N is the number of data points.
|
|
initFromLabels
|
initFromLabels (
self,
data,
labels,
)
Given a dataset and labeling construct compute the means and use as
the class means.
|
|