Module: KMeans

Table of Contents

Module: KMeans

compClust/mlx/wrapper/KMeans.py

Usage: KMeans.py parameterFilename datasetFilename resultsFilename

Wrapper for kmeans algorithm

Depends on the following environment variables: KMEANS_COMMAND (e.g., /proj/cluster_gazing2/bin/kmeans)

Algorithm parameters include the following name value pairs. Unless a default is indicated, the parameter is required.

distance_metric Either the word "correlation" or "euclidean" (include quotes).

init_means The word "church", or "random", "random_range", or "random_sample" (include quotes).

k The number of clusters, k, to find.

k_strict If "true", kmeans will treat k as a strict parameter. That is, if k clusters could not be found, (after an optional num_restarts, in the case of randomly initialized means) no result will be reported. Defaults to "false".

num_iterations The number of kmeans iterations.

max_restarts The maximum number of restarts in the case of collapsed clusters (valid only for randomly initialized means). Defaults to 0.

num_mean_samples If init_means = "random_sample", this parameter indicates the number of datapoints to sample (without replacement) when estimating initial means. Defaults to 3.

seed The seed to use for the pseudo-random number generator (valid only for randomly initialized means). Defaults to 42.

An example parameter file:

distance_metric = "euclidean" init_means = "random_sample" k = 5 num_iterations = 100 max_restarts = 10 num_mean_samples = 3 seed = 1234

Imported modules

import Numeric
from compClust.mlx import ML_Algorithm
from compClust.mlx.labelings import Labeling
from compClust.mlx.models import DistanceFromMean
import compClust.mlx.wrapper
from compClust.util import Verify, WrapperUtil
from compClust.util.TimeStampedPrintStream import TimeStampedPrintStream
import os
import string
import sys
import tempfile
import types
import warnings

Classes

KMeans

Table of Contents

This document was automatically generated on Wed Aug 27 14:25:04 2003 by HappyDoc version 2.1