Table of Contents

Module: DiagEM compClust/mlx/wrapper/DiagEM.py

Usage: DiagEM.py parameter_filename input_filename output_filename

Wrapper for diagonal em algorithm

Depends on the following environment variables: DIAGEM_COMMAND (e.g., /proj/cluster_gazing2/bin/diagem)

Brief Algorithm Description:

Performs EM segmentation of an array of feature vectors. The algorithm is from Bishop's "Neural Networks for Pattern Recognition", page 65. This particular EM algorithm fits Gaussians to the data. Each element of the feature vector is assumed to be independent (i.e. independent channels).

Required Parameters: (note: the list enclosed in the brakets are possible values each one of parameters can take )

k = <x>

x is the number of clusters to find

num_iterations = <x>

Where x is the number of iteration to perform over the data set

distance_metric = [correlation, correlation_centered, euclidean]

The correlation metric is actually Euclidean distance on the data set mapped to the surface of a hypersphere. This approximates the correlation metric.

init_method = [church_means, random_means, random_point, random_range, random_sample, file]

Optional / Dependent Parameters:

k_strict = [true, 'false']

Turns on/off k strict behavior, which means that is the exact number k clusters is not found, i.e. there are collapsed clusters, then do not return any results. Collapsed clusters tend to happen more often with the euclidean metric than the correlation metric which can return singleton clusters

seed = <x> (optional)

Where x is the number used to seed the random number generator. This parameter allows runs of the algorithm to be deterministic. If the parameter is omitted, it will be initialized 42

samples = <x> (depends on init_method)

If the random_sample initialization method is chosen, then this parameter defines how many points to sample for each mean. It must be >0 and <rows.

means_file = "file name" (depends on init_method)

If the file initialization method is chosen, this parameters specifies the file to load the means from.

annealing = [on, 'off']

Turns on annealing. If not speicified assumed to be off

initial_temp = <x>

Starting temperature to run the annealer at.

schedule = <x>

Temperature schedule. The initial_temp is multiplied by this number every step. Needs to be in the range (0.0, 1.0), but should be in the high 0.90s.

em_type = [scalar, 'diagonal']

Restricts the freedom of the covariance matrix calculations. Assumed to be diagonal.

Depreciated Parameters:

Not needed (set to constant values)

test_fraction train_fraction

Superceded by the parameter k

min_clusters max_clusters

Only applicable to mccv run which are now handled by the MCCV.py wrapper

stepsize random_seed crossvalidation_runs crossvalidation_samples

Imported modules   
import Numeric
from compClust.mlx import ML_Algorithm
from compClust.mlx.datasets import Dataset
from compClust.mlx.labelings import Labeling
from compClust.mlx.models import MixtureOfDiagonalGaussians
import compClust.mlx.wrapper
from compClust.util import Verify, Usage, WrapperUtil
from compClust.util.TimeStampedPrintStream import TimeStampedPrintStream
import os
import re
import string
import sys
import tempfile
Classes   

DiagEM


Table of Contents

This document was automatically generated on Wed Aug 27 14:25:04 2003 by HappyDoc version 2.1