Module: TSplit

Module: TSplit compClust/mlx/wrapper/TSplit.py

Usage: TSplit.py parameter_filename input_filename output_filename

Wrapper for the tsplit algorithm.

Note: The class labels will have the extension you specify on the command line and the tsplit intermediate file, if saved, will have a .gtr extension.

Depends on the following environment variables: TSPLIT_COMMAND (e.g., /proj/cluster_gazing2/bin/tsplit)

Brief Algorithm Description:

Required Parameters: (note: the list enclosed in the brakets are possible values each one of parameters can take )

distance_metric = [correlation, euclidean, Bhattacharyya]

Bhattacharyya : takes into account of not only the difference between the two mean vectors, but also the distributions of the two groups of data points.

agglomerate_method = [none, native, size, clusterNumber]

none - do not agglomerate, just generate the normal tsplit output files

native - use tsplit built in agglomeration to produce as close to K clusters as possible

size - perform a size threshold agglomeration. Starting at the root recurse through the tree attempting to agglomerate at each node stopping only when the number of genes in the agglomerated sub-tree is less then the parameter "size_threshold"

clusterNumber - return as close to K clusters as possible using the "size" agglomeration method to partition the tree

size/clusterNumber agglomeration is identical to the agglomeration used in xclust

splitting_method = [`PCA`, 'Best'}

uses either PCA splitting or best splitting, which utilizes the energy parameter

k = <x>

where x is the target number of clusters

Optional / Dependent Parameters:

min_cluster_size = <x> (required for size and native agglomeration)

where x is the minimun number of genes that will appear is any given cluster.

energy = <x> (required if method = `PCA`)

number in the range of (0, 100]. Indicating the quantity of energy to preserve at each node.

merge = [`closest`, 'prune']

Selects the method to merge node back to the target number of clusters. Only applicable if agglomerate_method is native. closest is the default and merges nodes together which are closest depending on the chosen distance_metric. prune simply merges sibling nodes togvether.

save_intermediate_files = [`yes`, 'no'] (default `no`)

if you choose yes, the generated tsplit files (.gtr) will be saved. Otherwise they will be deleted.

Imported modules

from compClust.mlx.ML_Algorithm import ML_Algorithm
from compClust.mlx.XClustTree import XClustTree
from compClust.mlx.labelings import Labeling
from compClust.mlx.models import DistanceFromMean
import compClust.mlx.wrapper
from compClust.mlx.wrapper.TreeAgglomerator import TreeAgglomerator
from compClust.util import Verify, WrapperUtil
from compClust.util.TimeStampedPrintStream import TimeStampedPrintStream
import os
import string
import sys
import tempfile

Classes

TSplit

Table of Contents

This document was automatically generated on Wed Aug 27 14:25:04 2003 by HappyDoc version 2.1