CompClust Install

This contains installation instructions for CompClust.

First you will need to install the dependencies using either Debian Build Dependencies or Manual Dependencies. Once those are installed you may want to see how to do Algorithm Installation.

After all of the various dependencies are satisfied you can finally, Install CompClust.

Debian Build Dependencies

To build and run compclust you'll need the following packages.

apt-get install python python-numeric python-numeric-ext ipython
python-scientific python-pyrex python-stats python-tk python-pyrex python-dev python-profiler python-imaging python-imaging-tk python-rpy quixote python-simpletal gcc g++ libc6-dev tk8.4-dev

NOTE: matplotlib Debian packages (python-matplotlib) can be installed by adding the following two lines to your /etc/apt/sources.list:

deb packages/ deb-src sources/

apt-get install python-simpletal python-twisted

You will need to be running against testing/unstable (as of 2005 Dec) in order for this to work.

Manual Dependencies

Obviously you will need python 2.3, we have only done minor testing with 2.4

Next you will need Python Numeric and all of its extensions.

Download numpy from (Numarray is an upcoming version of numeric routines for python which we don't currently support.)

To build the link to some of the underlying C code you will also need pyrex and a distutils compatible compiler.

You will need and to run the pca analysis package which came from However as a convenience they're now included

For the PCA Extreme Gene analysis we now require rpy which then depends on R.

As a warning RPy uses some magic to figure out which version of R to import which at least on some of the binary RPy installs I've tried it on can go horribly wrong. In that case I'd recommend installing R and building RPy from source.

The most current plot library is using matplotlib we currently require at least version 0.84. Also in my experiance their API has been somewhat unstable and so newer versions have a higher than typical chance of not working.

Scientific is used by IPlot for it's histogram functionality and by compClust.visualize.SummaryViews for a least squares

Throughout CompClust web we use IPython as our enhanced interactive shell.

It provides a number of amazing features for interactive use, which makes it required for any mode that uses an interpreter to interact with CompClust.

Ipython can be downloaded from

To run the web interface you'll need to install the following packages

Package Source URL Version
Quixote >= 2.0
SimpleTAL >= 3.12

Quixote currently supports several different web servers, compclust web has driver scripts for apache/scgi and the simple quixote server. serves using the simple web server built into Quixote 2.

Alternatively one can use the scgi interface to apache. This obviously requires apache, and SCGI

Once you have installed scgi you can either install mod_scgi, following their instructions or use the cgi2scgi gateway script.

You will need to compile the example cgi2scgi code provided with the scgi package. You may need to change the PORT setting in the file to match whatever port you plan on launching the scgi server application as.

  • PIL (Python Imaging Library)

If you want to be able to download the plots in the web version in formats other than PNG, you'll need to install PIL.

We've gone through a couple of different plotting backends, the first version of IPlot (still available as compClust.iplot) used Pmw and BLT, though we strongly deprecate this package in favor of the newer matplotlib based version in compClust.iplot

Python megawidgets (Pmw) wraps the core BLT graph widget for IPlot. It can be found at

for debian apt-get install blt python-pmw

for manual installations Tk should have come with your installation of python, however you may also need to add BLT for its graph widget

Also for completeness, some archaic pyMLX code that predates IPlot uses gracePlot if you find yourself interested in that code. gracePlot is available from

Algorithm Installation

Because of licensing issues we can't distribute the source to the command line c code that we use.

We provide binaries for our EM and KMeans clustering algorithms for platforms that we use at our web page

For some people we can distribute the underlying C code if you have compclust/src you're in luck and have the full source tree. We hope to clean up our licensing issues and break our dependency on the non-free numerical recipes code so we can redistribute the c source code as well. But we can't do it yet.

Currently I don't autodetect platform type for building the C code.

you MUST use a version of gnu make, other lesser makes will not cut it.

Some of the underlying algorithm code can use MPI, however the python code isn't currently taking advantage of this. If you would like to build a parallel version you will need to install mpi and update the variable MPICH_INCLUDE to point to it. However the current C make system will skip MPI if its not available.

You may also need to change BINEXT, OBJEXT, LIBEXT, LIBOBJEXT, and SHLIBEXT if your platform is not a straightforward linux like unix variant.

once you've made these changes

you should be able to do

$ python install

(which should also build all of the binary code), if not you should be able to just type make in the compclust/src directory.

There are two packages that we can't redistribute.

We used the matlab dependent SOM toolbox from the Helsinki University of Technology for our SOM implementation.

Also we used a modified version of XCluster which was based on the Gavin Sherlock's implementation.

For the binaries find a convenient location to store them.

For Xcluster you'll need to follow their build instructions.

This is being obsoleted, currently the kmeans and diagem algorithms expect to find the kmeans and diagem binaries in either the source tree or properly installed distribution of compclust.

<deprecated> The python code needs to know the location of these components for which we used environment variables to specify the paths to external programs. For instance the variable DIAGEM_COMMAND is used by the DiagEM wrapper to find the diagem executable.

Wrapper Required Environment variables DiagEM DIAGEM_COMMAND=<path to diagem> HutSOM SOM_TOOLBOX_HOME=<path to somtoolbox directory> KMeans KMEANS_COMMAND=<path to kmeans> KMedians KMEDIANS_COMMAND=<path to kmedians> WMatch WMATCH_COMMAND=<path to wmatch> XClust XCLUST_COMMAND=<path to xclust> </deprecated.

Install CompClust

Once you've finished installing all of the dependencies using their fine instructions, you'll need to install compClust. Thanks to the wonders of distutils, this should be relatively easy.

If it's not easy the DistUtils user manuals at might be useful.

To install the python code it boils down to running:

unzip # extract the downloaded archive cd compclust # change to the subdirectory

python install

or if you'd like you can avoid all the automated testing with

python install --notests

As a special feature for developers it is possible to build the pyrex C interface in a way where its linked into the standard source tree so one can modify the python code without having to constantly do new distutils builds.

To do this run

python build --inplace