The Wold Lab

Caltech Biology|Bioinformatics Lab

STATUS: LOGGED OUT

Switch to SSL

Menu

API Docs | Download |

California Institute of Technology

Cistematic Download Page


Version: 20051129


This is the page to download the different components of Cistematic painlessly. Cistematic consists of python code, gene databases, and genomic sequences. In order to ease the ramp-up necessary to use our code, we have included everything necessary to run Cistematic on this page.

Note that by design, we assume proficiency with a command line and editing python scripts, which are not unreasonable requirements in return for genome-wide searching and analyses accross dozens of animal genomes. A web version of Cistematic for more casual use is under development, but is not part of this release.

Cistematic Pre-requisites

Required

Cistematic currently runs on Linux and Macintosh; it also runs under Cygwin in Microsoft Windows. In addition to python, the current version of Cistematic is heavily dependent on sqlite and its python interface, pysqlite. You will therefore need:

  • Any Unix-compatible OS such as Linux and MacOS 10.3-10.4
  • python 2.3 or better
  • sqlite 3 (already installed on mac 10.4)
  • pysqlite 1.1.X (not Pysqlite 1.0.X or 2.X)
  • disk space: 1-40GB, depending on the size and number of genomes downloaded below

Optional (Can be installed later)

In addition to the requirements listed above, three additional packages will allow you to get the most out of Cistematic and the example code for the upcoming paper. psyco, which only runs on Intel 32-bit CPUs, will give you approximately 9-fold speed up running Cistematic code and is highly recommended, if it's available for your platform. Matplotlib was used to generate the figures in the paper and is hence also recommended. And compClust would be necessary in order to repeat the clustering that we did of the GNF dataset. The actual version numbers are:

Cistematic Code, Programs, and Database

The gzip'ed tarballs represent the guts of Cistematic. Installation involves simply extracting these files in your root directory, which will create a new /proj/genome directory under which all of Cistematic's code and genomes will live. To be more precise:

The program binaries included above are for NCBI Blast and paircomp. Users on systems that are not binary compatible may want to simply download both, compile them, and install them in /proj/genome/programs, along with a /proj/genome/programs/workdir directory.

Cistematic Genomes

While Cistematic could be used as is and be supplied arbitrary sequences to analyze (using "generic" Genomes), most users will want to download one or more genomes below. Note that it is also possible to build the genome releases belows from scratch, as documented in the code found in cistematic.genomes.* . Most of the genomes are very large (700-1000 MB for each compressed mammalian genome, which will take about 4x space when uncompressed). The available genomes are:


Last updated: Nov. 30 th, 2005 by Ali