Mapping and Quantifying Mammalian Transcriptomes by RNA-Seq

Ali Mortazavi, Brian Williams, Kenneth McCue, Lorian Schaeffer, Barbara Wold


This page holds the underlying data and code for the analysis of the paper above, which has been published in Nature Methods in 2008. While the paper focuses on mouse tissues, we have since used the same code in C elegans and human cell lines with great success.

Beware that the behavior of Bowtie 0.9.9.2 has changed when using '--best -k' with respect to mapping unique and multireads and is now incompatible with makerdsfrombowtie.py . Please use Bowtie 0.9.9.1 or earlier until this is fixed! Sorry for the inconvenience.

New: ERANGE 3.1

ERANGE3.1 is now released. This is the version we are using in the Wold lab for both ChIP-seq and RNA-seq analyses. Some of the new supported features are:

The current version is: ERANGE3.1.tgz, which was released on 2009/04/17.

The following READMEs constitute the bulk of the documentation for ERANGE:


You are highly encouraged to use the following pipeline scripts rather than the individual commands for RNA-seq:

Please note that ERANGE3.X is a major departure from the bed-based formats used in ERANGE2.0 and requires re-importing read mappings into sqlite based read datasets (RDS). However, we suggest that you run v3.X instead of v2.X for production purposes.

Dual-use ERANGE (version 2.1)

ERANGE is our Python package for doing RNA-seq and ChIP-seq (hence the "dual-use"), and is a descendant of the ChIPSeq mini peak finder (Johnson, 2007). In particular, the RNAseq analysis uses some of the very same code to access Cistematic. Version 2.0 is the first released in the wild and is "Bed"-centric. In particular, it is not optimized for speed!

Note that ERANGE assumes the following requirements: Python 2.5, Linux / Mac OS X (preferably with the Python Psyco compiler), and Cistematic 2.X (all scripts with a command line genome specification rely on Cistematic!), which you can get here.

If you want to rerun our entire analysis starting with either the raw data (eland files) or the bed files, you will need the following files:

The Mouse Reference data

Briefly, each tissue has two replicates, the second of which was done with spike-ins, as desribed in the paper. For each replicate we provide:

Brain 1 (no spike)

Brain 2 (spike)

Liver 1 (no spike)

Liver 2 (spike)

Muscle 1 (no spike)

Muscle 2 (spike)


Last Modified: 2008/12/02 by Ali Mortazavi