Mapping and Quantifying Mammalian Transcriptomes by RNA-Seq

Ali Mortazavi, Brian Williams, Kenneth McCue, Lorian Schaeffer, Barbara Wold


This is the page of the underlying data and code for the analysis of the paper above, which has been published in Nature Methods in 2008. While the paper focuses on mouse tissues, we have since used the same code in C elegans and human cell lines with great success.

If using Bowtie 0.10.X, please make sure to use the new '--strata' flag in order to handle multireads correctly. Note that ERANGE is not compatible with bowtie 0.9.9.X.

ERANGE Development Edition

ERANGE is now available through Git. Future releases starting with ERANGE 4.0 will be available through a git repository. Development snapshots will be uploaded periodically for any interested parties. Available right now is a development alpha of ERANGE 4.0. Woldlab Gitweb portal

Developers wishing to create a clone of the repository can do so using:

git clone git://woldlab.caltech.edu/erange.git

Experimental BAM support

A development version of ERANGE is available from the Git repository. This version includes a full rewrite of ReadDataset.py to use BAM files instead of the prior rds files. This build is still in testing, but should now be stable enough for limited use.

Important: Discontinue use of ERANGE version 3.2.1

Recently, an error in version 3.2.1 of Erange has emerged that will result in too many peaks being returned by findall.py and a reported FDR that is too high. Additionally gene counts will be returned as zero although rpkm values will be correct.

It is recommended that ERANGE 4.0a be downloaded from the above repository and used. This version has been tested with several of our datasets and the results agree with the prior (v3.2) release.

ERANGE 4.0a

New Features and Functions

Erange supports configuration files

Cistematic integration

Erange supports optparse

Improved package topology

The following READMEs constitute the bulk of the documentation for ERANGE:

You are highly encouraged to use the following pipeline scripts rather than the individual commands for RNA-seq:

Please note that ERANGE3.X is a major departure from the bed-based formats used in ERANGE2.0 and requires re-importing read mappings into sqlite based read datasets (RDS). However, we suggest that you run v3.X instead of v2.X for production purposes.

Interim Erange3.3 release

An interim build of Erange is available that includes support for analysis using self organizing maps. This will be incorporated into Erange4.0 shortly, but is being made available early as an interim release. ERANGE3.3.tgz To use this you will also need an updated version of Cistematic's analyzego.py which should replace the version located in your $CISTEMATIC_ROOT/cistematic/stat/ in order to support the bonferroni map correction.

Dual-use E-RANGE

E-RANGE is our Python package for doing RNA-seq and ChIP-seq (hence the "dual-use"), and is a descendant of the ChIPSeq mini peak finder (Johnson, 2007).

To use it for RNA-seq, first go through the RNA-seq README, then read the file analysisSteps.txt and take a look at the pipeline shell script runStandardAnalysis.sh.

Note that E-RANGE assumes the following requirements: Python 2.5, Linux / Mac OS X (preferably with the Python Psyco compiler), and Cistematic 2.0 (all scripts with a command line genome specification rely on Cistematic!), which you can get here.

If you want to rerun our entire analysis starting with either the raw data (eland files) or the bed files, you will need the following files:

To use it for ChIP-seq follow the instructions in README.chip-seq to create an RDS file and then run the peakfinder script findall.py.

The Mouse Reference data

Briefly, each tissue has two replicates, the second of which was done with spike-ins, as described in the paper. For each replicate we provide:<br>

Tissue Table

Spike-Ins? Tissue wig beds.tgz rpkms.tgz comb.eland2.gz bigbed.tgz
No Spike-In Brain mm9Brain mm9Brain1 mm9Brain1 mm9Brain1 mm9Brain1
Liver mm9Liver mm9Liver1 mm9Liver1 mm9Liver1 mm9Liver1
Muscle mm9Muscle mm9Muscle1 mm9Muscle1 mm9Muscle1 mm9Muscle1
Spike-In Brain mm9Brain2 mm9Brain2 mm9Brain2 mm9Brain2 mm9Brain2
Liver mm9Liver2 mm9Liver2 mm9Liver2 mm9Liver2 mm9Liver2
Muscle mm9Muscle2 mm9Muscle2 mm9Muscle2 mm9Muscle2 mm9Muscle2

Help

For assistance with Erange please contact Sean Upchurch (sau AT caltech.edu)


Last Modified: 7 Jun 2011 by Sean Upchurch

WoldlabWiki: RNASeq (last edited 2012-07-02 17:21:16 by sau)