fail silently.
-3. MAKING THE NECESSARY INPUT (RDS) FILES
-
-You will want to first convert your read mappings to the
-native ERANGE read store. Please see the file
-README.build-rds for instructions on how to do this.
-
-Build an RDS file for both the ChIP, and if available and
-appropriate, the control. Note that we *HIGHLY* recommend
-the use of a matched control sample to account for some
-of the general background artifacts that can be present
-in ChIP-seq samples (e.g. DNAse hypersensitivity,
-assembly collapse of some sattelite repeats, etc....).
+3. MAKING THE NECESSARY INPUT FILES
+
+Erange uses BAM format files, but there are a couple of
+modifications that need to be made to the header and
+individual entries. The python script bamPreprocessing.py
+will do the following:
+1. Count the reads by type and write these counts to the
+header as comments.
+2. Verify that every read has a value in the NH tag or add
+it if needed.
+3. Optionally annotate the reads with the geneID using the
+ZG flag
+
+Note that we *HIGHLY* recommend the use of a matched
+control sample to account for some of the general
+background artifacts that can be present in ChIP-seq
+samples (e.g. DNAse hypersensitivity, assembly collapse
+of some sattelite repeats, etc....).
4. WEIGHING MULTIREADS
(a) is the default in the current release of ERANGE.
Simply proceed to RUNNING THE PEAK FINDER for (a) and
-(a). You can ignore multireads (b) by using the -nomulti
+(a). You can ignore multireads (b) by using the --nomulti
flag with findall.py. For (c), use weighMultireads.py
to weigh multireads based on a unique reads in the
respective radius of each potential location. Once run,
To run the peak finder without read shifting, use the
following command:
-python $ERANGEPATH/findall.py label chip.rds chip.regions.txt -control control.rds -listPeak -revbackground
+python $ERANGEPATH/findall.py label chip.rds chip.regions.txt --control control.rds --listPeak --revbackground
which will run the peak finder on chip.rds / control.rds ,
store the enriched region coordinates in chip.regions.txt,
You will *NEED* to change some of the default parameters
if working in smaller genomes (e.g. use smaller -spacing),
if working with certain types of IPs such as histones and
-polymerases (test with and without -notrim and
--nodirectionality), if working with rather weak IPs
-(e.g. -minimum and -ratio), or if working with larger
+polymerases (test with and without --notrim and
+--nodirectionality), if working with rather weak IPs
+(e.g. --minimum and --ratio), or if working with larger
fragment sizes (see the paragraph below discussing read
shifting).
findall.py returns a per-peak p-value. By default, this
is calculated using a Poisson distribution of peak RPMs
-(or counts, if using -raw) for each chromosome in the IP.
+(or counts, if using --raw) for each chromosome in the IP.
P-value calculations can be turned off using
-'-pvalue none '. Alternatively, the p-value can be
+'--pvalue none '. Alternatively, the p-value can be
calculated from the background using the option
-'-pvalue back ', which must be combined with the option
--revbackground.
+'--pvalue back ', which must be combined with the option
+--revbackground.
By default, findall.py does not try to adjust the location
of the reads based on half the size of the expected fragment
length (the "shift"). If you believe that you need to shift
your peaks, findall.py can try to pick the best shift based
on the best shift for strong sites using the parameter
-'-shift learn '. You can also either manually specify a
-shift value using '-shift #bp ' or ou can calculate a
-"best shift" for each region using '-autoshift'. If you
+'--shift learn '. You can also either manually specify a
+shift value using '--shift #bp ' or ou can calculate a
+"best shift" for each region using '--autoshift'. If you
need to using the shift options, the recommended usage is:
-(i) first run findall.py with '-shift learn ', which will
+(i) first run findall.py with '--shift learn ', which will
peak a shift if there are at least 30 regions that meet
its training criteria.
(ii) if (i) couldn't pick a shift, run findall.py with
--autoshift and -reportshift
+--autoshift and --reportshift
(iii) look at the mode (most common #) for the shift
-(iv) rerun findall.py with -shift #bp where #bp is the mode
+(iv) rerun findall.py with --shift #bp where #bp is the mode
If you are storing the RDS files on an network-mounted
-directory, make sure to use '-cache XXXXX' to enable
+directory, make sure to use '--cache XXXXX' to enable
local caching, where is as large as appropriate as
described in section 9 of README.build-rds .
RELEASE HISTORY
+version 3.2 November 2010 - updated command line options
version 3.1 February 2009 - support for read shifting
version 3.0 February 2009 - support for UCSC narrowPeak format in regiontobed.py
version 3.0rc1 December 2008 - added parameter to control peak-trimming