X-Git-Url: http://woldlab.caltech.edu/gitweb/?p=erange.git;a=blobdiff_plain;f=docs%2FREADME.chip-seq;fp=docs%2FREADME.chip-seq;h=ea7b0a347661d59a40e838cd666156f9c9b8bfe4;hp=846a441a434f8e3cff4506108f27126855a2fb47;hb=4ad5495359e4322da39868020a7398676261679e;hpb=cfc5602b26323ad2365295145e3f6c622d912eb4

diff --git a/docs/README.chip-seq b/docs/README.chip-seq
index 846a441..ea7b0a3 100644
--- a/docs/README.chip-seq
+++ b/docs/README.chip-seq
@@ -57,18 +57,24 @@ options are case sensitive and that they could well
 fail silently.
 
 
-3. MAKING THE NECESSARY INPUT (RDS) FILES
-
-You will want to first convert your read mappings to the 
-native ERANGE read store. Please see the file 
-README.build-rds for instructions on how to do this.
-
-Build an RDS file for both the ChIP, and if available and 
-appropriate, the control. Note that we *HIGHLY* recommend 
-the use of a matched control sample to account for some 
-of the general background artifacts that can be present 
-in ChIP-seq samples (e.g. DNAse hypersensitivity, 
-assembly collapse of some sattelite repeats, etc....). 
+3. MAKING THE NECESSARY INPUT FILES
+
+Erange uses BAM format files, but there are a couple of
+modifications that need to be made to the header and
+individual entries.  The python script bamPreprocessing.py
+will do the following:
+1. Count the reads by type and write these counts to the
+header as comments.
+2. Verify that every read has a value in the NH tag or add
+it if needed.
+3. Optionally annotate the reads with the geneID using the
+ZG flag
+
+Note that we *HIGHLY* recommend the use of a matched
+control sample to account for some of the general
+background artifacts that can be present in ChIP-seq
+samples (e.g. DNAse hypersensitivity, assembly collapse
+of some sattelite repeats, etc....). 
 
 
 4. WEIGHING MULTIREADS