convert standard analysis pipelines to use bam format natively

[erange.git] / docs / README.chip-seq
diff --git a/docs/README.chip-seq b/docs/README.chip-seq

index 846a441a434f8e3cff4506108f27126855a2fb47..ea7b0a347661d59a40e838cd666156f9c9b8bfe4 100644 (file)
--- a/docs/README.chip-seq
+++ b/docs/README.chip-seq
@@ -57,18 +57,24 @@ options are case sensitive and that they could well
  fail silently.
  
  
-3. MAKING THE NECESSARY INPUT (RDS) FILES
-
-You will want to first convert your read mappings to the 
-native ERANGE read store. Please see the file 
-README.build-rds for instructions on how to do this.
-
-Build an RDS file for both the ChIP, and if available and 
-appropriate, the control. Note that we *HIGHLY* recommend 
-the use of a matched control sample to account for some 
-of the general background artifacts that can be present 
-in ChIP-seq samples (e.g. DNAse hypersensitivity, 
-assembly collapse of some sattelite repeats, etc....). 
+3. MAKING THE NECESSARY INPUT FILES
+
+Erange uses BAM format files, but there are a couple of
+modifications that need to be made to the header and
+individual entries.  The python script bamPreprocessing.py
+will do the following:
+1. Count the reads by type and write these counts to the
+header as comments.
+2. Verify that every read has a value in the NH tag or add
+it if needed.
+3. Optionally annotate the reads with the geneID using the
+ZG flag
+
+Note that we *HIGHLY* recommend the use of a matched
+control sample to account for some of the general
+background artifacts that can be present in ChIP-seq
+samples (e.g. DNAse hypersensitivity, assembly collapse
+of some sattelite repeats, etc....). 
  
  
  4. WEIGHING MULTIREADS