X-Git-Url: http://woldlab.caltech.edu/gitweb/?p=erange.git;a=blobdiff_plain;f=docs%2FREADME.chip-seq;fp=docs%2FREADME.chip-seq;h=ea7b0a347661d59a40e838cd666156f9c9b8bfe4;hp=846a441a434f8e3cff4506108f27126855a2fb47;hb=4ad5495359e4322da39868020a7398676261679e;hpb=cfc5602b26323ad2365295145e3f6c622d912eb4 diff --git a/docs/README.chip-seq b/docs/README.chip-seq index 846a441..ea7b0a3 100644 --- a/docs/README.chip-seq +++ b/docs/README.chip-seq @@ -57,18 +57,24 @@ options are case sensitive and that they could well fail silently. -3. MAKING THE NECESSARY INPUT (RDS) FILES - -You will want to first convert your read mappings to the -native ERANGE read store. Please see the file -README.build-rds for instructions on how to do this. - -Build an RDS file for both the ChIP, and if available and -appropriate, the control. Note that we *HIGHLY* recommend -the use of a matched control sample to account for some -of the general background artifacts that can be present -in ChIP-seq samples (e.g. DNAse hypersensitivity, -assembly collapse of some sattelite repeats, etc....). +3. MAKING THE NECESSARY INPUT FILES + +Erange uses BAM format files, but there are a couple of +modifications that need to be made to the header and +individual entries. The python script bamPreprocessing.py +will do the following: +1. Count the reads by type and write these counts to the +header as comments. +2. Verify that every read has a value in the NH tag or add +it if needed. +3. Optionally annotate the reads with the geneID using the +ZG flag + +Note that we *HIGHLY* recommend the use of a matched +control sample to account for some of the general +background artifacts that can be present in ChIP-seq +samples (e.g. DNAse hypersensitivity, assembly collapse +of some sattelite repeats, etc....). 4. WEIGHING MULTIREADS