579
Comment: added basic template for running bowtie with multiple threads
|
4043
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
== A Quick Word on Files == Condor has the ability to work with files which live on the NFS server (castor, loxcyc, rattus) as well as files local to the execute host. If you plan to work with a ton of small files or a handful of large files, feel free to use the NFS server as the source for your files. If you have a bunch of large files to process, you'll likely be better off telling Condor to transfer the files to the execute host before executing your job. Not only will you get better performance, everyone else will still be able to use the NFS server, allowing you to save face at the same time...Trust me, I speak from experience. To transfer files to the execute host, use the following directives: {{{ should_transfer_files = YES when_to_transfer_output = ON_EXIT transfer_input_files = /full/path/to/infile1,/full/path/to/infile2,... }}} This can be done globally by placing these directives at the top of the recipe, or on a per-job basis by placing them before each "queue" directive. |
|
Line 3: | Line 15: |
Executable = /proj/genome/programs/bowtie-0.12.1/bowtie Arguments = /proj/genome/programs/bowtie-0.12.1/indexes/hg19sp75spike -v 2 -k 11 -m 10 --best --strata -p 4 -q /full/path/to/fastq_file/1184_1_1.fastq --un /full/path/to/unmapped_file/1184_1_1.unmapped.fa --max /full/path/to/repeats_file/1185_1_1.repeat.fa /full/path/to/output_file/1184_1_1.bowtie.txt |
universe=vanilla |
Line 6: | Line 17: |
Log = bowtie-submit.log.$(Process) Output = bowtie-submit.out.$(Process) Error = bowtie-submit.err.$(Process) |
environment="BOWTIE_INDEXES=/proj/genome/programs/bowtie-0.12.1/indexes OUTDIR=/full/path/to/output" executable=/proj/genome/programs/bowtie-0.12.1/bowtie arguments=hg19sp75spike -v 2 -k 11 -m 10 --best --strata -p 4 -q $(OUTDIR)/1184_1_1.fastq --un $(OUTDIR)/1184_1_1.unmapped.fa --max $(OUTDIR)/1185_1_1.repeat.fa $(OUTDIR)/1184_1_1.bowtie.txt log=bowtie.$(Process).log output=bowtie.$(Process).out error=bowtie.$(Process).err |
Line 14: | Line 30: |
Queue | queue |
Line 16: | Line 32: |
It's important to set the "request_cpus" variable to match the -p option to bowtie. It's also probably a good idea to set the "request_memory" to a more realistic value...8000 is almost 8 Gigs == ERANGE Template == {{{ universe=vanilla environment="PYTHONPATH=/path/to/cistematic/root CISTEMATIC_ROOT=/path/to/cistematic/root ERANGEPATH=/path/to/erange" executable=/bin/sh log=rrpa.$(Process).log output=rrpa.$(Process).out error=rrpa.$(Process).err getenv = true arguments = $(ERANGEPATH)/doc/runRNAPairedAnalysis.sh hsapiens 1184_1_1 /proj/genome/gbdb/hg19/hg19.rmsk.db queue }}} == Tophat Template == (note: you will need to change the paths) {{{ universe=vanilla environment="PATH=$PATH:/woldlab/glusterfs/data/bowtie-0.12.5:/woldlab/glusterfs/data/tophat-1.0.14/bin BOWTIE_INDEXES=/woldlab/glusterfs/data/bowtie-0.12.5/indexes/" executable=/usr/bin/python log=tophat.$(Process).log output=tophat.$(Process).out error=tophat.$(Process).err request_cpus = 4 request_memory = 8000 request_disk = 0 arguments="/woldlab/glusterfs/data/tophat-1.0.14/bin/tophat -o /woldlab/glusterfs/data/hamrhein/condor/20100916/HUVEC-WC-PolyA-010WC+-r147-std54 -p 4 -r 147 --mate-std-dev 54 hg19-male /woldlab/glusterfs/data/ENCODE_CSHL/010WC+/LID8464_FC61LTAAAXX_2_1.txt.75mers.fastq,/woldlab/glusterfs/data/ENCODE_CSHL/010WC+/LID8464_FC61LTAAAXX_3_1.txt.75mers.fastq,/woldlab/glusterfs/data/ENCODE_CSHL/010WC+/LID8464_FC61LTAAAXX_4_1.txt.75mers.fastq /woldlab/glusterfs/data/ENCODE_CSHL/010WC+/LID8464_FC61LTAAAXX_2_2.txt.75mers.fastq,/woldlab/glusterfs/data/ENCODE_CSHL/010WC+/LID8464_FC61LTAAAXX_3_2.txt.75mers.fastq,/woldlab/glusterfs/data/ENCODE_CSHL/010WC+/LID8464_FC61LTAAAXX_4_2.txt.75mers.fastq" queue arguments="/woldlab/glusterfs/data/tophat-1.0.14/bin/tophat -o /woldlab/glusterfs/data/hamrhein/condor/20100916/HeLaS3-WC-PolyA-011WC+-r41-std92 -p 4 -r 41 --mate-std-dev 92 hg19-female /woldlab/glusterfs/data/ENCODE_CSHL/011WC+/LID16633_FC61U2UAAXX_4_1.txt.75mers.fastq,/woldlab/glusterfs/data/ENCODE_CSHL/011WC+/LID16633_FC61U2UAAXX_5_1.txt.75mers.fastq,/woldlab/glusterfs/data/ENCODE_CSHL/011WC+/LID16633_FC61U2UAAXX_6_1.txt.75mers.fastq /woldlab/glusterfs/data/ENCODE_CSHL/011WC+/LID16633_FC61U2UAAXX_4_2.txt.75mers.fastq,/woldlab/glusterfs/data/ENCODE_CSHL/011WC+/LID16633_FC61U2UAAXX_5_2.txt.75mers.fastq,/woldlab/glusterfs/data/ENCODE_CSHL/011WC+/LID16633_FC61U2UAAXX_6_2.txt.75mers.fastq" queue }}} |
A Quick Word on Files
Condor has the ability to work with files which live on the NFS server (castor, loxcyc, rattus) as well as files local to the execute host. If you plan to work with a ton of small files or a handful of large files, feel free to use the NFS server as the source for your files. If you have a bunch of large files to process, you'll likely be better off telling Condor to transfer the files to the execute host before executing your job. Not only will you get better performance, everyone else will still be able to use the NFS server, allowing you to save face at the same time...Trust me, I speak from experience.
To transfer files to the execute host, use the following directives:
should_transfer_files = YES when_to_transfer_output = ON_EXIT transfer_input_files = /full/path/to/infile1,/full/path/to/infile2,...
This can be done globally by placing these directives at the top of the recipe, or on a per-job basis by placing them before each "queue" directive.
Bowtie Template
universe=vanilla environment="BOWTIE_INDEXES=/proj/genome/programs/bowtie-0.12.1/indexes OUTDIR=/full/path/to/output" executable=/proj/genome/programs/bowtie-0.12.1/bowtie arguments=hg19sp75spike -v 2 -k 11 -m 10 --best --strata -p 4 -q $(OUTDIR)/1184_1_1.fastq --un $(OUTDIR)/1184_1_1.unmapped.fa --max $(OUTDIR)/1185_1_1.repeat.fa $(OUTDIR)/1184_1_1.bowtie.txt log=bowtie.$(Process).log output=bowtie.$(Process).out error=bowtie.$(Process).err request_cpus = 4 request_memory = 8000 request_disk = 0 queue
It's important to set the "request_cpus" variable to match the -p option to bowtie. It's also probably a good idea to set the "request_memory" to a more realistic value...8000 is almost 8 Gigs
ERANGE Template
universe=vanilla environment="PYTHONPATH=/path/to/cistematic/root CISTEMATIC_ROOT=/path/to/cistematic/root ERANGEPATH=/path/to/erange" executable=/bin/sh log=rrpa.$(Process).log output=rrpa.$(Process).out error=rrpa.$(Process).err getenv = true arguments = $(ERANGEPATH)/doc/runRNAPairedAnalysis.sh hsapiens 1184_1_1 /proj/genome/gbdb/hg19/hg19.rmsk.db queue
Tophat Template
- (note: you will need to change the paths)
universe=vanilla environment="PATH=$PATH:/woldlab/glusterfs/data/bowtie-0.12.5:/woldlab/glusterfs/data/tophat-1.0.14/bin BOWTIE_INDEXES=/woldlab/glusterfs/data/bowtie-0.12.5/indexes/" executable=/usr/bin/python log=tophat.$(Process).log output=tophat.$(Process).out error=tophat.$(Process).err request_cpus = 4 request_memory = 8000 request_disk = 0 arguments="/woldlab/glusterfs/data/tophat-1.0.14/bin/tophat -o /woldlab/glusterfs/data/hamrhein/condor/20100916/HUVEC-WC-PolyA-010WC+-r147-std54 -p 4 -r 147 --mate-std-dev 54 hg19-male /woldlab/glusterfs/data/ENCODE_CSHL/010WC+/LID8464_FC61LTAAAXX_2_1.txt.75mers.fastq,/woldlab/glusterfs/data/ENCODE_CSHL/010WC+/LID8464_FC61LTAAAXX_3_1.txt.75mers.fastq,/woldlab/glusterfs/data/ENCODE_CSHL/010WC+/LID8464_FC61LTAAAXX_4_1.txt.75mers.fastq /woldlab/glusterfs/data/ENCODE_CSHL/010WC+/LID8464_FC61LTAAAXX_2_2.txt.75mers.fastq,/woldlab/glusterfs/data/ENCODE_CSHL/010WC+/LID8464_FC61LTAAAXX_3_2.txt.75mers.fastq,/woldlab/glusterfs/data/ENCODE_CSHL/010WC+/LID8464_FC61LTAAAXX_4_2.txt.75mers.fastq" queue arguments="/woldlab/glusterfs/data/tophat-1.0.14/bin/tophat -o /woldlab/glusterfs/data/hamrhein/condor/20100916/HeLaS3-WC-PolyA-011WC+-r41-std92 -p 4 -r 41 --mate-std-dev 92 hg19-female /woldlab/glusterfs/data/ENCODE_CSHL/011WC+/LID16633_FC61U2UAAXX_4_1.txt.75mers.fastq,/woldlab/glusterfs/data/ENCODE_CSHL/011WC+/LID16633_FC61U2UAAXX_5_1.txt.75mers.fastq,/woldlab/glusterfs/data/ENCODE_CSHL/011WC+/LID16633_FC61U2UAAXX_6_1.txt.75mers.fastq /woldlab/glusterfs/data/ENCODE_CSHL/011WC+/LID16633_FC61U2UAAXX_4_2.txt.75mers.fastq,/woldlab/glusterfs/data/ENCODE_CSHL/011WC+/LID16633_FC61U2UAAXX_5_2.txt.75mers.fastq,/woldlab/glusterfs/data/ENCODE_CSHL/011WC+/LID16633_FC61U2UAAXX_6_2.txt.75mers.fastq" queue