Differences between revisions 14 and 15
Revision 14 as of 2014-07-15 21:34:23
Size: 4409
Editor: diane
Comment:
Revision 15 as of 2015-04-13 22:29:59
Size: 3471
Editor: hamrhein
Comment: Replaced Tophat section with functional example
Deletions are marked like this. Additions are marked like this.
Line 56: Line 56:
  (note: you will need to change the paths)
  (s
etting PATH with environment replaces your current PATH, so you can't access your current
  
path with $PATH. instead of setting the environment you can tell condor to reuse your current environment with getenv=True )
  (a
lso you can break long argument lines at spaces by ending the line with \ )
  '''note:s'''
 * Y
ou will need to fill-in all the variables
 * S
etting PATH with environment replaces your current PATH, so you can't access your current path with $PATH.
 * I
nstead of setting the environment you can tell condor to reuse your current environment with getenv=True
 * A
lso you can break long argument lines at spaces by ending the line with \
Line 62: Line 64:
environment="PATH=/bin:/usr/bin/:/woldlab/glusterfs/data/bowtie-0.12.5:/woldlab/glusterfs/data/tophat-1.0.14/bin BOWTIE_INDEXES=/woldlab/glusterfs/data/bowtie-0.12.5/indexes/"
environment = "PATH=/bin:/usr/bin:/usr/local/bin:/woldlab/castor/proj/genome/programs/${BOWTIE_DIR}"
Line 65: Line 69:
log=tophat.$(Process).log
output=tophat.$(Process).out
error=tophat.$(Process).err
error = ${BASE_DIR}/logs/tophat.$(Process).err
output = ${BASE_DIR}/logs/tophat.$(Process).out
log = ${BASE_DIR}/logs/tophat.$(Process).log
Line 69: Line 73:
request_cpus = 4
request_memory = 8000
request_cpus = 6
request_memory = 4000
Line 73: Line 77:
arguments="/woldlab/glusterfs/data/tophat-1.0.14/bin/tophat \
          -o /woldlab/glusterfs/data/hamrhein/condor/20100916/HUVEC-WC-PolyA-010WC+-r147-std54 \
          -p 4 -r 147 --mate-std-dev 54 hg19-male /woldlab/glusterfs/data/ENCODE_CSHL/010WC+/LID8464_FC61LTAAAXX_2_1.txt.75mers.fastq,/woldlab/glusterfs/data/ENCODE_CSHL/010WC+/LID8464_FC61LTAAAXX_3_1.txt.75mers.fastq,/woldlab/glusterfs/data/ENCODE_CSHL/010WC+/LID8464_FC61LTAAAXX_4_1.txt.75mers.fastq /woldlab/glusterfs/data/ENCODE_CSHL/010WC+/LID8464_FC61LTAAAXX_2_2.txt.75mers.fastq,/woldlab/glusterfs/data/ENCODE_CSHL/010WC+/LID8464_FC61LTAAAXX_3_2.txt.75mers.fastq,/woldlab/glusterfs/data/ENCODE_CSHL/010WC+/LID8464_FC61LTAAAXX_4_2.txt.75mers.fastq"
queue
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
Line 78: Line 80:
arguments="/woldlab/glusterfs/data/tophat-1.0.14/bin/tophat -o /woldlab/glusterfs/data/hamrhein/condor/20100916/HeLaS3-WC-PolyA-011WC+-r41-std92 -p 4 -r 41 --mate-std-dev 92 hg19-female /woldlab/glusterfs/data/ENCODE_CSHL/011WC+/LID16633_FC61U2UAAXX_4_1.txt.75mers.fastq,/woldlab/glusterfs/data/ENCODE_CSHL/011WC+/LID16633_FC61U2UAAXX_5_1.txt.75mers.fastq,/woldlab/glusterfs/data/ENCODE_CSHL/011WC+/LID16633_FC61U2UAAXX_6_1.txt.75mers.fastq /woldlab/glusterfs/data/ENCODE_CSHL/011WC+/LID16633_FC61U2UAAXX_4_2.txt.75mers.fastq,/woldlab/glusterfs/data/ENCODE_CSHL/011WC+/LID16633_FC61U2UAAXX_5_2.txt.75mers.fastq,/woldlab/glusterfs/data/ENCODE_CSHL/011WC+/LID16633_FC61U2UAAXX_6_2.txt.75mers.fastq" transfer_input_files = ${ALL_FASTQ_FILES}
transfer_output_files = XFER_${LIB}/accepted_hits.bam
transfer_output_remaps = "accepted_hits.bam = ${OUTPUT_DIR}/accepted_hits.bam"

arguments = "${TOPHAT_DIR}/tophat --bowtie1 -o XFER_${LIB} -p 4 -G ${GENES_DIR}/${GTF_FILE} --transcriptome-index ${GENES_DIR}/ --no-novel-juncs --library-type fr-unstranded ${INDEX_DIR}/${GENOME_BASE} ${FASTQ_FILES_1} ${FASTQ_FILES_2}"

A Quick Word on Files

Condor has the ability to work with files which live on the NFS server (castor, loxcyc, rattus) as well as files local to the execute host. If you plan to work with a ton of small files or a handful of large files, feel free to use the NFS server as the source for your files. If you have a bunch of large files to process, you'll likely be better off telling Condor to transfer the files to the execute host before executing your job. Not only will you get better performance, everyone else will still be able to use the NFS server, allowing you to save face at the same time...Trust me, I speak from experience.

To transfer files to the execute host, use the following directives:

should_transfer_files = YES
when_to_transfer_output = ON_EXIT
transfer_input_files = /full/path/to/infile1,/full/path/to/infile2,...

This can be done globally by placing these directives at the top of the recipe, or on a per-job basis by placing them before each "queue" directive.

Bowtie Template

universe=vanilla

environment="BOWTIE_INDEXES=/proj/genome/programs/bowtie-0.12.1/indexes OUTDIR=/full/path/to/output"

executable=/proj/genome/programs/bowtie-0.12.1/bowtie
arguments=hg19sp75spike -v 2 -k 11 -m 10 --best --strata -p 4 -q $(OUTDIR)/1184_1_1.fastq --un $(OUTDIR)/1184_1_1.unmapped.fa --max $(OUTDIR)/1185_1_1.repeat.fa $(OUTDIR)/1184_1_1.bowtie.txt

log=bowtie.$(Process).log
output=bowtie.$(Process).out
error=bowtie.$(Process).err

request_cpus = 4
request_memory = 8000
request_disk = 0

queue

It's important to set the "request_cpus" variable to match the -p option to bowtie. It's also probably a good idea to set the "request_memory" to a more realistic value...8000 is almost 8 Gigs

ERANGE Template

universe=vanilla

environment="PYTHONPATH=/path/to/cistematic/root CISTEMATIC_ROOT=/path/to/cistematic/root ERANGEPATH=/path/to/erange"

executable=/bin/sh

log=rrpa.$(Process).log
output=rrpa.$(Process).out
error=rrpa.$(Process).err

getenv = true

arguments = $(ERANGEPATH)/doc/runRNAPairedAnalysis.sh hsapiens 1184_1_1 /proj/genome/gbdb/hg19/hg19.rmsk.db
queue

Tophat Template

  • note:s

  • You will need to fill-in all the variables
  • Setting PATH with environment replaces your current PATH, so you can't access your current path with $PATH.
  • Instead of setting the environment you can tell condor to reuse your current environment with getenv=True
  • Also you can break long argument lines at spaces by ending the line with \

universe=vanilla

environment = "PATH=/bin:/usr/bin:/usr/local/bin:/woldlab/castor/proj/genome/programs/${BOWTIE_DIR}"

executable=/usr/bin/python

error = ${BASE_DIR}/logs/tophat.$(Process).err
output = ${BASE_DIR}/logs/tophat.$(Process).out
log = ${BASE_DIR}/logs/tophat.$(Process).log

request_cpus = 6
request_memory = 4000
request_disk = 0

should_transfer_files = YES
when_to_transfer_output = ON_EXIT

transfer_input_files = ${ALL_FASTQ_FILES}
transfer_output_files = XFER_${LIB}/accepted_hits.bam
transfer_output_remaps = "accepted_hits.bam = ${OUTPUT_DIR}/accepted_hits.bam"

arguments = "${TOPHAT_DIR}/tophat --bowtie1 -o XFER_${LIB} -p 4 -G ${GENES_DIR}/${GTF_FILE} --transcriptome-index ${GENES_DIR}/ --no-novel-juncs --library-type fr-unstranded ${INDEX_DIR}/${GENOME_BASE} ${FASTQ_FILES_1} ${FASTQ_FILES_2}"
queue

WoldlabWiki: Condor/Templates (last edited 2015-04-13 22:38:12 by hamrhein)