Differences between revisions 7 and 8
Revision 7 as of 2012-03-20 21:44:50
Size: 2287
Editor: diane
Comment:
Revision 8 as of 2014-01-10 21:47:20
Size: 2443
Editor: diane
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
Quick Condor Notes = Condor =
Line 3: Line 3:
We're experimenting with using Condor as a queuing system. The first pass has pongo.cacr.caltech.edu configured as the submit host, and myogenin.cacr.caltech.edu and mondom.cacr.caltech.edu configured as the execute hosts. There's two fundamental steps to using condor.
Line 5: Line 5:
Basically that means you should run condor_submit on pongo, but your jobs will actually run on myogenin & mondom. For example to run a random python script.  1. Writing a condor submit script
 2. Submitting it with condor_submit

You should run {{{ condor_submit my_script.condor }}} on pongo.cacr.caltech.edu

The other most useful Condor commands are:

 * condor_status - see what machines are running jobs
 * condor_q - see the current list of jobs waiting to run
 * condor_rm - remove a job from the queue.

== Writing a script ==

There's a fair amount of boiler-plate text that goes into a condor script. Here's close to the simplest example
Line 8: Line 21:
# this is a hackish way to make a file contianing the lines between the EOFs
$ cat >myscript.condor <<EOF
Line 14: Line 25:
log=script.status log=script.log
Line 17: Line 29:
EOF
$ condor_submit myscript.conor
Line 21: Line 31:
the important parts are:
  * executable - what program you want to run.
  * arguments - the options to pass to the executable
  * queue - tell condor to combine the program in executable with the arguments to create something runnable.
Line 22: Line 36:
The log file will give information about where the program is running and if it aborted for some reason.
The output file contains the standard output from the program, the error file contains the standard error. If you list the same file for both it'll end up looking like you ran the program on a terminal with both normal output and error output mixed together.
Line 23: Line 39:
The condor user documentation is at http://www.cs.wisc.edu/condor/manual/v7.4/2_Users_Manual.html You can list argument / queue multiple times -- this will tell condor that there are multiple "processes" that you want to have run.

== More Information ==

The condor user documentation is at http://research.cs.wisc.edu/htcondor/manual/v7.8/index.html
Line 27: Line 47:
One difficulty with a queuing system is they want to view a single executable as only taking one cpu which isn't true for either multi-threaded apps, or applications that start sub-processes. I'm attempting to resolve that by using condor's [[http://www.cs.wisc.edu/condor/manual/v7.4/3_13Setting_Up.html#SECTION004139900000000000000|Dynamic Slots]] feature If you have a multi-threaded application (like bowtie) or an application that starts subprosess (like tophat) you'll need to tell condor how many cpus you expect to use.
Line 29: Line 49:
Instead a job running slot for each cpu with memory/cpu ram available, this method creates a single slot with all the cpus in a single slot. Then as each process gets allocated to the slot the remaning resources are used to create a new slot. However if you want to use a job that uses multiple cpus for a single executable you'll need to add a "request_cpus=N" variable to the condor submit script. (Think for example bowtie, tophat, or make -j).

I do have an example condor submit script with a simple python process that uses multiple cpus in [[http://mus.caltech.edu/~diane/condor/multicpu/|multicpu]].
I do have an example condor submit script with a simple python process that shows how to use multiple cpus in [[http://mus.caltech.edu/~diane/condor/multicpu/|multicpu]].

Condor

There's two fundamental steps to using condor.

  1. Writing a condor submit script
  2. Submitting it with condor_submit

You should run  condor_submit my_script.condor  on pongo.cacr.caltech.edu

The other most useful Condor commands are:

  • condor_status - see what machines are running jobs
  • condor_q - see the current list of jobs waiting to run
  • condor_rm - remove a job from the queue.

Writing a script

There's a fair amount of boiler-plate text that goes into a condor script. Here's close to the simplest example

universe=vanilla
executable=/usr/bin/python
output=script.output
error=script.output
log=script.log

arguments=script.py --do_that_thing
queue

the important parts are:

  • executable - what program you want to run.
  • arguments - the options to pass to the executable
  • queue - tell condor to combine the program in executable with the arguments to create something runnable.

The log file will give information about where the program is running and if it aborted for some reason. The output file contains the standard output from the program, the error file contains the standard error. If you list the same file for both it'll end up looking like you ran the program on a terminal with both normal output and error output mixed together.

You can list argument / queue multiple times -- this will tell condor that there are multiple "processes" that you want to have run.

More Information

The condor user documentation is at http://research.cs.wisc.edu/htcondor/manual/v7.8/index.html

A tutorial presentation (.ppt) and Videos from the 2008 Condor Week Presentations.

If you have a multi-threaded application (like bowtie) or an application that starts subprosess (like tophat) you'll need to tell condor how many cpus you expect to use.

I do have an example condor submit script with a simple python process that shows how to use multiple cpus in multicpu.


Distributed computing in practice:the Condor experience is a paper describing the history and goals of the Condor project.

WoldlabWiki: Condor (last edited 2014-07-15 21:33:54 by diane)