htsworkflow.git
15 years agoremove cruft from an unimplemented feature merge-proposal
Diane Trout [Tue, 17 Feb 2009 22:45:51 +0000 (22:45 +0000)]
remove cruft from an unimplemented feature

15 years agoremove all caltech pipeline specific code
Diane Trout [Sat, 14 Feb 2009 00:33:23 +0000 (00:33 +0000)]
remove all caltech pipeline specific code

15 years agomake a branch to discuss what our merged front end should like
Diane Trout [Sat, 14 Feb 2009 00:23:45 +0000 (00:23 +0000)]
make a branch to discuss what our merged front end should like

15 years agoReverse a mistake. DataRuns is from Rami and should use Rami's field names
Diane Trout [Fri, 13 Feb 2009 23:48:18 +0000 (23:48 +0000)]
Reverse a mistake. DataRuns is from Rami and should use Rami's field names

15 years agoenable sort of library view by descending library_id
Diane Trout [Fri, 13 Feb 2009 23:35:08 +0000 (23:35 +0000)]
enable sort of library view by descending library_id

15 years agosave longer comment describing change
Diane Trout [Fri, 13 Feb 2009 21:41:43 +0000 (21:41 +0000)]
save longer comment describing change

15 years agoAparently some more recent version of multi eland stopped
Diane Trout [Fri, 13 Feb 2009 01:59:42 +0000 (01:59 +0000)]
Aparently some more recent version of multi eland stopped
writing the - and just uses fewer fields. when there were too
many hits.

15 years agoMerge in the library list, detail, and results downloading feature from
Diane Trout [Fri, 13 Feb 2009 01:51:58 +0000 (01:51 +0000)]
Merge in the library list, detail, and results downloading feature from
the Caltech live site.

There's several components in the frontend tree to render the pages,
in addition this adds in some helper functions in pipelines.eland
to simplify computing summary statistics for an eland lane.

I also needed to merge in a generator based makebed code for
returning the files to the django database.

To use this, the settings file in this branch will need a variable
RESULT_HOME_DIR to be set.

15 years agouse the compression handling auto-opener for our eland files
Diane Trout [Fri, 13 Feb 2009 01:42:06 +0000 (01:42 +0000)]
use the compression handling auto-opener for our eland files

15 years agomake our API docstrings more epydoc friendly
Diane Trout [Thu, 12 Feb 2009 22:38:09 +0000 (22:38 +0000)]
make our API docstrings more epydoc friendly

15 years agoAdd load_pipeline_run_xml, a little function that feeds the xml file into
Diane Trout [Thu, 12 Feb 2009 22:37:08 +0000 (22:37 +0000)]
Add load_pipeline_run_xml, a little function that feeds the xml file into
ElementTree and grabs the useful root

15 years agoDrop 'using %s as cwd' down to just debug level.
Diane Trout [Thu, 5 Feb 2009 00:06:39 +0000 (00:06 +0000)]
Drop 'using %s as cwd' down to just debug level.
It was getting to annoying watching it scroll by constantly

15 years agoextended command line configuration parsing and add config file parsing
Diane Trout [Fri, 30 Jan 2009 20:47:35 +0000 (20:47 +0000)]
extended command line configuration parsing and add config file parsing
for finding the location of our database and sequence archive directories.

15 years agoTry to make runfolder results extraction more robust
Diane Trout [Fri, 30 Jan 2009 02:15:57 +0000 (02:15 +0000)]
Try to make runfolder results extraction more robust
If an IPAR or firecrest directory is missing some of the important
matrix files that implies there isn't actually a valid run present,
this patch will then (hopefully) issue a warning and skip that analysis
run.

I also added an option to scripts/runfolder to allow a user to specify
where the extracted results should go.

One questionable thing is that for one analysis some of the lanes
were run as sequence and not an eland analysis so were I expected
all the lanes to have an eland genome, it doesn't for these.
I hope that the code doesn't lose the index after serializing and
deserializing that chunk example.

15 years agoUpdate to not hard code the config file name and the error message
Diane Trout [Fri, 30 Jan 2009 01:51:50 +0000 (01:51 +0000)]
Update to not hard code the config file name and the error message
for when we don't find it

15 years agoinsert code to do ~ home directory expansion
Diane Trout [Sat, 24 Jan 2009 00:24:18 +0000 (00:24 +0000)]
insert code to do ~ home directory expansion

15 years agoAdd in Rami's report template, and adjust the paths to use "reports" instead
Diane Trout [Fri, 23 Jan 2009 02:23:21 +0000 (02:23 +0000)]
Add in Rami's report template, and adjust the paths to use "reports" instead
of "htsw_reports"

15 years agoAdd id as an AutoNumber(primary_key=True) field and remove the pk from
Diane Trout [Fri, 23 Jan 2009 02:21:09 +0000 (02:21 +0000)]
Add id as an AutoNumber(primary_key=True) field and remove the pk from
library_id.

Stanford decided to use library_id as a text field so they could use
library IDs like "SL100". Caltech just used the raw sql id, so the
foreign key reference in experiments_flowcells was expecting a numeric
id, but since the model had the text field as the primary key things
didn't work.

15 years agoMerge in Rami's changes from last friday.
Diane Trout [Wed, 21 Jan 2009 02:50:22 +0000 (02:50 +0000)]
Merge in Rami's changes from last friday.

Most of the admin pages work. Though there's a wsgi error with the reports.
I'll try to figure out tomorrow.

the biggest difference between caltech trunk and stanford schemas right now
is caltech is using made_for as a foreign key, and stanford is using it
as a text field.

15 years agoadd some testing code for the runner daemon
Diane Trout [Wed, 14 Jan 2009 01:18:42 +0000 (01:18 +0000)]
add some testing code for the runner daemon

15 years agoadd empty admin.py for eland_config app
Diane Trout [Wed, 14 Jan 2009 01:17:16 +0000 (01:17 +0000)]
add empty admin.py for eland_config app

15 years agoMerged much of the stanford htsworkflow frontend into trunk.
Diane Trout [Wed, 14 Jan 2009 01:12:47 +0000 (01:12 +0000)]
Merged much of the stanford htsworkflow frontend into trunk.
Updated to be compatable with Django 1.0

A big change for the 1.0 compatibility is the Admin class that was
attached to models was moved into a seperate file admin.py

I probably munged some of the fieldset formatting in the conversion process.

15 years agoThis is a partial merge of the stanford branch with the caltech branch of
Diane Trout [Thu, 8 Jan 2009 20:12:03 +0000 (20:12 +0000)]
This is a partial merge of the stanford branch with the caltech branch of
the web application, it doesn't work correctly yet, the libraries admin page
is broken, and lacks the ability to browe the 'made_for' column.

This is based on a merge that started a few month ago, but I hadn't finished
the merge, I'll need to check for more updates from their branch soon.

During the process I decided it would be a good idea to update to django 1.0
which is going to make things even more unstable, so I thought I should
check this work in progess in before continuing.

15 years agoLook in Temp directories for some of the files we have historically
Diane Trout [Tue, 6 Jan 2009 02:05:10 +0000 (02:05 +0000)]
Look in Temp directories for some of the files we have historically
used for our summary reports.

Version 1.1rc1 of the gapipeline started moving some of the files
into /Temp subdirectories of bustard and gerald.

15 years agoHandle paired-end eland files.
Diane Trout [Wed, 24 Dec 2008 23:39:31 +0000 (23:39 +0000)]
Handle paired-end eland files.

This required changing the ELAND class to hold a list of dictionaries
from its previous implmentation where it was exporting an internal dictionary
of the lanes.

I decided to directly show the internal list and to remove the previous
dictionary methods to make it more obvious when code was expecting
the previous behavior.

Also a saved runfolder will now have eland files of the form
s_<lane id>_<end id>.

Internally the end is 0 or 1, I tried to make the display show 1 or 2 for
the users benefit though.

15 years agoremove a debug print statement
Diane Trout [Wed, 24 Dec 2008 23:34:23 +0000 (23:34 +0000)]
remove a debug print statement

15 years agoAdd test cases for alphanum sort
Diane Trout [Wed, 24 Dec 2008 23:33:51 +0000 (23:33 +0000)]
Add test cases for alphanum sort

15 years agoSupport sorting numbers along with the alphanumeric strings
Diane Trout [Wed, 24 Dec 2008 23:33:14 +0000 (23:33 +0000)]
Support sorting numbers along with the alphanumeric strings

also I cleaned up the indent a bit

15 years agochange from hand coded formatting functions to the built in python
Diane Trout [Tue, 23 Dec 2008 02:06:27 +0000 (02:06 +0000)]
change from hand coded formatting functions to the built in python
C-style printf formatting

15 years agoUse the right URLError attribute names for error messages
Diane Trout [Tue, 23 Dec 2008 02:05:35 +0000 (02:05 +0000)]
Use the right URLError attribute names for error messages

15 years agoupdate make-tree-library script with new default location
Diane Trout [Mon, 22 Dec 2008 22:50:46 +0000 (22:50 +0000)]
update make-tree-library script with new default location

15 years agofix the multi-eland parser to strip off extensions and not the last 3
Diane Trout [Mon, 22 Dec 2008 20:44:15 +0000 (20:44 +0000)]
fix the multi-eland parser to strip off extensions and not the last 3
characters of the filename.

15 years agoclean up the logic for deciding the output filename when using stdin
Diane Trout [Mon, 22 Dec 2008 20:43:32 +0000 (20:43 +0000)]
clean up the logic for deciding the output filename when using stdin
as the input

15 years agoAdd command to report path to make figuring out which goat_pipeline is running
Diane Trout [Fri, 19 Dec 2008 00:54:06 +0000 (00:54 +0000)]
Add command to report path to make figuring out which goat_pipeline is running

15 years agorename config file to something that doesn't include the read length
Diane Trout [Thu, 18 Dec 2008 23:43:38 +0000 (23:43 +0000)]
rename config file to something that doesn't include the read length
since that has been changing.

also a minor code clean up.

15 years agoThe summary parsing code now seems to handle paired end runs
Diane Trout [Wed, 10 Dec 2008 01:00:25 +0000 (01:00 +0000)]
The summary parsing code now seems to handle paired end runs
this required changing how the lane_results were being stored,
previously it was a dictionary indexed by lane, now it is a list
of dictionaries, where the list index indicates which "end" of
a paired end run it is. (0 is the first, 1 is the second)

Also I got tired of being forced to use strings for the lane index
by element tree and modified the code so it converts the strings
required by element tree to integers for our internal dictionaries.

15 years agoTest 1.1rc1 style runs, which unfortunately require a hack for parsing
Diane Trout [Tue, 9 Dec 2008 01:19:23 +0000 (01:19 +0000)]
Test 1.1rc1 style runs, which unfortunately require a hack for parsing
the summary.htm     files since illumina's html is invalid.
They forgot to use &lt; when writing <=. Most web browsers will ignore
it, but ElementTree is pickier.

Also as of this commit the summary parsing code still doesn't understand
paired end runs so the paired end summary file parsing tests still fail.

15 years agomake-library-tree is a tool to maintain caltech's version of our solexa
Diane Trout [Wed, 3 Dec 2008 22:25:26 +0000 (22:25 +0000)]
make-library-tree is a tool to maintain caltech's version of our solexa
results archive.

15 years agoAdd test code to see if runfolder can handle something that looks like a
Diane Trout [Wed, 3 Dec 2008 22:24:29 +0000 (22:24 +0000)]
Add test code to see if runfolder can handle something that looks like a
paired end run.

15 years agoAdd code to create a paired end Summary.htm file
Diane Trout [Wed, 3 Dec 2008 22:22:31 +0000 (22:22 +0000)]
Add code to create a paired end Summary.htm file

15 years agoStore the bustard pathname when searching for run folders
Diane Trout [Wed, 3 Dec 2008 22:21:16 +0000 (22:21 +0000)]
Store the bustard pathname when searching for run folders
This was needed so the srf file can use the same runfolder scanning
code as the --extract-results feature.

15 years agoUse the get_runs from htsworkflow.pipelines.runfolder
Diane Trout [Fri, 21 Nov 2008 01:15:27 +0000 (01:15 +0000)]
Use the get_runs from htsworkflow.pipelines.runfolder
On the plus side this means it'll handle IPAR files, on the downside
it means that the srf program will crash if there's something wrong with
the summary.htm file or if there's an ipar directory that doesn't have
a run in it.
(I really need to add some code to get_runs to skip over IPAR directories that
are being ignored.)

15 years agoForgot to change a import htsworkflow.pipeline to htsworflow.pipelines.
Diane Trout [Fri, 14 Nov 2008 19:04:59 +0000 (19:04 +0000)]
Forgot to change a import htsworkflow.pipeline to htsworflow.pipelines.

15 years agoUpdated ipar_100 test case to deal with the using U0/1/2 vs R0/1/2
Diane Trout [Thu, 6 Nov 2008 22:49:40 +0000 (22:49 +0000)]
Updated ipar_100 test case to deal with the using U0/1/2 vs R0/1/2
(my first implementation was to just dump all of the multi reads into
U0/1/2)

15 years agoProcess eland extended (or multi) read files.
Diane Trout [Thu, 6 Nov 2008 22:39:24 +0000 (22:39 +0000)]
Process eland extended (or multi) read files.

This also updates the report tools to be compatible with 1.0.
For multi reads I mapped 0/1/2 mismatch reads to U0/U1/U2 if the number of
reads equaled 1 (for each category seperatly) and I mapped reads >1 and < 255
to R0/R1/R2.

Unfortunately 1.1rc1 changed the summary file, so this patch is not
compatible with it yet.

15 years agoThe htsworkflow.pipelines.gerald module was getting to large
Diane Trout [Thu, 30 Oct 2008 22:28:01 +0000 (22:28 +0000)]
The htsworkflow.pipelines.gerald module was getting to large
so I broke the portion that analyzed the Summary.htm file and
the eland_result files into seperate modules in anticipation
of extending the eland code to handle some of the newer eland
result file types.

15 years agoAdd support for scanning for results in the IPAR directory.
Diane Trout [Thu, 30 Oct 2008 22:03:12 +0000 (22:03 +0000)]
Add support for scanning for results in the IPAR directory.

The field that was the firecrest class in PipelineRun is now the
"image_analysis" field and can be either firecrest or ipar.

I also extracted some of the common functions out of the runfolder test
modules and added them to a seperate "simulate_runfolder" module.

15 years agoAdd "_slow" to the end of the queuecommand test functions
Diane Trout [Thu, 30 Oct 2008 21:59:56 +0000 (21:59 +0000)]
Add "_slow" to the end of the queuecommand test functions
this allows "nosetests --exclude=slow" to skip them.

15 years agoupdate setup.py for some package renames and some missing scripts
Diane Trout [Tue, 28 Oct 2008 21:25:00 +0000 (21:25 +0000)]
update setup.py for some package renames and some missing scripts

15 years agoMerge in new modules from htsworkflow branch.
Diane Trout [Tue, 21 Oct 2008 19:44:25 +0000 (19:44 +0000)]
Merge in new modules from htsworkflow branch.

However I renamed things to simpler names.

analys_track -> analysis
exp_track -> experiments
fctracker -> samples
htsw_reports -> reports

As a result this check in probably wont work as I haven't finished
updating all the imports

15 years agoMerge in model changes to fctracker from htsworkflow branch
Diane Trout [Tue, 21 Oct 2008 19:39:50 +0000 (19:39 +0000)]
Merge in model changes to fctracker from htsworkflow branch

15 years agoupdate scripts for the pipeline to pipelines module rename
Diane Trout [Tue, 21 Oct 2008 19:02:49 +0000 (19:02 +0000)]
update scripts for the pipeline to pipelines module rename

15 years agorename pipeline to pipelines to imply that we can process more than just illumina.
Diane Trout [Wed, 15 Oct 2008 19:49:34 +0000 (19:49 +0000)]
rename pipeline to pipelines to imply that we can process more than just illumina.

15 years agoRename trunk from gaworkflow to htsworkflow (and update all of the imports)
Diane Trout [Wed, 15 Oct 2008 18:59:34 +0000 (18:59 +0000)]
Rename trunk from gaworkflow to htsworkflow (and update all of the imports)
Fix the queuecommands test script to deal with the 1 sec delay hack

15 years agosolexa2srf likes to produce output, so my trick of watching the
Diane Trout [Thu, 25 Sep 2008 00:04:19 +0000 (00:04 +0000)]
solexa2srf likes to produce output, so my trick of watching the
sockets to block until when the process ends didn't work.

This patch inserts a simple sleep(1) (second) into the code that
waits for the jobs to finish to prevent the queue manager from rapidly
spinning.

It should probably be fixed with a better way of monitoring for when
a process finishes

15 years agouse _ for field seperator in srf file names. (Using a uniform seperator
Diane Trout [Thu, 25 Sep 2008 00:02:25 +0000 (00:02 +0000)]
use _ for field seperator in srf file names. (Using a uniform seperator
makes it easier to process the files later. Not to mention avoiding
characters that are "special" like : is a good idea for multi-platform
compatibility)

15 years agoUse queuecommands.run not queuecommands.start_job to actually
Diane Trout [Thu, 18 Sep 2008 23:28:58 +0000 (23:28 +0000)]
Use queuecommands.run not queuecommands.start_job to actually
wait to launch additional processes

15 years agoBe a little more informative about how many process are left to run
Diane Trout [Thu, 18 Sep 2008 22:53:26 +0000 (22:53 +0000)]
Be a little more informative about how many process are left to run
and what the exit code was in queuecommands.py

15 years agoextract status field out of flowcell name.
Diane Trout [Fri, 5 Sep 2008 21:56:38 +0000 (21:56 +0000)]
extract status field out of flowcell name.

For gaworkflow we abused the schema and stored the flow cell status
in the flow cell name field, this patch updates my sqlite interface
to the fctracker db to split that field.

15 years agoAdd support for converting mutli-eland files from pipeline 0.3 to
Diane Trout [Fri, 29 Aug 2008 16:51:24 +0000 (16:51 +0000)]
Add support for converting mutli-eland files from pipeline 0.3 to
bedfiles

15 years agoinsert stub clean_runs function to list roughly what I think I can
Diane Trout [Fri, 29 Aug 2008 16:51:23 +0000 (16:51 +0000)]
insert stub clean_runs function to list roughly what I think I can
delete before compressing the runfolder

15 years agoImprove code to extract runfolder name from the path to the runfolder.
Diane Trout [Fri, 15 Aug 2008 22:46:46 +0000 (22:46 +0000)]
Improve code to extract runfolder name from the path to the runfolder.

This version will actually convert relative paths into an absolute path
before extracting the runfolder name, as well as grabbing the right name
if there's a trailing /

15 years agoIn trying to get scripts/srf to work I needed to set subprocess.Popen to
Diane Trout [Thu, 14 Aug 2008 20:58:15 +0000 (20:58 +0000)]
In trying to get scripts/srf to work I needed to set subprocess.Popen to
shell=True, the end result of that is that at least on linux hosts
passing in a list of arguments to Popen doesn't work very well, Popen
needs a string.

Perhaps a better solution would be for queuecommand to take a
shell parameter and if that's true do the joining into a string.

but for the moment I just converted my test case to pass a string
instead of a list.

15 years agorefactor code to make a runfolder out of the UnitTest class.
Diane Trout [Thu, 14 Aug 2008 20:58:15 +0000 (20:58 +0000)]
refactor code to make a runfolder out of the UnitTest class.
I did it so I could more easily make a mini-runfolder for developing
code that needed to scan the runfolder.

15 years agowe might as well automatically save the Summary.htm file as well
Diane Trout [Thu, 14 Aug 2008 00:09:39 +0000 (00:09 +0000)]
we might as well automatically save the Summary.htm file as well

15 years agoUtility to create srf files from a bustard directory
Diane Trout [Thu, 14 Aug 2008 00:09:09 +0000 (00:09 +0000)]
Utility to create srf files from a bustard directory

this version works, as long as you launch it in the bustard directory
in question. There seems to be some messiness in the interaction between
how the list of arguments passed to Popen with shell=True has any file globs
expanded.

I had to switch from passing a list of arguments to Popen to string,
and I'm still not sure if any of the code to try and change the directory
to the bustard directory actually worked correctly.

(which is why it only works when launching from the bustard directory)

15 years agoA bit of refactoring toward making the run progress report code work
Diane Trout [Mon, 11 Aug 2008 23:22:15 +0000 (23:22 +0000)]
A bit of refactoring toward making the run progress report code work
by walking the directory instead of just watching via pyinotify.

mostly this was move where the report formatting code was stored to
someplace a little more shared, and by moving the thread that watches
the directory tree.

15 years agoThe older pipeline runs had a Phi-X control lane which we didn't
Diane Trout [Wed, 16 Jul 2008 00:46:45 +0000 (00:46 +0000)]
The older pipeline runs had a Phi-X control lane which we didn't
run eland against, so the total number of eland entries in
the GERALD config.xml file was less than 8. So relax testing
that constraint.

15 years agoProvide cross referencing information to the libraries to help find
Diane Trout [Tue, 15 Jul 2008 00:26:48 +0000 (00:26 +0000)]
Provide cross referencing information to the libraries to help find
which lanes provide supporting information

15 years agoFinish updating the Summary parsing file to handle the new 0.3 format
Diane Trout [Mon, 7 Jul 2008 22:19:51 +0000 (22:19 +0000)]
Finish updating the Summary parsing file to handle the new 0.3 format
in addition I split test_runfolder into one that tests 0.2.6 files and
one that tests 0.3 files.

15 years agoPartially handle the changed Summary.htm file from the 0.3 version of the
Diane Trout [Thu, 3 Jul 2008 00:16:50 +0000 (00:16 +0000)]
Partially handle the changed Summary.htm file from the 0.3 version of the
GAPipeline.

This update is incomplete as I'm pretty sure the xml serialization code
for the run xml file will break. However it does generate the summary
report for both the old summary file and the new post 0.3 file.

I also need to add unit tests for parsing and serializing the 0.3
file format.

15 years agoDetect if our watch is on a mount point.
Diane Trout [Tue, 24 Jun 2008 00:36:17 +0000 (00:36 +0000)]
Detect if our watch is on a mount point.

If we're on something that is unmounted, keep watching until there's a
new mount. Once something has been remounted, restart the watch.

15 years agoAdd QueueCommands, a class that allows controlling how many
Diane Trout [Tue, 17 Jun 2008 00:25:03 +0000 (00:25 +0000)]
Add QueueCommands, a class that allows controlling how many
processes to run simultaniously.

I still need to and a driver script to handle getting jobs from the
user.

It's mostly in so I can control launching the solexa2srf commands for
submitting stuff to the SRA.

15 years agodon't use os.path.normpath when pathname is null in PipelineRun
Diane Trout [Fri, 6 Jun 2008 21:02:36 +0000 (21:02 +0000)]
don't use os.path.normpath when pathname is null in PipelineRun

15 years agoadd --extract-results to scripts/runfolder
Diane Trout [Thu, 5 Jun 2008 22:24:22 +0000 (22:24 +0000)]
add --extract-results to scripts/runfolder
this will build a directory tree with <flowcellID>/<cycle count>/
with the various eland result files, run_*.xml files, etc.

15 years agoSome of the older flow cells used a default genome for eland instead
Diane Trout [Thu, 29 May 2008 00:01:50 +0000 (00:01 +0000)]
Some of the older flow cells used a default genome for eland instead
of specifying the genome path for each lane.

This patch will look up in the chipwidedefaults for the eland_genome if
it isn't found in the lane specific parameters

15 years agoCompute all the details needed to create our 25bp rerun given just
Diane Trout [Wed, 28 May 2008 00:42:19 +0000 (00:42 +0000)]
Compute all the details needed to create our 25bp rerun given just
a runfolder.
(This assumes more than the --gerald/-o version that I first
implemented, which is still available).

Now you can give rerun_eland a runfolder name, and it will (if there's
only 1 run found by pipeline.runfolder) extract the bases from that
into a new Data/C1-<length+1> directory and should launch eland.

15 years agoignore more *.py[co~] files in some of our test directories
Diane Trout [Tue, 27 May 2008 22:52:13 +0000 (22:52 +0000)]
ignore more *.py[co~] files in some of our test directories

15 years agoadd --run-xml to runfolder so you can generate summary reports from a
Diane Trout [Fri, 23 May 2008 21:37:05 +0000 (21:37 +0000)]
add --run-xml to runfolder so you can generate summary reports from a
previously analyzed runfolder

15 years agoUpdate pipeline.gerald to handle eland_result files that have been bzipped.
Diane Trout [Fri, 23 May 2008 21:33:07 +0000 (21:33 +0000)]
Update pipeline.gerald to handle eland_result files that have been bzipped.
Also I added my opener module which will try to guess the right
compression utility for a file.

15 years agoBegining of consolidation with trunk/stanford variatants of the database.
Brandon King [Mon, 19 May 2008 22:49:09 +0000 (22:49 +0000)]
Begining of consolidation with trunk/stanford variatants of the database.

15 years agoadd rerun_eland.py which extracts sub-sequences from eland files and runs
Diane Trout [Thu, 15 May 2008 00:32:47 +0000 (00:32 +0000)]
add rerun_eland.py which extracts sub-sequences from eland files and runs
eland on them with a new sequence length.

The script also helpfully uses the gerald config file to figure out the
correct genome path.

15 years agoseparate computing the sample/lane_id names from calculating read counts
Diane Trout [Wed, 14 May 2008 23:00:47 +0000 (23:00 +0000)]
separate computing the sample/lane_id names from calculating read counts

the read count computation takes a long time, and if we just want to
quickly access some information from the gerald directory it was really
annoying to wait for it to finish.

15 years agov0.2.0 progress
Brandon King [Wed, 14 May 2008 00:01:27 +0000 (00:01 +0000)]
v0.2.0 progress
 * Commented out eland_result table as it is not being used by either site and Stanford has implemented something that is probably more useful, so we will like import that.
 * Person has been renamed to UserProfile and has been intergrated with the user profiles feature of Django (http://www.djangobook.com/en/1.0/chapter12/#cn222), which allows you to get access to the "profile" information by using user.get_profile().
 * Added Lab which just contains a name... This will be used to implement user/lab level access to Flowcell/Library information.

15 years agoadd additional debugging logging to retrieve_config and configure_pipeline
Diane Trout [Tue, 13 May 2008 16:36:55 +0000 (16:36 +0000)]
add additional debugging logging to retrieve_config and configure_pipeline
to help figure out why it was failing. (which turned out to originally be
because of user error)

15 years agologging.basicConfig should only be in top level scripts.
Diane Trout [Tue, 13 May 2008 16:17:11 +0000 (16:17 +0000)]
logging.basicConfig should only be in top level scripts.
using basicConfig in a module causes problems because it's likely
to override the users logging.basicConfig. (from some other
top level script that's using logging correctly)

15 years agomake it possible to include all alignments, not just the ones that match
Diane Trout [Sat, 10 May 2008 04:32:25 +0000 (04:32 +0000)]
make it possible to include all alignments, not just the ones that match
chromosomes.

15 years agomakebed is a script too
Diane Trout [Sat, 10 May 2008 00:18:41 +0000 (00:18 +0000)]
makebed is a script too

15 years agoKeep track of sample_name and lane_id computed from the eland
Diane Trout [Sat, 10 May 2008 00:18:24 +0000 (00:18 +0000)]
Keep track of sample_name and lane_id computed from the eland
filename.

Perhaps I should have more code checking to make sure its of the form
s_?_eland_result.txt

15 years agoMake the runfolder splitting patch a bit more python 2.4 compatible
Diane Trout [Fri, 9 May 2008 03:51:30 +0000 (03:51 +0000)]
Make the runfolder splitting patch a bit more python 2.4 compatible
Python2.4 doesn't have datetime.strptime, nor does it have
a built in copy of ElementTree in the xml.etree namespace,

15 years agoupdate the setup.py file to the new name for the runfolder script
Diane Trout [Fri, 9 May 2008 03:22:39 +0000 (03:22 +0000)]
update the setup.py file to the new name for the runfolder script

15 years agoMove runfolder analysis classes out of scripts/runfolder.py into seperate files
Diane Trout [Fri, 9 May 2008 03:21:16 +0000 (03:21 +0000)]
Move runfolder analysis classes out of scripts/runfolder.py into seperate files
Also rename runfolder.py to runfolder

This was a really annoying patch to make, I wanted to do two major things,
be able to construct the runfolder configuration extracting classes
from the xml file I was creating, and to make unit tests to make sure
all the code was at least somewhat correct.

Writing all of the xml serialization code was really annoying and dull,
there was probably some nifty metaprogramming way of solving it, but
I didn't feel like figuring it out, as I really need to move on to
more important parts of the project.

I wanted to rename runfolder.py to runfolder as the solexa pipeline code
has a runfolder.py (and if anyone has a better name for the script that's
supposed to dump the runfolder xml file, let me know).

Also in working on the xml serialization code, I extended the serialization
for the eland files, this version now dumps the genome_map and the
eland statistics, like reads, match counts and the like. It does
mean that the --archive mode will take longer, but it also means
I'll have enough information to generate the run statistics later.

Now I might have to redo this if we figure out if we should be handling
the realign files instead.

15 years agoAdd a script that takes a set of eland_result files and makes bedfiles
Diane Trout [Thu, 24 Apr 2008 00:25:42 +0000 (00:25 +0000)]
Add a script that takes a set of eland_result files and makes bedfiles
it'll also look up the lane descriptions in the flowcell database

15 years agoReport cluster results with the rest of the lane summary information.
Diane Trout [Thu, 24 Apr 2008 00:24:26 +0000 (00:24 +0000)]
Report cluster results with the rest of the lane summary information.
this involved breaking names like "s_1" into their sample and lane identifiers
and then exclusively using the lane identifiers.

One complexity is that I still had to treat the lane IDs as keys into
a dictionary instead of offsets into a list, because the lanes
were labeled in the range 1..8, but python's list indexes would have
been 0..7.

I also changed the report code to return a string instead of printing
stuff to stdout, to make it easier for me to integrate it into code
to email the summary report.

15 years agoalso since nothing is currently using the pipelineFinished message
Diane Trout [Tue, 22 Apr 2008 21:58:04 +0000 (21:58 +0000)]
also since nothing is currently using the pipelineFinished message
from runner remove it

15 years agooops forgot to remove some debugging statements from the previous patch
Diane Trout [Tue, 22 Apr 2008 00:26:58 +0000 (00:26 +0000)]
oops forgot to remove some debugging statements from the previous patch

15 years agoExtend makebed to lookup metadata out of a copy of the fctracker database
Diane Trout [Tue, 22 Apr 2008 00:02:16 +0000 (00:02 +0000)]
Extend makebed to lookup metadata out of a copy of the fctracker database

15 years agosplit the library script into a reusable database/reporting layer
Diane Trout [Mon, 21 Apr 2008 23:34:51 +0000 (23:34 +0000)]
split the library script into a reusable database/reporting layer
and command line script.

15 years agoIgnore binary files generated by python (*.py[co])
Diane Trout [Mon, 21 Apr 2008 22:21:32 +0000 (22:21 +0000)]
Ignore binary files generated by python (*.py[co])

16 years agoAdded the same changes I made to the 1.1 branch, all display-related
Lorian Schaeffer [Sat, 19 Apr 2008 00:13:11 +0000 (00:13 +0000)]
Added the same changes I made to the 1.1 branch, all display-related