htsworkflow.git
15 years agosave a copy of the merged tree before we start modifying it too much stanford.caltech-merged-database-2009-jan-15
Diane Trout [Thu, 15 Jan 2009 23:50:26 +0000 (23:50 +0000)]
save a copy of the merged tree before we start modifying it too much

15 years agoMerged much of the stanford htsworkflow frontend into trunk.
Diane Trout [Wed, 14 Jan 2009 01:12:47 +0000 (01:12 +0000)]
Merged much of the stanford htsworkflow frontend into trunk.
Updated to be compatable with Django 1.0

A big change for the 1.0 compatibility is the Admin class that was
attached to models was moved into a seperate file admin.py

I probably munged some of the fieldset formatting in the conversion process.

15 years agoThis is a partial merge of the stanford branch with the caltech branch of
Diane Trout [Thu, 8 Jan 2009 20:12:03 +0000 (20:12 +0000)]
This is a partial merge of the stanford branch with the caltech branch of
the web application, it doesn't work correctly yet, the libraries admin page
is broken, and lacks the ability to browe the 'made_for' column.

This is based on a merge that started a few month ago, but I hadn't finished
the merge, I'll need to check for more updates from their branch soon.

During the process I decided it would be a good idea to update to django 1.0
which is going to make things even more unstable, so I thought I should
check this work in progess in before continuing.

15 years agoLook in Temp directories for some of the files we have historically
Diane Trout [Tue, 6 Jan 2009 02:05:10 +0000 (02:05 +0000)]
Look in Temp directories for some of the files we have historically
used for our summary reports.

Version 1.1rc1 of the gapipeline started moving some of the files
into /Temp subdirectories of bustard and gerald.

15 years agoHandle paired-end eland files.
Diane Trout [Wed, 24 Dec 2008 23:39:31 +0000 (23:39 +0000)]
Handle paired-end eland files.

This required changing the ELAND class to hold a list of dictionaries
from its previous implmentation where it was exporting an internal dictionary
of the lanes.

I decided to directly show the internal list and to remove the previous
dictionary methods to make it more obvious when code was expecting
the previous behavior.

Also a saved runfolder will now have eland files of the form
s_<lane id>_<end id>.

Internally the end is 0 or 1, I tried to make the display show 1 or 2 for
the users benefit though.

15 years agoremove a debug print statement
Diane Trout [Wed, 24 Dec 2008 23:34:23 +0000 (23:34 +0000)]
remove a debug print statement

15 years agoAdd test cases for alphanum sort
Diane Trout [Wed, 24 Dec 2008 23:33:51 +0000 (23:33 +0000)]
Add test cases for alphanum sort

15 years agoSupport sorting numbers along with the alphanumeric strings
Diane Trout [Wed, 24 Dec 2008 23:33:14 +0000 (23:33 +0000)]
Support sorting numbers along with the alphanumeric strings

also I cleaned up the indent a bit

15 years agochange from hand coded formatting functions to the built in python
Diane Trout [Tue, 23 Dec 2008 02:06:27 +0000 (02:06 +0000)]
change from hand coded formatting functions to the built in python
C-style printf formatting

15 years agoUse the right URLError attribute names for error messages
Diane Trout [Tue, 23 Dec 2008 02:05:35 +0000 (02:05 +0000)]
Use the right URLError attribute names for error messages

15 years agoupdate make-tree-library script with new default location
Diane Trout [Mon, 22 Dec 2008 22:50:46 +0000 (22:50 +0000)]
update make-tree-library script with new default location

15 years agofix the multi-eland parser to strip off extensions and not the last 3
Diane Trout [Mon, 22 Dec 2008 20:44:15 +0000 (20:44 +0000)]
fix the multi-eland parser to strip off extensions and not the last 3
characters of the filename.

15 years agoclean up the logic for deciding the output filename when using stdin
Diane Trout [Mon, 22 Dec 2008 20:43:32 +0000 (20:43 +0000)]
clean up the logic for deciding the output filename when using stdin
as the input

15 years agoAdd command to report path to make figuring out which goat_pipeline is running
Diane Trout [Fri, 19 Dec 2008 00:54:06 +0000 (00:54 +0000)]
Add command to report path to make figuring out which goat_pipeline is running

15 years agorename config file to something that doesn't include the read length
Diane Trout [Thu, 18 Dec 2008 23:43:38 +0000 (23:43 +0000)]
rename config file to something that doesn't include the read length
since that has been changing.

also a minor code clean up.

15 years agoThe summary parsing code now seems to handle paired end runs
Diane Trout [Wed, 10 Dec 2008 01:00:25 +0000 (01:00 +0000)]
The summary parsing code now seems to handle paired end runs
this required changing how the lane_results were being stored,
previously it was a dictionary indexed by lane, now it is a list
of dictionaries, where the list index indicates which "end" of
a paired end run it is. (0 is the first, 1 is the second)

Also I got tired of being forced to use strings for the lane index
by element tree and modified the code so it converts the strings
required by element tree to integers for our internal dictionaries.

15 years agoTest 1.1rc1 style runs, which unfortunately require a hack for parsing
Diane Trout [Tue, 9 Dec 2008 01:19:23 +0000 (01:19 +0000)]
Test 1.1rc1 style runs, which unfortunately require a hack for parsing
the summary.htm     files since illumina's html is invalid.
They forgot to use &lt; when writing <=. Most web browsers will ignore
it, but ElementTree is pickier.

Also as of this commit the summary parsing code still doesn't understand
paired end runs so the paired end summary file parsing tests still fail.

15 years agomake-library-tree is a tool to maintain caltech's version of our solexa
Diane Trout [Wed, 3 Dec 2008 22:25:26 +0000 (22:25 +0000)]
make-library-tree is a tool to maintain caltech's version of our solexa
results archive.

15 years agoAdd test code to see if runfolder can handle something that looks like a
Diane Trout [Wed, 3 Dec 2008 22:24:29 +0000 (22:24 +0000)]
Add test code to see if runfolder can handle something that looks like a
paired end run.

15 years agoAdd code to create a paired end Summary.htm file
Diane Trout [Wed, 3 Dec 2008 22:22:31 +0000 (22:22 +0000)]
Add code to create a paired end Summary.htm file

15 years agoStore the bustard pathname when searching for run folders
Diane Trout [Wed, 3 Dec 2008 22:21:16 +0000 (22:21 +0000)]
Store the bustard pathname when searching for run folders
This was needed so the srf file can use the same runfolder scanning
code as the --extract-results feature.

15 years agoUse the get_runs from htsworkflow.pipelines.runfolder
Diane Trout [Fri, 21 Nov 2008 01:15:27 +0000 (01:15 +0000)]
Use the get_runs from htsworkflow.pipelines.runfolder
On the plus side this means it'll handle IPAR files, on the downside
it means that the srf program will crash if there's something wrong with
the summary.htm file or if there's an ipar directory that doesn't have
a run in it.
(I really need to add some code to get_runs to skip over IPAR directories that
are being ignored.)

15 years agoForgot to change a import htsworkflow.pipeline to htsworflow.pipelines.
Diane Trout [Fri, 14 Nov 2008 19:04:59 +0000 (19:04 +0000)]
Forgot to change a import htsworkflow.pipeline to htsworflow.pipelines.

15 years agoUpdated ipar_100 test case to deal with the using U0/1/2 vs R0/1/2
Diane Trout [Thu, 6 Nov 2008 22:49:40 +0000 (22:49 +0000)]
Updated ipar_100 test case to deal with the using U0/1/2 vs R0/1/2
(my first implementation was to just dump all of the multi reads into
U0/1/2)

15 years agoProcess eland extended (or multi) read files.
Diane Trout [Thu, 6 Nov 2008 22:39:24 +0000 (22:39 +0000)]
Process eland extended (or multi) read files.

This also updates the report tools to be compatible with 1.0.
For multi reads I mapped 0/1/2 mismatch reads to U0/U1/U2 if the number of
reads equaled 1 (for each category seperatly) and I mapped reads >1 and < 255
to R0/R1/R2.

Unfortunately 1.1rc1 changed the summary file, so this patch is not
compatible with it yet.

15 years agoThe htsworkflow.pipelines.gerald module was getting to large
Diane Trout [Thu, 30 Oct 2008 22:28:01 +0000 (22:28 +0000)]
The htsworkflow.pipelines.gerald module was getting to large
so I broke the portion that analyzed the Summary.htm file and
the eland_result files into seperate modules in anticipation
of extending the eland code to handle some of the newer eland
result file types.

15 years agoAdd support for scanning for results in the IPAR directory.
Diane Trout [Thu, 30 Oct 2008 22:03:12 +0000 (22:03 +0000)]
Add support for scanning for results in the IPAR directory.

The field that was the firecrest class in PipelineRun is now the
"image_analysis" field and can be either firecrest or ipar.

I also extracted some of the common functions out of the runfolder test
modules and added them to a seperate "simulate_runfolder" module.

15 years agoAdd "_slow" to the end of the queuecommand test functions
Diane Trout [Thu, 30 Oct 2008 21:59:56 +0000 (21:59 +0000)]
Add "_slow" to the end of the queuecommand test functions
this allows "nosetests --exclude=slow" to skip them.

15 years agoupdate setup.py for some package renames and some missing scripts
Diane Trout [Tue, 28 Oct 2008 21:25:00 +0000 (21:25 +0000)]
update setup.py for some package renames and some missing scripts

15 years agoMerge in new modules from htsworkflow branch.
Diane Trout [Tue, 21 Oct 2008 19:44:25 +0000 (19:44 +0000)]
Merge in new modules from htsworkflow branch.

However I renamed things to simpler names.

analys_track -> analysis
exp_track -> experiments
fctracker -> samples
htsw_reports -> reports

As a result this check in probably wont work as I haven't finished
updating all the imports

15 years agoMerge in model changes to fctracker from htsworkflow branch
Diane Trout [Tue, 21 Oct 2008 19:39:50 +0000 (19:39 +0000)]
Merge in model changes to fctracker from htsworkflow branch

15 years agoupdate scripts for the pipeline to pipelines module rename
Diane Trout [Tue, 21 Oct 2008 19:02:49 +0000 (19:02 +0000)]
update scripts for the pipeline to pipelines module rename

15 years agorename pipeline to pipelines to imply that we can process more than just illumina.
Diane Trout [Wed, 15 Oct 2008 19:49:34 +0000 (19:49 +0000)]
rename pipeline to pipelines to imply that we can process more than just illumina.

15 years agoRename trunk from gaworkflow to htsworkflow (and update all of the imports)
Diane Trout [Wed, 15 Oct 2008 18:59:34 +0000 (18:59 +0000)]
Rename trunk from gaworkflow to htsworkflow (and update all of the imports)
Fix the queuecommands test script to deal with the 1 sec delay hack

15 years agosolexa2srf likes to produce output, so my trick of watching the
Diane Trout [Thu, 25 Sep 2008 00:04:19 +0000 (00:04 +0000)]
solexa2srf likes to produce output, so my trick of watching the
sockets to block until when the process ends didn't work.

This patch inserts a simple sleep(1) (second) into the code that
waits for the jobs to finish to prevent the queue manager from rapidly
spinning.

It should probably be fixed with a better way of monitoring for when
a process finishes

15 years agouse _ for field seperator in srf file names. (Using a uniform seperator
Diane Trout [Thu, 25 Sep 2008 00:02:25 +0000 (00:02 +0000)]
use _ for field seperator in srf file names. (Using a uniform seperator
makes it easier to process the files later. Not to mention avoiding
characters that are "special" like : is a good idea for multi-platform
compatibility)

15 years agoUse queuecommands.run not queuecommands.start_job to actually
Diane Trout [Thu, 18 Sep 2008 23:28:58 +0000 (23:28 +0000)]
Use queuecommands.run not queuecommands.start_job to actually
wait to launch additional processes

15 years agoBe a little more informative about how many process are left to run
Diane Trout [Thu, 18 Sep 2008 22:53:26 +0000 (22:53 +0000)]
Be a little more informative about how many process are left to run
and what the exit code was in queuecommands.py

15 years agoextract status field out of flowcell name.
Diane Trout [Fri, 5 Sep 2008 21:56:38 +0000 (21:56 +0000)]
extract status field out of flowcell name.

For gaworkflow we abused the schema and stored the flow cell status
in the flow cell name field, this patch updates my sqlite interface
to the fctracker db to split that field.

15 years agoAdd support for converting mutli-eland files from pipeline 0.3 to
Diane Trout [Fri, 29 Aug 2008 16:51:24 +0000 (16:51 +0000)]
Add support for converting mutli-eland files from pipeline 0.3 to
bedfiles

15 years agoinsert stub clean_runs function to list roughly what I think I can
Diane Trout [Fri, 29 Aug 2008 16:51:23 +0000 (16:51 +0000)]
insert stub clean_runs function to list roughly what I think I can
delete before compressing the runfolder

15 years agoImprove code to extract runfolder name from the path to the runfolder.
Diane Trout [Fri, 15 Aug 2008 22:46:46 +0000 (22:46 +0000)]
Improve code to extract runfolder name from the path to the runfolder.

This version will actually convert relative paths into an absolute path
before extracting the runfolder name, as well as grabbing the right name
if there's a trailing /

15 years agoIn trying to get scripts/srf to work I needed to set subprocess.Popen to
Diane Trout [Thu, 14 Aug 2008 20:58:15 +0000 (20:58 +0000)]
In trying to get scripts/srf to work I needed to set subprocess.Popen to
shell=True, the end result of that is that at least on linux hosts
passing in a list of arguments to Popen doesn't work very well, Popen
needs a string.

Perhaps a better solution would be for queuecommand to take a
shell parameter and if that's true do the joining into a string.

but for the moment I just converted my test case to pass a string
instead of a list.

15 years agorefactor code to make a runfolder out of the UnitTest class.
Diane Trout [Thu, 14 Aug 2008 20:58:15 +0000 (20:58 +0000)]
refactor code to make a runfolder out of the UnitTest class.
I did it so I could more easily make a mini-runfolder for developing
code that needed to scan the runfolder.

15 years agowe might as well automatically save the Summary.htm file as well
Diane Trout [Thu, 14 Aug 2008 00:09:39 +0000 (00:09 +0000)]
we might as well automatically save the Summary.htm file as well

15 years agoUtility to create srf files from a bustard directory
Diane Trout [Thu, 14 Aug 2008 00:09:09 +0000 (00:09 +0000)]
Utility to create srf files from a bustard directory

this version works, as long as you launch it in the bustard directory
in question. There seems to be some messiness in the interaction between
how the list of arguments passed to Popen with shell=True has any file globs
expanded.

I had to switch from passing a list of arguments to Popen to string,
and I'm still not sure if any of the code to try and change the directory
to the bustard directory actually worked correctly.

(which is why it only works when launching from the bustard directory)

15 years agoA bit of refactoring toward making the run progress report code work
Diane Trout [Mon, 11 Aug 2008 23:22:15 +0000 (23:22 +0000)]
A bit of refactoring toward making the run progress report code work
by walking the directory instead of just watching via pyinotify.

mostly this was move where the report formatting code was stored to
someplace a little more shared, and by moving the thread that watches
the directory tree.

15 years agoThe older pipeline runs had a Phi-X control lane which we didn't
Diane Trout [Wed, 16 Jul 2008 00:46:45 +0000 (00:46 +0000)]
The older pipeline runs had a Phi-X control lane which we didn't
run eland against, so the total number of eland entries in
the GERALD config.xml file was less than 8. So relax testing
that constraint.

15 years agoProvide cross referencing information to the libraries to help find
Diane Trout [Tue, 15 Jul 2008 00:26:48 +0000 (00:26 +0000)]
Provide cross referencing information to the libraries to help find
which lanes provide supporting information

15 years agoFinish updating the Summary parsing file to handle the new 0.3 format
Diane Trout [Mon, 7 Jul 2008 22:19:51 +0000 (22:19 +0000)]
Finish updating the Summary parsing file to handle the new 0.3 format
in addition I split test_runfolder into one that tests 0.2.6 files and
one that tests 0.3 files.

15 years agoPartially handle the changed Summary.htm file from the 0.3 version of the
Diane Trout [Thu, 3 Jul 2008 00:16:50 +0000 (00:16 +0000)]
Partially handle the changed Summary.htm file from the 0.3 version of the
GAPipeline.

This update is incomplete as I'm pretty sure the xml serialization code
for the run xml file will break. However it does generate the summary
report for both the old summary file and the new post 0.3 file.

I also need to add unit tests for parsing and serializing the 0.3
file format.

15 years agoDetect if our watch is on a mount point.
Diane Trout [Tue, 24 Jun 2008 00:36:17 +0000 (00:36 +0000)]
Detect if our watch is on a mount point.

If we're on something that is unmounted, keep watching until there's a
new mount. Once something has been remounted, restart the watch.

15 years agoAdd QueueCommands, a class that allows controlling how many
Diane Trout [Tue, 17 Jun 2008 00:25:03 +0000 (00:25 +0000)]
Add QueueCommands, a class that allows controlling how many
processes to run simultaniously.

I still need to and a driver script to handle getting jobs from the
user.

It's mostly in so I can control launching the solexa2srf commands for
submitting stuff to the SRA.

15 years agodon't use os.path.normpath when pathname is null in PipelineRun
Diane Trout [Fri, 6 Jun 2008 21:02:36 +0000 (21:02 +0000)]
don't use os.path.normpath when pathname is null in PipelineRun

15 years agoadd --extract-results to scripts/runfolder
Diane Trout [Thu, 5 Jun 2008 22:24:22 +0000 (22:24 +0000)]
add --extract-results to scripts/runfolder
this will build a directory tree with <flowcellID>/<cycle count>/
with the various eland result files, run_*.xml files, etc.

15 years agoSome of the older flow cells used a default genome for eland instead
Diane Trout [Thu, 29 May 2008 00:01:50 +0000 (00:01 +0000)]
Some of the older flow cells used a default genome for eland instead
of specifying the genome path for each lane.

This patch will look up in the chipwidedefaults for the eland_genome if
it isn't found in the lane specific parameters

15 years agoCompute all the details needed to create our 25bp rerun given just
Diane Trout [Wed, 28 May 2008 00:42:19 +0000 (00:42 +0000)]
Compute all the details needed to create our 25bp rerun given just
a runfolder.
(This assumes more than the --gerald/-o version that I first
implemented, which is still available).

Now you can give rerun_eland a runfolder name, and it will (if there's
only 1 run found by pipeline.runfolder) extract the bases from that
into a new Data/C1-<length+1> directory and should launch eland.

15 years agoignore more *.py[co~] files in some of our test directories
Diane Trout [Tue, 27 May 2008 22:52:13 +0000 (22:52 +0000)]
ignore more *.py[co~] files in some of our test directories

15 years agoadd --run-xml to runfolder so you can generate summary reports from a
Diane Trout [Fri, 23 May 2008 21:37:05 +0000 (21:37 +0000)]
add --run-xml to runfolder so you can generate summary reports from a
previously analyzed runfolder

15 years agoUpdate pipeline.gerald to handle eland_result files that have been bzipped.
Diane Trout [Fri, 23 May 2008 21:33:07 +0000 (21:33 +0000)]
Update pipeline.gerald to handle eland_result files that have been bzipped.
Also I added my opener module which will try to guess the right
compression utility for a file.

15 years agoBegining of consolidation with trunk/stanford variatants of the database.
Brandon King [Mon, 19 May 2008 22:49:09 +0000 (22:49 +0000)]
Begining of consolidation with trunk/stanford variatants of the database.

15 years agoadd rerun_eland.py which extracts sub-sequences from eland files and runs
Diane Trout [Thu, 15 May 2008 00:32:47 +0000 (00:32 +0000)]
add rerun_eland.py which extracts sub-sequences from eland files and runs
eland on them with a new sequence length.

The script also helpfully uses the gerald config file to figure out the
correct genome path.

15 years agoseparate computing the sample/lane_id names from calculating read counts
Diane Trout [Wed, 14 May 2008 23:00:47 +0000 (23:00 +0000)]
separate computing the sample/lane_id names from calculating read counts

the read count computation takes a long time, and if we just want to
quickly access some information from the gerald directory it was really
annoying to wait for it to finish.

15 years agov0.2.0 progress
Brandon King [Wed, 14 May 2008 00:01:27 +0000 (00:01 +0000)]
v0.2.0 progress
 * Commented out eland_result table as it is not being used by either site and Stanford has implemented something that is probably more useful, so we will like import that.
 * Person has been renamed to UserProfile and has been intergrated with the user profiles feature of Django (http://www.djangobook.com/en/1.0/chapter12/#cn222), which allows you to get access to the "profile" information by using user.get_profile().
 * Added Lab which just contains a name... This will be used to implement user/lab level access to Flowcell/Library information.

15 years agoadd additional debugging logging to retrieve_config and configure_pipeline
Diane Trout [Tue, 13 May 2008 16:36:55 +0000 (16:36 +0000)]
add additional debugging logging to retrieve_config and configure_pipeline
to help figure out why it was failing. (which turned out to originally be
because of user error)

15 years agologging.basicConfig should only be in top level scripts.
Diane Trout [Tue, 13 May 2008 16:17:11 +0000 (16:17 +0000)]
logging.basicConfig should only be in top level scripts.
using basicConfig in a module causes problems because it's likely
to override the users logging.basicConfig. (from some other
top level script that's using logging correctly)

15 years agomake it possible to include all alignments, not just the ones that match
Diane Trout [Sat, 10 May 2008 04:32:25 +0000 (04:32 +0000)]
make it possible to include all alignments, not just the ones that match
chromosomes.

15 years agomakebed is a script too
Diane Trout [Sat, 10 May 2008 00:18:41 +0000 (00:18 +0000)]
makebed is a script too

15 years agoKeep track of sample_name and lane_id computed from the eland
Diane Trout [Sat, 10 May 2008 00:18:24 +0000 (00:18 +0000)]
Keep track of sample_name and lane_id computed from the eland
filename.

Perhaps I should have more code checking to make sure its of the form
s_?_eland_result.txt

15 years agoMake the runfolder splitting patch a bit more python 2.4 compatible
Diane Trout [Fri, 9 May 2008 03:51:30 +0000 (03:51 +0000)]
Make the runfolder splitting patch a bit more python 2.4 compatible
Python2.4 doesn't have datetime.strptime, nor does it have
a built in copy of ElementTree in the xml.etree namespace,

15 years agoupdate the setup.py file to the new name for the runfolder script
Diane Trout [Fri, 9 May 2008 03:22:39 +0000 (03:22 +0000)]
update the setup.py file to the new name for the runfolder script

15 years agoMove runfolder analysis classes out of scripts/runfolder.py into seperate files
Diane Trout [Fri, 9 May 2008 03:21:16 +0000 (03:21 +0000)]
Move runfolder analysis classes out of scripts/runfolder.py into seperate files
Also rename runfolder.py to runfolder

This was a really annoying patch to make, I wanted to do two major things,
be able to construct the runfolder configuration extracting classes
from the xml file I was creating, and to make unit tests to make sure
all the code was at least somewhat correct.

Writing all of the xml serialization code was really annoying and dull,
there was probably some nifty metaprogramming way of solving it, but
I didn't feel like figuring it out, as I really need to move on to
more important parts of the project.

I wanted to rename runfolder.py to runfolder as the solexa pipeline code
has a runfolder.py (and if anyone has a better name for the script that's
supposed to dump the runfolder xml file, let me know).

Also in working on the xml serialization code, I extended the serialization
for the eland files, this version now dumps the genome_map and the
eland statistics, like reads, match counts and the like. It does
mean that the --archive mode will take longer, but it also means
I'll have enough information to generate the run statistics later.

Now I might have to redo this if we figure out if we should be handling
the realign files instead.

16 years agoAdd a script that takes a set of eland_result files and makes bedfiles
Diane Trout [Thu, 24 Apr 2008 00:25:42 +0000 (00:25 +0000)]
Add a script that takes a set of eland_result files and makes bedfiles
it'll also look up the lane descriptions in the flowcell database

16 years agoReport cluster results with the rest of the lane summary information.
Diane Trout [Thu, 24 Apr 2008 00:24:26 +0000 (00:24 +0000)]
Report cluster results with the rest of the lane summary information.
this involved breaking names like "s_1" into their sample and lane identifiers
and then exclusively using the lane identifiers.

One complexity is that I still had to treat the lane IDs as keys into
a dictionary instead of offsets into a list, because the lanes
were labeled in the range 1..8, but python's list indexes would have
been 0..7.

I also changed the report code to return a string instead of printing
stuff to stdout, to make it easier for me to integrate it into code
to email the summary report.

16 years agoalso since nothing is currently using the pipelineFinished message
Diane Trout [Tue, 22 Apr 2008 21:58:04 +0000 (21:58 +0000)]
also since nothing is currently using the pipelineFinished message
from runner remove it

16 years agooops forgot to remove some debugging statements from the previous patch
Diane Trout [Tue, 22 Apr 2008 00:26:58 +0000 (00:26 +0000)]
oops forgot to remove some debugging statements from the previous patch

16 years agoExtend makebed to lookup metadata out of a copy of the fctracker database
Diane Trout [Tue, 22 Apr 2008 00:02:16 +0000 (00:02 +0000)]
Extend makebed to lookup metadata out of a copy of the fctracker database

16 years agosplit the library script into a reusable database/reporting layer
Diane Trout [Mon, 21 Apr 2008 23:34:51 +0000 (23:34 +0000)]
split the library script into a reusable database/reporting layer
and command line script.

16 years agoIgnore binary files generated by python (*.py[co])
Diane Trout [Mon, 21 Apr 2008 22:21:32 +0000 (22:21 +0000)]
Ignore binary files generated by python (*.py[co])

16 years agoAdded the same changes I made to the 1.1 branch, all display-related
Lorian Schaeffer [Sat, 19 Apr 2008 00:13:11 +0000 (00:13 +0000)]
Added the same changes I made to the 1.1 branch, all display-related

16 years agoAdd library.py, a script to extract a useful description of a flowcell
Diane Trout [Wed, 9 Apr 2008 20:53:35 +0000 (20:53 +0000)]
Add library.py, a script to extract a useful description of a flowcell
from the gaworkflow database.
Basically it nicely aggregates the library descritpion with the flow
cell lanes.

16 years agoDon't die if Config/FlowcellId.xml is missing warn the user and continue
Diane Trout [Tue, 1 Apr 2008 21:30:40 +0000 (21:30 +0000)]
Don't die if Config/FlowcellId.xml is missing warn the user and continue

16 years agoreport statistics on the various eland_result sequence status codes
Diane Trout [Tue, 1 Apr 2008 00:18:07 +0000 (00:18 +0000)]
report statistics on the various eland_result sequence status codes
(E.g. how many times does QC/NM/U[012]/R[012] show up)

16 years agoDon't die if Summary.htm isn't present
Diane Trout [Fri, 28 Mar 2008 23:32:12 +0000 (23:32 +0000)]
Don't die if Summary.htm isn't present

16 years agorename the summary report to summary_report to distingush it from Summary
Diane Trout [Fri, 28 Mar 2008 22:37:13 +0000 (22:37 +0000)]
rename the summary report to summary_report to distingush it from Summary
also moved the summarize_mapped_reads from the summary_report function to
the top level of the script

16 years agomake the mapped reads summary report more robust
Diane Trout [Fri, 28 Mar 2008 22:12:19 +0000 (22:12 +0000)]
make the mapped reads summary report more robust
It's really useful for the summary report to report everything that
mapped to the genome as a single entry, and everything that mapped
to the contamination or spike-ins as seperate entries.

This version tags files that aren't symlined as 'last_dir_element/filename'
and files that are symlinked into the genome directory as 'filename'.
Since I have the spike-ins symlinked in, and the last element of the path
name is the genome name it makes it easy to use the 'last_dir_element'
as the name to group all the per chromosome reads to.

16 years agoAdd runfolder.py to the list of scripts
Diane Trout [Thu, 27 Mar 2008 01:13:15 +0000 (01:13 +0000)]
Add runfolder.py to the list of scripts

16 years agosort the sample mapped reads output
Diane Trout [Wed, 26 Mar 2008 23:11:50 +0000 (23:11 +0000)]
sort the sample mapped reads output
also use the fasta name from the eland_result file even if there isn't a
corresponding file on disk any more.

16 years agoAdd documentation about what runfolder.py does
Diane Trout [Tue, 25 Mar 2008 21:54:03 +0000 (21:54 +0000)]
Add documentation about what runfolder.py does

16 years agoPut ten_nM_dilution field back in library table, and added the cluster_estimate field...
Lorian Schaeffer [Tue, 25 Mar 2008 18:31:09 +0000 (18:31 +0000)]
Put ten_nM_dilution field back in library table, and added the cluster_estimate fields back to the flowcell view.

16 years agoStore the firecrest/bustard/gerald tuple in a new class PipelineRun
Diane Trout [Sat, 22 Mar 2008 00:58:36 +0000 (00:58 +0000)]
Store the firecrest/bustard/gerald tuple in a new class PipelineRun
This was because I found the <runfolder>/Config directory which contained
a useful file containing the real flowcell id.

I still need to get runfolder to copy the "important" files and to write
the output xml file somewhere other than the current directory.

16 years agoSummarize information from the runfolder.
Diane Trout [Fri, 21 Mar 2008 23:09:09 +0000 (23:09 +0000)]
Summarize information from the runfolder.

This is the start of a tool for archiving the "important" parts of the run
folder, or providing some summary information.

16 years agoHopefully undid accidental changes to views.py and urls.py
Lorian Schaeffer [Fri, 21 Mar 2008 21:04:41 +0000 (21:04 +0000)]
Hopefully undid accidental changes to views.py and urls.py

16 years agoAdded a dropdown status field to flowcell table
Lorian Schaeffer [Fri, 21 Mar 2008 20:54:19 +0000 (20:54 +0000)]
Added a dropdown status field to flowcell table

16 years agoCreated table Person, with required fields "name" and "lab", and optional field ...
Lorian Schaeffer [Thu, 20 Mar 2008 19:17:21 +0000 (19:17 +0000)]
Created table Person, with required fields "name" and "lab", and optional field "email"
Changed "made_for" in table Library to choose from contents of table Person

16 years ago Flowcell patch:
Lorian Schaeffer [Thu, 20 Mar 2008 18:46:22 +0000 (18:46 +0000)]
Flowcell patch:
Table:
Added four "kit_#" fields to hold the lot #s for each kit
Added "cluster_station_id" field
Added "sequencer_id" field
Changed "lane_x_pM" to decimal instead of int
Changed "lane_x_cluster_estimate" to char instead of int

Display:
Search now works! Correct format for searching ForeignKeys is (name of ForeignKey in current model)__(name of linked field you want to search in ForeignKey's home model)
Set "save_as" option to be true
Reversed sort order for admin grid
Added flowcell_id to search function
Changed display of fields in detailed admin view
Removed "lane_x_cluster_estimate" from all views. Didn't remove the fields themselves in case the other group was planning to use them.
Changed order of fields in admin grid view
Made all fields in admin grid view link to correct page

16 years agoRemoved fields kit_#_lot from library table (they were supposed to go in flowcell...
Lorian Schaeffer [Thu, 20 Mar 2008 17:38:12 +0000 (17:38 +0000)]
Removed fields kit_#_lot from library table (they were supposed to go in flowcell table), changed display of undiluted_concentration to include units.

16 years ago Library patch:
Lorian Schaeffer [Wed, 19 Mar 2008 23:57:35 +0000 (23:57 +0000)]
Library patch:
Library table:
removed "ten_nM_dilution" field from table
added integer field "library_size" to table
added four integer fields "kit_[#]_lot" to table
changed library_id to be text instead of int
changed successful_pM to decimal instead of int
added option "Completed" to the PROTOCOL_END_POINTS set of options

Library display:
changed which fields are displayed on library admin grid
added "library_id" field to search
changed the display of fields in the detailed admin page

16 years agoconvert a single-match eland result file into a bed file readable by UCSC
Diane Trout [Thu, 6 Mar 2008 22:58:34 +0000 (22:58 +0000)]
convert a single-match eland result file into a bed file readable by UCSC

16 years agoAdd script to extract some subset of sequence from an eland result file.
Diane Trout [Thu, 6 Mar 2008 22:57:39 +0000 (22:57 +0000)]
Add script to extract some subset of sequence from an eland result file.