Diane Trout [Tue, 31 Mar 2009 01:29:12 +0000 (01:29 +0000)]
remove some useless debugging print statements
Diane Trout [Tue, 31 Mar 2009 01:28:14 +0000 (01:28 +0000)]
provide an error message if no run is found with the --use-run option
Diane Trout [Mon, 30 Mar 2009 23:14:26 +0000 (23:14 +0000)]
Add unit test to check that the crosstalk matrix was being saved and reloaded
Aparently if your unit test fails to check something there's no guarantee
that it happens.
Also actually add an implementation for saving the matrix.
Diane Trout [Mon, 30 Mar 2009 22:57:41 +0000 (22:57 +0000)]
IPAR detection is more reliable than firecrest so do it first, and then
check for firecrest second
Diane Trout [Mon, 30 Mar 2009 18:53:49 +0000 (18:53 +0000)]
Allow specifying a run instead of just scanning the runfolder for it.
Also rework some of the command line options to group the commands
together, and to print helpful error messages when runfolder is being used
incorrectly.
Diane Trout [Mon, 30 Mar 2009 18:52:26 +0000 (18:52 +0000)]
Add some logging code to report what the program is doing
Diane Trout [Sat, 28 Mar 2009 02:17:22 +0000 (02:17 +0000)]
For pipeline 1.1rc1 or 1.3.2, look for the matrix files in the bustard dir
also if the bustard config.xml file is present, check to see if the
matrix file was forced in there.
Diane Trout [Sat, 28 Mar 2009 00:59:59 +0000 (00:59 +0000)]
Move sample data from simulate_runfolder into testdata
Also I changed make_matrix to take a filename instead of a matrix directory
as in pipeline > 1.1rc1 they started writing the matrix into the bustard
directory instead of as a subdirectory of the firecrest directory.
Diane Trout [Wed, 25 Mar 2009 19:12:43 +0000 (19:12 +0000)]
Make the visiable names match how the "group" and "contact" names were being used.
So "Name" is the person submitting the sample
ans "contact" is now called "lab name" and mostly contains the PI or Lab manager name.
Diane Trout [Fri, 20 Mar 2009 00:21:27 +0000 (00:21 +0000)]
Consume output form the subprocesses
one of the times I was building the srf files the illumina2srf programs
stopped while the files were still incomplete, but with no CPU time.
It appears that the output from the child processes reached a point
where linux decided to block the process. This patch reads the output
from illumina2srf and if you run it with debug mode on it'll log it
otherwise it just ignores it.
Diane Trout [Wed, 18 Mar 2009 19:24:41 +0000 (19:24 +0000)]
Actually we want fastq files, not scarf files
Diane Trout [Wed, 18 Mar 2009 19:20:49 +0000 (19:20 +0000)]
Generate a gerald config file, even if the genome is unknown
as it makes it easier to edit the wrong config file and manually launch
an analysis.
Diane Trout [Wed, 18 Mar 2009 18:35:49 +0000 (18:35 +0000)]
provide more options for picking how much logging info to spew
Diane Trout [Wed, 18 Mar 2009 18:35:15 +0000 (18:35 +0000)]
only report cwd when starting the queue monitor, not on every start job loop
Diane Trout [Wed, 18 Mar 2009 18:34:09 +0000 (18:34 +0000)]
ignore *.py[co] files
Diane Trout [Mon, 16 Mar 2009 22:49:11 +0000 (22:49 +0000)]
Parse runfolders generated with IPAR 1.3 and pipeline 1.3.2
I'm still parsing the Summary.htm file, though it appears they added
an xml file with the same information in it.
Also the s_matrix.txt file seems to have gone away.
This also adds a full Summary.htm into pipelines/test/testdata
Diane Trout [Thu, 12 Mar 2009 23:56:57 +0000 (23:56 +0000)]
Use django's FilteredSelectMultiple javascript widget for picking affiliations
Diane Trout [Thu, 12 Mar 2009 00:54:07 +0000 (00:54 +0000)]
allow searching by cluster station or sequencer name on the flowcell admin page
Diane Trout [Tue, 10 Mar 2009 19:20:11 +0000 (19:20 +0000)]
We can't search linked tables. This is a problem that needs fixing.
Diane Trout [Tue, 10 Mar 2009 01:17:13 +0000 (01:17 +0000)]
The public library page needed to pass eland.genome_map to summarize_mapped_reads
Diane Trout [Tue, 10 Mar 2009 01:11:22 +0000 (01:11 +0000)]
Improve support for eland searching a single fasta containing multiple records.
the problem was that I was assuming / was a path seperator between genome
directory name and chromosome, but eland was also reporting it as
fasta file name / fasta record.
By happy accident in genome map, the fasta file with multiple records would
be stored in the GenomeMap dictionary as having the same name, value pair
while things that had the genome encoded would be fasta filename mapping to
genome/fasta filename.
as a result it appears that splitting a mapped item on the path seperator /
and then looking the "base path" up in the genome map will allow me to
determine if an element is a genome directory "path" or a multi record
fasta file by its absence (for genome dirs) or presence (for multi
fasta records)
Diane Trout [Tue, 10 Mar 2009 00:29:04 +0000 (00:29 +0000)]
Use gigio's fully name instead of nickname
Diane Trout [Tue, 10 Mar 2009 00:08:33 +0000 (00:08 +0000)]
Let cluster_station and sequencer default to the first entry in linked table
Diane Trout [Mon, 9 Mar 2009 23:36:55 +0000 (23:36 +0000)]
replace % with %% in the library names, so retrieve config's % expansion works.
Diane Trout [Mon, 9 Mar 2009 22:05:54 +0000 (22:05 +0000)]
Change flowcell admin index page to include a formatted cluster estimate
Diane Trout [Mon, 9 Mar 2009 19:28:10 +0000 (19:28 +0000)]
Change default picomolarity to 5, and allow overriding the default
via the htsworkflow ini file.
closes [ticket:120] and [ticket:83]
Diane Trout [Mon, 9 Mar 2009 19:25:21 +0000 (19:25 +0000)]
Add commented htsworkflow.ini.example as documentation for new settings
like default_pm.
Diane Trout [Mon, 9 Mar 2009 18:39:02 +0000 (18:39 +0000)]
Sort by library id, not creation data closes [ticket:116]
I left the sort by creation date in there commented out, in case that's how
hudson/alpha wants to do it.
Diane Trout [Mon, 9 Mar 2009 17:19:40 +0000 (17:19 +0000)]
Add flowcell notes field back to the admin page
Diane Trout [Sat, 7 Mar 2009 06:47:18 +0000 (06:47 +0000)]
flowcell model moved from samples to experiments, eland_config needed to know
(and I forgot to tell it)
Diane Trout [Sat, 7 Mar 2009 01:47:38 +0000 (01:47 +0000)]
The default django css pages wanted a few gifs, which this patch adds
Diane Trout [Sat, 7 Mar 2009 01:43:04 +0000 (01:43 +0000)]
Grab more of the admin css pages that were refering to each other
and rename the location for the images they were looking for from
img/admin to img/ (since I expect we'll modify these templates some
from django's defaults)
Diane Trout [Sat, 7 Mar 2009 00:58:48 +0000 (00:58 +0000)]
Display the pM on the public library detail page.
Also I was getting tired of our ugly pages so I ripped large chunks of the
django css and more of their templates to make our library index page look
nicer.
I started working on the public library detail page but that started
involving too many alterations to the code for friday.
Diane Trout [Sat, 7 Mar 2009 00:55:28 +0000 (00:55 +0000)]
Allow filtering by the new hidden field on the library admin page
Diane Trout [Sat, 7 Mar 2009 00:54:50 +0000 (00:54 +0000)]
Lorian asked that the library size default to 225 when porting from our old database
Diane Trout [Fri, 6 Mar 2009 02:09:26 +0000 (02:09 +0000)]
Make the public library browsing page support several features from
django admin pages.
* Search bar
* Pagination
* Filters
This took grabbing the "ChangeSet" class from django.contrib.admin and
slightly modifying, in addition to the css files.
To provide the css files I had to add the ability to serve static
files from the app level. I followed the django pattern that the
internal static pages would only be served by django when debug is true.
And because it's hard to split it out, I also added a new field
'hidden' to the library table. This allows hiding libraries from
the public library page (which is useful since a number of libraries
have a gel isolate pair, which many end users find rather confusing).
Diane Trout [Fri, 6 Mar 2009 02:04:08 +0000 (02:04 +0000)]
Add another username to manage for our made_for to affiliation script
Diane Trout [Wed, 4 Mar 2009 20:10:35 +0000 (20:10 +0000)]
Add get_absolute_url to Library model and use it for linking from both the
index and detail view to the public detail page.
Diane Trout [Wed, 4 Mar 2009 02:26:05 +0000 (02:26 +0000)]
Implement experiment type as a seperate table.
closes [ticket:107]
also initialize the experiment type table with the conversion script
Diane Trout [Wed, 4 Mar 2009 01:17:20 +0000 (01:17 +0000)]
Change cluster_mac_id and seq_mac_id from text fields holding
the contents of a drop down box to be seperate tables whose table names
are a bit closer to the official illumina names.
clust_mac_id -> cluster_station
seq_mac_id -> sequencer
Diane Trout [Wed, 4 Mar 2009 01:13:55 +0000 (01:13 +0000)]
For one of our more recent libraries, the successful_pM had more digits
that was allowed, so I bumped up the setting.
Also I changed the u in ul to \00b5l (so in most browsers it will
render as a micro)
Diane Trout [Sat, 28 Feb 2009 01:05:28 +0000 (01:05 +0000)]
Add a link to the public library summary page off from the libray admin list
Diane Trout [Fri, 27 Feb 2009 22:15:09 +0000 (22:15 +0000)]
Add my script to convert Caltech's made_for field to htsworkflow 0.2's affiliation field.
This could be used as an example for other sites.
Diane Trout [Fri, 27 Feb 2009 22:04:32 +0000 (22:04 +0000)]
Update some of Rami's GERALD config file generator to Django 1.0 API
and get the config file to show up in a browser by setting the mime/type
Diane Trout [Fri, 27 Feb 2009 01:01:02 +0000 (01:01 +0000)]
Only configure logging if we're being run as a script
Diane Trout [Thu, 26 Feb 2009 05:21:52 +0000 (05:21 +0000)]
better handle the case when Affilations.contact is None
Diane Trout [Thu, 26 Feb 2009 03:04:14 +0000 (03:04 +0000)]
Remove a few more fields that we dont use, and change the Lanes display
on the flowcell index page.
I switched from the <div><span> that Rami was using to a <ol><li>
hierarchy, I also made the items link to the corresponding library
page.
Diane Trout [Tue, 24 Feb 2009 18:42:17 +0000 (18:42 +0000)]
simplify library admin index view
Diane Trout [Tue, 17 Feb 2009 23:45:59 +0000 (23:45 +0000)]
Add script to convert current caltech v0.1.x database to the trunk schema
Diane Trout [Tue, 17 Feb 2009 23:45:24 +0000 (23:45 +0000)]
Merge flowcell 'paired_end' flag from v0.1 branch
Then use this flag in the eland_config module to specify ANALYSIS eland_pair
Also change the default analysis to eland_extended
use the following to add the new field:
alter table fctracker_flowcell add column paired_end bool not null default false;
Diane Trout [Tue, 17 Feb 2009 22:39:59 +0000 (22:39 +0000)]
remove cfg_defaults. It was part of an idea I was starting to implement
but I didn't actually do it, so now its just causing problems
Diane Trout [Fri, 13 Feb 2009 23:48:18 +0000 (23:48 +0000)]
Reverse a mistake. DataRuns is from Rami and should use Rami's field names
Diane Trout [Fri, 13 Feb 2009 23:35:08 +0000 (23:35 +0000)]
enable sort of library view by descending library_id
Diane Trout [Fri, 13 Feb 2009 21:41:43 +0000 (21:41 +0000)]
save longer comment describing change
Diane Trout [Fri, 13 Feb 2009 01:59:42 +0000 (01:59 +0000)]
Aparently some more recent version of multi eland stopped
writing the - and just uses fewer fields. when there were too
many hits.
Diane Trout [Fri, 13 Feb 2009 01:51:58 +0000 (01:51 +0000)]
Merge in the library list, detail, and results downloading feature from
the Caltech live site.
There's several components in the frontend tree to render the pages,
in addition this adds in some helper functions in pipelines.eland
to simplify computing summary statistics for an eland lane.
I also needed to merge in a generator based makebed code for
returning the files to the django database.
To use this, the settings file in this branch will need a variable
RESULT_HOME_DIR to be set.
Diane Trout [Fri, 13 Feb 2009 01:42:06 +0000 (01:42 +0000)]
use the compression handling auto-opener for our eland files
Diane Trout [Thu, 12 Feb 2009 22:38:09 +0000 (22:38 +0000)]
make our API docstrings more epydoc friendly
Diane Trout [Thu, 12 Feb 2009 22:37:08 +0000 (22:37 +0000)]
Add load_pipeline_run_xml, a little function that feeds the xml file into
ElementTree and grabs the useful root
Diane Trout [Thu, 5 Feb 2009 00:06:39 +0000 (00:06 +0000)]
Drop 'using %s as cwd' down to just debug level.
It was getting to annoying watching it scroll by constantly
Diane Trout [Fri, 30 Jan 2009 20:47:35 +0000 (20:47 +0000)]
extended command line configuration parsing and add config file parsing
for finding the location of our database and sequence archive directories.
Diane Trout [Fri, 30 Jan 2009 02:15:57 +0000 (02:15 +0000)]
Try to make runfolder results extraction more robust
If an IPAR or firecrest directory is missing some of the important
matrix files that implies there isn't actually a valid run present,
this patch will then (hopefully) issue a warning and skip that analysis
run.
I also added an option to scripts/runfolder to allow a user to specify
where the extracted results should go.
One questionable thing is that for one analysis some of the lanes
were run as sequence and not an eland analysis so were I expected
all the lanes to have an eland genome, it doesn't for these.
I hope that the code doesn't lose the index after serializing and
deserializing that chunk example.
Diane Trout [Fri, 30 Jan 2009 01:51:50 +0000 (01:51 +0000)]
Update to not hard code the config file name and the error message
for when we don't find it
Diane Trout [Sat, 24 Jan 2009 00:24:18 +0000 (00:24 +0000)]
insert code to do ~ home directory expansion
Diane Trout [Fri, 23 Jan 2009 02:23:21 +0000 (02:23 +0000)]
Add in Rami's report template, and adjust the paths to use "reports" instead
of "htsw_reports"
Diane Trout [Fri, 23 Jan 2009 02:21:09 +0000 (02:21 +0000)]
Add id as an AutoNumber(primary_key=True) field and remove the pk from
library_id.
Stanford decided to use library_id as a text field so they could use
library IDs like "SL100". Caltech just used the raw sql id, so the
foreign key reference in experiments_flowcells was expecting a numeric
id, but since the model had the text field as the primary key things
didn't work.
Diane Trout [Wed, 21 Jan 2009 02:50:22 +0000 (02:50 +0000)]
Merge in Rami's changes from last friday.
Most of the admin pages work. Though there's a wsgi error with the reports.
I'll try to figure out tomorrow.
the biggest difference between caltech trunk and stanford schemas right now
is caltech is using made_for as a foreign key, and stanford is using it
as a text field.
Diane Trout [Wed, 14 Jan 2009 01:18:42 +0000 (01:18 +0000)]
add some testing code for the runner daemon
Diane Trout [Wed, 14 Jan 2009 01:17:16 +0000 (01:17 +0000)]
add empty admin.py for eland_config app
Diane Trout [Wed, 14 Jan 2009 01:12:47 +0000 (01:12 +0000)]
Merged much of the stanford htsworkflow frontend into trunk.
Updated to be compatable with Django 1.0
A big change for the 1.0 compatibility is the Admin class that was
attached to models was moved into a seperate file admin.py
I probably munged some of the fieldset formatting in the conversion process.
Diane Trout [Thu, 8 Jan 2009 20:12:03 +0000 (20:12 +0000)]
This is a partial merge of the stanford branch with the caltech branch of
the web application, it doesn't work correctly yet, the libraries admin page
is broken, and lacks the ability to browe the 'made_for' column.
This is based on a merge that started a few month ago, but I hadn't finished
the merge, I'll need to check for more updates from their branch soon.
During the process I decided it would be a good idea to update to django 1.0
which is going to make things even more unstable, so I thought I should
check this work in progess in before continuing.
Diane Trout [Tue, 6 Jan 2009 02:05:10 +0000 (02:05 +0000)]
Look in Temp directories for some of the files we have historically
used for our summary reports.
Version 1.1rc1 of the gapipeline started moving some of the files
into /Temp subdirectories of bustard and gerald.
Diane Trout [Wed, 24 Dec 2008 23:39:31 +0000 (23:39 +0000)]
Handle paired-end eland files.
This required changing the ELAND class to hold a list of dictionaries
from its previous implmentation where it was exporting an internal dictionary
of the lanes.
I decided to directly show the internal list and to remove the previous
dictionary methods to make it more obvious when code was expecting
the previous behavior.
Also a saved runfolder will now have eland files of the form
s_<lane id>_<end id>.
Internally the end is 0 or 1, I tried to make the display show 1 or 2 for
the users benefit though.
Diane Trout [Wed, 24 Dec 2008 23:34:23 +0000 (23:34 +0000)]
remove a debug print statement
Diane Trout [Wed, 24 Dec 2008 23:33:51 +0000 (23:33 +0000)]
Add test cases for alphanum sort
Diane Trout [Wed, 24 Dec 2008 23:33:14 +0000 (23:33 +0000)]
Support sorting numbers along with the alphanumeric strings
also I cleaned up the indent a bit
Diane Trout [Tue, 23 Dec 2008 02:06:27 +0000 (02:06 +0000)]
change from hand coded formatting functions to the built in python
C-style printf formatting
Diane Trout [Tue, 23 Dec 2008 02:05:35 +0000 (02:05 +0000)]
Use the right URLError attribute names for error messages
Diane Trout [Mon, 22 Dec 2008 22:50:46 +0000 (22:50 +0000)]
update make-tree-library script with new default location
Diane Trout [Mon, 22 Dec 2008 20:44:15 +0000 (20:44 +0000)]
fix the multi-eland parser to strip off extensions and not the last 3
characters of the filename.
Diane Trout [Mon, 22 Dec 2008 20:43:32 +0000 (20:43 +0000)]
clean up the logic for deciding the output filename when using stdin
as the input
Diane Trout [Fri, 19 Dec 2008 00:54:06 +0000 (00:54 +0000)]
Add command to report path to make figuring out which goat_pipeline is running
Diane Trout [Thu, 18 Dec 2008 23:43:38 +0000 (23:43 +0000)]
rename config file to something that doesn't include the read length
since that has been changing.
also a minor code clean up.
Diane Trout [Wed, 10 Dec 2008 01:00:25 +0000 (01:00 +0000)]
The summary parsing code now seems to handle paired end runs
this required changing how the lane_results were being stored,
previously it was a dictionary indexed by lane, now it is a list
of dictionaries, where the list index indicates which "end" of
a paired end run it is. (0 is the first, 1 is the second)
Also I got tired of being forced to use strings for the lane index
by element tree and modified the code so it converts the strings
required by element tree to integers for our internal dictionaries.
Diane Trout [Tue, 9 Dec 2008 01:19:23 +0000 (01:19 +0000)]
Test 1.1rc1 style runs, which unfortunately require a hack for parsing
the summary.htm files since illumina's html is invalid.
They forgot to use < when writing <=. Most web browsers will ignore
it, but ElementTree is pickier.
Also as of this commit the summary parsing code still doesn't understand
paired end runs so the paired end summary file parsing tests still fail.
Diane Trout [Wed, 3 Dec 2008 22:25:26 +0000 (22:25 +0000)]
make-library-tree is a tool to maintain caltech's version of our solexa
results archive.
Diane Trout [Wed, 3 Dec 2008 22:24:29 +0000 (22:24 +0000)]
Add test code to see if runfolder can handle something that looks like a
paired end run.
Diane Trout [Wed, 3 Dec 2008 22:22:31 +0000 (22:22 +0000)]
Add code to create a paired end Summary.htm file
Diane Trout [Wed, 3 Dec 2008 22:21:16 +0000 (22:21 +0000)]
Store the bustard pathname when searching for run folders
This was needed so the srf file can use the same runfolder scanning
code as the --extract-results feature.
Diane Trout [Fri, 21 Nov 2008 01:15:27 +0000 (01:15 +0000)]
Use the get_runs from htsworkflow.pipelines.runfolder
On the plus side this means it'll handle IPAR files, on the downside
it means that the srf program will crash if there's something wrong with
the summary.htm file or if there's an ipar directory that doesn't have
a run in it.
(I really need to add some code to get_runs to skip over IPAR directories that
are being ignored.)
Diane Trout [Fri, 14 Nov 2008 19:04:59 +0000 (19:04 +0000)]
Forgot to change a import htsworkflow.pipeline to htsworflow.pipelines.
Diane Trout [Thu, 6 Nov 2008 22:49:40 +0000 (22:49 +0000)]
Updated ipar_100 test case to deal with the using U0/1/2 vs R0/1/2
(my first implementation was to just dump all of the multi reads into
U0/1/2)
Diane Trout [Thu, 6 Nov 2008 22:39:24 +0000 (22:39 +0000)]
Process eland extended (or multi) read files.
This also updates the report tools to be compatible with 1.0.
For multi reads I mapped 0/1/2 mismatch reads to U0/U1/U2 if the number of
reads equaled 1 (for each category seperatly) and I mapped reads >1 and < 255
to R0/R1/R2.
Unfortunately 1.1rc1 changed the summary file, so this patch is not
compatible with it yet.
Diane Trout [Thu, 30 Oct 2008 22:28:01 +0000 (22:28 +0000)]
The htsworkflow.pipelines.gerald module was getting to large
so I broke the portion that analyzed the Summary.htm file and
the eland_result files into seperate modules in anticipation
of extending the eland code to handle some of the newer eland
result file types.
Diane Trout [Thu, 30 Oct 2008 22:03:12 +0000 (22:03 +0000)]
Add support for scanning for results in the IPAR directory.
The field that was the firecrest class in PipelineRun is now the
"image_analysis" field and can be either firecrest or ipar.
I also extracted some of the common functions out of the runfolder test
modules and added them to a seperate "simulate_runfolder" module.
Diane Trout [Thu, 30 Oct 2008 21:59:56 +0000 (21:59 +0000)]
Add "_slow" to the end of the queuecommand test functions
this allows "nosetests --exclude=slow" to skip them.
Diane Trout [Tue, 28 Oct 2008 21:25:00 +0000 (21:25 +0000)]
update setup.py for some package renames and some missing scripts
Diane Trout [Tue, 21 Oct 2008 19:44:25 +0000 (19:44 +0000)]
Merge in new modules from htsworkflow branch.
However I renamed things to simpler names.
analys_track -> analysis
exp_track -> experiments
fctracker -> samples
htsw_reports -> reports
As a result this check in probably wont work as I haven't finished
updating all the imports
Diane Trout [Tue, 21 Oct 2008 19:39:50 +0000 (19:39 +0000)]
Merge in model changes to fctracker from htsworkflow branch
Diane Trout [Tue, 21 Oct 2008 19:02:49 +0000 (19:02 +0000)]
update scripts for the pipeline to pipelines module rename