Diane Trout [Wed, 7 Jul 2010 00:19:37 +0000 (00:19 +0000)]
If a quality score started with an @ sign it was treated as a header
which created an invalid fastq file.
This patch fixes that, and introduces some test cases for srf2named_fastq.py
Diane Trout [Wed, 7 Jul 2010 00:19:36 +0000 (00:19 +0000)]
This still isn't ideal as the RESULTS_HOME_DIR is still out of my
home directory, but at least the use of expanduser means it'll work on
both linux and os x.
Diane Trout [Tue, 22 Jun 2010 19:07:49 +0000 (19:07 +0000)]
Save changes needed to submit to UCSC in Jun 2010.
Some of those changes include:
* modifying the list of variables to include in the ddf
* making it easier to set the MapAlgorithm.
* Return information about the condor scripts so I can make a
condor dagman script to run all the compression jobs
Perhaps some parts of this should be moved into the main
htsworkflow. I can see wanting the code to batch convert
srf/qseqs to fastq being more generally useful.
Diane Trout [Tue, 22 Jun 2010 19:07:48 +0000 (19:07 +0000)]
Fix srf2named_fastq to output the proper /2 in paired end reads
Diane Trout [Tue, 15 Jun 2010 18:47:09 +0000 (18:47 +0000)]
Update the 'flowcell started' email message to include the sequencer
in the url.
Diane Trout [Tue, 15 Jun 2010 18:47:08 +0000 (18:47 +0000)]
Make it easier to run srf2named_fastq.py
Diane Trout [Tue, 15 Jun 2010 00:18:33 +0000 (00:18 +0000)]
Wrapper script that helps convert srf files to fastq files.
It can do the following:
split the fastq into two files (for paired end reads)
add in a flowcell id to the header (for either type of read).
Diane Trout [Mon, 14 Jun 2010 21:11:32 +0000 (21:11 +0000)]
Rename avg_lib_size to gel_cut_size, and add insert_size
to clear up the confusion about what was supposed to be
being recorded in avg_lib_size.
In addition, this patch adds in a per lane status field.
Diane Trout [Mon, 14 Jun 2010 21:11:31 +0000 (21:11 +0000)]
Update internal copy of the django admin/templates/admin/index.html
They changed how the css was being imported in the django css
so when running with django 1.1 the admin index page was messed up.
This gets all the borders to show up correctly.
Diane Trout [Fri, 11 Jun 2010 00:16:19 +0000 (00:16 +0000)]
Collect fastqs by read and add them to the configuration ini file as a
single line. (As desired by UCSC).
Also the library to result map file supports a basic comment character.
if # is the _first_ character it will skip that line.
Next I should fix the avg library size / insert length variables.
Diane Trout [Fri, 11 Jun 2010 00:16:18 +0000 (00:16 +0000)]
Put partial support back in for srf files.
Since I don't know if the srf file is supposed to be single or
paired end, this version assumes paired end unless you provide the
--single option.
Currently it'll give up if you try to convert a paired srf file
to a fastq file.
Also I made the code formatting in the make_parser function
look cleaner, and changed it to allow setting the logging verbosity
via command line options --verbose/--debug.
Diane Trout [Thu, 10 Jun 2010 00:55:11 +0000 (00:55 +0000)]
Add script to try and build submissions to the UCSC encode project.
This version supports generating qseq2fastq entries using the
htsworkflow api and scanning the flowcell repository directory.
There was code to generate the ddf files (from ini files).
I need to update the ini generation code to scan the submisison directory
for fastq files and group them by read.
Diane Trout [Thu, 10 Jun 2010 00:55:10 +0000 (00:55 +0000)]
Extend htsworkflow.pipelines.sequences to also try to figure out the cycle count.
In addition there is experimental code to shove the found sequences into a
sql database.
I also needed to bug fix the sequence patterns to catch the fake flowcell
ilmn200901 which wasn't matching my regexp for detecting flowcell ids.
Diane Trout [Thu, 10 Jun 2010 00:55:09 +0000 (00:55 +0000)]
Move the code to scan the sequence file archive to its own module so
I can use it in scripts other than make-library-tree
Diane Trout [Thu, 10 Jun 2010 00:55:08 +0000 (00:55 +0000)]
new api module actually needed logging.
Diane Trout [Thu, 10 Jun 2010 00:55:06 +0000 (00:55 +0000)]
Move the knowledge of the urls for the REST API to one new file
htsworkflow.util.api and then update some of the scripts that were
using the api to import from the new module.
Yes this increases the dependencies, but it does mean it'll be
easier to update the urls if we need to change them
Diane Trout [Tue, 1 Jun 2010 19:36:31 +0000 (19:36 +0000)]
apparently commas are important
Diane Trout [Tue, 1 Jun 2010 19:35:59 +0000 (19:35 +0000)]
make-library-tree should be an installed script too
Diane Trout [Sat, 29 May 2010 01:06:19 +0000 (01:06 +0000)]
Add in extra fields lorian asked for to library detail page
Diane Trout [Sat, 29 May 2010 00:08:02 +0000 (00:08 +0000)]
Do not insert anything into the header if there is no flowcell info.
(Previously there was a spurious _)
Diane Trout [Mon, 17 May 2010 22:37:30 +0000 (22:37 +0000)]
qseq2fastq should also be installed as a script
Brandon King [Fri, 14 May 2010 22:43:22 +0000 (22:43 +0000)]
A patch that allows printing more than 11 labels at a time.
Brandon King [Thu, 13 May 2010 21:32:54 +0000 (21:32 +0000)]
Default to pointing to the Ubuntu python location for django admin templates.
Diane Trout [Sat, 8 May 2010 00:33:58 +0000 (00:33 +0000)]
Matches can have trailing AGCT in addition to a number
Diane Trout [Sat, 8 May 2010 00:32:32 +0000 (00:32 +0000)]
Always return a count from carefully_make_hardlinks
Be more flexible about which json parser to use
Brandon King [Fri, 7 May 2010 22:45:35 +0000 (22:45 +0000)]
Disabling 'delete selected'.
Brandon King [Thu, 29 Apr 2010 00:25:58 +0000 (00:25 +0000)]
Added a 'Print Labels' action to the Library Admin Page.
* Django 1.1 feature.
* FIXME: Requires a Printer Template (Inventory) with type Library to work...
* returns a useful error if template does not exist.
Diane Trout [Fri, 23 Apr 2010 22:21:46 +0000 (22:21 +0000)]
Update summary script to read from the GERALD Summary.xml file
instead of depending on randomly changign html code.
Diane Trout [Fri, 23 Apr 2010 22:21:45 +0000 (22:21 +0000)]
Add support for generating fasta files in addition to fastq files
Add an option to add a flowcell ID to the header
Brandon King [Sat, 10 Apr 2010 00:33:30 +0000 (00:33 +0000)]
WARNING: Django 1.0.2 to Django 1.1.1 compatibility patch... There's not going back now!
Diane Trout [Mon, 22 Mar 2010 22:43:58 +0000 (22:43 +0000)]
Extend qseq2fastq to write to two fastq files,
one for files that pass filter and one that doesn.
Diane Trout [Fri, 5 Mar 2010 22:53:07 +0000 (22:53 +0000)]
Report hidden field in the library API
Diane Trout [Fri, 5 Mar 2010 22:41:13 +0000 (22:41 +0000)]
Use the HTS workflow API to figure out the library tree.
This also needed to search flow flowcell id by the starting name
because we still have the status of a flowcell being part of the
name in a few places.
Diane Trout [Mon, 22 Feb 2010 20:07:21 +0000 (20:07 +0000)]
there is no such thing as sequence_extended. I was using the wrong
suffix generator for paired end sequencing
Diane Trout [Thu, 4 Feb 2010 22:40:40 +0000 (22:40 +0000)]
Return affiliation, library name, and comment in the lanes_for json
api call
Diane Trout [Thu, 4 Feb 2010 20:30:09 +0000 (20:30 +0000)]
Actually implement the code to loop over a list of runfolders
on the command line.
Diane Trout [Sat, 30 Jan 2010 01:28:52 +0000 (01:28 +0000)]
Update the inventory tracker code for the split from lanes being
in the flowcell table to their own stand-alone model.
Also I made the mark_archived_data script take a list of
runfolder archives so I can archive a whole hard disk in one go.
Diane Trout [Thu, 28 Jan 2010 23:59:57 +0000 (23:59 +0000)]
Adds a json api 'lanes_for' feature
Diane Trout [Thu, 28 Jan 2010 19:49:13 +0000 (19:49 +0000)]
Force auth_backend error messages to sys stderr, as
wsgi hates stdout
Diane Trout [Wed, 27 Jan 2010 18:03:11 +0000 (18:03 +0000)]
Remove debugging code that breaks mod_wsgi
Diane Trout [Tue, 26 Jan 2010 01:40:00 +0000 (01:40 +0000)]
Added 'lanes_for' which will show recent flowcell lanes ordered by date,
and allows filtering by username.
In addition I modified the library index to bin runs into
small (<40), medium (<100), and large (>=100) runs seperated by single
and paired end reads.
Diane Trout [Fri, 22 Jan 2010 19:30:48 +0000 (19:30 +0000)]
Refine user handling.
The sysadmins need username to match up with the unix accounts,
The site manager needs a meaningful name to attach users to samples.
So the HTSUser string representation is first/last name and then username
in the corner.
In addition I modified the add user popup form to allow setting the
first/last name during the user creation.
Diane Trout [Thu, 21 Jan 2010 23:07:03 +0000 (23:07 +0000)]
Update test code to deal with the switch to storing archive
qseq files instead of srf files
Diane Trout [Thu, 21 Jan 2010 22:25:09 +0000 (22:25 +0000)]
Don't throw an error if library.cell_line is None.
The API was having problems where if the cell_line wasn't set it was
trying to do None.cellline_name, which didn't work so well.
In addition there were a few other type conversion issues, such as
unicode(None) != None.
So I added unicode_or_none
Diane Trout [Wed, 13 Jan 2010 00:11:50 +0000 (00:11 +0000)]
Modify qseq2fastq to also read from compressed tar files containing qseq files
Diane Trout [Thu, 7 Jan 2010 20:52:11 +0000 (20:52 +0000)]
Update the usage string for qseq2fastq
Diane Trout [Tue, 15 Dec 2009 23:42:25 +0000 (23:42 +0000)]
Add flowcell/lane information for a library to the rest hts api.
(Also catch a couple of bugs converting some fields to json.)
Diane Trout [Tue, 15 Dec 2009 23:42:22 +0000 (23:42 +0000)]
Define the unicode() function for HTSUser to also report the users full name.
ticket:149
Make adding a user to an affiliation optional
ticket:150
Diane Trout [Tue, 15 Dec 2009 23:42:19 +0000 (23:42 +0000)]
Include information about the flowcell run date on the library detail page
Diane Trout [Tue, 15 Dec 2009 23:42:17 +0000 (23:42 +0000)]
Make the basic library page show up when javascript is disabled
by removing the x-hidden from the content div.
Diane Trout [Tue, 15 Dec 2009 23:42:12 +0000 (23:42 +0000)]
Tell people to use the https address so it'll work off campus.
Diane Trout [Fri, 11 Dec 2009 23:56:10 +0000 (23:56 +0000)]
Modify the srf utility to tar.bz2 the qseq files instead of the using
the srf utility.
Additionally I updated the runfolder script to capture a few more
pieces of information (in addition to the switch to qseq files).
I'm now capturing the IVC plot and pngs, and the
flow cell reports generated by the 1.4 and later version of the
pipeline.
Diane Trout [Tue, 1 Dec 2009 22:19:10 +0000 (22:19 +0000)]
Replace '.' with 'N' in the sequence from qseq files.
Also add an option to include the pass filter state in the header
(and a small code reorganization)
Diane Trout [Tue, 1 Dec 2009 22:19:09 +0000 (22:19 +0000)]
Fix typo in srf command
Diane Trout [Tue, 1 Dec 2009 02:03:04 +0000 (02:03 +0000)]
Also include the read ID in the fastq header
Diane Trout [Tue, 1 Dec 2009 01:41:40 +0000 (01:41 +0000)]
Add a simple utility to convert qseq to fastq files.
It'll probably morph into a more complex utility in the near future.
Diane Trout [Wed, 25 Nov 2009 21:13:54 +0000 (21:13 +0000)]
Watch for a list of files to indicate that the flowcell is done.
I change completion_file to completion_files and used shlex.split
to split the options in the ini file into multiple elements.
Thus if you want a name with a space in it you'll need to use a
backslash before the space
Diane Trout [Fri, 13 Nov 2009 01:26:58 +0000 (01:26 +0000)]
Override extjs's rather harsh stylesheet so the library detail page is legible
Diane Trout [Tue, 10 Nov 2009 02:06:30 +0000 (02:06 +0000)]
Don't add the post_run_command unless the option and a runfolder are specified
Diane Trout [Tue, 10 Nov 2009 02:06:26 +0000 (02:06 +0000)]
Remport that the pipeline will be done in X to Y days from now
instead of from when the pipeline was started.
Diane Trout [Tue, 27 Oct 2009 22:27:10 +0000 (22:27 +0000)]
update inventory status template to use new field name for samples_library library id
Diane Trout [Tue, 27 Oct 2009 22:27:08 +0000 (22:27 +0000)]
Allow specifying which database to convert
Diane Trout [Tue, 27 Oct 2009 22:07:44 +0000 (22:07 +0000)]
Turn the library_id back into the primary key for samples_library (SCHEMA CHANGE!)
Trying to make it possible to enter the 'library_id' instead of the
arbitrary auto-incrementing key when creating a flowcell was turning out
to be far too time consuming.
It was vastly easier to decide that the 'library id' was a sufficiently
unique short value that it could be used directly as the primary key.
Its now a char 10 field, unlike the integer primary key to support
stanford style library IDs like SL123.
Its possible to convert the previous database version to one compatible with
this code by running docs/conv_library_id_to_pk_v0.3.1.py
Diane Trout [Sat, 17 Oct 2009 00:27:36 +0000 (00:27 +0000)]
Figure out the absolute path to the runfolder for passing to the post_run
command
Diane Trout [Thu, 8 Oct 2009 19:39:33 +0000 (19:39 +0000)]
Also use the javascript dual panel choice box for tags.
Diane Trout [Thu, 8 Oct 2009 00:39:32 +0000 (00:39 +0000)]
In some cases test_retrive_config will attempt to normalize None as a url
which doesn't work so well.
So this patch just returns the None and lets the problem
get sorted out elsewhere.
Diane Trout [Thu, 8 Oct 2009 00:13:17 +0000 (00:13 +0000)]
Use raw_id_fields for the library ID in the Flowcell Lane Inline form
this provides a vastly superior choice for searching a large number of
database entries.
I also grouped the various Lane form customization elements together
as I was tired of scrolling up and down in the file.
Diane Trout [Thu, 8 Oct 2009 00:13:08 +0000 (00:13 +0000)]
Add the ability not to build srf files.
This requires that you specify the 'site name' as a flag to turn
on creating the srf files.Add the ability not to build srf files.
Diane Trout [Tue, 6 Oct 2009 23:26:17 +0000 (23:26 +0000)]
Add an account number field to a library. (Note SCHEMA CHANGE)
We need some way to charge groups for our services, so it keeping
track of what account to use.
This does require a minor schema change.
alter table samples_library add column "account_number" varchar(100) NULL;
Diane Trout [Tue, 6 Oct 2009 19:45:33 +0000 (19:45 +0000)]
Add a bread crumb to the send started email page.
I modified the base template to re-include the default place for
a bread crumb, and added a bit of test code to make sure the bread crumb
is actually present.
Diane Trout [Tue, 6 Oct 2009 18:42:16 +0000 (18:42 +0000)]
Remove spurious print
Diane Trout [Tue, 6 Oct 2009 18:42:11 +0000 (18:42 +0000)]
Test that the email notification actually generates some emails.
I discovered that django when run in a test environment will store
email messages sent in django.core.mail.outbox, so I could make sure
that when the "run started" send email is clicked, it actually creates
some messages. (Though my current test, is just that the body is non-zero).
Diane Trout [Tue, 6 Oct 2009 18:34:36 +0000 (18:34 +0000)]
Use eland_extended or eland_pair instead of eland for ANALYSIS type
this required splitting up my analysis suffix for sequencing and
aligning code.
Also forcing lanes that have no available genome to be sequencing
changed part of a retrive_config test case.
Diane Trout [Tue, 6 Oct 2009 00:04:50 +0000 (00:04 +0000)]
Default to sequencing if we don't have a genome for the provided
species.
Also I added some additional logging messages to make it easier
to se what's going on with retrieve config.
Diane Trout [Fri, 2 Oct 2009 00:15:14 +0000 (00:15 +0000)]
Test the updated version of extract results that builds srf files.
Of course this means you need illumina2srf for the code to work.
Perhaps I should add something to skip that test if its missing.
Diane Trout [Wed, 30 Sep 2009 22:16:41 +0000 (22:16 +0000)]
Add building srf files to runfolder as part of --extract-results
This required splitting the srf code out from the srf script
into a new module in htsworkflow/pipelines/srf.py
Diane Trout [Mon, 28 Sep 2009 19:18:08 +0000 (19:18 +0000)]
Modify hdquery to not die when being imported on non-linux systems.
(Makes nosetests --with-doctests work better)
Diane Trout [Sat, 26 Sep 2009 01:35:25 +0000 (01:35 +0000)]
The config file should also set the SEQUENCE_FORMAT
(also I forgot to change a .getcode() to .code
Diane Trout [Sat, 26 Sep 2009 01:26:40 +0000 (01:26 +0000)]
HttpRequest.getcode doesn't exist in python 2.4, use .code instead
Diane Trout [Sat, 26 Sep 2009 00:02:17 +0000 (00:02 +0000)]
Implement a client side config file generator.
This downloads the flowcell information json block and then
creates a gerald config file with it.
This version will also look for a "post_run" entry in the
htsworkflow.ini config file for a script that should be
inserted into the config file to be run when make ends.
Diane Trout [Sat, 26 Sep 2009 00:02:12 +0000 (00:02 +0000)]
Print library.library_id instead of the flowcell.library_id.
The first is the library_id string we assign to libraries,
the second is the foriegn key linking to the library primary key.
(yes they're both named library_id... it makes it confusing.)
Diane Trout [Wed, 23 Sep 2009 19:09:58 +0000 (19:09 +0000)]
Return species information as part of the flowcell json information.
Additionally instead of using django authentication use an apikey
for authenticating access to the json data.
Currently the apikey is just a value stored in the settings.py file
(DEFAULT_API_KEY), but in the future could be linked to users.
Diane Trout [Sat, 19 Sep 2009 01:28:24 +0000 (01:28 +0000)]
Put a stub species_json in, as I'd listed it in the samples/urls.py
this implementation just always returns 404
Diane Trout [Sat, 19 Sep 2009 01:15:48 +0000 (01:15 +0000)]
Allow grabbing library information via json.
Also make sure that we refer to libraries by our official "library_id"
instead of django's primary key library.id. I needed alter what was being
returned by the flowcell json code in order to support this.
Diane Trout [Sat, 19 Sep 2009 01:15:42 +0000 (01:15 +0000)]
Add a /config/<fcid>/json url that returns information about a flowcell
this includes test code that verifies the underlying dictionary
representing the flowcell is correct, as well as testing that we
can retrieve data from the url only if we're logged in.
Now I need to implement something similar for sharing information
about libraries.
Diane Trout [Sat, 19 Sep 2009 01:15:40 +0000 (01:15 +0000)]
Django really wanted a 404 template.
So here is an insanely basic 404 template.
Someone should fix it, but at least know I'm getting 404s and not
tracebacks
Diane Trout [Wed, 16 Sep 2009 21:36:44 +0000 (21:36 +0000)]
Force addition of HTSUser object if someone is creating an auth_users object
Diane Trout [Wed, 16 Sep 2009 21:36:36 +0000 (21:36 +0000)]
Replace some prints with logging.info messages and
replace the call to HTTPError.reason with HTTPError.code and HTTPError.msg
as those seem to be available in python 2.6
Diane Trout [Mon, 14 Sep 2009 18:51:30 +0000 (18:51 +0000)]
Fix ticket:145 this patch includes the tar.bz2 extension in the scores pattern.
Also for good measure I check for tar.gz and .tgz. This'll
help avoid the problem of something else showing up in the directory
that matches the pattern scores*, like scores.tar.bz2.md5
Diane Trout [Fri, 11 Sep 2009 01:41:12 +0000 (01:41 +0000)]
Apparently I should've rendered the emails in plain text.
Also lorian wanted to include the cluster estimate in the email.
Diane Trout [Fri, 11 Sep 2009 00:44:01 +0000 (00:44 +0000)]
None is better than a baddly munged string for when we don't know the cluster/tile value
Diane Trout [Thu, 10 Sep 2009 23:54:33 +0000 (23:54 +0000)]
Don't crash if there are no lane result summary entries when rendering
output by not looking inside the empty dictionary.
Diane Trout [Thu, 10 Sep 2009 23:52:33 +0000 (23:52 +0000)]
Report if the Summary.htm file is missing the Lane Results Summary block.
Diane Trout [Thu, 10 Sep 2009 19:45:34 +0000 (19:45 +0000)]
Remove spurious pring debugging statements as they make mod_wsgi very sad
Brandon King [Thu, 3 Sep 2009 22:10:47 +0000 (22:10 +0000)]
wsgi print error fixed.
Diane Trout [Wed, 2 Sep 2009 23:03:04 +0000 (23:03 +0000)]
remove spurious debug print statement
Diane Trout [Tue, 1 Sep 2009 23:03:01 +0000 (23:03 +0000)]
Report who the site managers are for the BCC
also include the affiliations email as one of the entities being notified
about a library.
Brandon King [Tue, 25 Aug 2009 18:27:41 +0000 (18:27 +0000)]
Added control_lane column to Flowcell.
* updated upgrade_v0.2.6_to_v0.3.py script to update flowcell table.
Brandon King [Sat, 22 Aug 2009 00:28:26 +0000 (00:28 +0000)]
No printing allowed; fixed.
Diane Trout [Sat, 22 Aug 2009 00:09:37 +0000 (00:09 +0000)]
Now adding a user adds the key linking the htsuser object to the auth_user object.
(I overrode the user admin class and two supporting forms.)