Diane Trout [Tue, 21 Dec 2010 23:53:59 +0000 (15:53 -0800)]
Add utility to fix the off-by-3 error in the amplified from field.
The bug appeared because the library name and library primary key
drifted apart because of some accidental deletions.
I eventually moved us back to using raw primary keys as that
made constructing django urls simplier & easier. However
I'd apparently forgotten to adjust the amplified from field.
Diane Trout [Thu, 2 Dec 2010 01:19:07 +0000 (17:19 -0800)]
Make the inventory login page look a bit nicer.
This involved adjusting some of the base django templates.
Whenever given a choice I try to make our pages look more like the
admin site.
Diane Trout [Thu, 2 Dec 2010 01:18:05 +0000 (17:18 -0800)]
Adjust some whitespace
Diane Trout [Wed, 1 Dec 2010 00:26:51 +0000 (16:26 -0800)]
This mildly adjusts the css to be a bit prettier
Diane Trout [Wed, 1 Dec 2010 00:15:43 +0000 (16:15 -0800)]
Add a new ini-file option to point to the flowcell repository.
It's in the [frontend] section called "results_dir".
Diane Trout [Tue, 30 Nov 2010 19:45:47 +0000 (11:45 -0800)]
Add config option notification_bcc for always attaching to notify emails.
Its called notification_bcc in the htsworkflow.ini file and
NOTIFICATION_BCC in the django settings.py file.
It ignores the "send bcc" option on the email_started form.
Diane Trout [Wed, 17 Nov 2010 05:39:25 +0000 (21:39 -0800)]
Cache the attributes for each type of fastq file.
Previously it was just using whatever was last as the
set of attributes. Which led to everything being flagged as
FastqRd2
Diane Trout [Tue, 16 Nov 2010 01:41:04 +0000 (17:41 -0800)]
Map junctions.bed to the Junctions view.
Hopefully it'll pass validation and not be treated as a bed file,
even though it ends in the .bed extension.
(There were lots of problems with a previous subission because the
cufflinks bed file was almost but not quite a match to ucsc's bed
file validator. Or maybe they caved on the definition of a score.)
Diane Trout [Tue, 16 Nov 2010 01:40:11 +0000 (17:40 -0800)]
Autodetect location of *2fastq scripts
Uses the .__file__ variable of the python modules.
Diane Trout [Mon, 15 Nov 2010 22:59:57 +0000 (14:59 -0800)]
Don't count failed flowcells when guessing if a library is paired or single ended
In addition the NameToView map was extended to cache the is_paired flag
so I don't have to keep searching through the dictionary of library information.
Diane Trout [Sun, 7 Nov 2010 23:58:07 +0000 (15:58 -0800)]
Merge branch 'master' of mus.cacr.caltech.edu:htsworkflow
Diane Trout [Sun, 7 Nov 2010 23:54:22 +0000 (15:54 -0800)]
Very basic non-ExtJS version of the inventory page
It needs to group by object type. Possible with an index of types
followed by index of objects before getting to the detail page.
The component where it forces you to login before viewing the
inventory page is broken. It shows a super-plain form, but
fails on submit with some kind of cross site forgery prevention
cookie fails
Diane Trout [Sun, 7 Nov 2010 23:46:49 +0000 (15:46 -0800)]
Remove another ExtJS panel reference.
This version has a working two column independantly scrollable
library browse with no ExtJS dependency.
Diane Trout [Fri, 5 Nov 2010 23:27:02 +0000 (16:27 -0700)]
Update to the new UCSC DAF file.
We are submitting 3 different types of expression data,
one for Gencode v3c, one for Gencode v4, and one for the
de novo cufflinks assembly.
As a result I needed to update the file name to ucsc view map
to describe these new, more specific datasets.
Also I modified the name of the archival and upload condor scripts
to include a username.
Diane Trout [Wed, 3 Nov 2010 23:59:07 +0000 (16:59 -0700)]
Create a class to handle mapping extension to ucsc view attributes.
This will attempt to guess if a library is paired end by seeing
if there are more paired end lanes than single read lanes.
The file exention matching was changed to use fnmatch instead of
ends with.
I still haven't added the ability to define extensions to ucsc DAF
view maps in a config file, as I don't know how to handle the case
of the .bam file that goes to a different view depending on if its
a paired end vs single end.
Also the ucsc_gather script is too long and parts of it need
to migrate into the rest of the htsworkflow tree.
Diane Trout [Tue, 2 Nov 2010 00:16:03 +0000 (17:16 -0700)]
Two lines between functions (trivial change)
Diane Trout [Mon, 1 Nov 2010 21:55:42 +0000 (14:55 -0700)]
Only complane about missing genomes if we _wanted_ to align the lane.
If it was flagged for sequencing, don't complain.
Diane Trout [Mon, 1 Nov 2010 21:52:45 +0000 (14:52 -0700)]
Check for s_${lane}_02_matrix.txt as well as s_${lane}_1_matrix.txt
It still could use more test coverage
Diane Trout [Thu, 28 Oct 2010 00:25:02 +0000 (17:25 -0700)]
Remove dependency on ExtJS for library & lanes_for pages.
The inventory page still needs extjs.
In the process I cleaned up some of the CSS as well.
The downside is you'll need to link
htsworkflow/static/js/jquery.min.js
to a valid jquery sourcefile.
Diane Trout [Wed, 27 Oct 2010 22:51:20 +0000 (15:51 -0700)]
The WITH_SEQUENCE flag needs to be lower-case "true".
Diane Trout [Fri, 22 Oct 2010 23:40:13 +0000 (16:40 -0700)]
Report version number derived from git tag.
This patch includes the necessary infrastructure to support that
feature and its been added to qseq2fastq and srf2fastq.
Additionally to improve testability of qseq2fastq and srf2fastq, the
original standalone module was moved into htsworkflow.pipelines and a
small stub module was placed in scripts.
Diane Trout [Tue, 19 Oct 2010 19:35:57 +0000 (19:35 +0000)]
Set WITH_SEQUENCE as both a per-lane AND global parameter
the per-lane version set $(lane)_WITH_SEQUENCE := TRUE in the makefiles
the gerald WITH_SEQUENCE TRUE changed the setting in the config.txt file
So lets just set both versions and hopefully we'll get sequence files
out the other end.
Let me show you my hammer...
Diane Trout [Fri, 8 Oct 2010 17:32:46 +0000 (17:32 +0000)]
WITH_SEQUENCE is actually a global GERALD option, remove the lane specifier.
Diane Trout [Thu, 7 Oct 2010 22:32:45 +0000 (22:32 +0000)]
Update gerald config file generation.
Include ELAND_SET_SIZE, WITH_SEQUENCE options
Stop commenting SEQUENCE_FORMAT out
Lorian Schaeffer [Tue, 5 Oct 2010 18:12:42 +0000 (18:12 +0000)]
Minor changes to display
Lorian Schaeffer [Fri, 1 Oct 2010 19:01:58 +0000 (19:01 +0000)]
Removed some models from the admin index view
Lorian Schaeffer [Fri, 1 Oct 2010 18:57:01 +0000 (18:57 +0000)]
Changed Frequently Used links to be properly relative
Lorian Schaeffer [Fri, 1 Oct 2010 01:39:37 +0000 (01:39 +0000)]
Removed from admin index page
Lorian Schaeffer [Fri, 1 Oct 2010 01:25:11 +0000 (01:25 +0000)]
Removed unused DataRun model from admin index page
Lorian Schaeffer [Fri, 1 Oct 2010 01:21:39 +0000 (01:21 +0000)]
Removed all analysis models from main admin index
Lorian Schaeffer [Fri, 1 Oct 2010 01:19:44 +0000 (01:19 +0000)]
Limited which models show up on admin index page
Removed some fields from view in the Library Change/Add page
Rearranged the remaining Library fields
Diane Trout [Thu, 30 Sep 2010 21:58:28 +0000 (21:58 +0000)]
Ignore some files generated by python
.coverage
*.egg-info
Diane Trout [Fri, 24 Sep 2010 22:34:17 +0000 (22:34 +0000)]
Adjust spoolwatchers use of inotify to correspond to pyinotify 0.8.9
Diane Trout [Fri, 24 Sep 2010 22:34:16 +0000 (22:34 +0000)]
Handle the case when a sequencing lane lacks any yield information.
(For instance if we only run one lane of analysis on a flowcell)
Diane Trout [Fri, 24 Sep 2010 22:34:15 +0000 (22:34 +0000)]
Remind myself that dry-run doesn't work with the extract results code
Diane Trout [Fri, 10 Sep 2010 22:35:20 +0000 (22:35 +0000)]
Add support for CASAVA 1.7
They stopped generating eland_extended files so I needed to parse
the export files instead.
Also more carefully test how I'm computing the U0-2 and R0-2
scores.
Diane Trout [Tue, 24 Aug 2010 21:32:06 +0000 (21:32 +0000)]
Better detect which program ids generated CNF4 srf files.
apparently illumina2srf v1.11.5Illumina.1.3 also genrated CNF4s.
It's only illumina2srf v1.11.6... that generated CNF1s.
Diane Trout [Tue, 24 Aug 2010 00:31:55 +0000 (00:31 +0000)]
Split parts of build_fastqs into a seperate function.
Use the database to flag if a lane is bad instead of hard coding it.
Support comments and spaces in the library_id to target directory file.
Diane Trout [Mon, 23 Aug 2010 22:54:20 +0000 (22:54 +0000)]
Add a "All Lanes" option to the control-lane field for the cases
where we didn't set a control lane, but instead let the pipeline
estimate from the entire flowcell.
Diane Trout [Sat, 14 Aug 2010 00:23:06 +0000 (00:23 +0000)]
Shrink flowcell notes field.
See [ticket:167]
Diane Trout [Sat, 14 Aug 2010 00:07:16 +0000 (00:07 +0000)]
Don't offer to track flowcell config files in svn
Diane Trout [Fri, 13 Aug 2010 23:57:06 +0000 (23:57 +0000)]
Remove debug print statement
Diane Trout [Fri, 13 Aug 2010 23:51:32 +0000 (23:51 +0000)]
Reduce the size of the bioanalyzer summary text box
to 3 lines.
Diane Trout [Fri, 13 Aug 2010 22:49:10 +0000 (22:49 +0000)]
Tell git to ignore *.py[co]~ files.
(Useful if you're using git-svn)
Diane Trout [Fri, 13 Aug 2010 22:49:09 +0000 (22:49 +0000)]
This patch adds in fields to support storing bioanalyzer results. [ticket:166]
Diane Trout [Wed, 14 Jul 2010 22:32:16 +0000 (22:32 +0000)]
Georgi used slightly different extensions for the second batch
of data
Diane Trout [Wed, 14 Jul 2010 22:31:57 +0000 (22:31 +0000)]
Add search field to the lanes admin page
Diane Trout [Mon, 12 Jul 2010 19:02:22 +0000 (19:02 +0000)]
Use the htsworkflow API to determine if a flowcell is paired end or not.
Diane Trout [Mon, 12 Jul 2010 19:01:31 +0000 (19:01 +0000)]
Under some conditions urlerror doesn't have a code,
so just report the raw error message in that case
Diane Trout [Mon, 12 Jul 2010 18:23:39 +0000 (18:23 +0000)]
Report if a lane was on a paired end flowcell from the library
json structure.
Diane Trout [Fri, 9 Jul 2010 00:08:12 +0000 (00:08 +0000)]
Since srf2named_fastq detects what type of srf file I can remove
the -c option that was forcing it to read CNF1 formatted srf files
Diane Trout [Thu, 8 Jul 2010 22:15:31 +0000 (22:15 +0000)]
Update srf2named_fastq to try to detect if the srf file is CNF1 or CNF4
and figure out the correct option to pass to srf2fastq.
Diane Trout [Thu, 8 Jul 2010 18:54:15 +0000 (18:54 +0000)]
Add option to force overwriting old fastqs.
It will ignore the current existence of a fastq file when generating
the condor submit script.
In addition it will tell srf2named_fastq to --force as well.
Diane Trout [Wed, 7 Jul 2010 00:19:37 +0000 (00:19 +0000)]
If a quality score started with an @ sign it was treated as a header
which created an invalid fastq file.
This patch fixes that, and introduces some test cases for srf2named_fastq.py
Diane Trout [Wed, 7 Jul 2010 00:19:36 +0000 (00:19 +0000)]
This still isn't ideal as the RESULTS_HOME_DIR is still out of my
home directory, but at least the use of expanduser means it'll work on
both linux and os x.
Diane Trout [Tue, 22 Jun 2010 19:07:49 +0000 (19:07 +0000)]
Save changes needed to submit to UCSC in Jun 2010.
Some of those changes include:
* modifying the list of variables to include in the ddf
* making it easier to set the MapAlgorithm.
* Return information about the condor scripts so I can make a
condor dagman script to run all the compression jobs
Perhaps some parts of this should be moved into the main
htsworkflow. I can see wanting the code to batch convert
srf/qseqs to fastq being more generally useful.
Diane Trout [Tue, 22 Jun 2010 19:07:48 +0000 (19:07 +0000)]
Fix srf2named_fastq to output the proper /2 in paired end reads
Diane Trout [Tue, 15 Jun 2010 18:47:09 +0000 (18:47 +0000)]
Update the 'flowcell started' email message to include the sequencer
in the url.
Diane Trout [Tue, 15 Jun 2010 18:47:08 +0000 (18:47 +0000)]
Make it easier to run srf2named_fastq.py
Diane Trout [Tue, 15 Jun 2010 00:18:33 +0000 (00:18 +0000)]
Wrapper script that helps convert srf files to fastq files.
It can do the following:
split the fastq into two files (for paired end reads)
add in a flowcell id to the header (for either type of read).
Diane Trout [Mon, 14 Jun 2010 21:11:32 +0000 (21:11 +0000)]
Rename avg_lib_size to gel_cut_size, and add insert_size
to clear up the confusion about what was supposed to be
being recorded in avg_lib_size.
In addition, this patch adds in a per lane status field.
Diane Trout [Mon, 14 Jun 2010 21:11:31 +0000 (21:11 +0000)]
Update internal copy of the django admin/templates/admin/index.html
They changed how the css was being imported in the django css
so when running with django 1.1 the admin index page was messed up.
This gets all the borders to show up correctly.
Diane Trout [Fri, 11 Jun 2010 00:16:19 +0000 (00:16 +0000)]
Collect fastqs by read and add them to the configuration ini file as a
single line. (As desired by UCSC).
Also the library to result map file supports a basic comment character.
if # is the _first_ character it will skip that line.
Next I should fix the avg library size / insert length variables.
Diane Trout [Fri, 11 Jun 2010 00:16:18 +0000 (00:16 +0000)]
Put partial support back in for srf files.
Since I don't know if the srf file is supposed to be single or
paired end, this version assumes paired end unless you provide the
--single option.
Currently it'll give up if you try to convert a paired srf file
to a fastq file.
Also I made the code formatting in the make_parser function
look cleaner, and changed it to allow setting the logging verbosity
via command line options --verbose/--debug.
Diane Trout [Thu, 10 Jun 2010 00:55:11 +0000 (00:55 +0000)]
Add script to try and build submissions to the UCSC encode project.
This version supports generating qseq2fastq entries using the
htsworkflow api and scanning the flowcell repository directory.
There was code to generate the ddf files (from ini files).
I need to update the ini generation code to scan the submisison directory
for fastq files and group them by read.
Diane Trout [Thu, 10 Jun 2010 00:55:10 +0000 (00:55 +0000)]
Extend htsworkflow.pipelines.sequences to also try to figure out the cycle count.
In addition there is experimental code to shove the found sequences into a
sql database.
I also needed to bug fix the sequence patterns to catch the fake flowcell
ilmn200901 which wasn't matching my regexp for detecting flowcell ids.
Diane Trout [Thu, 10 Jun 2010 00:55:09 +0000 (00:55 +0000)]
Move the code to scan the sequence file archive to its own module so
I can use it in scripts other than make-library-tree
Diane Trout [Thu, 10 Jun 2010 00:55:08 +0000 (00:55 +0000)]
new api module actually needed logging.
Diane Trout [Thu, 10 Jun 2010 00:55:06 +0000 (00:55 +0000)]
Move the knowledge of the urls for the REST API to one new file
htsworkflow.util.api and then update some of the scripts that were
using the api to import from the new module.
Yes this increases the dependencies, but it does mean it'll be
easier to update the urls if we need to change them
Diane Trout [Tue, 1 Jun 2010 19:36:31 +0000 (19:36 +0000)]
apparently commas are important
Diane Trout [Tue, 1 Jun 2010 19:35:59 +0000 (19:35 +0000)]
make-library-tree should be an installed script too
Diane Trout [Sat, 29 May 2010 01:06:19 +0000 (01:06 +0000)]
Add in extra fields lorian asked for to library detail page
Diane Trout [Sat, 29 May 2010 00:08:02 +0000 (00:08 +0000)]
Do not insert anything into the header if there is no flowcell info.
(Previously there was a spurious _)
Diane Trout [Mon, 17 May 2010 22:37:30 +0000 (22:37 +0000)]
qseq2fastq should also be installed as a script
Brandon King [Fri, 14 May 2010 22:43:22 +0000 (22:43 +0000)]
A patch that allows printing more than 11 labels at a time.
Brandon King [Thu, 13 May 2010 21:32:54 +0000 (21:32 +0000)]
Default to pointing to the Ubuntu python location for django admin templates.
Diane Trout [Sat, 8 May 2010 00:33:58 +0000 (00:33 +0000)]
Matches can have trailing AGCT in addition to a number
Diane Trout [Sat, 8 May 2010 00:32:32 +0000 (00:32 +0000)]
Always return a count from carefully_make_hardlinks
Be more flexible about which json parser to use
Brandon King [Fri, 7 May 2010 22:45:35 +0000 (22:45 +0000)]
Disabling 'delete selected'.
Brandon King [Thu, 29 Apr 2010 00:25:58 +0000 (00:25 +0000)]
Added a 'Print Labels' action to the Library Admin Page.
* Django 1.1 feature.
* FIXME: Requires a Printer Template (Inventory) with type Library to work...
* returns a useful error if template does not exist.
Diane Trout [Fri, 23 Apr 2010 22:21:46 +0000 (22:21 +0000)]
Update summary script to read from the GERALD Summary.xml file
instead of depending on randomly changign html code.
Diane Trout [Fri, 23 Apr 2010 22:21:45 +0000 (22:21 +0000)]
Add support for generating fasta files in addition to fastq files
Add an option to add a flowcell ID to the header
Brandon King [Sat, 10 Apr 2010 00:33:30 +0000 (00:33 +0000)]
WARNING: Django 1.0.2 to Django 1.1.1 compatibility patch... There's not going back now!
Diane Trout [Mon, 22 Mar 2010 22:43:58 +0000 (22:43 +0000)]
Extend qseq2fastq to write to two fastq files,
one for files that pass filter and one that doesn.
Diane Trout [Fri, 5 Mar 2010 22:53:07 +0000 (22:53 +0000)]
Report hidden field in the library API
Diane Trout [Fri, 5 Mar 2010 22:41:13 +0000 (22:41 +0000)]
Use the HTS workflow API to figure out the library tree.
This also needed to search flow flowcell id by the starting name
because we still have the status of a flowcell being part of the
name in a few places.
Diane Trout [Mon, 22 Feb 2010 20:07:21 +0000 (20:07 +0000)]
there is no such thing as sequence_extended. I was using the wrong
suffix generator for paired end sequencing
Diane Trout [Thu, 4 Feb 2010 22:40:40 +0000 (22:40 +0000)]
Return affiliation, library name, and comment in the lanes_for json
api call
Diane Trout [Thu, 4 Feb 2010 20:30:09 +0000 (20:30 +0000)]
Actually implement the code to loop over a list of runfolders
on the command line.
Diane Trout [Sat, 30 Jan 2010 01:28:52 +0000 (01:28 +0000)]
Update the inventory tracker code for the split from lanes being
in the flowcell table to their own stand-alone model.
Also I made the mark_archived_data script take a list of
runfolder archives so I can archive a whole hard disk in one go.
Diane Trout [Thu, 28 Jan 2010 23:59:57 +0000 (23:59 +0000)]
Adds a json api 'lanes_for' feature
Diane Trout [Thu, 28 Jan 2010 19:49:13 +0000 (19:49 +0000)]
Force auth_backend error messages to sys stderr, as
wsgi hates stdout
Diane Trout [Wed, 27 Jan 2010 18:03:11 +0000 (18:03 +0000)]
Remove debugging code that breaks mod_wsgi
Diane Trout [Tue, 26 Jan 2010 01:40:00 +0000 (01:40 +0000)]
Added 'lanes_for' which will show recent flowcell lanes ordered by date,
and allows filtering by username.
In addition I modified the library index to bin runs into
small (<40), medium (<100), and large (>=100) runs seperated by single
and paired end reads.
Diane Trout [Fri, 22 Jan 2010 19:30:48 +0000 (19:30 +0000)]
Refine user handling.
The sysadmins need username to match up with the unix accounts,
The site manager needs a meaningful name to attach users to samples.
So the HTSUser string representation is first/last name and then username
in the corner.
In addition I modified the add user popup form to allow setting the
first/last name during the user creation.
Diane Trout [Thu, 21 Jan 2010 23:07:03 +0000 (23:07 +0000)]
Update test code to deal with the switch to storing archive
qseq files instead of srf files
Diane Trout [Thu, 21 Jan 2010 22:25:09 +0000 (22:25 +0000)]
Don't throw an error if library.cell_line is None.
The API was having problems where if the cell_line wasn't set it was
trying to do None.cellline_name, which didn't work so well.
In addition there were a few other type conversion issues, such as
unicode(None) != None.
So I added unicode_or_none
Diane Trout [Wed, 13 Jan 2010 00:11:50 +0000 (00:11 +0000)]
Modify qseq2fastq to also read from compressed tar files containing qseq files
Diane Trout [Thu, 7 Jan 2010 20:52:11 +0000 (20:52 +0000)]
Update the usage string for qseq2fastq
Diane Trout [Tue, 15 Dec 2009 23:42:25 +0000 (23:42 +0000)]
Add flowcell/lane information for a library to the rest hts api.
(Also catch a couple of bugs converting some fields to json.)