htsworkflow.git
11 years agoTweaks to make the sample gather query faster.
Diane Trout [Wed, 12 Sep 2012 18:32:55 +0000 (11:32 -0700)]
Tweaks to make the sample gather query faster.
Instead of creating a new library term for submissions, I
changed it back to using the stanrard library ontology term.
(making it easier to link things together).

Also apparently you can have nested optional statements.

11 years agoAdd a helpful error message when we need the submission name but it wasn't provided
Diane Trout [Wed, 12 Sep 2012 18:31:19 +0000 (11:31 -0700)]
Add a helpful error message when we need the submission name but it wasn't provided

11 years agoAnother tweak to allow parser detection to work on content-type text/plain urls
Diane Trout [Wed, 12 Sep 2012 18:30:26 +0000 (11:30 -0700)]
Another tweak to allow parser detection to work on content-type text/plain urls

11 years agoBetter parser guessing when grabbing plain text from a webserver.
Diane Trout [Tue, 11 Sep 2012 23:01:46 +0000 (16:01 -0700)]
Better parser guessing when grabbing plain text from a webserver.
Also fix a typo in an error message

11 years agoUse the same model for building fastqs as for building soft file
Diane Trout [Mon, 10 Sep 2012 21:16:08 +0000 (14:16 -0700)]
Use the same model for building fastqs as for building soft file
(And let user specify a base filename on command line)

11 years agoAllow retrying retrieving rdf from website a few times.
Diane Trout [Mon, 10 Sep 2012 21:15:32 +0000 (14:15 -0700)]
Allow retrying retrieving rdf from website a few times.

11 years agoImport flowcell information into our model this allows
Diane Trout [Sat, 1 Sep 2012 02:59:52 +0000 (19:59 -0700)]
Import flowcell information into our model this allows
our super library id & sequence & flowcell info query to work

11 years agoHave librdf open the urls in load_into_model instead of using python
Diane Trout [Sat, 1 Sep 2012 02:56:55 +0000 (19:56 -0700)]
Have librdf open the urls in load_into_model instead of using python
As a side effect I ended up discovering that my parser type detection
code for file types was wrong, and now it has test code

I was trying to fix a bug where librdf 1.0.10.1-1.1ubuntu1 was
segfaulting when loading into the model but I couldn't figure
out what was doing it.

11 years agoAutomatically add the flowcell subdirectory to the sequence archive path
Diane Trout [Sat, 1 Sep 2012 02:53:03 +0000 (19:53 -0700)]
Automatically add the flowcell subdirectory to the sequence archive path

11 years agoOlder rdf library doesn't like unicode strings as nodes
Diane Trout [Fri, 31 Aug 2012 21:39:45 +0000 (14:39 -0700)]
Older rdf library doesn't like unicode strings as nodes
so manually encode as utf8

11 years agoFix matching scanned sequence files to library IDs work for hiseq runs.
Diane Trout [Fri, 31 Aug 2012 19:29:59 +0000 (12:29 -0700)]
Fix matching scanned sequence files to library IDs work for hiseq runs.
The previous version was keying off of flowcell/lane so if you
had multiple libraries from the same flowcell/lane all the sequences
would end up in one of the libraries.

Hopefully this fixes that. Though to do this I ended up changing
the whole structure of condorfastq to be based on updating an RDF model.
This depends on the sequence.py module changes of saving things to
rdf models -- and the new code to infer library ids at that layer.

11 years agoInherit from MutableMapping for ResultMap.
Diane Trout [Fri, 31 Aug 2012 19:19:42 +0000 (12:19 -0700)]
Inherit from MutableMapping for ResultMap.
This gets me automatic methods.
I did it because my for x in resultmap wasn't working
correctly. It was using a list key (0) instead of a map key
(library id) and tossing a key error.

Also instead of using ResultMap.add_result(key, destination) I can do
ResultMap[key] = destination

11 years agoAllow option to save/restore a sequence class to a RDF model.
Diane Trout [Fri, 31 Aug 2012 19:08:08 +0000 (12:08 -0700)]
Allow option to save/restore a sequence class to a RDF model.
(After doing this I started having dreams of some set of mixins
designed to persist data into different types of storage).

I also renamed the sql save to indicate that its going to a SQL
database.

Also I renamed one of my simplify Uris to stripNamespace
to make it clearer what it was actually doing.

simplify_uri just returns the end of a uri -- regardless of type.
stripNamespace removes a specific namespave from a uri.

11 years agoTest RDFa encoded information for flowcell & libraries.
Diane Trout [Tue, 28 Aug 2012 23:03:45 +0000 (16:03 -0700)]
Test RDFa encoded information for flowcell & libraries.
This shouls make sure that I have the right class type on flowcells
so I can reliably fish it out of a triple store.

11 years agoMerge branch 'master' of mus.cacr.caltech.edu:htsworkflow
Diane Trout [Tue, 28 Aug 2012 22:57:14 +0000 (15:57 -0700)]
Merge branch 'master' of mus.cacr.caltech.edu:htsworkflow

11 years agosanitize_literal had trouble with empty strings.
Diane Trout [Tue, 28 Aug 2012 22:56:22 +0000 (15:56 -0700)]
sanitize_literal had trouble with empty strings.
This fixes that and tests that edge case.

11 years agoAttempt to come up with regexp to detect hiseq formatted files.
Diane Trout [Tue, 28 Aug 2012 22:48:50 +0000 (15:48 -0700)]
Attempt to come up with regexp to detect hiseq formatted files.

11 years agoFix some mismatches between variable names in python code and in templates
Diane Trout [Thu, 9 Aug 2012 19:53:32 +0000 (12:53 -0700)]
Fix some mismatches between variable names in python code and in templates

11 years agoCreate a lane to file name turtle data file
Diane Trout [Tue, 7 Aug 2012 02:13:49 +0000 (19:13 -0700)]
Create a lane to file name turtle data file
this required passing the lane ID information back through
the json api.

11 years agoSimplify code to see if we already have a DataRun created.
Diane Trout [Tue, 7 Aug 2012 02:01:51 +0000 (19:01 -0700)]
Simplify code to see if we already have a DataRun created.
Make sure we update the data runs when getting flowcell lane details.
Test to make sure we can't accidentally add more than one data run.
In the test code also change to assert from failUnless.

11 years agoMerge branch 'master' of mus.cacr.caltech.edu:htsworkflow
Diane Trout [Mon, 6 Aug 2012 21:26:26 +0000 (14:26 -0700)]
Merge branch 'master' of mus.cacr.caltech.edu:htsworkflow

11 years agoDon't link to lane for the raw result files section.
Diane Trout [Mon, 6 Aug 2012 21:25:11 +0000 (14:25 -0700)]
Don't link to lane for the raw result files section.
The raw result file code for finding the lane ID was returning
a wrong value (first lane) which was causing trouble on
multi sample hiseq runs.

11 years agoMerge branch 'master' of mus.cacr.caltech.edu:htsworkflow
Diane Trout [Mon, 6 Aug 2012 21:17:01 +0000 (14:17 -0700)]
Merge branch 'master' of mus.cacr.caltech.edu:htsworkflow

11 years agoUse default printer settings again.
Diane Trout [Mon, 6 Aug 2012 21:16:34 +0000 (14:16 -0700)]
Use default printer settings again.

(Removing it broke things)

11 years agoTest proper setting of sequence project variable
Diane Trout [Mon, 6 Aug 2012 18:25:24 +0000 (11:25 -0700)]
Test proper setting of sequence project variable

11 years agoTest more of the sequences class.
Diane Trout [Thu, 2 Aug 2012 21:52:10 +0000 (14:52 -0700)]
Test more of the sequences class.

(And fix the bugs discovered with better test coverage)

11 years agoFix a few more formatting issues with the geo template
Diane Trout [Thu, 2 Aug 2012 21:51:41 +0000 (14:51 -0700)]
Fix a few more formatting issues with the geo template

11 years agoMerge branch 'master' of mus.cacr.caltech.edu:htsworkflow
Diane Trout [Wed, 1 Aug 2012 23:19:50 +0000 (16:19 -0700)]
Merge branch 'master' of mus.cacr.caltech.edu:htsworkflow

11 years agoGEO wanted both files for a paired lane to be on a single line.
Diane Trout [Wed, 1 Aug 2012 23:18:13 +0000 (16:18 -0700)]
GEO wanted both files for a paired lane to be on a single line.
This required a bit of formatting code to massage the
results of the sparql query into an acceptable form.

11 years agoFix issues introduced when switching to the django templating system for fastq genera...
Diane Trout [Fri, 27 Jul 2012 22:35:00 +0000 (15:35 -0700)]
Fix issues introduced when switching to the django templating system for fastq generation.
It needs to know where django is, and I didn't include a short form for
identifying flowcells.

11 years agoActually we want to capture the unanalyzed lanes if we can...
Diane Trout [Wed, 25 Jul 2012 18:53:11 +0000 (11:53 -0700)]
Actually we want to capture the unanalyzed lanes if we can...

11 years agoGet actual list of sequencers used for a library.
Diane Trout [Tue, 24 Jul 2012 19:39:34 +0000 (12:39 -0700)]
Get actual list of sequencers used for a library.
I do have pipeline version numbers available, but with the current
version of the query it would end up duplicating the sequencer
model number. (there's a lot more version combinations then
there are sequencers)

11 years agoUpdate to our current IPs. I'm tired of syncdb overwriting them.
Diane Trout [Mon, 23 Jul 2012 18:34:36 +0000 (11:34 -0700)]
Update to our current IPs. I'm tired of syncdb overwriting them.

11 years agoAdapt building qseq or srf archives to use SampleKey identified lanes
Diane Trout [Thu, 19 Jul 2012 18:23:40 +0000 (11:23 -0700)]
Adapt building qseq or srf archives to use SampleKey identified lanes

11 years agoUpdate save_raw_data to SampleKey api
Diane Trout [Thu, 19 Jul 2012 18:14:29 +0000 (11:14 -0700)]
Update save_raw_data to SampleKey api

11 years agoAdd in a few more genomes to detection code
Diane Trout [Thu, 19 Jul 2012 00:01:03 +0000 (17:01 -0700)]
Add in a few more genomes to detection code

11 years agoFix library viewing code to read HiSeq run xmls.
Diane Trout [Tue, 17 Jul 2012 22:56:41 +0000 (15:56 -0700)]
Fix library viewing code to read HiSeq run xmls.

Also still need to fix the samples.view code

11 years agoRemove default sequence-format so the new auto-detector code is actually called
Diane Trout [Mon, 16 Jul 2012 22:09:00 +0000 (15:09 -0700)]
Remove default sequence-format so the new auto-detector code is actually called

11 years agoAttempt to guess raw sequence type for a run.
Diane Trout [Sat, 14 Jul 2012 00:31:41 +0000 (17:31 -0700)]
Attempt to guess raw sequence type for a run.
Also will check for the Unaligned directory first
before looking for the BaseCalls directory, as there's still
a BaseCalls dir.
(It does this by checking for the aligned directory first, and
if it exists using the name in the aligned config file to find
the corresponding base call directory)

11 years agoFix a refactoring error
Diane Trout [Fri, 13 Jul 2012 23:31:48 +0000 (16:31 -0700)]
Fix a refactoring error

11 years agoLook for RTAConfig.xml file generated by pipeline 1.12 era.
Diane Trout [Thu, 12 Jul 2012 18:16:17 +0000 (11:16 -0700)]
Look for RTAConfig.xml file generated by pipeline 1.12 era.

11 years agoUse lane 1s matrix when pipeline was set to build matrix from all.
Diane Trout [Thu, 12 Jul 2012 17:59:19 +0000 (10:59 -0700)]
Use lane 1s matrix when pipeline was set to build matrix from all.
(HiSeq's pipeline doesn't make the lane-less matrix file)

11 years agoClean up flowcell ID detection and add support for reading HiSeq IDs
Diane Trout [Thu, 12 Jul 2012 17:58:15 +0000 (10:58 -0700)]
Clean up flowcell ID detection and add support for reading HiSeq IDs
Update the quick and dirty report generator for SampleKey API

11 years agofix a variable name typo
Diane Trout [Thu, 12 Jul 2012 17:54:37 +0000 (10:54 -0700)]
fix a variable name typo

11 years agoRe-enable some suppressed tests and update them for the SampleKey API.
Diane Trout [Thu, 12 Jul 2012 17:52:53 +0000 (10:52 -0700)]
Re-enable some suppressed tests and update them for the SampleKey API.
Also try to test getting the HiSeq flowcell ID out of RunInfo.xml

11 years agowhitespace fix.
Diane Trout [Thu, 12 Jul 2012 17:49:44 +0000 (10:49 -0700)]
whitespace fix.

11 years agoTest reading the xml out of a file too.
Diane Trout [Wed, 11 Jul 2012 00:29:00 +0000 (17:29 -0700)]
Test reading the xml out of a file too.
You write some code and think "oh this is simple,
it should be ok without tests" and then you watch the stack traces.
Now its tested -- at least a bit.

11 years agoFix import statement and typos for genomemap
Diane Trout [Tue, 10 Jul 2012 23:55:19 +0000 (16:55 -0700)]
Fix import statement and typos for genomemap

11 years agoCreate a class to convert contig names into genome/contig names
Diane Trout [Tue, 10 Jul 2012 23:38:20 +0000 (16:38 -0700)]
Create a class to convert contig names into genome/contig names
And now being split out I can reasonably test it.
This is needed so when we're reporting where the genome
locations mapped to we can summarize them as a genome
instead of chr1 chr2 chr3 etc.

11 years agoGuess genome name for building compressed mapping counts from genomesize.xml
Diane Trout [Tue, 10 Jul 2012 21:13:46 +0000 (14:13 -0700)]
Guess genome name for building compressed mapping counts from genomesize.xml
The HiSeq pipeline creates a file that has the sizes of the genomes
that it mapped the particular project to, figuring out the genome
version from that is a lot easier than dealing with all the random
possible file names for the config files.

This patch is lacking in that I just hard coded a few genomes
I probably need some mechanism for pulling it from a database.

11 years agoGive some failover defaults to printer settings.
Diane Trout [Tue, 10 Jul 2012 21:10:45 +0000 (14:10 -0700)]
Give some failover defaults to printer settings.
It's also not certain if we're still using them anywhere.

11 years agoUse sample keys when looking up lane parameters.
Diane Trout [Sat, 7 Jul 2012 00:34:27 +0000 (17:34 -0700)]
Use sample keys when looking up lane parameters.
And as a bonus feature I decided to test for a SampleKey incoming
and if its not convert the older lane number to a sample key internally.

One downside is since I'm storing not fully specified SampleKeys
I have to do a brute force lookup of the key.

11 years agoA few python 2.6 test case incompatibilities.
Diane Trout [Fri, 6 Jul 2012 23:12:25 +0000 (16:12 -0700)]
A few python 2.6 test case incompatibilities.

11 years agoSwitch to regular dictionary instead of ordered dictionary.
Diane Trout [Fri, 6 Jul 2012 23:03:51 +0000 (16:03 -0700)]
Switch to regular dictionary instead of ordered dictionary.
python 2.6 doesn't have ordered dictionary. So I switched to using
a regular dictionary and just sorting the returned keys.

11 years agoMassively rework eland file finding and indexing.
Diane Trout [Thu, 5 Jul 2012 17:33:46 +0000 (10:33 -0700)]
Massively rework eland file finding and indexing.
Instead of looking in known locations for specific file names
in priority order, this now scans the gerald tree looking for
any potential eland files.

The eland container class has been changed to be a MutableMapping and
the found files are added. The class internally tracks the priority of
the different file types and will drop any lower files when a higher
priority file shows up.

In addition the key for finding files is now a "SampleKey" class
which supports tracking sample name, lane and read. The SampleKey
also has a fuzzy matching feature that uses "None" as a wildcard.

So you can search for all the samples for a particular end with
something like "SampleKey(read=1)". Needless to say this change
required updating a lot of code that was assuming the nested
list/dictionary structure from before that was tracking read/lane.

11 years agoRename chromosome in hiseq test set to match our fake human build
Diane Trout [Thu, 5 Jul 2012 17:30:19 +0000 (10:30 -0700)]
Rename chromosome in hiseq test set to match our fake human build

11 years agoDisable debug messages for each scanned eland file pattern
Diane Trout [Sat, 30 Jun 2012 03:23:43 +0000 (20:23 -0700)]
Disable debug messages for each scanned eland file pattern
Given how many places I'm looking for the eland files the debug
messages were getting way too noisy.

11 years agoMerge branch 'master' of mus.cacr.caltech.edu:htsworkflow
Diane Trout [Sat, 30 Jun 2012 03:13:57 +0000 (20:13 -0700)]
Merge branch 'master' of mus.cacr.caltech.edu:htsworkflow

11 years agoSupport multiple having an eland file split into multiple fragments.
Diane Trout [Sat, 30 Jun 2012 03:06:33 +0000 (20:06 -0700)]
Support multiple having an eland file split into multiple fragments.
The interesting thing was changing the match_codes and mapped_reads
dictionaries into their own classes so I can add them together, and
it'll go through and add matching dictionary entries together.

e.g. MatchCodes({'U0':5})+MatchCodes({'U0':5}) == MatchCodes({'U0': 10})

Unfortunately because there are multiple samples per lane on the
hiseq this still isn't enough to support counting aligned results yet.

11 years agoSome minor tweaks to rta 1.12 testing.
Diane Trout [Sat, 30 Jun 2012 03:04:34 +0000 (20:04 -0700)]
Some minor tweaks to rta 1.12 testing.
Turn more testing on (remove some suppressing returns),
use the right gerald class, make a variable for indicating
how many ends there should be.

11 years agoProperly code in tabs for simulated hiseq export files
Diane Trout [Sat, 30 Jun 2012 03:03:18 +0000 (20:03 -0700)]
Properly code in tabs for simulated hiseq export files

11 years agoUse different tags indicating gerald or casava style base alignment
Diane Trout [Sat, 30 Jun 2012 02:43:06 +0000 (19:43 -0700)]
Use different tags indicating gerald or casava style base alignment
I also decided to insert the timestamp from the config file into the
xml for casava runs.

11 years agoAdd script for updating database to data run versions and cluster/sequencer defaults v0.5.5
Diane Trout [Wed, 27 Jun 2012 23:41:03 +0000 (16:41 -0700)]
Add script for updating database to data run versions and cluster/sequencer defaults

11 years agoIf I'm going to make a call to logger, I really should create it first
Diane Trout [Wed, 27 Jun 2012 23:40:22 +0000 (16:40 -0700)]
If I'm going to make a call to logger, I really should create it first

11 years agoOnly allow one default cluster station or sequencer to be set
Diane Trout [Wed, 27 Jun 2012 22:56:37 +0000 (15:56 -0700)]
Only allow one default cluster station or sequencer to be set

Works by watching for the pre_save signal for those classes
and if it sees that the isdefault flag being set to true it
goes through all the other records and sets isdefault to null.

11 years agoOnly allow one default cluster station or sequencer to be set
Diane Trout [Wed, 27 Jun 2012 22:53:19 +0000 (15:53 -0700)]
Only allow one default cluster station or sequencer to be set

Works by watching for the pre_save signal for those classes
and if it sees that the isdefault flag being set to true it
goes through all the other records and sets isdefault to null.

11 years agoAuto-generate a secret key for django and store in the config file.
Diane Trout [Wed, 27 Jun 2012 21:44:18 +0000 (14:44 -0700)]
Auto-generate a secret key for django and store in the config file.
This way I can keep checking settings.py in.

Also since I implemented this be re-writing the settings file
I thought I'd clean out some of how we were storing some of our
previous default values. (Removed the hard coded - [defaults] section)
so they dont end up cluttering the users config file.

Since I removed the default for the linking hard disk tool
I changed the script to complain more helpfully if it can't find
the setting.

11 years agoAdd the ability to specify default cluster station & sequencer in the database
Diane Trout [Tue, 26 Jun 2012 22:45:27 +0000 (15:45 -0700)]
Add the ability to specify default cluster station & sequencer in the database

11 years agoreformatting to be more pep8 like
Diane Trout [Tue, 26 Jun 2012 21:27:17 +0000 (14:27 -0700)]
reformatting to be more pep8 like

11 years agoConvert dictionary comprehension to dict(generator) so it'll work with 2.6
Diane Trout [Mon, 25 Jun 2012 19:33:37 +0000 (12:33 -0700)]
Convert dictionary comprehension to dict(generator) so it'll work with 2.6

11 years agoAdapt select strike-out widget to django 1.2
Diane Trout [Mon, 25 Jun 2012 19:32:23 +0000 (12:32 -0700)]
Adapt select strike-out widget to django 1.2
the function i was overriding was hidden, so I grabbed a bit more of
django 1.3 into my fake widget

11 years agoForgot to add the css file to add 'active' state to sequencers
Diane Trout [Sat, 23 Jun 2012 00:41:30 +0000 (17:41 -0700)]
Forgot to add the css file to add 'active' state to sequencers
commit d188105c2937068e4bbcf6dbf5229a85725a2e7d

11 years agoAdd an 'active' state to the sequencers.
Diane Trout [Sat, 23 Jun 2012 00:38:17 +0000 (17:38 -0700)]
Add an 'active' state to the sequencers.
On the flowcell admin page hack the sequencer select combo box to
strike out the disabled ones.
(Yes it involves hacking an override into the FlowCell admin form by
replacing the default widget it creates. Maybe there's a cleaner way,
but I didn't figure it out)

11 years agoShow software version information on datarun page
Diane Trout [Fri, 22 Jun 2012 22:13:27 +0000 (15:13 -0700)]
Show software version information on datarun page

11 years agoSave pipeline version information from run_xml into DataRun table.
Diane Trout [Fri, 22 Jun 2012 00:24:22 +0000 (17:24 -0700)]
Save pipeline version information from run_xml into DataRun table.
Also show said information on the flowcell page.

11 years agoAdd links to libraries and submission to data loaded from dcc indes files
Diane Trout [Fri, 22 Jun 2012 00:23:19 +0000 (17:23 -0700)]
Add links to libraries and submission to data loaded from dcc indes files

11 years agoMerge ssh://jumpgate.caltech.edu/var/htsworkflow/htsworkflow
Diane Trout [Thu, 21 Jun 2012 18:13:04 +0000 (11:13 -0700)]
Merge ssh://jumpgate.caltech.edu/var/htsworkflow/htsworkflow

11 years agoMove typeof. I think its now getting parsed by rapper correctly now
Diane Trout [Fri, 15 Jun 2012 00:00:30 +0000 (17:00 -0700)]
Move typeof. I think its now getting parsed by rapper correctly now

11 years agoImport information from NCBI SRA into a RDF model.
Diane Trout [Thu, 14 Jun 2012 21:57:18 +0000 (14:57 -0700)]
Import information from NCBI SRA into a RDF model.
(Mostly so I can find our libraries on their site)

11 years agoSlightly different sort order for finding libraries
Diane Trout [Thu, 14 Jun 2012 21:54:50 +0000 (14:54 -0700)]
Slightly different sort order for finding libraries

11 years agoAdd a software (name) property to firecrest, ipar, bustard, gerald
Diane Trout [Tue, 12 Jun 2012 00:03:49 +0000 (17:03 -0700)]
Add a software (name) property to firecrest, ipar, bustard, gerald
This property should return the proper illumina name for any
of their various software components I've been trying to capture.

NOTE: Now requires lxml.

I prefer the xpath searching api, and that's not in element tree.
so this now requires lxml.

11 years agoReturn a gerald version number as a number and not a cvs string.
Diane Trout [Mon, 11 Jun 2012 21:34:04 +0000 (14:34 -0700)]
Return a gerald version number as a number and not a cvs string.
(Also my emacs configuration cleared excess spaces after on several lines)

11 years agoAdd instrument ID and model to sequencer table
Diane Trout [Thu, 7 Jun 2012 00:10:53 +0000 (17:10 -0700)]
Add instrument ID and model to sequencer table
Additionally added displaying that information on the flowcell page.

11 years agoSplit lane parameters into seperate classes for GA & HiSeq config files.
Diane Trout [Sat, 2 Jun 2012 00:20:24 +0000 (17:20 -0700)]
Split lane parameters into seperate classes for GA & HiSeq config files.
Also tweak the tests for a different example flowcell

11 years agoFixup docstring to be clearer to others
Diane Trout [Thu, 31 May 2012 22:37:37 +0000 (15:37 -0700)]
Fixup docstring to be clearer to others

11 years agoMerge ssh://jumpgate.caltech.edu/var/htsworkflow/htsworkflow
Diane Trout [Fri, 25 May 2012 19:18:47 +0000 (12:18 -0700)]
Merge ssh://jumpgate.caltech.edu/var/htsworkflow/htsworkflow

11 years agoreplace tab charcter with spaces in the multiplex index pluralization live
Diane Trout [Fri, 25 May 2012 19:17:59 +0000 (12:17 -0700)]
replace tab charcter with spaces in the multiplex index pluralization

11 years agoMerge ssh://jumpgate.caltech.edu/var/htsworkflow/htsworkflow
Diane Trout [Fri, 25 May 2012 19:17:14 +0000 (12:17 -0700)]
Merge ssh://jumpgate.caltech.edu/var/htsworkflow/htsworkflow

11 years agoFix Multiplex index pluralization
Diane Trout [Fri, 25 May 2012 19:16:33 +0000 (12:16 -0700)]
Fix Multiplex index pluralization

11 years agoGenerate html reports when doing sparql queries with encode_find.
Diane Trout [Wed, 23 May 2012 00:36:07 +0000 (17:36 -0700)]
Generate html reports when doing sparql queries with encode_find.

I have a further improved simplify_uri function which extracts
a meaningful name from some pretty arbitrary rdf nodes.

(This does mean that an earlier attempt which is still in the code
probably can be removed -- that one depended on knowing the namespace)

11 years agoMerge branch 'master' of mus.cacr.caltech.edu:htsworkflow
Diane Trout [Wed, 16 May 2012 16:49:53 +0000 (09:49 -0700)]
Merge branch 'master' of mus.cacr.caltech.edu:htsworkflow

11 years agoTweak some test code to work with librdf 1.0.10
Diane Trout [Wed, 16 May 2012 16:48:29 +0000 (09:48 -0700)]
Tweak some test code to work with librdf 1.0.10

11 years agoupdate summary.py to extract data from HiSeq runs.
Diane Trout [Wed, 16 May 2012 00:36:48 +0000 (17:36 -0700)]
update summary.py to extract data from HiSeq runs.
I also commited a much larger swath of hiseq summary files.

One change is Summary and LaneResultSummary now has subclasses to
handle the differences between the GA series and the HiSeq series
runfolders.

At this point the test code is dying trying to read the LaneParameters
out of the gerald config file.

11 years agoFix bugs introduduced by the improved HiSeq runfolder scanning.
Diane Trout [Tue, 15 May 2012 22:10:08 +0000 (15:10 -0700)]
Fix bugs introduduced by the improved HiSeq runfolder scanning.
This makes more progress toward analyzing a HiSeq runfolder, but
is currently lacking the ability to process the Aligned reads.

This does seem to now process the base call processing information
out of the Unaligned tree of hiseq runs.

Also all the previous runfolder versions tests pass again. (Some of my
intoduced logic was a bit off for guessing what type of runfolder it
is.)

11 years agoMerge branch 'master' of mus.cacr.caltech.edu:htsworkflow
Diane Trout [Sat, 12 May 2012 00:49:59 +0000 (17:49 -0700)]
Merge branch 'master' of mus.cacr.caltech.edu:htsworkflow

11 years agoI found this version more useful for trying to
Diane Trout [Sat, 12 May 2012 00:49:36 +0000 (17:49 -0700)]
I found this version more useful for trying to
answer Flo's questions about what we had submitted.

11 years agoProperly constructing the geo soft file really needed multiple sparql queries.
Diane Trout [Sat, 12 May 2012 00:43:01 +0000 (17:43 -0700)]
Properly constructing the geo soft file really needed multiple sparql queries.
Most of this is glueing the various queries together into the soft file
template.

One significant url change that should make it easier to
write turtle documents describing the library was to end the
submission set URI with # instead of /

this way .../SubmissionLog/SubName# is more clearly the same base as
.../SubmissionLog/SubName#AttributeName.

However this probably will break my older rule files.

And I'm not checking for that... *sigh*

11 years agoPrepare for submitting data to geo by:
Diane Trout [Sat, 12 May 2012 00:38:27 +0000 (17:38 -0700)]
Prepare for submitting data to geo by:
Adding a new namespace for them.
Creating something to simplify namespace qualified names to short normal names.

Also I implemented an option to import other RDF files.
I made a small attempt to at least partially sanitize strings being loaded
to only include "safe" html.

Most of the changes to test_rdfhelp came because I reindented it to 4 spaces.

Though I did add in tests for the new functions.

11 years agoMinimal changes needed to get raw data archived for loxcyc.
Diane Trout [Thu, 10 May 2012 01:45:05 +0000 (18:45 -0700)]
Minimal changes needed to get raw data archived for loxcyc.
Its probably not properly counting how many reads there are.

12 years ago100% test coverage for alphanum.py
Diane Trout [Fri, 4 May 2012 19:05:29 +0000 (12:05 -0700)]
100% test coverage for alphanum.py