From 317f5e8dd22cc9e1e5e05fbcaeb3b9aca7447351 Mon Sep 17 00:00:00 2001 From: Charles Plessy Date: Mon, 16 Nov 2009 14:23:42 +0900 Subject: [PATCH] Imported Upstream version 0.1.7~dfsg --- ChangeLog | 902 +++++++++++++++++++++++++++++++++------------- Makefile | 7 +- Makefile.mingw | 2 +- NEWS | 40 ++ bam.c | 45 +-- bam.h | 17 +- bam_aux.c | 67 ---- bam_import.c | 106 +----- bam_index.c | 24 +- bam_maqcns.c | 276 ++++++++------ bam_maqcns.h | 2 +- bam_md.c | 5 +- bam_plcmd.c | 14 +- bam_rmdup.c | 145 +++++--- bam_rmdupse.c | 281 +++++++-------- bam_sort.c | 103 ++++-- bamtk.c | 8 +- bgzf.c | 2 +- bgzip.c | 4 +- examples/Makefile | 16 +- faidx.c | 104 +++++- faidx.h | 21 ++ kaln.c | 370 +++++++++++++++++++ kaln.h | 55 +++ klist.h | 96 +++++ knetfile.c | 104 +++++- knetfile.h | 7 +- misc/novo2sam.pl | 10 +- misc/sam2vcf.pl | 217 +++++++++++ misc/samtools.pl | 65 +++- razf.c | 167 ++++++++- razf.h | 11 + razip.c | 4 +- sam.c | 4 +- sam_header.c | 701 +++++++++++++++++++++++++++++++++++ sam_header.h | 24 ++ sam_view.c | 49 ++- samtools.1 | 31 +- samtools.txt | 170 ++++----- 39 files changed, 3315 insertions(+), 961 deletions(-) create mode 100644 kaln.c create mode 100644 kaln.h create mode 100644 klist.h create mode 100755 misc/sam2vcf.pl create mode 100644 sam_header.c create mode 100644 sam_header.h diff --git a/ChangeLog b/ChangeLog index c0afc45..6b1a695 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,5 +1,389 @@ ------------------------------------------------------------------------ -r451 | lh3lh3 | 2009-09-02 10:44:48 +0100 (Wed, 02 Sep 2009) | 4 lines +r509 | lh3lh3 | 2009-11-06 09:17:09 -0500 (Fri, 06 Nov 2009) | 3 lines +Changed paths: + M /trunk/samtools/bam_plcmd.c + M /trunk/samtools/bamtk.c + + * samtools-0.1.6-22 (r509) + * forget to fix a similar problem in glfgen + +------------------------------------------------------------------------ +r508 | lh3lh3 | 2009-11-06 09:06:40 -0500 (Fri, 06 Nov 2009) | 3 lines +Changed paths: + M /trunk/samtools/bam_maqcns.c + M /trunk/samtools/bam_plcmd.c + M /trunk/samtools/bamtk.c + M /trunk/samtools/sam_view.c + + * samtools-0.1.6-21 (r508) + * fixed a potential bug in the indel caller towards the end of a chromosome + +------------------------------------------------------------------------ +r494 | lh3lh3 | 2009-10-26 11:38:00 -0400 (Mon, 26 Oct 2009) | 3 lines +Changed paths: + M /trunk/samtools/bamtk.c + M /trunk/samtools/sam_view.c + + * samtools-0.1.6-19 (r494) + * allow to convert Illumina quality (64 based) to the BAM quality + +------------------------------------------------------------------------ +r493 | lh3lh3 | 2009-10-26 10:24:39 -0400 (Mon, 26 Oct 2009) | 4 lines +Changed paths: + M /trunk/samtools/Makefile + M /trunk/samtools/bam.c + M /trunk/samtools/bam_import.c + M /trunk/samtools/bamtk.c + M /trunk/samtools/sam.c + M /trunk/samtools/sam_header.c + + * samtools-0.1.6-18 (r493) + * fixed the bugs due to improperly incorporating Petr's header parser + * a little code clean up in sam_header.c + +------------------------------------------------------------------------ +r492 | petulda | 2009-10-24 09:43:25 -0400 (Sat, 24 Oct 2009) | 1 line +Changed paths: + M /trunk/samtools/sam_header.c + +Added sam_header_line_free call for sam_header_parse2 +------------------------------------------------------------------------ +r491 | lh3lh3 | 2009-10-24 00:50:16 -0400 (Sat, 24 Oct 2009) | 3 lines +Changed paths: + M /trunk/samtools/sam_view.c + + * BUGGY VERSION + * fixed a minor bug + +------------------------------------------------------------------------ +r490 | lh3lh3 | 2009-10-24 00:45:12 -0400 (Sat, 24 Oct 2009) | 4 lines +Changed paths: + M /trunk/samtools/bam.c + M /trunk/samtools/bam.h + M /trunk/samtools/bam_import.c + M /trunk/samtools/sam.c + + * BUGGY VERSION + * improved the interface a bit + * bug unfixed + +------------------------------------------------------------------------ +r489 | lh3lh3 | 2009-10-24 00:41:50 -0400 (Sat, 24 Oct 2009) | 3 lines +Changed paths: + M /trunk/samtools/bam_import.c + M /trunk/samtools/sam_header.c + M /trunk/samtools/sam_header.h + + * BUGGY VERSION. Please NOT use it. + * Fixed a minor bug, but the major bug is still there. + +------------------------------------------------------------------------ +r488 | lh3lh3 | 2009-10-24 00:17:10 -0400 (Sat, 24 Oct 2009) | 3 lines +Changed paths: + M /trunk/samtools/Makefile + M /trunk/samtools/bam.c + M /trunk/samtools/bam.h + M /trunk/samtools/bam_aux.c + M /trunk/samtools/bam_import.c + M /trunk/samtools/bam_rmdup.c + M /trunk/samtools/bam_rmdupse.c + M /trunk/samtools/kaln.c + M /trunk/samtools/sam.c + M /trunk/samtools/sam_header.c + M /trunk/samtools/sam_header.h + M /trunk/samtools/sam_view.c + + * This revision is SERIOUSLY BUGGY. Please NOT use it. + * Start to incorporate header parsing from Petr Danecek + +------------------------------------------------------------------------ +r487 | petulda | 2009-10-23 11:44:32 -0400 (Fri, 23 Oct 2009) | 1 line +Changed paths: + M /trunk/samtools/sam_header.c + M /trunk/samtools/sam_header.h + +Now possible to merge multiple HeaderDict dictionaries +------------------------------------------------------------------------ +r486 | petulda | 2009-10-22 11:46:58 -0400 (Thu, 22 Oct 2009) | 1 line +Changed paths: + M /trunk/samtools/sam_header.c + + +------------------------------------------------------------------------ +r485 | petulda | 2009-10-22 11:41:56 -0400 (Thu, 22 Oct 2009) | 1 line +Changed paths: + A /trunk/samtools/sam_header.c + A /trunk/samtools/sam_header.h + + +------------------------------------------------------------------------ +r484 | lh3lh3 | 2009-10-19 14:31:32 -0400 (Mon, 19 Oct 2009) | 5 lines +Changed paths: + M /trunk/samtools/bam_import.c + M /trunk/samtools/bam_rmdupse.c + M /trunk/samtools/bamtk.c + M /trunk/samtools/examples/Makefile + + * samtools-0.1.6-17 (r484) + * fixed a memory leak in rmdupse + * fixed a bug in parsing @RG header lines + * test rmdup in examples/ + +------------------------------------------------------------------------ +r483 | lh3lh3 | 2009-10-19 13:22:48 -0400 (Mon, 19 Oct 2009) | 4 lines +Changed paths: + M /trunk/samtools/bam_rmdup.c + M /trunk/samtools/bam_rmdupse.c + M /trunk/samtools/bamtk.c + + * samtools-0.1.6-16 (r483) + * unify the interface of rmdup and rmdupse + * a new bug found in rg2lib(). Have not been fixed yet. + +------------------------------------------------------------------------ +r482 | lh3lh3 | 2009-10-19 13:03:34 -0400 (Mon, 19 Oct 2009) | 4 lines +Changed paths: + M /trunk/samtools/bam.h + M /trunk/samtools/bam_rmdup.c + M /trunk/samtools/bam_rmdupse.c + M /trunk/samtools/bamtk.c + A /trunk/samtools/klist.h + + * samtools-0.1.6-15 (r482) + * rewrite rmdupse + * rmdupse is now library aware + +------------------------------------------------------------------------ +r481 | lh3lh3 | 2009-10-18 00:07:21 -0400 (Sun, 18 Oct 2009) | 3 lines +Changed paths: + M /trunk/samtools/bam_rmdup.c + M /trunk/samtools/bamtk.c + + * samtools-0.1.6-14 (r480) + * rmdup is now RG aware + +------------------------------------------------------------------------ +r480 | lh3lh3 | 2009-10-17 22:05:20 -0400 (Sat, 17 Oct 2009) | 2 lines +Changed paths: + M /trunk/samtools/misc/samtools.pl + +added a small unitity to parse SRA XML files + +------------------------------------------------------------------------ +r479 | lh3lh3 | 2009-10-17 20:57:26 -0400 (Sat, 17 Oct 2009) | 3 lines +Changed paths: + M /trunk/samtools/bam_maqcns.c + M /trunk/samtools/bam_maqcns.h + M /trunk/samtools/bam_plcmd.c + M /trunk/samtools/bam_sort.c + M /trunk/samtools/bamtk.c + + * samtools-0.1.6-13 (r479) + * merge: optionally use file names as RG tags + +------------------------------------------------------------------------ +r478 | lh3lh3 | 2009-10-14 14:18:12 -0400 (Wed, 14 Oct 2009) | 3 lines +Changed paths: + M /trunk/samtools/bam.c + M /trunk/samtools/bam.h + M /trunk/samtools/bam_maqcns.c + M /trunk/samtools/bamtk.c + M /trunk/samtools/kaln.c + + * samtools-0.1.6-12 (r478) + * fixed a bug in the indel caller + +------------------------------------------------------------------------ +r477 | lh3lh3 | 2009-10-10 06:12:26 -0400 (Sat, 10 Oct 2009) | 3 lines +Changed paths: + M /trunk/samtools/bam_index.c + M /trunk/samtools/bamtk.c + + * samtools-0.1.6-11 (r477) + * fixed a bug due to recent change in bam_index.c (thank Nicole Washington for the patch) + +------------------------------------------------------------------------ +r476 | petulda | 2009-10-09 11:45:36 -0400 (Fri, 09 Oct 2009) | 1 line +Changed paths: + A /trunk/samtools/misc/sam2vcf.pl + +Added the sam2vcf.pl script. +------------------------------------------------------------------------ +r475 | lh3lh3 | 2009-10-08 10:19:16 -0400 (Thu, 08 Oct 2009) | 2 lines +Changed paths: + M /trunk/samtools/Makefile + M /trunk/samtools/bam.c + M /trunk/samtools/bam.h + M /trunk/samtools/bam_maqcns.c + M /trunk/samtools/bamtk.c + A /trunk/samtools/kaln.c + A /trunk/samtools/kaln.h + +Unfinished modification. Please do not use this revision... + +------------------------------------------------------------------------ +r474 | petulda | 2009-10-08 06:39:54 -0400 (Thu, 08 Oct 2009) | 1 line +Changed paths: + M /trunk/samtools/knetfile.c + +Removed the offending knet_seek message. +------------------------------------------------------------------------ +r473 | petulda | 2009-10-06 09:26:35 -0400 (Tue, 06 Oct 2009) | 1 line +Changed paths: + M /trunk/samtools/knetfile.c + M /trunk/samtools/razf.c + +Bug fix - faidx on RAZF compressed files now working. +------------------------------------------------------------------------ +r472 | lh3lh3 | 2009-10-02 08:42:57 -0400 (Fri, 02 Oct 2009) | 2 lines +Changed paths: + M /trunk/samtools/samtools.1 + +Clarify the meaning of a region like "chr2:1,000,000". + +------------------------------------------------------------------------ +r471 | lh3lh3 | 2009-10-02 05:42:19 -0400 (Fri, 02 Oct 2009) | 2 lines +Changed paths: + M /trunk/samtools/misc/novo2sam.pl + +Fixed minor bugs in novo2sam.pl (on behalf of Ken Chen and Colin Hercus) + +------------------------------------------------------------------------ +r470 | lh3lh3 | 2009-09-29 15:01:27 -0400 (Tue, 29 Sep 2009) | 3 lines +Changed paths: + M /trunk/samtools/Makefile.mingw + M /trunk/samtools/bamtk.c + M /trunk/samtools/knetfile.c + M /trunk/samtools/knetfile.h + + * samtools-0.1.6-9 (r470) + * make knetfile.c compatible with MinGW (thank Martin Morgan for the patch) + +------------------------------------------------------------------------ +r469 | lh3lh3 | 2009-09-29 08:07:44 -0400 (Tue, 29 Sep 2009) | 3 lines +Changed paths: + M /trunk/samtools/bam_index.c + M /trunk/samtools/bamtk.c + + * samtools-0.1.6-9 (r469) + * refactor bam_fetch() for Python binding. On behalf of Leo Goodstadt. + +------------------------------------------------------------------------ +r468 | lh3lh3 | 2009-09-28 05:18:29 -0400 (Mon, 28 Sep 2009) | 3 lines +Changed paths: + M /trunk/samtools/bam_sort.c + M /trunk/samtools/bamtk.c + M /trunk/samtools/misc/samtools.pl + + * samtools-0.1.6-7 (r468) + * make merge stable + +------------------------------------------------------------------------ +r467 | petulda | 2009-09-28 04:51:29 -0400 (Mon, 28 Sep 2009) | 1 line +Changed paths: + M /trunk/samtools/bgzf.c + M /trunk/samtools/bgzip.c + M /trunk/samtools/razf.c + M /trunk/samtools/razip.c + +Changed the mode for newly created files to 0666. This allows less strict permissions with umask properly set (e.g. 0002 vs. 0022). +------------------------------------------------------------------------ +r466 | lh3lh3 | 2009-09-24 06:29:19 -0400 (Thu, 24 Sep 2009) | 3 lines +Changed paths: + M /trunk/samtools/ChangeLog + M /trunk/samtools/bam_md.c + M /trunk/samtools/bamtk.c + + * samtools-0.1.6-6 (r466) + * do not crash calmd when some sequences are absent from the reference. + +------------------------------------------------------------------------ +r464 | jmarshall | 2009-09-23 06:14:32 -0400 (Wed, 23 Sep 2009) | 3 lines +Changed paths: + M /trunk/samtools/bam.c + M /trunk/samtools/knetfile.c + +Suppress bgzf_check_EOF() messages when reading from a pipe, as there is +no way to seek on a pipe and the messages always appear. + +------------------------------------------------------------------------ +r463 | petulda | 2009-09-16 07:05:41 -0400 (Wed, 16 Sep 2009) | 1 line +Changed paths: + M /trunk/samtools/knetfile.c + M /trunk/samtools/razf.c + +A bug fix, "samtools view" is now working again. +------------------------------------------------------------------------ +r462 | lh3lh3 | 2009-09-16 04:51:07 -0400 (Wed, 16 Sep 2009) | 3 lines +Changed paths: + M /trunk/samtools/bamtk.c + M /trunk/samtools/faidx.c + M /trunk/samtools/knetfile.c + M /trunk/samtools/knetfile.h + M /trunk/samtools/razf.c + M /trunk/samtools/razf.h + + * samtools-0.1.6-5 (r462) + * Added knetfile support in razf and faidx (on behalf of Petr Danecek) + +------------------------------------------------------------------------ +r460 | lh3lh3 | 2009-09-09 07:06:22 -0400 (Wed, 09 Sep 2009) | 2 lines +Changed paths: + M /trunk/samtools/samtools.1 + +fixed a formatting issue + +------------------------------------------------------------------------ +r459 | lh3lh3 | 2009-09-08 18:14:08 -0400 (Tue, 08 Sep 2009) | 3 lines +Changed paths: + M /trunk/samtools/bam_sort.c + M /trunk/samtools/bamtk.c + + * samtools-0.1.6-4 (r459) + * make sort output the result to stdout when -o is in use + +------------------------------------------------------------------------ +r458 | lh3lh3 | 2009-09-07 05:10:28 -0400 (Mon, 07 Sep 2009) | 4 lines +Changed paths: + M /trunk/samtools/bamtk.c + M /trunk/samtools/faidx.c + M /trunk/samtools/faidx.h + M /trunk/samtools/samtools.1 + + * samtools-0.1.6-2 (r458) + * added more interface to faidx (by Nils) + * updated documentation + +------------------------------------------------------------------------ +r457 | lh3lh3 | 2009-09-05 16:12:04 -0400 (Sat, 05 Sep 2009) | 3 lines +Changed paths: + M /trunk/samtools/bam_sort.c + M /trunk/samtools/bamtk.c + + * samtools-0.1.6-2 (r457) + * get rid of three assert() in bam_sort.c + +------------------------------------------------------------------------ +r456 | jmarshall | 2009-09-04 12:46:25 -0400 (Fri, 04 Sep 2009) | 3 lines +Changed paths: + M /trunk/samtools/razf.c + +Return NULL from _razf_open() (and hence razf_open()/razf_open2()) +when opening the file fails. + +------------------------------------------------------------------------ +r453 | lh3lh3 | 2009-09-02 08:56:33 -0400 (Wed, 02 Sep 2009) | 2 lines +Changed paths: + M /trunk/samtools/ChangeLog + M /trunk/samtools/NEWS + M /trunk/samtools/bamtk.c + M /trunk/samtools/samtools.1 + D /trunk/samtools/source.dot + +Release samtools-0.1.6 + +------------------------------------------------------------------------ +r451 | lh3lh3 | 2009-09-02 05:44:48 -0400 (Wed, 02 Sep 2009) | 4 lines Changed paths: M /trunk/samtools/bam_md.c M /trunk/samtools/bam_rmdup.c @@ -13,7 +397,7 @@ Changed paths: * improved the help message a little bit ------------------------------------------------------------------------ -r450 | lh3lh3 | 2009-09-02 09:55:55 +0100 (Wed, 02 Sep 2009) | 3 lines +r450 | lh3lh3 | 2009-09-02 04:55:55 -0400 (Wed, 02 Sep 2009) | 3 lines Changed paths: M /trunk/samtools/bam_color.c M /trunk/samtools/bamtk.c @@ -22,7 +406,7 @@ Changed paths: * fixed a bug in bam_color.c (on behalf of Nils Homer) ------------------------------------------------------------------------ -r449 | lh3lh3 | 2009-08-29 20:36:41 +0100 (Sat, 29 Aug 2009) | 4 lines +r449 | lh3lh3 | 2009-08-29 15:36:41 -0400 (Sat, 29 Aug 2009) | 4 lines Changed paths: M /trunk/samtools/bam_import.c M /trunk/samtools/bam_md.c @@ -34,7 +418,7 @@ Changed paths: * in import, give a warning if the read is aligned but there is no CIGAR. ------------------------------------------------------------------------ -r448 | lh3lh3 | 2009-08-19 09:44:28 +0100 (Wed, 19 Aug 2009) | 3 lines +r448 | lh3lh3 | 2009-08-19 04:44:28 -0400 (Wed, 19 Aug 2009) | 3 lines Changed paths: M /trunk/samtools/ChangeLog M /trunk/samtools/bam_pileup.c @@ -45,7 +429,7 @@ Changed paths: * fixed an issue when the last CIGAR is I or D ------------------------------------------------------------------------ -r447 | lh3lh3 | 2009-08-17 09:34:57 +0100 (Mon, 17 Aug 2009) | 3 lines +r447 | lh3lh3 | 2009-08-17 04:34:57 -0400 (Mon, 17 Aug 2009) | 3 lines Changed paths: M /trunk/samtools/bam_aux.c M /trunk/samtools/bamtk.c @@ -54,7 +438,7 @@ Changed paths: * fixed a bug in bam_aux_get(): 'A' is not checked ------------------------------------------------------------------------ -r446 | lh3lh3 | 2009-08-17 09:33:17 +0100 (Mon, 17 Aug 2009) | 2 lines +r446 | lh3lh3 | 2009-08-17 04:33:17 -0400 (Mon, 17 Aug 2009) | 2 lines Changed paths: M /trunk/samtools/bam_aux.c M /trunk/samtools/bamtk.c @@ -62,7 +446,7 @@ Changed paths: * ------------------------------------------------------------------------ -r444 | lh3lh3 | 2009-08-11 10:02:36 +0100 (Tue, 11 Aug 2009) | 3 lines +r444 | lh3lh3 | 2009-08-11 05:02:36 -0400 (Tue, 11 Aug 2009) | 3 lines Changed paths: M /trunk/samtools/bam_sort.c M /trunk/samtools/bamtk.c @@ -71,7 +455,7 @@ Changed paths: * bug in "merge -n" ------------------------------------------------------------------------ -r443 | lh3lh3 | 2009-08-11 09:29:11 +0100 (Tue, 11 Aug 2009) | 4 lines +r443 | lh3lh3 | 2009-08-11 04:29:11 -0400 (Tue, 11 Aug 2009) | 4 lines Changed paths: M /trunk/samtools/bam.c M /trunk/samtools/bam_import.c @@ -82,7 +466,7 @@ Changed paths: * parse CIGAR "=" and "X" as "M" ------------------------------------------------------------------------ -r442 | lh3lh3 | 2009-08-07 21:56:38 +0100 (Fri, 07 Aug 2009) | 4 lines +r442 | lh3lh3 | 2009-08-07 16:56:38 -0400 (Fri, 07 Aug 2009) | 4 lines Changed paths: M /trunk/samtools/bam_pileup.c M /trunk/samtools/bamtk.c @@ -95,7 +479,7 @@ Changed paths: * fixed a bug in detecting unsorted bam in pileup ------------------------------------------------------------------------ -r441 | bhandsaker | 2009-08-05 14:41:28 +0100 (Wed, 05 Aug 2009) | 2 lines +r441 | bhandsaker | 2009-08-05 09:41:28 -0400 (Wed, 05 Aug 2009) | 2 lines Changed paths: M /trunk/samtools/bgzf.c M /trunk/samtools/bgzf.h @@ -104,7 +488,7 @@ Changed paths: Change copyright notices now that MIT has approved open source distribution. ------------------------------------------------------------------------ -r440 | lh3lh3 | 2009-08-05 10:44:24 +0100 (Wed, 05 Aug 2009) | 3 lines +r440 | lh3lh3 | 2009-08-05 05:44:24 -0400 (Wed, 05 Aug 2009) | 3 lines Changed paths: M /trunk/samtools/bam_stat.c M /trunk/samtools/bamtk.c @@ -113,21 +497,21 @@ Changed paths: * in flagstats, do not report singletons if both ends are unmapped ------------------------------------------------------------------------ -r439 | lh3lh3 | 2009-08-04 22:16:51 +0100 (Tue, 04 Aug 2009) | 2 lines +r439 | lh3lh3 | 2009-08-04 17:16:51 -0400 (Tue, 04 Aug 2009) | 2 lines Changed paths: M /trunk/samtools/misc/maq2sam.c fixed a SERIOUS bug in setting 0x20 flag ------------------------------------------------------------------------ -r438 | lh3lh3 | 2009-08-04 21:50:43 +0100 (Tue, 04 Aug 2009) | 2 lines +r438 | lh3lh3 | 2009-08-04 16:50:43 -0400 (Tue, 04 Aug 2009) | 2 lines Changed paths: M /trunk/samtools/misc/samtools.pl fixed two minor bugs (suggested by Tim M Storm) ------------------------------------------------------------------------ -r437 | lh3lh3 | 2009-08-04 09:13:24 +0100 (Tue, 04 Aug 2009) | 3 lines +r437 | lh3lh3 | 2009-08-04 04:13:24 -0400 (Tue, 04 Aug 2009) | 3 lines Changed paths: M /trunk/samtools/bamtk.c M /trunk/samtools/misc/samtools.pl @@ -137,7 +521,7 @@ Changed paths: * fixed a typo ------------------------------------------------------------------------ -r434 | lh3lh3 | 2009-08-03 10:40:42 +0100 (Mon, 03 Aug 2009) | 3 lines +r434 | lh3lh3 | 2009-08-03 05:40:42 -0400 (Mon, 03 Aug 2009) | 3 lines Changed paths: M /trunk/samtools/bam_tview.c M /trunk/samtools/bamtk.c @@ -146,7 +530,7 @@ Changed paths: * in tview, press 'r' to show read names rather than sequences ------------------------------------------------------------------------ -r433 | lh3lh3 | 2009-08-02 19:13:35 +0100 (Sun, 02 Aug 2009) | 3 lines +r433 | lh3lh3 | 2009-08-02 14:13:35 -0400 (Sun, 02 Aug 2009) | 3 lines Changed paths: M /trunk/samtools/knetfile.c @@ -154,7 +538,7 @@ Changed paths: * anyway, MinGW seems to have problem with "%lld". ------------------------------------------------------------------------ -r432 | lh3lh3 | 2009-08-02 00:32:07 +0100 (Sun, 02 Aug 2009) | 5 lines +r432 | lh3lh3 | 2009-08-01 19:32:07 -0400 (Sat, 01 Aug 2009) | 5 lines Changed paths: M /trunk/samtools/Makefile.mingw M /trunk/samtools/bamtk.c @@ -169,21 +553,21 @@ Changed paths: * PDCurses support in Windows ------------------------------------------------------------------------ -r431 | lh3lh3 | 2009-08-01 23:34:54 +0100 (Sat, 01 Aug 2009) | 2 lines +r431 | lh3lh3 | 2009-08-01 18:34:54 -0400 (Sat, 01 Aug 2009) | 2 lines Changed paths: M /trunk/samtools/win32/libz.a replace the GnuWin32 version of libz.a with my own build with MinGW. ------------------------------------------------------------------------ -r430 | lh3lh3 | 2009-08-01 23:21:07 +0100 (Sat, 01 Aug 2009) | 2 lines +r430 | lh3lh3 | 2009-08-01 18:21:07 -0400 (Sat, 01 Aug 2009) | 2 lines Changed paths: M /trunk/samtools/knetfile.c add comments ------------------------------------------------------------------------ -r429 | lh3lh3 | 2009-08-01 22:41:19 +0100 (Sat, 01 Aug 2009) | 3 lines +r429 | lh3lh3 | 2009-08-01 17:41:19 -0400 (Sat, 01 Aug 2009) | 3 lines Changed paths: M /trunk/samtools/Makefile.mingw M /trunk/samtools/bamtk.c @@ -194,14 +578,14 @@ Changed paths: * knetfile.c is now compatible with mingw-winsock ------------------------------------------------------------------------ -r428 | lh3lh3 | 2009-08-01 00:39:07 +0100 (Sat, 01 Aug 2009) | 2 lines +r428 | lh3lh3 | 2009-07-31 19:39:07 -0400 (Fri, 31 Jul 2009) | 2 lines Changed paths: M /trunk/samtools/Makefile.mingw simplify MinGW Makefile ------------------------------------------------------------------------ -r427 | lh3lh3 | 2009-08-01 00:30:54 +0100 (Sat, 01 Aug 2009) | 5 lines +r427 | lh3lh3 | 2009-07-31 19:30:54 -0400 (Fri, 31 Jul 2009) | 5 lines Changed paths: A /trunk/samtools/Makefile.mingw M /trunk/samtools/bam_import.c @@ -217,7 +601,7 @@ Changed paths: * zlib headers and Windows version of libz.a are included in win32/ ------------------------------------------------------------------------ -r426 | lh3lh3 | 2009-07-31 23:32:09 +0100 (Fri, 31 Jul 2009) | 3 lines +r426 | lh3lh3 | 2009-07-31 18:32:09 -0400 (Fri, 31 Jul 2009) | 3 lines Changed paths: M /trunk/samtools/bamtk.c M /trunk/samtools/sam_view.c @@ -226,14 +610,14 @@ Changed paths: * fixed a bug caused by recent modifications. Sorry. ------------------------------------------------------------------------ -r425 | lh3lh3 | 2009-07-31 23:23:51 +0100 (Fri, 31 Jul 2009) | 2 lines +r425 | lh3lh3 | 2009-07-31 18:23:51 -0400 (Fri, 31 Jul 2009) | 2 lines Changed paths: M /trunk/samtools/bgzf.c compatible with Windows binary files ------------------------------------------------------------------------ -r424 | lh3lh3 | 2009-07-31 10:19:59 +0100 (Fri, 31 Jul 2009) | 5 lines +r424 | lh3lh3 | 2009-07-31 05:19:59 -0400 (Fri, 31 Jul 2009) | 5 lines Changed paths: M /trunk/samtools/bam_maqcns.c M /trunk/samtools/bam_maqcns.h @@ -248,14 +632,14 @@ Changed paths: * in tview, optionally allow to treat reference skip as deletion ------------------------------------------------------------------------ -r423 | lh3lh3 | 2009-07-30 22:00:36 +0100 (Thu, 30 Jul 2009) | 2 lines +r423 | lh3lh3 | 2009-07-30 17:00:36 -0400 (Thu, 30 Jul 2009) | 2 lines Changed paths: A /trunk/samtools/misc/psl2sam.pl convert BLAT psl to SAM. ------------------------------------------------------------------------ -r422 | lh3lh3 | 2009-07-30 11:24:39 +0100 (Thu, 30 Jul 2009) | 6 lines +r422 | lh3lh3 | 2009-07-30 06:24:39 -0400 (Thu, 30 Jul 2009) | 6 lines Changed paths: M /trunk/samtools/ChangeLog M /trunk/samtools/bam.c @@ -273,7 +657,7 @@ Changed paths: * update ChangeLog ------------------------------------------------------------------------ -r421 | lh3lh3 | 2009-07-30 10:03:39 +0100 (Thu, 30 Jul 2009) | 3 lines +r421 | lh3lh3 | 2009-07-30 05:03:39 -0400 (Thu, 30 Jul 2009) | 3 lines Changed paths: M /trunk/samtools/bam_import.c M /trunk/samtools/bam_plcmd.c @@ -287,14 +671,14 @@ Changed paths: * in view and pileup, load header from FASTA index if the input is SAM. ------------------------------------------------------------------------ -r420 | lh3lh3 | 2009-07-29 09:18:55 +0100 (Wed, 29 Jul 2009) | 2 lines +r420 | lh3lh3 | 2009-07-29 04:18:55 -0400 (Wed, 29 Jul 2009) | 2 lines Changed paths: M /trunk/samtools/misc/maq2sam.c do not set "read 1" if reads are not mapped in the PE mode of maq ------------------------------------------------------------------------ -r419 | lh3lh3 | 2009-07-28 09:52:33 +0100 (Tue, 28 Jul 2009) | 5 lines +r419 | lh3lh3 | 2009-07-28 04:52:33 -0400 (Tue, 28 Jul 2009) | 5 lines Changed paths: M /trunk/samtools/bam_import.c M /trunk/samtools/bamtk.c @@ -307,14 +691,14 @@ Changed paths: * add "unique" command to samtools.pl ------------------------------------------------------------------------ -r418 | lh3lh3 | 2009-07-24 14:04:19 +0100 (Fri, 24 Jul 2009) | 2 lines +r418 | lh3lh3 | 2009-07-24 09:04:19 -0400 (Fri, 24 Jul 2009) | 2 lines Changed paths: M /trunk/samtools/misc/wgsim_eval.pl skip @header lines in SAM ------------------------------------------------------------------------ -r417 | lh3lh3 | 2009-07-24 12:42:38 +0100 (Fri, 24 Jul 2009) | 3 lines +r417 | lh3lh3 | 2009-07-24 07:42:38 -0400 (Fri, 24 Jul 2009) | 3 lines Changed paths: M /trunk/samtools/bamtk.c M /trunk/samtools/sam_view.c @@ -323,7 +707,7 @@ Changed paths: * more help in "samtools view" due to the recent changes. ------------------------------------------------------------------------ -r416 | lh3lh3 | 2009-07-24 12:34:30 +0100 (Fri, 24 Jul 2009) | 3 lines +r416 | lh3lh3 | 2009-07-24 07:34:30 -0400 (Fri, 24 Jul 2009) | 3 lines Changed paths: M /trunk/samtools/bam.c M /trunk/samtools/bam.h @@ -337,7 +721,7 @@ Changed paths: * support import/export SAM with string tags ------------------------------------------------------------------------ -r415 | lh3lh3 | 2009-07-24 11:39:26 +0100 (Fri, 24 Jul 2009) | 3 lines +r415 | lh3lh3 | 2009-07-24 06:39:26 -0400 (Fri, 24 Jul 2009) | 3 lines Changed paths: M /trunk/samtools/bam.c M /trunk/samtools/bam.h @@ -351,14 +735,14 @@ Changed paths: * FLAG now can be in HEX ------------------------------------------------------------------------ -r414 | lh3lh3 | 2009-07-22 22:03:49 +0100 (Wed, 22 Jul 2009) | 2 lines +r414 | lh3lh3 | 2009-07-22 17:03:49 -0400 (Wed, 22 Jul 2009) | 2 lines Changed paths: M /trunk/samtools/kstring.h fixed a compiling error (thank Ken for fixing it) ------------------------------------------------------------------------ -r412 | lh3lh3 | 2009-07-21 22:19:40 +0100 (Tue, 21 Jul 2009) | 2 lines +r412 | lh3lh3 | 2009-07-21 17:19:40 -0400 (Tue, 21 Jul 2009) | 2 lines Changed paths: M /trunk/samtools/kstring.c M /trunk/samtools/kstring.h @@ -366,14 +750,14 @@ Changed paths: Implemented Boyer-Moore search in the kstring library. ------------------------------------------------------------------------ -r409 | lh3lh3 | 2009-07-17 17:10:20 +0100 (Fri, 17 Jul 2009) | 2 lines +r409 | lh3lh3 | 2009-07-17 12:10:20 -0400 (Fri, 17 Jul 2009) | 2 lines Changed paths: M /trunk/samtools/bam_index.c do not include knetfile.h when _USE_KNETFILE is not defined ------------------------------------------------------------------------ -r408 | lh3lh3 | 2009-07-17 15:29:21 +0100 (Fri, 17 Jul 2009) | 5 lines +r408 | lh3lh3 | 2009-07-17 10:29:21 -0400 (Fri, 17 Jul 2009) | 5 lines Changed paths: M /trunk/samtools/bam_md.c M /trunk/samtools/bam_tview.c @@ -386,7 +770,7 @@ Changed paths: * bgzf.c: improved the compatibility with Windows headers ------------------------------------------------------------------------ -r407 | lh3lh3 | 2009-07-17 14:46:56 +0100 (Fri, 17 Jul 2009) | 5 lines +r407 | lh3lh3 | 2009-07-17 09:46:56 -0400 (Fri, 17 Jul 2009) | 5 lines Changed paths: M /trunk/samtools/bam.h M /trunk/samtools/bam_aux.c @@ -400,14 +784,14 @@ Changed paths: * fillmd: cmd interface improvement ------------------------------------------------------------------------ -r406 | lh3lh3 | 2009-07-16 23:30:40 +0100 (Thu, 16 Jul 2009) | 2 lines +r406 | lh3lh3 | 2009-07-16 18:30:40 -0400 (Thu, 16 Jul 2009) | 2 lines Changed paths: M /trunk/samtools/Makefile Sorry. The old Makefile is for PDCurses... ------------------------------------------------------------------------ -r405 | lh3lh3 | 2009-07-16 23:30:11 +0100 (Thu, 16 Jul 2009) | 3 lines +r405 | lh3lh3 | 2009-07-16 18:30:11 -0400 (Thu, 16 Jul 2009) | 3 lines Changed paths: M /trunk/samtools/Makefile M /trunk/samtools/bam_tview.c @@ -417,7 +801,7 @@ Changed paths: * improved the compatibility with PDCurses a little bit ------------------------------------------------------------------------ -r404 | lh3lh3 | 2009-07-16 23:23:52 +0100 (Thu, 16 Jul 2009) | 3 lines +r404 | lh3lh3 | 2009-07-16 18:23:52 -0400 (Thu, 16 Jul 2009) | 3 lines Changed paths: M /trunk/samtools/Makefile M /trunk/samtools/bam_tview.c @@ -427,7 +811,7 @@ Changed paths: * compatible with PDCurses ------------------------------------------------------------------------ -r403 | lh3lh3 | 2009-07-16 22:39:39 +0100 (Thu, 16 Jul 2009) | 3 lines +r403 | lh3lh3 | 2009-07-16 17:39:39 -0400 (Thu, 16 Jul 2009) | 3 lines Changed paths: M /trunk/samtools/bamtk.c M /trunk/samtools/kseq.h @@ -436,7 +820,7 @@ Changed paths: * fixed a bug in kseq.h for binary files (text files are fine) ------------------------------------------------------------------------ -r402 | lh3lh3 | 2009-07-16 11:49:53 +0100 (Thu, 16 Jul 2009) | 4 lines +r402 | lh3lh3 | 2009-07-16 06:49:53 -0400 (Thu, 16 Jul 2009) | 4 lines Changed paths: M /trunk/samtools/bam_import.c M /trunk/samtools/bam_index.c @@ -448,7 +832,7 @@ Changed paths: * improve portability to MinGW ------------------------------------------------------------------------ -r398 | lh3lh3 | 2009-07-13 10:21:36 +0100 (Mon, 13 Jul 2009) | 3 lines +r398 | lh3lh3 | 2009-07-13 05:21:36 -0400 (Mon, 13 Jul 2009) | 3 lines Changed paths: A /trunk/bam-lite/bam.h (from /trunk/samtools/bam.h:395) A /trunk/bam-lite/bam_lite.c (from /trunk/samtools/bam_lite.c:395) @@ -458,7 +842,7 @@ Changed paths: * copy bam.h to bam-lite ------------------------------------------------------------------------ -r395 | lh3lh3 | 2009-07-13 10:12:57 +0100 (Mon, 13 Jul 2009) | 3 lines +r395 | lh3lh3 | 2009-07-13 05:12:57 -0400 (Mon, 13 Jul 2009) | 3 lines Changed paths: M /trunk/samtools/bam.h M /trunk/samtools/bam_lite.c @@ -470,7 +854,7 @@ Changed paths: * added bam_pileup_file() and removed bam_lpileup_file() ------------------------------------------------------------------------ -r394 | lh3lh3 | 2009-07-13 00:35:10 +0100 (Mon, 13 Jul 2009) | 3 lines +r394 | lh3lh3 | 2009-07-12 19:35:10 -0400 (Sun, 12 Jul 2009) | 3 lines Changed paths: M /trunk/samtools/bamtk.c M /trunk/samtools/knetfile.c @@ -480,7 +864,7 @@ Changed paths: * http_proxy support in knetfile library (check http_proxy ENV) ------------------------------------------------------------------------ -r393 | lh3lh3 | 2009-07-12 23:57:07 +0100 (Sun, 12 Jul 2009) | 5 lines +r393 | lh3lh3 | 2009-07-12 18:57:07 -0400 (Sun, 12 Jul 2009) | 5 lines Changed paths: M /trunk/samtools/bam_index.c M /trunk/samtools/bam_tview.c @@ -494,14 +878,14 @@ Changed paths: not seen the sideeffect so far. ------------------------------------------------------------------------ -r392 | lh3lh3 | 2009-07-12 18:50:55 +0100 (Sun, 12 Jul 2009) | 2 lines +r392 | lh3lh3 | 2009-07-12 13:50:55 -0400 (Sun, 12 Jul 2009) | 2 lines Changed paths: M /trunk/samtools/samtools.1 Remove the warning in tview ------------------------------------------------------------------------ -r391 | lh3lh3 | 2009-07-12 18:42:43 +0100 (Sun, 12 Jul 2009) | 3 lines +r391 | lh3lh3 | 2009-07-12 13:42:43 -0400 (Sun, 12 Jul 2009) | 3 lines Changed paths: M /trunk/samtools/bam_tview.c M /trunk/samtools/bamtk.c @@ -510,7 +894,7 @@ Changed paths: * do not show a blank screen when no reads mapped ------------------------------------------------------------------------ -r390 | lh3lh3 | 2009-07-09 14:01:42 +0100 (Thu, 09 Jul 2009) | 4 lines +r390 | lh3lh3 | 2009-07-09 09:01:42 -0400 (Thu, 09 Jul 2009) | 4 lines Changed paths: M /trunk/samtools/bam.h A /trunk/samtools/bam_lite.c @@ -521,7 +905,7 @@ Changed paths: * added bam_lite.c for light-weight BAM reading ------------------------------------------------------------------------ -r385 | lh3lh3 | 2009-07-07 16:53:29 +0100 (Tue, 07 Jul 2009) | 2 lines +r385 | lh3lh3 | 2009-07-07 11:53:29 -0400 (Tue, 07 Jul 2009) | 2 lines Changed paths: M /trunk/samtools/bamtk.c M /trunk/samtools/knetfile.c @@ -529,7 +913,7 @@ Changed paths: Release samtools-0.1.5c (fixed a bug in piping) ------------------------------------------------------------------------ -r383 | lh3lh3 | 2009-07-07 11:39:55 +0100 (Tue, 07 Jul 2009) | 2 lines +r383 | lh3lh3 | 2009-07-07 06:39:55 -0400 (Tue, 07 Jul 2009) | 2 lines Changed paths: M /trunk/samtools/bamtk.c M /trunk/samtools/sam.c @@ -537,7 +921,7 @@ Changed paths: Release samtools-0.1.5b (BUG! so embarrassing!) ------------------------------------------------------------------------ -r381 | lh3lh3 | 2009-07-07 11:20:06 +0100 (Tue, 07 Jul 2009) | 2 lines +r381 | lh3lh3 | 2009-07-07 06:20:06 -0400 (Tue, 07 Jul 2009) | 2 lines Changed paths: M /trunk/samtools/Makefile M /trunk/samtools/bam.h @@ -547,7 +931,7 @@ Changed paths: Release samtools-0.1.5a (for compatibility with Bio::DB::Sam) ------------------------------------------------------------------------ -r373 | lh3lh3 | 2009-07-07 10:26:57 +0100 (Tue, 07 Jul 2009) | 2 lines +r373 | lh3lh3 | 2009-07-07 05:26:57 -0400 (Tue, 07 Jul 2009) | 2 lines Changed paths: M /trunk/samtools/ChangeLog M /trunk/samtools/NEWS @@ -557,7 +941,7 @@ Changed paths: Release samtools-0.1.5 ------------------------------------------------------------------------ -r372 | lh3lh3 | 2009-07-07 09:49:27 +0100 (Tue, 07 Jul 2009) | 3 lines +r372 | lh3lh3 | 2009-07-07 04:49:27 -0400 (Tue, 07 Jul 2009) | 3 lines Changed paths: M /trunk/samtools/bamtk.c M /trunk/samtools/sam.c @@ -566,21 +950,21 @@ Changed paths: * keep header text if "view -t" is used (by Gerton) ------------------------------------------------------------------------ -r371 | lh3lh3 | 2009-07-07 01:13:32 +0100 (Tue, 07 Jul 2009) | 2 lines +r371 | lh3lh3 | 2009-07-06 20:13:32 -0400 (Mon, 06 Jul 2009) | 2 lines Changed paths: M /trunk/samtools/samtools.1 update documentation ------------------------------------------------------------------------ -r370 | bhandsaker | 2009-07-02 22:24:34 +0100 (Thu, 02 Jul 2009) | 2 lines +r370 | bhandsaker | 2009-07-02 17:24:34 -0400 (Thu, 02 Jul 2009) | 2 lines Changed paths: M /trunk/samtools/Makefile Introduced LIBPATH variable so this could be overridden to allow samtools to build correct at the Broad. ------------------------------------------------------------------------ -r369 | lh3lh3 | 2009-07-02 13:36:53 +0100 (Thu, 02 Jul 2009) | 4 lines +r369 | lh3lh3 | 2009-07-02 08:36:53 -0400 (Thu, 02 Jul 2009) | 4 lines Changed paths: M /trunk/samtools/ChangeLog M /trunk/samtools/bam_aux.c @@ -592,7 +976,7 @@ Changed paths: * remove the debugging code in bam_aux_get() (Drat!) ------------------------------------------------------------------------ -r368 | lh3lh3 | 2009-07-02 11:32:26 +0100 (Thu, 02 Jul 2009) | 6 lines +r368 | lh3lh3 | 2009-07-02 06:32:26 -0400 (Thu, 02 Jul 2009) | 6 lines Changed paths: M /trunk/samtools/bam.c M /trunk/samtools/bam.h @@ -616,21 +1000,21 @@ Changed paths: * small memory leak may be present on failure, though ------------------------------------------------------------------------ -r367 | lh3lh3 | 2009-06-30 16:18:42 +0100 (Tue, 30 Jun 2009) | 2 lines +r367 | lh3lh3 | 2009-06-30 11:18:42 -0400 (Tue, 30 Jun 2009) | 2 lines Changed paths: M /trunk/samtools/knetfile.c reduce the chance of blocking in FTP connection ------------------------------------------------------------------------ -r366 | lh3lh3 | 2009-06-30 15:35:21 +0100 (Tue, 30 Jun 2009) | 2 lines +r366 | lh3lh3 | 2009-06-30 10:35:21 -0400 (Tue, 30 Jun 2009) | 2 lines Changed paths: M /trunk/samtools/knetfile.c minor changes to knetfile: invalid fd equals -1 rather than 0 ------------------------------------------------------------------------ -r365 | lh3lh3 | 2009-06-30 14:04:30 +0100 (Tue, 30 Jun 2009) | 3 lines +r365 | lh3lh3 | 2009-06-30 09:04:30 -0400 (Tue, 30 Jun 2009) | 3 lines Changed paths: M /trunk/samtools/bam_index.c M /trunk/samtools/bamtk.c @@ -641,7 +1025,7 @@ Changed paths: * download the BAM index file if it is not found in the current working directory. ------------------------------------------------------------------------ -r364 | lh3lh3 | 2009-06-30 12:39:07 +0100 (Tue, 30 Jun 2009) | 3 lines +r364 | lh3lh3 | 2009-06-30 07:39:07 -0400 (Tue, 30 Jun 2009) | 3 lines Changed paths: M /trunk/samtools/bamtk.c M /trunk/samtools/knetfile.c @@ -650,7 +1034,7 @@ Changed paths: * knetfile: report error when the file is not present on FTP ------------------------------------------------------------------------ -r363 | lh3lh3 | 2009-06-29 23:23:32 +0100 (Mon, 29 Jun 2009) | 4 lines +r363 | lh3lh3 | 2009-06-29 18:23:32 -0400 (Mon, 29 Jun 2009) | 4 lines Changed paths: M /trunk/samtools/bam_tview.c M /trunk/samtools/bamtk.c @@ -664,14 +1048,14 @@ Changed paths: * bgzf: cache recent blocks (disabled by default) ------------------------------------------------------------------------ -r362 | lh3lh3 | 2009-06-25 21:04:34 +0100 (Thu, 25 Jun 2009) | 2 lines +r362 | lh3lh3 | 2009-06-25 16:04:34 -0400 (Thu, 25 Jun 2009) | 2 lines Changed paths: M /trunk/samtools/bgzf.c write changelog ------------------------------------------------------------------------ -r361 | lh3lh3 | 2009-06-25 21:03:10 +0100 (Thu, 25 Jun 2009) | 3 lines +r361 | lh3lh3 | 2009-06-25 16:03:10 -0400 (Thu, 25 Jun 2009) | 3 lines Changed paths: M /trunk/samtools/bam_index.c M /trunk/samtools/bamtk.c @@ -680,7 +1064,7 @@ Changed paths: * if a file is given on FTP, search locally for the BAM index ------------------------------------------------------------------------ -r360 | lh3lh3 | 2009-06-25 20:44:52 +0100 (Thu, 25 Jun 2009) | 5 lines +r360 | lh3lh3 | 2009-06-25 15:44:52 -0400 (Thu, 25 Jun 2009) | 5 lines Changed paths: M /trunk/samtools/Makefile M /trunk/samtools/bam_import.c @@ -697,7 +1081,7 @@ Changed paths: * support knetfile library in BGZF ------------------------------------------------------------------------ -r359 | lh3lh3 | 2009-06-25 17:10:55 +0100 (Thu, 25 Jun 2009) | 2 lines +r359 | lh3lh3 | 2009-06-25 12:10:55 -0400 (Thu, 25 Jun 2009) | 2 lines Changed paths: M /trunk/samtools/knetfile.c M /trunk/samtools/knetfile.h @@ -705,14 +1089,14 @@ Changed paths: fixed bugs in knetfile.* ------------------------------------------------------------------------ -r358 | lh3lh3 | 2009-06-25 13:53:19 +0100 (Thu, 25 Jun 2009) | 2 lines +r358 | lh3lh3 | 2009-06-25 08:53:19 -0400 (Thu, 25 Jun 2009) | 2 lines Changed paths: A /trunk/samtools/knetfile.h this is the header file ------------------------------------------------------------------------ -r357 | lh3lh3 | 2009-06-25 13:52:03 +0100 (Thu, 25 Jun 2009) | 3 lines +r357 | lh3lh3 | 2009-06-25 08:52:03 -0400 (Thu, 25 Jun 2009) | 3 lines Changed paths: A /trunk/samtools/knetfile.c @@ -720,7 +1104,7 @@ Changed paths: * preliminary version ------------------------------------------------------------------------ -r354 | lh3lh3 | 2009-06-24 14:02:25 +0100 (Wed, 24 Jun 2009) | 3 lines +r354 | lh3lh3 | 2009-06-24 09:02:25 -0400 (Wed, 24 Jun 2009) | 3 lines Changed paths: M /trunk/samtools/bam.c M /trunk/samtools/bamtk.c @@ -729,7 +1113,7 @@ Changed paths: * fixed a memory leak in bam_view1(), although samtools is not using this routine. ------------------------------------------------------------------------ -r351 | lh3lh3 | 2009-06-18 00:16:26 +0100 (Thu, 18 Jun 2009) | 4 lines +r351 | lh3lh3 | 2009-06-17 19:16:26 -0400 (Wed, 17 Jun 2009) | 4 lines Changed paths: M /trunk/samtools/bamtk.c M /trunk/samtools/faidx.c @@ -739,7 +1123,7 @@ Changed paths: * hope this does not introduce new bugs... ------------------------------------------------------------------------ -r350 | lh3lh3 | 2009-06-16 14:37:01 +0100 (Tue, 16 Jun 2009) | 3 lines +r350 | lh3lh3 | 2009-06-16 09:37:01 -0400 (Tue, 16 Jun 2009) | 3 lines Changed paths: M /trunk/samtools/bam_plcmd.c M /trunk/samtools/bamtk.c @@ -748,7 +1132,7 @@ Changed paths: * fixed a small memory leak in pileup, caused by recent modifications ------------------------------------------------------------------------ -r347 | lh3lh3 | 2009-06-13 21:20:49 +0100 (Sat, 13 Jun 2009) | 3 lines +r347 | lh3lh3 | 2009-06-13 16:20:49 -0400 (Sat, 13 Jun 2009) | 3 lines Changed paths: M /trunk/samtools/bam_plcmd.c M /trunk/samtools/bamtk.c @@ -758,7 +1142,7 @@ Changed paths: * added `-S' to pileup, similar to `view -S' ------------------------------------------------------------------------ -r346 | lh3lh3 | 2009-06-13 17:52:31 +0100 (Sat, 13 Jun 2009) | 3 lines +r346 | lh3lh3 | 2009-06-13 12:52:31 -0400 (Sat, 13 Jun 2009) | 3 lines Changed paths: M /trunk/samtools/Makefile M /trunk/samtools/bamtk.c @@ -769,21 +1153,21 @@ Changed paths: * allow to select a read group at view command-line ------------------------------------------------------------------------ -r344 | lh3lh3 | 2009-06-13 14:06:24 +0100 (Sat, 13 Jun 2009) | 2 lines +r344 | lh3lh3 | 2009-06-13 09:06:24 -0400 (Sat, 13 Jun 2009) | 2 lines Changed paths: M /trunk/samtools/examples/calDepth.c added more comments ------------------------------------------------------------------------ -r343 | lh3lh3 | 2009-06-13 14:01:22 +0100 (Sat, 13 Jun 2009) | 2 lines +r343 | lh3lh3 | 2009-06-13 09:01:22 -0400 (Sat, 13 Jun 2009) | 2 lines Changed paths: M /trunk/samtools/examples/calDepth.c nothing really ------------------------------------------------------------------------ -r342 | lh3lh3 | 2009-06-13 13:58:48 +0100 (Sat, 13 Jun 2009) | 2 lines +r342 | lh3lh3 | 2009-06-13 08:58:48 -0400 (Sat, 13 Jun 2009) | 2 lines Changed paths: M /trunk/samtools/examples/Makefile A /trunk/samtools/examples/calDepth.c @@ -791,7 +1175,7 @@ Changed paths: added an example of calculating read depth ------------------------------------------------------------------------ -r341 | lh3lh3 | 2009-06-13 13:00:08 +0100 (Sat, 13 Jun 2009) | 6 lines +r341 | lh3lh3 | 2009-06-13 08:00:08 -0400 (Sat, 13 Jun 2009) | 6 lines Changed paths: M /trunk/samtools/Makefile M /trunk/samtools/bam.h @@ -811,7 +1195,7 @@ Changed paths: * remove the support of -q in pileup ------------------------------------------------------------------------ -r340 | lh3lh3 | 2009-06-13 11:17:14 +0100 (Sat, 13 Jun 2009) | 6 lines +r340 | lh3lh3 | 2009-06-13 06:17:14 -0400 (Sat, 13 Jun 2009) | 6 lines Changed paths: M /trunk/samtools/INSTALL M /trunk/samtools/Makefile @@ -829,14 +1213,14 @@ Changed paths: * detect NCURSES in bam_tview.c ------------------------------------------------------------------------ -r339 | lh3lh3 | 2009-06-13 10:35:19 +0100 (Sat, 13 Jun 2009) | 2 lines +r339 | lh3lh3 | 2009-06-13 05:35:19 -0400 (Sat, 13 Jun 2009) | 2 lines Changed paths: M /trunk/samtools/INSTALL update INSTALL ------------------------------------------------------------------------ -r338 | lh3lh3 | 2009-06-13 00:15:24 +0100 (Sat, 13 Jun 2009) | 4 lines +r338 | lh3lh3 | 2009-06-12 19:15:24 -0400 (Fri, 12 Jun 2009) | 4 lines Changed paths: M /trunk/samtools/bam.c M /trunk/samtools/bam.h @@ -852,7 +1236,7 @@ Changed paths: command line ------------------------------------------------------------------------ -r337 | lh3lh3 | 2009-06-12 21:25:50 +0100 (Fri, 12 Jun 2009) | 4 lines +r337 | lh3lh3 | 2009-06-12 16:25:50 -0400 (Fri, 12 Jun 2009) | 4 lines Changed paths: M /trunk/samtools/bamtk.c M /trunk/samtools/bgzf.c @@ -865,7 +1249,7 @@ Changed paths: * "samtools view" support "-u" command-line option ------------------------------------------------------------------------ -r336 | lh3lh3 | 2009-06-12 17:20:12 +0100 (Fri, 12 Jun 2009) | 5 lines +r336 | lh3lh3 | 2009-06-12 12:20:12 -0400 (Fri, 12 Jun 2009) | 5 lines Changed paths: M /trunk/samtools/Makefile M /trunk/samtools/misc/Makefile @@ -879,14 +1263,14 @@ Changed paths: * on old version of zlib, writing is not available ------------------------------------------------------------------------ -r335 | lh3lh3 | 2009-06-12 16:47:33 +0100 (Fri, 12 Jun 2009) | 2 lines +r335 | lh3lh3 | 2009-06-12 11:47:33 -0400 (Fri, 12 Jun 2009) | 2 lines Changed paths: D /trunk/samtools/zlib remove zlib for simplification... ------------------------------------------------------------------------ -r334 | lh3lh3 | 2009-06-12 15:43:36 +0100 (Fri, 12 Jun 2009) | 5 lines +r334 | lh3lh3 | 2009-06-12 10:43:36 -0400 (Fri, 12 Jun 2009) | 5 lines Changed paths: M /trunk/samtools/bam.h M /trunk/samtools/bam_aux.c @@ -898,14 +1282,14 @@ Changed paths: * this version works with the latest Bio::DB::Sam (20090612) ------------------------------------------------------------------------ -r333 | lh3lh3 | 2009-06-12 15:33:42 +0100 (Fri, 12 Jun 2009) | 2 lines +r333 | lh3lh3 | 2009-06-12 10:33:42 -0400 (Fri, 12 Jun 2009) | 2 lines Changed paths: M /trunk/samtools/ChangeLog update ChangeLog ------------------------------------------------------------------------ -r332 | lh3lh3 | 2009-06-12 15:21:21 +0100 (Fri, 12 Jun 2009) | 2 lines +r332 | lh3lh3 | 2009-06-12 10:21:21 -0400 (Fri, 12 Jun 2009) | 2 lines Changed paths: M /trunk/samtools/AUTHORS M /trunk/samtools/Makefile @@ -914,7 +1298,7 @@ Changed paths: fixed minor things in Makefile ------------------------------------------------------------------------ -r331 | lh3lh3 | 2009-06-12 15:07:05 +0100 (Fri, 12 Jun 2009) | 4 lines +r331 | lh3lh3 | 2009-06-12 10:07:05 -0400 (Fri, 12 Jun 2009) | 4 lines Changed paths: M /trunk/samtools/bamtk.c @@ -923,7 +1307,7 @@ Changed paths: changes in the Makefile building system. ------------------------------------------------------------------------ -r330 | lh3lh3 | 2009-06-12 15:03:38 +0100 (Fri, 12 Jun 2009) | 2 lines +r330 | lh3lh3 | 2009-06-12 10:03:38 -0400 (Fri, 12 Jun 2009) | 2 lines Changed paths: M /trunk/samtools/AUTHORS D /trunk/samtools/README @@ -931,7 +1315,7 @@ Changed paths: update information... ------------------------------------------------------------------------ -r329 | lh3lh3 | 2009-06-12 14:52:21 +0100 (Fri, 12 Jun 2009) | 3 lines +r329 | lh3lh3 | 2009-06-12 09:52:21 -0400 (Fri, 12 Jun 2009) | 3 lines Changed paths: M /trunk/samtools/misc/novo2sam.pl @@ -939,7 +1323,7 @@ Changed paths: * this version works with indels ------------------------------------------------------------------------ -r328 | lh3lh3 | 2009-06-12 14:50:53 +0100 (Fri, 12 Jun 2009) | 3 lines +r328 | lh3lh3 | 2009-06-12 09:50:53 -0400 (Fri, 12 Jun 2009) | 3 lines Changed paths: M /trunk/samtools/INSTALL M /trunk/samtools/Makefile @@ -950,7 +1334,7 @@ Changed paths: * update INSTALL instruction ------------------------------------------------------------------------ -r327 | lh3lh3 | 2009-06-12 14:18:29 +0100 (Fri, 12 Jun 2009) | 4 lines +r327 | lh3lh3 | 2009-06-12 09:18:29 -0400 (Fri, 12 Jun 2009) | 4 lines Changed paths: A /trunk/samtools/Makefile (from /trunk/samtools/Makefile.generic:325) D /trunk/samtools/Makefile.am @@ -994,14 +1378,14 @@ Changed paths: * unfinished! (will be soon) ------------------------------------------------------------------------ -r326 | lh3lh3 | 2009-06-12 14:12:03 +0100 (Fri, 12 Jun 2009) | 2 lines +r326 | lh3lh3 | 2009-06-12 09:12:03 -0400 (Fri, 12 Jun 2009) | 2 lines Changed paths: M /trunk/samtools/misc/samtools.pl Unfinished ------------------------------------------------------------------------ -r325 | lh3lh3 | 2009-06-10 16:27:59 +0100 (Wed, 10 Jun 2009) | 3 lines +r325 | lh3lh3 | 2009-06-10 11:27:59 -0400 (Wed, 10 Jun 2009) | 3 lines Changed paths: M /trunk/samtools/bam_maqcns.c M /trunk/samtools/bamtk.c @@ -1010,7 +1394,7 @@ Changed paths: * further avoid wrong consensus calls in repetitive regions. ------------------------------------------------------------------------ -r324 | lh3lh3 | 2009-06-10 15:56:17 +0100 (Wed, 10 Jun 2009) | 4 lines +r324 | lh3lh3 | 2009-06-10 10:56:17 -0400 (Wed, 10 Jun 2009) | 4 lines Changed paths: M /trunk/samtools/bam_maqcns.c M /trunk/samtools/bam_plcmd.c @@ -1023,7 +1407,7 @@ Changed paths: * allow filtering on mapQ at the pileup command line ------------------------------------------------------------------------ -r323 | lh3lh3 | 2009-06-10 10:04:21 +0100 (Wed, 10 Jun 2009) | 3 lines +r323 | lh3lh3 | 2009-06-10 05:04:21 -0400 (Wed, 10 Jun 2009) | 3 lines Changed paths: M /trunk/samtools/misc/samtools.pl @@ -1031,28 +1415,28 @@ Changed paths: * indels and SNPs use different mapping quality threshold ------------------------------------------------------------------------ -r322 | lh3lh3 | 2009-06-10 10:03:22 +0100 (Wed, 10 Jun 2009) | 2 lines +r322 | lh3lh3 | 2009-06-10 05:03:22 -0400 (Wed, 10 Jun 2009) | 2 lines Changed paths: M /trunk/samtools/misc/export2sam.pl fixed a typo ------------------------------------------------------------------------ -r321 | lh3lh3 | 2009-06-09 09:21:48 +0100 (Tue, 09 Jun 2009) | 2 lines +r321 | lh3lh3 | 2009-06-09 04:21:48 -0400 (Tue, 09 Jun 2009) | 2 lines Changed paths: M /trunk/samtools/misc/samtools.pl just typo. no real change ------------------------------------------------------------------------ -r320 | lh3lh3 | 2009-06-08 14:32:51 +0100 (Mon, 08 Jun 2009) | 2 lines +r320 | lh3lh3 | 2009-06-08 09:32:51 -0400 (Mon, 08 Jun 2009) | 2 lines Changed paths: M /trunk/samtools/misc/samtools.pl a little bit code cleanup ------------------------------------------------------------------------ -r319 | lh3lh3 | 2009-06-08 14:22:33 +0100 (Mon, 08 Jun 2009) | 4 lines +r319 | lh3lh3 | 2009-06-08 09:22:33 -0400 (Mon, 08 Jun 2009) | 4 lines Changed paths: M /trunk/samtools/misc/samtools.pl @@ -1061,7 +1445,7 @@ Changed paths: * optionally print filtered variants ------------------------------------------------------------------------ -r318 | lh3lh3 | 2009-06-08 14:14:26 +0100 (Mon, 08 Jun 2009) | 3 lines +r318 | lh3lh3 | 2009-06-08 09:14:26 -0400 (Mon, 08 Jun 2009) | 3 lines Changed paths: M /trunk/samtools/misc/samtools.pl @@ -1069,7 +1453,7 @@ Changed paths: * combine snpFilter and indelFilter ------------------------------------------------------------------------ -r317 | lh3lh3 | 2009-06-08 11:31:42 +0100 (Mon, 08 Jun 2009) | 3 lines +r317 | lh3lh3 | 2009-06-08 06:31:42 -0400 (Mon, 08 Jun 2009) | 3 lines Changed paths: M /trunk/samtools/misc/samtools.pl @@ -1077,7 +1461,7 @@ Changed paths: * change a default parameter ------------------------------------------------------------------------ -r316 | lh3lh3 | 2009-06-08 11:11:06 +0100 (Mon, 08 Jun 2009) | 5 lines +r316 | lh3lh3 | 2009-06-08 06:11:06 -0400 (Mon, 08 Jun 2009) | 5 lines Changed paths: M /trunk/samtools/bam_maqcns.c M /trunk/samtools/bam_maqcns.h @@ -1091,7 +1475,7 @@ Changed paths: * pileup: allow to output variant sites only ------------------------------------------------------------------------ -r312 | lh3lh3 | 2009-06-04 13:01:10 +0100 (Thu, 04 Jun 2009) | 3 lines +r312 | lh3lh3 | 2009-06-04 08:01:10 -0400 (Thu, 04 Jun 2009) | 3 lines Changed paths: M /trunk/samtools/misc/samtools.pl @@ -1099,14 +1483,14 @@ Changed paths: * added pileup2fq ------------------------------------------------------------------------ -r311 | lh3lh3 | 2009-06-03 09:40:40 +0100 (Wed, 03 Jun 2009) | 2 lines +r311 | lh3lh3 | 2009-06-03 04:40:40 -0400 (Wed, 03 Jun 2009) | 2 lines Changed paths: M /trunk/samtools/misc/samtools.pl * in snpFilter, suppress non-SNP sites ------------------------------------------------------------------------ -r310 | lh3lh3 | 2009-06-01 14:35:13 +0100 (Mon, 01 Jun 2009) | 3 lines +r310 | lh3lh3 | 2009-06-01 09:35:13 -0400 (Mon, 01 Jun 2009) | 3 lines Changed paths: M /trunk/samtools/misc/samtools.pl @@ -1114,7 +1498,7 @@ Changed paths: * fixed a typo ------------------------------------------------------------------------ -r309 | lh3lh3 | 2009-06-01 14:04:39 +0100 (Mon, 01 Jun 2009) | 3 lines +r309 | lh3lh3 | 2009-06-01 09:04:39 -0400 (Mon, 01 Jun 2009) | 3 lines Changed paths: M /trunk/samtools/misc/samtools.pl @@ -1122,7 +1506,7 @@ Changed paths: * snpFilter ------------------------------------------------------------------------ -r306 | lh3lh3 | 2009-05-28 11:49:35 +0100 (Thu, 28 May 2009) | 3 lines +r306 | lh3lh3 | 2009-05-28 06:49:35 -0400 (Thu, 28 May 2009) | 3 lines Changed paths: M /trunk/samtools/bgzf.c @@ -1130,14 +1514,14 @@ Changed paths: * suggested by {kdj,jm18}@sanger.ac.uk ------------------------------------------------------------------------ -r305 | lh3lh3 | 2009-05-28 11:16:08 +0100 (Thu, 28 May 2009) | 2 lines +r305 | lh3lh3 | 2009-05-28 06:16:08 -0400 (Thu, 28 May 2009) | 2 lines Changed paths: A /trunk/samtools/misc/interpolate_sam.pl Script for paired-end pileup, contributed by Stephen Montgomery. ------------------------------------------------------------------------ -r304 | lh3lh3 | 2009-05-28 11:08:49 +0100 (Thu, 28 May 2009) | 3 lines +r304 | lh3lh3 | 2009-05-28 06:08:49 -0400 (Thu, 28 May 2009) | 3 lines Changed paths: M /trunk/samtools/bamtk.c M /trunk/samtools/sam.c @@ -1146,7 +1530,7 @@ Changed paths: * fixed a minor bug in printing headers ------------------------------------------------------------------------ -r297 | lh3lh3 | 2009-05-21 16:06:16 +0100 (Thu, 21 May 2009) | 2 lines +r297 | lh3lh3 | 2009-05-21 11:06:16 -0400 (Thu, 21 May 2009) | 2 lines Changed paths: M /trunk/samtools/ChangeLog M /trunk/samtools/NEWS @@ -1158,7 +1542,7 @@ Changed paths: Release samtools-0.1.4 ------------------------------------------------------------------------ -r296 | lh3lh3 | 2009-05-21 12:53:14 +0100 (Thu, 21 May 2009) | 3 lines +r296 | lh3lh3 | 2009-05-21 07:53:14 -0400 (Thu, 21 May 2009) | 3 lines Changed paths: M /trunk/samtools/bam_maqcns.c M /trunk/samtools/bamtk.c @@ -1167,7 +1551,7 @@ Changed paths: * another similar bug in the indel caller ------------------------------------------------------------------------ -r295 | lh3lh3 | 2009-05-21 12:50:28 +0100 (Thu, 21 May 2009) | 3 lines +r295 | lh3lh3 | 2009-05-21 07:50:28 -0400 (Thu, 21 May 2009) | 3 lines Changed paths: M /trunk/samtools/bam_maqcns.c M /trunk/samtools/bamtk.c @@ -1176,14 +1560,14 @@ Changed paths: * fixed a critical bug in the indel caller ------------------------------------------------------------------------ -r294 | lh3lh3 | 2009-05-20 13:00:20 +0100 (Wed, 20 May 2009) | 2 lines +r294 | lh3lh3 | 2009-05-20 08:00:20 -0400 (Wed, 20 May 2009) | 2 lines Changed paths: M /trunk/samtools/bam_stat.c added a missing header file ------------------------------------------------------------------------ -r293 | lh3lh3 | 2009-05-19 23:44:25 +0100 (Tue, 19 May 2009) | 3 lines +r293 | lh3lh3 | 2009-05-19 18:44:25 -0400 (Tue, 19 May 2009) | 3 lines Changed paths: M /trunk/samtools/bam_tview.c M /trunk/samtools/bamtk.c @@ -1192,7 +1576,7 @@ Changed paths: * open tview in the dot-view mode by default ------------------------------------------------------------------------ -r292 | lh3lh3 | 2009-05-18 21:01:23 +0100 (Mon, 18 May 2009) | 6 lines +r292 | lh3lh3 | 2009-05-18 16:01:23 -0400 (Mon, 18 May 2009) | 6 lines Changed paths: M /trunk/samtools/samtools.1 @@ -1203,7 +1587,7 @@ information. Also thank James Bonfields for pointing this out. ------------------------------------------------------------------------ -r286 | lh3lh3 | 2009-05-14 15:23:13 +0100 (Thu, 14 May 2009) | 3 lines +r286 | lh3lh3 | 2009-05-14 10:23:13 -0400 (Thu, 14 May 2009) | 3 lines Changed paths: M /trunk/samtools/bam.h M /trunk/samtools/bam_aux.c @@ -1213,7 +1597,7 @@ Changed paths: * declare bam_aux_get_core() in bam.h ------------------------------------------------------------------------ -r276 | lh3lh3 | 2009-05-13 10:07:55 +0100 (Wed, 13 May 2009) | 5 lines +r276 | lh3lh3 | 2009-05-13 05:07:55 -0400 (Wed, 13 May 2009) | 5 lines Changed paths: M /trunk/samtools/bam.h M /trunk/samtools/bam_index.c @@ -1225,7 +1609,7 @@ Changed paths: * As is suggested by Tim, scan "{base}.bai" and "{base}.bam.bai" for index ------------------------------------------------------------------------ -r275 | lh3lh3 | 2009-05-12 21:14:10 +0100 (Tue, 12 May 2009) | 4 lines +r275 | lh3lh3 | 2009-05-12 16:14:10 -0400 (Tue, 12 May 2009) | 4 lines Changed paths: M /trunk/samtools/ChangeLog M /trunk/samtools/bam.h @@ -1236,7 +1620,7 @@ Changed paths: backward compatibility with Bio::DB::Sam ------------------------------------------------------------------------ -r273 | lh3lh3 | 2009-05-12 14:28:39 +0100 (Tue, 12 May 2009) | 3 lines +r273 | lh3lh3 | 2009-05-12 09:28:39 -0400 (Tue, 12 May 2009) | 3 lines Changed paths: M /trunk/samtools/bam_rmdupse.c M /trunk/samtools/bamtk.c @@ -1245,14 +1629,14 @@ Changed paths: * rmdupse: do not remove unmapped reads ------------------------------------------------------------------------ -r272 | lh3lh3 | 2009-05-12 14:20:00 +0100 (Tue, 12 May 2009) | 2 lines +r272 | lh3lh3 | 2009-05-12 09:20:00 -0400 (Tue, 12 May 2009) | 2 lines Changed paths: M /trunk/samtools/bam_rmdupse.c change a parameter. It does nothing ------------------------------------------------------------------------ -r271 | lh3lh3 | 2009-05-12 14:17:58 +0100 (Tue, 12 May 2009) | 3 lines +r271 | lh3lh3 | 2009-05-12 09:17:58 -0400 (Tue, 12 May 2009) | 3 lines Changed paths: M /trunk/samtools/Makefile.am M /trunk/samtools/Makefile.generic @@ -1265,7 +1649,7 @@ Changed paths: * added 'rmdupse' command ------------------------------------------------------------------------ -r267 | lh3lh3 | 2009-05-05 22:31:41 +0100 (Tue, 05 May 2009) | 3 lines +r267 | lh3lh3 | 2009-05-05 17:31:41 -0400 (Tue, 05 May 2009) | 3 lines Changed paths: M /trunk/samtools/bamtk.c M /trunk/samtools/sam_view.c @@ -1274,7 +1658,7 @@ Changed paths: * in sam_view.c, changed g_flag_on based on the suggestion by Angie Hinrichs ------------------------------------------------------------------------ -r266 | lh3lh3 | 2009-05-05 22:23:27 +0100 (Tue, 05 May 2009) | 3 lines +r266 | lh3lh3 | 2009-05-05 17:23:27 -0400 (Tue, 05 May 2009) | 3 lines Changed paths: M /trunk/samtools/bam_import.c M /trunk/samtools/bamtk.c @@ -1283,7 +1667,7 @@ Changed paths: * report an error if a non-* reference is present while @SQ is absent ------------------------------------------------------------------------ -r265 | lh3lh3 | 2009-05-05 22:09:00 +0100 (Tue, 05 May 2009) | 3 lines +r265 | lh3lh3 | 2009-05-05 17:09:00 -0400 (Tue, 05 May 2009) | 3 lines Changed paths: M /trunk/samtools/bam.h M /trunk/samtools/bam_import.c @@ -1295,7 +1679,7 @@ Changed paths: * make samopen() recognize @SQ header lines ------------------------------------------------------------------------ -r261 | lh3lh3 | 2009-05-05 15:10:30 +0100 (Tue, 05 May 2009) | 3 lines +r261 | lh3lh3 | 2009-05-05 10:10:30 -0400 (Tue, 05 May 2009) | 3 lines Changed paths: M /trunk/samtools/bam_plcmd.c M /trunk/samtools/bamtk.c @@ -1307,14 +1691,14 @@ Changed paths: * report error for file I/O error ------------------------------------------------------------------------ -r260 | lh3lh3 | 2009-05-05 15:01:16 +0100 (Tue, 05 May 2009) | 2 lines +r260 | lh3lh3 | 2009-05-05 10:01:16 -0400 (Tue, 05 May 2009) | 2 lines Changed paths: M /trunk/samtools/Makefile.am update Makefile.am ------------------------------------------------------------------------ -r259 | lh3lh3 | 2009-05-05 14:52:25 +0100 (Tue, 05 May 2009) | 3 lines +r259 | lh3lh3 | 2009-05-05 09:52:25 -0400 (Tue, 05 May 2009) | 3 lines Changed paths: M /trunk/samtools/bam.h M /trunk/samtools/bam_pileup.c @@ -1327,7 +1711,7 @@ Changed paths: * use the new I/O interface in pileup ------------------------------------------------------------------------ -r258 | lh3lh3 | 2009-05-05 14:33:22 +0100 (Tue, 05 May 2009) | 3 lines +r258 | lh3lh3 | 2009-05-05 09:33:22 -0400 (Tue, 05 May 2009) | 3 lines Changed paths: M /trunk/samtools/Makefile.generic M /trunk/samtools/Makefile.lite @@ -1343,7 +1727,7 @@ Changed paths: * unify the interface to BAM and SAM I/O ------------------------------------------------------------------------ -r257 | lh3lh3 | 2009-05-05 09:53:35 +0100 (Tue, 05 May 2009) | 3 lines +r257 | lh3lh3 | 2009-05-05 04:53:35 -0400 (Tue, 05 May 2009) | 3 lines Changed paths: M /trunk/samtools/Makefile.lite M /trunk/samtools/bam_plcmd.c @@ -1353,7 +1737,7 @@ Changed paths: * allow hex with "pileup -m" ------------------------------------------------------------------------ -r256 | lh3lh3 | 2009-05-04 19:16:50 +0100 (Mon, 04 May 2009) | 4 lines +r256 | lh3lh3 | 2009-05-04 14:16:50 -0400 (Mon, 04 May 2009) | 4 lines Changed paths: M /trunk/samtools/bam_lpileup.c M /trunk/samtools/bamtk.c @@ -1363,7 +1747,7 @@ Changed paths: * I do not know if this also fixes the bug causing assertion failure in the tview ------------------------------------------------------------------------ -r251 | lh3lh3 | 2009-04-28 13:53:23 +0100 (Tue, 28 Apr 2009) | 3 lines +r251 | lh3lh3 | 2009-04-28 08:53:23 -0400 (Tue, 28 Apr 2009) | 3 lines Changed paths: M /trunk/samtools/bam_pileup.c M /trunk/samtools/bamtk.c @@ -1372,7 +1756,7 @@ Changed paths: * fixed a bug when there are reads without coordinates ------------------------------------------------------------------------ -r250 | lh3lh3 | 2009-04-28 13:43:33 +0100 (Tue, 28 Apr 2009) | 2 lines +r250 | lh3lh3 | 2009-04-28 08:43:33 -0400 (Tue, 28 Apr 2009) | 2 lines Changed paths: A /trunk/samtools/AUTHORS A /trunk/samtools/README @@ -1381,7 +1765,7 @@ Changed paths: added missing files ------------------------------------------------------------------------ -r249 | lh3lh3 | 2009-04-28 13:37:16 +0100 (Tue, 28 Apr 2009) | 2 lines +r249 | lh3lh3 | 2009-04-28 08:37:16 -0400 (Tue, 28 Apr 2009) | 2 lines Changed paths: M /trunk/samtools/Makefile.generic M /trunk/samtools/Makefile.lite @@ -1391,14 +1775,14 @@ Changed paths: improve large file support in compilation ------------------------------------------------------------------------ -r248 | lh3lh3 | 2009-04-28 13:33:24 +0100 (Tue, 28 Apr 2009) | 2 lines +r248 | lh3lh3 | 2009-04-28 08:33:24 -0400 (Tue, 28 Apr 2009) | 2 lines Changed paths: M /trunk/samtools/INSTALL update INSTALL ------------------------------------------------------------------------ -r247 | lh3lh3 | 2009-04-28 13:28:50 +0100 (Tue, 28 Apr 2009) | 2 lines +r247 | lh3lh3 | 2009-04-28 08:28:50 -0400 (Tue, 28 Apr 2009) | 2 lines Changed paths: M /trunk/samtools/Makefile.am M /trunk/samtools/autogen.sh @@ -1409,7 +1793,7 @@ Changed paths: fixed various issues about the GNU building scripts ------------------------------------------------------------------------ -r246 | lh3lh3 | 2009-04-28 13:10:23 +0100 (Tue, 28 Apr 2009) | 4 lines +r246 | lh3lh3 | 2009-04-28 08:10:23 -0400 (Tue, 28 Apr 2009) | 4 lines Changed paths: M /trunk/samtools/ChangeLog D /trunk/samtools/Makefile @@ -1430,7 +1814,7 @@ Changed paths: * enhanced support of displaying color-space reads ------------------------------------------------------------------------ -r244 | lh3lh3 | 2009-04-25 11:49:40 +0100 (Sat, 25 Apr 2009) | 3 lines +r244 | lh3lh3 | 2009-04-25 06:49:40 -0400 (Sat, 25 Apr 2009) | 3 lines Changed paths: M /trunk/samtools/bam_md.c M /trunk/samtools/bamtk.c @@ -1439,7 +1823,7 @@ Changed paths: * fixed segfault for unmapped reads ------------------------------------------------------------------------ -r243 | lh3lh3 | 2009-04-24 21:27:26 +0100 (Fri, 24 Apr 2009) | 5 lines +r243 | lh3lh3 | 2009-04-24 16:27:26 -0400 (Fri, 24 Apr 2009) | 5 lines Changed paths: M /trunk/samtools/bam.h M /trunk/samtools/bam_maqcns.c @@ -1452,7 +1836,7 @@ Changed paths: * consensus calling now works with "=", but indel calling not ------------------------------------------------------------------------ -r242 | lh3lh3 | 2009-04-24 20:44:46 +0100 (Fri, 24 Apr 2009) | 3 lines +r242 | lh3lh3 | 2009-04-24 15:44:46 -0400 (Fri, 24 Apr 2009) | 3 lines Changed paths: M /trunk/samtools/bam_md.c M /trunk/samtools/bamtk.c @@ -1461,7 +1845,7 @@ Changed paths: * fixed a memory leak ------------------------------------------------------------------------ -r240 | lh3lh3 | 2009-04-24 16:40:18 +0100 (Fri, 24 Apr 2009) | 5 lines +r240 | lh3lh3 | 2009-04-24 11:40:18 -0400 (Fri, 24 Apr 2009) | 5 lines Changed paths: M /trunk/samtools/Makefile M /trunk/samtools/Makefile.lite @@ -1477,7 +1861,7 @@ Changed paths: * the plain pileup now support "=" bases, but consensus calling and glfgen may fail ------------------------------------------------------------------------ -r239 | lh3lh3 | 2009-04-24 12:08:20 +0100 (Fri, 24 Apr 2009) | 5 lines +r239 | lh3lh3 | 2009-04-24 07:08:20 -0400 (Fri, 24 Apr 2009) | 5 lines Changed paths: M /trunk/samtools/bam.h M /trunk/samtools/bam_aux.c @@ -1489,7 +1873,7 @@ Changed paths: * added tagview for testing bam_aux ------------------------------------------------------------------------ -r235 | lh3lh3 | 2009-04-21 23:17:39 +0100 (Tue, 21 Apr 2009) | 3 lines +r235 | lh3lh3 | 2009-04-21 18:17:39 -0400 (Tue, 21 Apr 2009) | 3 lines Changed paths: M /trunk/samtools/bam_pileup.c M /trunk/samtools/bamtk.c @@ -1498,14 +1882,14 @@ Changed paths: * fixed a bug in pileup: the first read in a chromosome may not be printed ------------------------------------------------------------------------ -r232 | lh3lh3 | 2009-04-16 15:25:43 +0100 (Thu, 16 Apr 2009) | 2 lines +r232 | lh3lh3 | 2009-04-16 10:25:43 -0400 (Thu, 16 Apr 2009) | 2 lines Changed paths: M /trunk/samtools/Makefile.lite a missing file in Makefile.lite ------------------------------------------------------------------------ -r227 | lh3lh3 | 2009-04-15 22:02:53 +0100 (Wed, 15 Apr 2009) | 2 lines +r227 | lh3lh3 | 2009-04-15 17:02:53 -0400 (Wed, 15 Apr 2009) | 2 lines Changed paths: M /trunk/samtools/NEWS M /trunk/samtools/bamtk.c @@ -1513,7 +1897,7 @@ Changed paths: Release samtools-0.1.3 ------------------------------------------------------------------------ -r223 | lh3lh3 | 2009-04-15 14:31:32 +0100 (Wed, 15 Apr 2009) | 3 lines +r223 | lh3lh3 | 2009-04-15 09:31:32 -0400 (Wed, 15 Apr 2009) | 3 lines Changed paths: M /trunk/samtools/bam_plcmd.c M /trunk/samtools/bamtk.c @@ -1522,7 +1906,7 @@ Changed paths: * make samtools more robust to weird input such as empty file ------------------------------------------------------------------------ -r222 | lh3lh3 | 2009-04-15 14:05:33 +0100 (Wed, 15 Apr 2009) | 2 lines +r222 | lh3lh3 | 2009-04-15 09:05:33 -0400 (Wed, 15 Apr 2009) | 2 lines Changed paths: M /trunk/samtools/ChangeLog M /trunk/samtools/NEWS @@ -1531,14 +1915,14 @@ Changed paths: prepare for release 0.1.3 ------------------------------------------------------------------------ -r221 | lh3lh3 | 2009-04-15 13:32:14 +0100 (Wed, 15 Apr 2009) | 2 lines +r221 | lh3lh3 | 2009-04-15 08:32:14 -0400 (Wed, 15 Apr 2009) | 2 lines Changed paths: A /trunk/samtools/misc/blast2sam.pl convert NCBI-BLASTN to SAM ------------------------------------------------------------------------ -r220 | lh3lh3 | 2009-04-15 13:18:19 +0100 (Wed, 15 Apr 2009) | 3 lines +r220 | lh3lh3 | 2009-04-15 08:18:19 -0400 (Wed, 15 Apr 2009) | 3 lines Changed paths: M /trunk/samtools/bam_lpileup.c M /trunk/samtools/bamtk.c @@ -1547,7 +1931,7 @@ Changed paths: * fixed a small memory leak in tview ------------------------------------------------------------------------ -r219 | lh3lh3 | 2009-04-15 13:00:08 +0100 (Wed, 15 Apr 2009) | 3 lines +r219 | lh3lh3 | 2009-04-15 08:00:08 -0400 (Wed, 15 Apr 2009) | 3 lines Changed paths: M /trunk/samtools/bam_rmdup.c M /trunk/samtools/bamtk.c @@ -1556,7 +1940,7 @@ Changed paths: * fixed a bug in rmdup when there are unmapped reads ------------------------------------------------------------------------ -r218 | lh3lh3 | 2009-04-14 22:28:58 +0100 (Tue, 14 Apr 2009) | 2 lines +r218 | lh3lh3 | 2009-04-14 17:28:58 -0400 (Tue, 14 Apr 2009) | 2 lines Changed paths: M /trunk/samtools/ChangeLog M /trunk/samtools/NEWS @@ -1564,7 +1948,7 @@ Changed paths: proposed NEWS for the new release (have not yet) ------------------------------------------------------------------------ -r216 | lh3lh3 | 2009-04-14 22:10:46 +0100 (Tue, 14 Apr 2009) | 4 lines +r216 | lh3lh3 | 2009-04-14 17:10:46 -0400 (Tue, 14 Apr 2009) | 4 lines Changed paths: M /trunk/samtools/misc/samtools.pl @@ -1573,7 +1957,7 @@ Changed paths: on the new pileup indel line implemented in samtools-0.1.2-25 ------------------------------------------------------------------------ -r215 | lh3lh3 | 2009-04-14 22:04:19 +0100 (Tue, 14 Apr 2009) | 4 lines +r215 | lh3lh3 | 2009-04-14 17:04:19 -0400 (Tue, 14 Apr 2009) | 4 lines Changed paths: M /trunk/samtools/bam_maqcns.c M /trunk/samtools/bam_plcmd.c @@ -1585,14 +1969,14 @@ Changed paths: containing indels ------------------------------------------------------------------------ -r211 | lh3lh3 | 2009-04-13 12:07:13 +0100 (Mon, 13 Apr 2009) | 2 lines +r211 | lh3lh3 | 2009-04-13 07:07:13 -0400 (Mon, 13 Apr 2009) | 2 lines Changed paths: M /trunk/samtools/ChangeLog update ChangeLog from "svn log" ------------------------------------------------------------------------ -r210 | lh3lh3 | 2009-04-12 20:57:05 +0100 (Sun, 12 Apr 2009) | 4 lines +r210 | lh3lh3 | 2009-04-12 15:57:05 -0400 (Sun, 12 Apr 2009) | 4 lines Changed paths: M /trunk/samtools/bam.c M /trunk/samtools/bam_import.c @@ -1605,7 +1989,7 @@ Changed paths: * allow empty header ------------------------------------------------------------------------ -r209 | lh3lh3 | 2009-04-12 20:32:44 +0100 (Sun, 12 Apr 2009) | 3 lines +r209 | lh3lh3 | 2009-04-12 15:32:44 -0400 (Sun, 12 Apr 2009) | 3 lines Changed paths: M /trunk/samtools/bam.c M /trunk/samtools/bam_import.c @@ -1615,7 +1999,7 @@ Changed paths: * recognize '*' at the QUAL field ------------------------------------------------------------------------ -r208 | lh3lh3 | 2009-04-12 20:08:02 +0100 (Sun, 12 Apr 2009) | 3 lines +r208 | lh3lh3 | 2009-04-12 15:08:02 -0400 (Sun, 12 Apr 2009) | 3 lines Changed paths: M /trunk/samtools/bam_import.c M /trunk/samtools/bamtk.c @@ -1625,14 +2009,14 @@ Changed paths: * the field separater is TAB only, now ------------------------------------------------------------------------ -r207 | lh3lh3 | 2009-04-08 15:18:03 +0100 (Wed, 08 Apr 2009) | 2 lines +r207 | lh3lh3 | 2009-04-08 10:18:03 -0400 (Wed, 08 Apr 2009) | 2 lines Changed paths: M /trunk/samtools/examples/ex1.sam.gz * fixed the problem in the example alignment due to the bug in fixmate ------------------------------------------------------------------------ -r206 | lh3lh3 | 2009-04-08 15:15:05 +0100 (Wed, 08 Apr 2009) | 3 lines +r206 | lh3lh3 | 2009-04-08 10:15:05 -0400 (Wed, 08 Apr 2009) | 3 lines Changed paths: M /trunk/samtools/bam_mate.c M /trunk/samtools/bamtk.c @@ -1642,7 +2026,7 @@ Changed paths: * fixed a nasty bug in `fixmate' ------------------------------------------------------------------------ -r205 | lh3lh3 | 2009-04-08 10:57:08 +0100 (Wed, 08 Apr 2009) | 2 lines +r205 | lh3lh3 | 2009-04-08 05:57:08 -0400 (Wed, 08 Apr 2009) | 2 lines Changed paths: M /trunk/samtools/misc/bowtie2sam.pl M /trunk/samtools/misc/soap2sam.pl @@ -1651,7 +2035,7 @@ Changed paths: make the script robust to the bugs in SOAP-2.1.7 ------------------------------------------------------------------------ -r200 | lh3lh3 | 2009-04-02 15:14:56 +0100 (Thu, 02 Apr 2009) | 3 lines +r200 | lh3lh3 | 2009-04-02 10:14:56 -0400 (Thu, 02 Apr 2009) | 3 lines Changed paths: M /trunk/samtools/bam_stat.c M /trunk/samtools/bamtk.c @@ -1660,7 +2044,7 @@ Changed paths: * check if file is truncated in flagstat ------------------------------------------------------------------------ -r199 | lh3lh3 | 2009-04-02 15:09:10 +0100 (Thu, 02 Apr 2009) | 3 lines +r199 | lh3lh3 | 2009-04-02 10:09:10 -0400 (Thu, 02 Apr 2009) | 3 lines Changed paths: M /trunk/samtools/bamtk.c @@ -1668,7 +2052,7 @@ Changed paths: * print the header if requested ------------------------------------------------------------------------ -r193 | lh3lh3 | 2009-03-27 15:09:50 +0000 (Fri, 27 Mar 2009) | 3 lines +r193 | lh3lh3 | 2009-03-27 11:09:50 -0400 (Fri, 27 Mar 2009) | 3 lines Changed paths: M /trunk/samtools/bam_plcmd.c M /trunk/samtools/bamtk.c @@ -1677,7 +2061,7 @@ Changed paths: * fixed a minor bug reported by Nils Homer ------------------------------------------------------------------------ -r185 | lh3lh3 | 2009-03-24 11:50:32 +0000 (Tue, 24 Mar 2009) | 2 lines +r185 | lh3lh3 | 2009-03-24 07:50:32 -0400 (Tue, 24 Mar 2009) | 2 lines Changed paths: A /trunk/samtools/Makefile (from /trunk/samtools/Makefile.std:184) D /trunk/samtools/Makefile.std @@ -1687,7 +2071,7 @@ Changed paths: rename Makefile.std as Makefile. GNU building systerm is not ready and may take some time... ------------------------------------------------------------------------ -r184 | lh3lh3 | 2009-03-24 10:36:38 +0000 (Tue, 24 Mar 2009) | 4 lines +r184 | lh3lh3 | 2009-03-24 06:36:38 -0400 (Tue, 24 Mar 2009) | 4 lines Changed paths: D /trunk/samtools/Makefile A /trunk/samtools/Makefile.std (from /trunk/samtools/Makefile:183) @@ -1703,7 +2087,7 @@ Changed paths: * rename Makefile to Makefile.std and prepare to add the GNU building systerms (also by Nils) ------------------------------------------------------------------------ -r183 | lh3lh3 | 2009-03-24 10:30:23 +0000 (Tue, 24 Mar 2009) | 4 lines +r183 | lh3lh3 | 2009-03-24 06:30:23 -0400 (Tue, 24 Mar 2009) | 4 lines Changed paths: M /trunk/samtools/Makefile M /trunk/samtools/bam_import.c @@ -1720,7 +2104,7 @@ Changed paths: * added my kstring library for a bit complex parsing of the position list. ------------------------------------------------------------------------ -r169 | lh3lh3 | 2009-03-12 13:40:14 +0000 (Thu, 12 Mar 2009) | 3 lines +r169 | lh3lh3 | 2009-03-12 09:40:14 -0400 (Thu, 12 Mar 2009) | 3 lines Changed paths: M /trunk/samtools/misc/soap2sam.pl @@ -1728,14 +2112,14 @@ Changed paths: * more robust to truncated soap output ------------------------------------------------------------------------ -r168 | lh3lh3 | 2009-03-11 10:49:00 +0000 (Wed, 11 Mar 2009) | 2 lines +r168 | lh3lh3 | 2009-03-11 06:49:00 -0400 (Wed, 11 Mar 2009) | 2 lines Changed paths: M /trunk/samtools/Makefile.lite added bam_stat.o to Makefile.lite ------------------------------------------------------------------------ -r167 | lh3lh3 | 2009-03-10 22:11:31 +0000 (Tue, 10 Mar 2009) | 3 lines +r167 | lh3lh3 | 2009-03-10 18:11:31 -0400 (Tue, 10 Mar 2009) | 3 lines Changed paths: M /trunk/samtools/bam_maqcns.c M /trunk/samtools/bamtk.c @@ -1744,7 +2128,7 @@ Changed paths: * generate RMS of mapQ instead of max mapQ ------------------------------------------------------------------------ -r166 | lh3lh3 | 2009-03-10 22:06:45 +0000 (Tue, 10 Mar 2009) | 3 lines +r166 | lh3lh3 | 2009-03-10 18:06:45 -0400 (Tue, 10 Mar 2009) | 3 lines Changed paths: M /trunk/samtools/bam_plcmd.c M /trunk/samtools/bamtk.c @@ -1756,7 +2140,7 @@ Changed paths: * implemented GLFv3 ------------------------------------------------------------------------ -r159 | lh3lh3 | 2009-03-03 11:26:08 +0000 (Tue, 03 Mar 2009) | 3 lines +r159 | lh3lh3 | 2009-03-03 06:26:08 -0500 (Tue, 03 Mar 2009) | 3 lines Changed paths: M /trunk/samtools/bam_plcmd.c M /trunk/samtools/bamtk.c @@ -1765,7 +2149,7 @@ Changed paths: * fixed a minor bug in displaying pileup ------------------------------------------------------------------------ -r158 | lh3lh3 | 2009-03-03 11:24:16 +0000 (Tue, 03 Mar 2009) | 3 lines +r158 | lh3lh3 | 2009-03-03 06:24:16 -0500 (Tue, 03 Mar 2009) | 3 lines Changed paths: M /trunk/samtools/ChangeLog M /trunk/samtools/bamtk.c @@ -1774,7 +2158,7 @@ Changed paths: * optionally print SAM header ------------------------------------------------------------------------ -r153 | lh3lh3 | 2009-03-02 10:45:28 +0000 (Mon, 02 Mar 2009) | 3 lines +r153 | lh3lh3 | 2009-03-02 05:45:28 -0500 (Mon, 02 Mar 2009) | 3 lines Changed paths: M /trunk/samtools/bamtk.c M /trunk/samtools/glf.c @@ -1783,7 +2167,7 @@ Changed paths: * use "GLF\3" as the magic for GLFv3 files ------------------------------------------------------------------------ -r152 | lh3lh3 | 2009-03-02 10:39:09 +0000 (Mon, 02 Mar 2009) | 5 lines +r152 | lh3lh3 | 2009-03-02 05:39:09 -0500 (Mon, 02 Mar 2009) | 5 lines Changed paths: M /trunk/samtools/Makefile M /trunk/samtools/bam_import.c @@ -1799,7 +2183,7 @@ Changed paths: * update to GLFv3: pos is changed to offset for better compression ------------------------------------------------------------------------ -r151 | lh3lh3 | 2009-03-01 15:18:43 +0000 (Sun, 01 Mar 2009) | 3 lines +r151 | lh3lh3 | 2009-03-01 10:18:43 -0500 (Sun, 01 Mar 2009) | 3 lines Changed paths: M /trunk/samtools/misc/wgsim.c @@ -1807,7 +2191,7 @@ Changed paths: * fixed a bug in simulating indels ------------------------------------------------------------------------ -r145 | lh3lh3 | 2009-02-26 19:43:57 +0000 (Thu, 26 Feb 2009) | 4 lines +r145 | lh3lh3 | 2009-02-26 14:43:57 -0500 (Thu, 26 Feb 2009) | 4 lines Changed paths: M /trunk/samtools/misc/wgsim.c @@ -1816,7 +2200,7 @@ Changed paths: not like long read names. ------------------------------------------------------------------------ -r141 | lh3lh3 | 2009-02-26 14:53:03 +0000 (Thu, 26 Feb 2009) | 6 lines +r141 | lh3lh3 | 2009-02-26 09:53:03 -0500 (Thu, 26 Feb 2009) | 6 lines Changed paths: M /trunk/samtools/ChangeLog M /trunk/samtools/misc/wgsim.c @@ -1829,7 +2213,7 @@ Changed paths: * make the script work with color reads ------------------------------------------------------------------------ -r140 | lh3lh3 | 2009-02-26 14:02:57 +0000 (Thu, 26 Feb 2009) | 2 lines +r140 | lh3lh3 | 2009-02-26 09:02:57 -0500 (Thu, 26 Feb 2009) | 2 lines Changed paths: M /trunk/samtools/misc/Makefile M /trunk/samtools/misc/wgsim.c @@ -1837,7 +2221,7 @@ Changed paths: * wgsim: added a note ------------------------------------------------------------------------ -r139 | lh3lh3 | 2009-02-26 11:39:08 +0000 (Thu, 26 Feb 2009) | 7 lines +r139 | lh3lh3 | 2009-02-26 06:39:08 -0500 (Thu, 26 Feb 2009) | 7 lines Changed paths: M /trunk/samtools/misc/wgsim.c M /trunk/samtools/misc/wgsim_eval.pl @@ -1850,7 +2234,7 @@ Changed paths: * change in accordant with wgsim ------------------------------------------------------------------------ -r129 | lh3lh3 | 2009-02-18 22:23:27 +0000 (Wed, 18 Feb 2009) | 3 lines +r129 | lh3lh3 | 2009-02-18 17:23:27 -0500 (Wed, 18 Feb 2009) | 3 lines Changed paths: M /trunk/samtools/bam_index.c M /trunk/samtools/bamtk.c @@ -1859,14 +2243,14 @@ Changed paths: * fixed a bug in bam_fetch, caused by completely contained adjacent chunks ------------------------------------------------------------------------ -r128 | bhandsaker | 2009-02-18 19:06:57 +0000 (Wed, 18 Feb 2009) | 2 lines +r128 | bhandsaker | 2009-02-18 14:06:57 -0500 (Wed, 18 Feb 2009) | 2 lines Changed paths: M /trunk/samtools/bamtk.c Fix annoying segv when invalid region specified. ------------------------------------------------------------------------ -r127 | lh3lh3 | 2009-02-17 10:49:55 +0000 (Tue, 17 Feb 2009) | 2 lines +r127 | lh3lh3 | 2009-02-17 05:49:55 -0500 (Tue, 17 Feb 2009) | 2 lines Changed paths: D /trunk/samtools/misc/indel_filter.pl A /trunk/samtools/misc/samtools.pl @@ -1874,7 +2258,7 @@ Changed paths: * move indel_filter.pl to samtools.pl ------------------------------------------------------------------------ -r126 | lh3lh3 | 2009-02-14 21:22:30 +0000 (Sat, 14 Feb 2009) | 3 lines +r126 | lh3lh3 | 2009-02-14 16:22:30 -0500 (Sat, 14 Feb 2009) | 3 lines Changed paths: M /trunk/samtools/bam_mate.c M /trunk/samtools/bamtk.c @@ -1883,7 +2267,7 @@ Changed paths: * fixed a bug in fixmate: SE reads are flagged as BAM_FMUNMAP ------------------------------------------------------------------------ -r125 | lh3lh3 | 2009-02-13 09:54:45 +0000 (Fri, 13 Feb 2009) | 3 lines +r125 | lh3lh3 | 2009-02-13 04:54:45 -0500 (Fri, 13 Feb 2009) | 3 lines Changed paths: M /trunk/samtools/bam_stat.c M /trunk/samtools/bamtk.c @@ -1892,7 +2276,7 @@ Changed paths: * fixed a minor bug in flagstat ------------------------------------------------------------------------ -r124 | lh3lh3 | 2009-02-12 11:15:32 +0000 (Thu, 12 Feb 2009) | 3 lines +r124 | lh3lh3 | 2009-02-12 06:15:32 -0500 (Thu, 12 Feb 2009) | 3 lines Changed paths: M /trunk/samtools/bam_maqcns.c M /trunk/samtools/bamtk.c @@ -1902,7 +2286,7 @@ Changed paths: * improve indel caller by setting maximum window size ------------------------------------------------------------------------ -r123 | lh3lh3 | 2009-02-12 10:30:29 +0000 (Thu, 12 Feb 2009) | 2 lines +r123 | lh3lh3 | 2009-02-12 05:30:29 -0500 (Thu, 12 Feb 2009) | 2 lines Changed paths: M /trunk/samtools/bam_plcmd.c M /trunk/samtools/bamtk.c @@ -1910,14 +2294,14 @@ Changed paths: * output max mapping quality in indel line ------------------------------------------------------------------------ -r122 | lh3lh3 | 2009-02-11 10:59:10 +0000 (Wed, 11 Feb 2009) | 2 lines +r122 | lh3lh3 | 2009-02-11 05:59:10 -0500 (Wed, 11 Feb 2009) | 2 lines Changed paths: M /trunk/samtools/misc/maq2sam.c fixed a bug in generating tag AM ------------------------------------------------------------------------ -r121 | lh3lh3 | 2009-02-03 10:43:11 +0000 (Tue, 03 Feb 2009) | 2 lines +r121 | lh3lh3 | 2009-02-03 05:43:11 -0500 (Tue, 03 Feb 2009) | 2 lines Changed paths: M /trunk/samtools/bam_index.c M /trunk/samtools/bamtk.c @@ -1925,14 +2309,14 @@ Changed paths: fixed a potential memory problem in indexing ------------------------------------------------------------------------ -r120 | bhandsaker | 2009-02-02 15:52:52 +0000 (Mon, 02 Feb 2009) | 2 lines +r120 | bhandsaker | 2009-02-02 10:52:52 -0500 (Mon, 02 Feb 2009) | 2 lines Changed paths: M /trunk/samtools/Makefile Pass LIBS to recursive targets to facilitate building at Broad. ------------------------------------------------------------------------ -r119 | lh3lh3 | 2009-02-02 10:12:15 +0000 (Mon, 02 Feb 2009) | 4 lines +r119 | lh3lh3 | 2009-02-02 05:12:15 -0500 (Mon, 02 Feb 2009) | 4 lines Changed paths: M /trunk/samtools/ChangeLog M /trunk/samtools/bam_plcmd.c @@ -1944,7 +2328,7 @@ Changed paths: * improve flagstat report a little bit ------------------------------------------------------------------------ -r118 | lh3lh3 | 2009-01-29 12:33:23 +0000 (Thu, 29 Jan 2009) | 3 lines +r118 | lh3lh3 | 2009-01-29 07:33:23 -0500 (Thu, 29 Jan 2009) | 3 lines Changed paths: M /trunk/samtools/Makefile A /trunk/samtools/bam_stat.c @@ -1954,7 +2338,7 @@ Changed paths: * added flagstat command ------------------------------------------------------------------------ -r116 | lh3lh3 | 2009-01-28 13:31:12 +0000 (Wed, 28 Jan 2009) | 2 lines +r116 | lh3lh3 | 2009-01-28 08:31:12 -0500 (Wed, 28 Jan 2009) | 2 lines Changed paths: M /trunk/samtools/ChangeLog M /trunk/samtools/NEWS @@ -1964,28 +2348,28 @@ Changed paths: Release SAMtools-0.1.2 ------------------------------------------------------------------------ -r115 | lh3lh3 | 2009-01-28 12:54:08 +0000 (Wed, 28 Jan 2009) | 2 lines +r115 | lh3lh3 | 2009-01-28 07:54:08 -0500 (Wed, 28 Jan 2009) | 2 lines Changed paths: A /trunk/samtools/misc/indel_filter.pl Script for filtering indel results ------------------------------------------------------------------------ -r114 | lh3lh3 | 2009-01-25 11:45:37 +0000 (Sun, 25 Jan 2009) | 2 lines +r114 | lh3lh3 | 2009-01-25 06:45:37 -0500 (Sun, 25 Jan 2009) | 2 lines Changed paths: A /trunk/samtools/misc/zoom2sam.pl convert ZOOM to SAM ------------------------------------------------------------------------ -r113 | lh3lh3 | 2009-01-24 14:25:07 +0000 (Sat, 24 Jan 2009) | 2 lines +r113 | lh3lh3 | 2009-01-24 09:25:07 -0500 (Sat, 24 Jan 2009) | 2 lines Changed paths: A /trunk/samtools/misc/novo2sam.pl add a script to convert novo alignment to SAM ------------------------------------------------------------------------ -r112 | lh3lh3 | 2009-01-23 20:57:39 +0000 (Fri, 23 Jan 2009) | 2 lines +r112 | lh3lh3 | 2009-01-23 15:57:39 -0500 (Fri, 23 Jan 2009) | 2 lines Changed paths: M /trunk/samtools/ChangeLog M /trunk/samtools/ChangeLog.old @@ -1994,7 +2378,7 @@ Changed paths: update documentation and ChangeLog ------------------------------------------------------------------------ -r111 | lh3lh3 | 2009-01-23 19:22:59 +0000 (Fri, 23 Jan 2009) | 3 lines +r111 | lh3lh3 | 2009-01-23 14:22:59 -0500 (Fri, 23 Jan 2009) | 3 lines Changed paths: M /trunk/samtools/bam_sort.c M /trunk/samtools/bamtk.c @@ -2003,7 +2387,7 @@ Changed paths: * fixed a bug in "merge" command line ------------------------------------------------------------------------ -r110 | lh3lh3 | 2009-01-22 15:36:48 +0000 (Thu, 22 Jan 2009) | 3 lines +r110 | lh3lh3 | 2009-01-22 10:36:48 -0500 (Thu, 22 Jan 2009) | 3 lines Changed paths: M /trunk/samtools/misc/Makefile A /trunk/samtools/misc/bowtie2sam.pl (from /branches/dev/samtools/misc/bowtie2sam.pl:108) @@ -2016,7 +2400,7 @@ Changed paths: * all future development will happen here ------------------------------------------------------------------------ -r109 | lh3lh3 | 2009-01-22 15:14:27 +0000 (Thu, 22 Jan 2009) | 3 lines +r109 | lh3lh3 | 2009-01-22 10:14:27 -0500 (Thu, 22 Jan 2009) | 3 lines Changed paths: M /trunk/samtools/COPYING M /trunk/samtools/ChangeLog @@ -2053,7 +2437,7 @@ Changed paths: * all future development will happen here at trunk/ ------------------------------------------------------------------------ -r79 | bhandsaker | 2009-01-07 21:42:15 +0000 (Wed, 07 Jan 2009) | 2 lines +r79 | bhandsaker | 2009-01-07 16:42:15 -0500 (Wed, 07 Jan 2009) | 2 lines Changed paths: M /trunk/samtools/bam_maqcns.c M /trunk/samtools/bam_tview.c @@ -2061,14 +2445,14 @@ Changed paths: Fix problem with compiling without curses. ------------------------------------------------------------------------ -r63 | lh3lh3 | 2008-12-22 15:58:02 +0000 (Mon, 22 Dec 2008) | 2 lines +r63 | lh3lh3 | 2008-12-22 10:58:02 -0500 (Mon, 22 Dec 2008) | 2 lines Changed paths: A /trunk/samtools (from /branches/dev/samtools:62) Create trunk copy ------------------------------------------------------------------------ -r62 | lh3lh3 | 2008-12-22 15:55:13 +0000 (Mon, 22 Dec 2008) | 2 lines +r62 | lh3lh3 | 2008-12-22 10:55:13 -0500 (Mon, 22 Dec 2008) | 2 lines Changed paths: A /branches/dev/samtools/NEWS M /branches/dev/samtools/bamtk.c @@ -2077,7 +2461,7 @@ Changed paths: Release samtools-0.1.1 ------------------------------------------------------------------------ -r61 | lh3lh3 | 2008-12-22 15:46:08 +0000 (Mon, 22 Dec 2008) | 10 lines +r61 | lh3lh3 | 2008-12-22 10:46:08 -0500 (Mon, 22 Dec 2008) | 10 lines Changed paths: M /branches/dev/samtools/bam_aux.c M /branches/dev/samtools/bam_index.c @@ -2098,7 +2482,7 @@ Changed paths: * prepare to release 0.1.1 ------------------------------------------------------------------------ -r60 | lh3lh3 | 2008-12-22 15:10:16 +0000 (Mon, 22 Dec 2008) | 2 lines +r60 | lh3lh3 | 2008-12-22 10:10:16 -0500 (Mon, 22 Dec 2008) | 2 lines Changed paths: A /branches/dev/samtools/examples A /branches/dev/samtools/examples/00README.txt @@ -2109,14 +2493,14 @@ Changed paths: example ------------------------------------------------------------------------ -r59 | lh3lh3 | 2008-12-22 09:38:15 +0000 (Mon, 22 Dec 2008) | 2 lines +r59 | lh3lh3 | 2008-12-22 04:38:15 -0500 (Mon, 22 Dec 2008) | 2 lines Changed paths: M /branches/dev/samtools/ChangeLog update ChangeLog ------------------------------------------------------------------------ -r58 | lh3lh3 | 2008-12-20 23:06:00 +0000 (Sat, 20 Dec 2008) | 3 lines +r58 | lh3lh3 | 2008-12-20 18:06:00 -0500 (Sat, 20 Dec 2008) | 3 lines Changed paths: M /branches/dev/samtools/misc/export2sam.pl @@ -2124,14 +2508,14 @@ Changed paths: * fixed several bugs ------------------------------------------------------------------------ -r57 | lh3lh3 | 2008-12-20 15:44:20 +0000 (Sat, 20 Dec 2008) | 2 lines +r57 | lh3lh3 | 2008-12-20 10:44:20 -0500 (Sat, 20 Dec 2008) | 2 lines Changed paths: A /branches/dev/samtools/misc/export2sam.pl convert Export format to SAM; not thoroughly tested ------------------------------------------------------------------------ -r56 | lh3lh3 | 2008-12-19 22:13:28 +0000 (Fri, 19 Dec 2008) | 6 lines +r56 | lh3lh3 | 2008-12-19 17:13:28 -0500 (Fri, 19 Dec 2008) | 6 lines Changed paths: M /branches/dev/samtools/bam_import.c M /branches/dev/samtools/bam_plcmd.c @@ -2146,14 +2530,14 @@ Changed paths: * tview: fixed a minor bug ------------------------------------------------------------------------ -r55 | lh3lh3 | 2008-12-19 20:10:26 +0000 (Fri, 19 Dec 2008) | 2 lines +r55 | lh3lh3 | 2008-12-19 15:10:26 -0500 (Fri, 19 Dec 2008) | 2 lines Changed paths: D /branches/dev/samtools/misc/all2sam.pl remove all2sam.pl ------------------------------------------------------------------------ -r54 | lh3lh3 | 2008-12-16 22:34:25 +0000 (Tue, 16 Dec 2008) | 2 lines +r54 | lh3lh3 | 2008-12-16 17:34:25 -0500 (Tue, 16 Dec 2008) | 2 lines Changed paths: A /branches/dev/samtools/COPYING M /branches/dev/samtools/bam.h @@ -2166,7 +2550,7 @@ Changed paths: Added copyright information and a bit more documentation. No code change. ------------------------------------------------------------------------ -r53 | lh3lh3 | 2008-12-16 13:40:18 +0000 (Tue, 16 Dec 2008) | 3 lines +r53 | lh3lh3 | 2008-12-16 08:40:18 -0500 (Tue, 16 Dec 2008) | 3 lines Changed paths: M /branches/dev/samtools/bam.c M /branches/dev/samtools/bam.h @@ -2178,7 +2562,7 @@ Changed paths: * improved efficiency of the indel caller for spliced alignments ------------------------------------------------------------------------ -r52 | lh3lh3 | 2008-12-16 10:28:20 +0000 (Tue, 16 Dec 2008) | 3 lines +r52 | lh3lh3 | 2008-12-16 05:28:20 -0500 (Tue, 16 Dec 2008) | 3 lines Changed paths: M /branches/dev/samtools/bam.c M /branches/dev/samtools/bam.h @@ -2190,7 +2574,7 @@ Changed paths: * a bit code cleanup: reduce the dependency between source files ------------------------------------------------------------------------ -r51 | lh3lh3 | 2008-12-15 14:29:32 +0000 (Mon, 15 Dec 2008) | 3 lines +r51 | lh3lh3 | 2008-12-15 09:29:32 -0500 (Mon, 15 Dec 2008) | 3 lines Changed paths: M /branches/dev/samtools/bam_maqcns.c M /branches/dev/samtools/bam_plcmd.c @@ -2200,7 +2584,7 @@ Changed paths: * fixed a memory leak ------------------------------------------------------------------------ -r50 | lh3lh3 | 2008-12-15 14:00:13 +0000 (Mon, 15 Dec 2008) | 2 lines +r50 | lh3lh3 | 2008-12-15 09:00:13 -0500 (Mon, 15 Dec 2008) | 2 lines Changed paths: M /branches/dev/samtools/ChangeLog M /branches/dev/samtools/bam.h @@ -2209,7 +2593,7 @@ Changed paths: update documentation, ChangeLog and a comment ------------------------------------------------------------------------ -r49 | lh3lh3 | 2008-12-15 13:36:43 +0000 (Mon, 15 Dec 2008) | 6 lines +r49 | lh3lh3 | 2008-12-15 08:36:43 -0500 (Mon, 15 Dec 2008) | 6 lines Changed paths: M /branches/dev/samtools/Makefile M /branches/dev/samtools/bam.h @@ -2227,7 +2611,7 @@ Changed paths: * updated documentation ------------------------------------------------------------------------ -r48 | lh3lh3 | 2008-12-12 13:55:36 +0000 (Fri, 12 Dec 2008) | 3 lines +r48 | lh3lh3 | 2008-12-12 08:55:36 -0500 (Fri, 12 Dec 2008) | 3 lines Changed paths: M /branches/dev/samtools/bam_maqcns.c M /branches/dev/samtools/bamtk.c @@ -2236,7 +2620,7 @@ Changed paths: * fixed another bug in maqcns when there is a nearby deletion ------------------------------------------------------------------------ -r47 | lh3lh3 | 2008-12-12 13:42:16 +0000 (Fri, 12 Dec 2008) | 5 lines +r47 | lh3lh3 | 2008-12-12 08:42:16 -0500 (Fri, 12 Dec 2008) | 5 lines Changed paths: M /branches/dev/samtools/bam_maqcns.c M /branches/dev/samtools/bam_pileup.c @@ -2248,7 +2632,7 @@ Changed paths: I am not quite sure why the previous version may have problem. ------------------------------------------------------------------------ -r46 | lh3lh3 | 2008-12-12 11:44:56 +0000 (Fri, 12 Dec 2008) | 6 lines +r46 | lh3lh3 | 2008-12-12 06:44:56 -0500 (Fri, 12 Dec 2008) | 6 lines Changed paths: M /branches/dev/samtools/bam_pileup.c M /branches/dev/samtools/bamtk.c @@ -2260,21 +2644,21 @@ Changed paths: bindings. ------------------------------------------------------------------------ -r45 | bhandsaker | 2008-12-11 20:43:56 +0000 (Thu, 11 Dec 2008) | 2 lines +r45 | bhandsaker | 2008-12-11 15:43:56 -0500 (Thu, 11 Dec 2008) | 2 lines Changed paths: M /branches/dev/samtools/bgzf.c Fix bug in tell() after reads that consume to the exact end of a block. ------------------------------------------------------------------------ -r44 | lh3lh3 | 2008-12-11 09:36:53 +0000 (Thu, 11 Dec 2008) | 2 lines +r44 | lh3lh3 | 2008-12-11 04:36:53 -0500 (Thu, 11 Dec 2008) | 2 lines Changed paths: M /branches/dev/samtools/samtools.1 update manual ------------------------------------------------------------------------ -r43 | lh3lh3 | 2008-12-11 09:25:36 +0000 (Thu, 11 Dec 2008) | 4 lines +r43 | lh3lh3 | 2008-12-11 04:25:36 -0500 (Thu, 11 Dec 2008) | 4 lines Changed paths: M /branches/dev/samtools/bam_import.c M /branches/dev/samtools/bamtk.c @@ -2284,7 +2668,7 @@ Changed paths: * made the parser a bit more robust ------------------------------------------------------------------------ -r42 | lh3lh3 | 2008-12-10 14:57:29 +0000 (Wed, 10 Dec 2008) | 5 lines +r42 | lh3lh3 | 2008-12-10 09:57:29 -0500 (Wed, 10 Dec 2008) | 5 lines Changed paths: M /branches/dev/samtools/bam_index.c M /branches/dev/samtools/bamtk.c @@ -2296,14 +2680,14 @@ Changed paths: * in bam_index.c, check potential bugs in the underlying I/O library ------------------------------------------------------------------------ -r41 | lh3lh3 | 2008-12-10 12:53:08 +0000 (Wed, 10 Dec 2008) | 2 lines +r41 | lh3lh3 | 2008-12-10 07:53:08 -0500 (Wed, 10 Dec 2008) | 2 lines Changed paths: M /branches/dev/samtools/samtools.1 update manual ------------------------------------------------------------------------ -r40 | lh3lh3 | 2008-12-10 11:52:10 +0000 (Wed, 10 Dec 2008) | 5 lines +r40 | lh3lh3 | 2008-12-10 06:52:10 -0500 (Wed, 10 Dec 2008) | 5 lines Changed paths: M /branches/dev/samtools/bam.h M /branches/dev/samtools/bam_pileup.c @@ -2315,7 +2699,7 @@ Changed paths: * made pileup take the reference sequence ------------------------------------------------------------------------ -r39 | lh3lh3 | 2008-12-09 11:59:28 +0000 (Tue, 09 Dec 2008) | 4 lines +r39 | lh3lh3 | 2008-12-09 06:59:28 -0500 (Tue, 09 Dec 2008) | 4 lines Changed paths: M /branches/dev/samtools/bam_import.c M /branches/dev/samtools/bamtk.c @@ -2326,35 +2710,35 @@ Changed paths: * in parser, correctl parse "=" at the MRNM field. ------------------------------------------------------------------------ -r38 | lh3lh3 | 2008-12-09 11:39:07 +0000 (Tue, 09 Dec 2008) | 2 lines +r38 | lh3lh3 | 2008-12-09 06:39:07 -0500 (Tue, 09 Dec 2008) | 2 lines Changed paths: M /branches/dev/samtools/misc/maq2sam.c fixed a bug in handling maq flag 64 and 192 ------------------------------------------------------------------------ -r37 | lh3lh3 | 2008-12-09 09:53:46 +0000 (Tue, 09 Dec 2008) | 2 lines +r37 | lh3lh3 | 2008-12-09 04:53:46 -0500 (Tue, 09 Dec 2008) | 2 lines Changed paths: M /branches/dev/samtools/misc/md5fa.c also calculate unordered md5sum check ------------------------------------------------------------------------ -r36 | lh3lh3 | 2008-12-09 09:46:21 +0000 (Tue, 09 Dec 2008) | 2 lines +r36 | lh3lh3 | 2008-12-09 04:46:21 -0500 (Tue, 09 Dec 2008) | 2 lines Changed paths: M /branches/dev/samtools/misc/md5fa.c fixed a minor bug when there are space in the sequence ------------------------------------------------------------------------ -r35 | lh3lh3 | 2008-12-09 09:40:45 +0000 (Tue, 09 Dec 2008) | 2 lines +r35 | lh3lh3 | 2008-12-09 04:40:45 -0500 (Tue, 09 Dec 2008) | 2 lines Changed paths: M /branches/dev/samtools/misc/md5fa.c fixed a potential memory leak ------------------------------------------------------------------------ -r34 | lh3lh3 | 2008-12-08 14:52:17 +0000 (Mon, 08 Dec 2008) | 2 lines +r34 | lh3lh3 | 2008-12-08 09:52:17 -0500 (Mon, 08 Dec 2008) | 2 lines Changed paths: M /branches/dev/samtools/bam_import.c M /branches/dev/samtools/bam_index.c @@ -2363,14 +2747,14 @@ Changed paths: * fixed a bug in import: bin is wrongly calculated ------------------------------------------------------------------------ -r33 | lh3lh3 | 2008-12-08 14:08:01 +0000 (Mon, 08 Dec 2008) | 2 lines +r33 | lh3lh3 | 2008-12-08 09:08:01 -0500 (Mon, 08 Dec 2008) | 2 lines Changed paths: M /branches/dev/samtools/misc/all2sam.pl nothing, really ------------------------------------------------------------------------ -r32 | lh3lh3 | 2008-12-08 12:56:02 +0000 (Mon, 08 Dec 2008) | 3 lines +r32 | lh3lh3 | 2008-12-08 07:56:02 -0500 (Mon, 08 Dec 2008) | 3 lines Changed paths: M /branches/dev/samtools/Makefile M /branches/dev/samtools/kseq.h @@ -2383,7 +2767,7 @@ Changed paths: * added md5sum utilities ------------------------------------------------------------------------ -r31 | lh3lh3 | 2008-12-08 11:35:29 +0000 (Mon, 08 Dec 2008) | 5 lines +r31 | lh3lh3 | 2008-12-08 06:35:29 -0500 (Mon, 08 Dec 2008) | 5 lines Changed paths: M /branches/dev/samtools/Makefile M /branches/dev/samtools/bam_import.c @@ -2397,7 +2781,7 @@ Changed paths: * also compile stand-alone faidx ------------------------------------------------------------------------ -r30 | lh3lh3 | 2008-12-08 11:17:04 +0000 (Mon, 08 Dec 2008) | 3 lines +r30 | lh3lh3 | 2008-12-08 06:17:04 -0500 (Mon, 08 Dec 2008) | 3 lines Changed paths: M /branches/dev/samtools/bam.h M /branches/dev/samtools/bam_sort.c @@ -2407,7 +2791,7 @@ Changed paths: * sorting by read names is available ------------------------------------------------------------------------ -r29 | lh3lh3 | 2008-12-08 10:29:02 +0000 (Mon, 08 Dec 2008) | 3 lines +r29 | lh3lh3 | 2008-12-08 05:29:02 -0500 (Mon, 08 Dec 2008) | 3 lines Changed paths: M /branches/dev/samtools/bam.c M /branches/dev/samtools/bam.h @@ -2423,7 +2807,7 @@ Changed paths: * format change to meet the latest specification ------------------------------------------------------------------------ -r28 | lh3lh3 | 2008-12-04 16:09:21 +0000 (Thu, 04 Dec 2008) | 3 lines +r28 | lh3lh3 | 2008-12-04 11:09:21 -0500 (Thu, 04 Dec 2008) | 3 lines Changed paths: M /branches/dev/samtools/bam_maqcns.c M /branches/dev/samtools/misc/maq2sam.c @@ -2432,7 +2816,7 @@ Changed paths: * change maq2sam to meet the latest specification ------------------------------------------------------------------------ -r27 | lh3lh3 | 2008-12-04 15:55:44 +0000 (Thu, 04 Dec 2008) | 2 lines +r27 | lh3lh3 | 2008-12-04 10:55:44 -0500 (Thu, 04 Dec 2008) | 2 lines Changed paths: M /branches/dev/samtools/razf.c M /branches/dev/samtools/razf.h @@ -2440,7 +2824,7 @@ Changed paths: considerable code clean up in razf ------------------------------------------------------------------------ -r26 | lh3lh3 | 2008-12-04 15:08:18 +0000 (Thu, 04 Dec 2008) | 2 lines +r26 | lh3lh3 | 2008-12-04 10:08:18 -0500 (Thu, 04 Dec 2008) | 2 lines Changed paths: M /branches/dev/samtools/ChangeLog M /branches/dev/samtools/Makefile @@ -2449,7 +2833,7 @@ Changed paths: make RAZF optional in faidx.c ------------------------------------------------------------------------ -r25 | lh3lh3 | 2008-12-01 15:27:22 +0000 (Mon, 01 Dec 2008) | 3 lines +r25 | lh3lh3 | 2008-12-01 10:27:22 -0500 (Mon, 01 Dec 2008) | 3 lines Changed paths: M /branches/dev/samtools/Makefile M /branches/dev/samtools/bam.h @@ -2461,7 +2845,7 @@ Changed paths: * added routines for retrieving aux data, NOT TESTED YET! ------------------------------------------------------------------------ -r24 | lh3lh3 | 2008-12-01 14:29:43 +0000 (Mon, 01 Dec 2008) | 5 lines +r24 | lh3lh3 | 2008-12-01 09:29:43 -0500 (Mon, 01 Dec 2008) | 5 lines Changed paths: M /branches/dev/samtools/bam.c M /branches/dev/samtools/bam_import.c @@ -2476,7 +2860,7 @@ Changed paths: * supporting hex strings ------------------------------------------------------------------------ -r23 | lh3lh3 | 2008-11-27 17:14:37 +0000 (Thu, 27 Nov 2008) | 3 lines +r23 | lh3lh3 | 2008-11-27 12:14:37 -0500 (Thu, 27 Nov 2008) | 3 lines Changed paths: M /branches/dev/samtools/bam_maqcns.c M /branches/dev/samtools/bamtk.c @@ -2485,7 +2869,7 @@ Changed paths: * fixed the bug in maqcns ------------------------------------------------------------------------ -r22 | lh3lh3 | 2008-11-27 17:08:11 +0000 (Thu, 27 Nov 2008) | 3 lines +r22 | lh3lh3 | 2008-11-27 12:08:11 -0500 (Thu, 27 Nov 2008) | 3 lines Changed paths: M /branches/dev/samtools/Makefile M /branches/dev/samtools/bam.h @@ -2499,7 +2883,7 @@ Changed paths: * add MAQ consensus caller, currently BUGGY! ------------------------------------------------------------------------ -r21 | lh3lh3 | 2008-11-27 13:51:28 +0000 (Thu, 27 Nov 2008) | 4 lines +r21 | lh3lh3 | 2008-11-27 08:51:28 -0500 (Thu, 27 Nov 2008) | 4 lines Changed paths: M /branches/dev/samtools/bam_pileup.c M /branches/dev/samtools/bam_tview.c @@ -2510,14 +2894,14 @@ Changed paths: * better coordinates and reference sequence ------------------------------------------------------------------------ -r19 | lh3lh3 | 2008-11-27 09:26:05 +0000 (Thu, 27 Nov 2008) | 2 lines +r19 | lh3lh3 | 2008-11-27 04:26:05 -0500 (Thu, 27 Nov 2008) | 2 lines Changed paths: A /branches/dev/samtools/ChangeLog new ChangeLog ------------------------------------------------------------------------ -r18 | lh3lh3 | 2008-11-27 09:24:45 +0000 (Thu, 27 Nov 2008) | 3 lines +r18 | lh3lh3 | 2008-11-27 04:24:45 -0500 (Thu, 27 Nov 2008) | 3 lines Changed paths: D /branches/dev/samtools/ChangeLog A /branches/dev/samtools/ChangeLog.old (from /branches/dev/samtools/ChangeLog:6) @@ -2526,7 +2910,7 @@ Rename ChangeLog to ChangeLog.old. This old ChangeLog is generated from the log of my personal SVN repository. ------------------------------------------------------------------------ -r17 | lh3lh3 | 2008-11-27 09:22:55 +0000 (Thu, 27 Nov 2008) | 6 lines +r17 | lh3lh3 | 2008-11-27 04:22:55 -0500 (Thu, 27 Nov 2008) | 6 lines Changed paths: M /branches/dev/samtools/Makefile M /branches/dev/samtools/bamtk.c @@ -2539,7 +2923,7 @@ Changed paths: * use BGZF by default, now ------------------------------------------------------------------------ -r16 | lh3lh3 | 2008-11-26 21:19:11 +0000 (Wed, 26 Nov 2008) | 4 lines +r16 | lh3lh3 | 2008-11-26 16:19:11 -0500 (Wed, 26 Nov 2008) | 4 lines Changed paths: M /branches/dev/samtools/bam_index.c M /branches/dev/samtools/bamtk.c @@ -2550,14 +2934,14 @@ Changed paths: * give more warnings when the file is truncated (or due to bugs in I/O library) ------------------------------------------------------------------------ -r15 | lh3lh3 | 2008-11-26 20:41:39 +0000 (Wed, 26 Nov 2008) | 2 lines +r15 | lh3lh3 | 2008-11-26 15:41:39 -0500 (Wed, 26 Nov 2008) | 2 lines Changed paths: M /branches/dev/samtools/bgzf.c fixed a bug in bgzf.c at the end of the file ------------------------------------------------------------------------ -r14 | lh3lh3 | 2008-11-26 17:05:18 +0000 (Wed, 26 Nov 2008) | 4 lines +r14 | lh3lh3 | 2008-11-26 12:05:18 -0500 (Wed, 26 Nov 2008) | 4 lines Changed paths: M /branches/dev/samtools/bamtk.c @@ -2566,14 +2950,14 @@ Changed paths: also update the version number anyway to avoid confusion ------------------------------------------------------------------------ -r13 | lh3lh3 | 2008-11-26 17:03:48 +0000 (Wed, 26 Nov 2008) | 2 lines +r13 | lh3lh3 | 2008-11-26 12:03:48 -0500 (Wed, 26 Nov 2008) | 2 lines Changed paths: M /branches/dev/samtools/razf.c a change from Jue, but I think it should not matter ------------------------------------------------------------------------ -r12 | lh3lh3 | 2008-11-26 16:48:14 +0000 (Wed, 26 Nov 2008) | 3 lines +r12 | lh3lh3 | 2008-11-26 11:48:14 -0500 (Wed, 26 Nov 2008) | 3 lines Changed paths: M /branches/dev/samtools/razf.c @@ -2581,21 +2965,21 @@ fixed a potential bug in razf. However, it seems still buggy, just rarely happens, very rarely. ------------------------------------------------------------------------ -r11 | lh3lh3 | 2008-11-26 14:02:56 +0000 (Wed, 26 Nov 2008) | 2 lines +r11 | lh3lh3 | 2008-11-26 09:02:56 -0500 (Wed, 26 Nov 2008) | 2 lines Changed paths: M /branches/dev/samtools/razf.c fixed a bug in razf, with the help of Jue ------------------------------------------------------------------------ -r10 | lh3lh3 | 2008-11-26 11:55:32 +0000 (Wed, 26 Nov 2008) | 2 lines +r10 | lh3lh3 | 2008-11-26 06:55:32 -0500 (Wed, 26 Nov 2008) | 2 lines Changed paths: M /branches/dev/samtools/bam_index.c remove a comment ------------------------------------------------------------------------ -r9 | lh3lh3 | 2008-11-26 11:37:05 +0000 (Wed, 26 Nov 2008) | 2 lines +r9 | lh3lh3 | 2008-11-26 06:37:05 -0500 (Wed, 26 Nov 2008) | 2 lines Changed paths: M /branches/dev/samtools/Makefile M /branches/dev/samtools/bam.h @@ -2605,14 +2989,14 @@ Changed paths: * Jue has updated razf to realize Bob's scheme ------------------------------------------------------------------------ -r7 | lh3lh3 | 2008-11-25 20:37:37 +0000 (Tue, 25 Nov 2008) | 2 lines +r7 | lh3lh3 | 2008-11-25 15:37:37 -0500 (Tue, 25 Nov 2008) | 2 lines Changed paths: A /branches/dev/samtools/samtools.1 the manual page ------------------------------------------------------------------------ -r6 | lh3lh3 | 2008-11-25 20:37:16 +0000 (Tue, 25 Nov 2008) | 3 lines +r6 | lh3lh3 | 2008-11-25 15:37:16 -0500 (Tue, 25 Nov 2008) | 3 lines Changed paths: A /branches/dev/samtools/ChangeLog A /branches/dev/samtools/Makefile @@ -2648,7 +3032,7 @@ The initial version of samtools, replicated from my local SVN repository. The current version is: 0.1.0-42. All future development will happen here. ------------------------------------------------------------------------ -r5 | lh3lh3 | 2008-11-25 20:30:49 +0000 (Tue, 25 Nov 2008) | 2 lines +r5 | lh3lh3 | 2008-11-25 15:30:49 -0500 (Tue, 25 Nov 2008) | 2 lines Changed paths: A /branches/dev/samtools diff --git a/Makefile b/Makefile index 450b3ab..246ac5c 100644 --- a/Makefile +++ b/Makefile @@ -3,10 +3,10 @@ CFLAGS= -g -Wall -O2 #-m64 #-arch ppc DFLAGS= -D_FILE_OFFSET_BITS=64 -D_USE_KNETFILE -D_CURSES_LIB=1 LOBJS= bgzf.o kstring.o bam_aux.o bam.o bam_import.o sam.o bam_index.o \ bam_pileup.o bam_lpileup.o bam_md.o glf.o razf.o faidx.o knetfile.o \ - bam_sort.o + bam_sort.o sam_header.o AOBJS= bam_tview.o bam_maqcns.o bam_plcmd.o sam_view.o \ bam_rmdup.o bam_rmdupse.o bam_mate.o bam_stat.o bam_color.o \ - bamtk.o + bamtk.o kaln.o PROG= samtools INCLUDES= SUBDIRS= . misc @@ -45,7 +45,7 @@ bgzip:bgzip.o bgzf.o $(CC) $(CFLAGS) -o $@ bgzf.o bgzip.o -lz razip.o:razf.h -bam.o:bam.h razf.h bam_endian.h kstring.h +bam.o:bam.h razf.h bam_endian.h kstring.h sam_header.h sam.o:sam.h bam.h bam_import.o:bam.h kseq.h khash.h razf.h bam_pileup.o:bam.h razf.h ksort.h @@ -57,6 +57,7 @@ bam_maqcns.o:bam.h ksort.h bam_maqcns.h bam_sort.o:bam.h ksort.h razf.h bam_md.o:bam.h faidx.h glf.o:glf.h +sam_header.o:sam_header.h khash.h faidx.o:faidx.h razf.h khash.h faidx_main.o:faidx.h razf.h diff --git a/Makefile.mingw b/Makefile.mingw index 80e8009..eb6ed47 100644 --- a/Makefile.mingw +++ b/Makefile.mingw @@ -1,7 +1,7 @@ CC= gcc.exe AR= ar.exe CFLAGS= -g -Wall -O2 -DFLAGS= -D_FILE_OFFSET_BITS=64 -D_CURSES_LIB=2 -D_USE_KNETFILE +DFLAGS= -D_CURSES_LIB=2 -D_USE_KNETFILE LOBJS= bgzf.o kstring.o bam_aux.o bam.o bam_import.o sam.o bam_index.o \ bam_pileup.o bam_lpileup.o bam_md.o glf.o razf.o faidx.o bam_sort.o \ knetfile.o diff --git a/NEWS b/NEWS index 8e0ba35..8db0996 100644 --- a/NEWS +++ b/NEWS @@ -1,3 +1,43 @@ +Beta Release 0.1.7 (10 November, 2009) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Notable changes: + + * Improved the indel caller in complex scenariors, in particular for + long reads. The indel caller is now able to make reasonable indel + calls from Craig Venter capillary reads. + + * Rewrote single-end duplicate removal with improved + performance. Paired-end reads are not touched. + + * Duplicate removal is now library aware. Samtools remove potential + PCR/optical dupliates inside a library rather than across libraries. + + * SAM header is now fully parsed, although this functionality is not + used in merging and so on. + + * In samtools merge, optionally take the input file name as RG-ID and + attach the RG tag to each alignment. + + * Added FTP support in the RAZF library. RAZF-compressed reference + sequence can be retrieved remotely. + + * Improved network support for Win32. + + * Samtools sort and merge are now stable. + +Changes in other utilities: + + * Implemented sam2vcf.pl that converts the pileup format to the VCF + format. + + * This release of samtools is known to work with the latest + Bio-Samtools Perl module. + +(0.1.7: 10 November 2009, r510) + + + Beta Release 0.1.6 (2 September, 2009) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/bam.c b/bam.c index 619b46a..ee7642b 100644 --- a/bam.c +++ b/bam.c @@ -1,9 +1,11 @@ #include #include +#include #include #include "bam.h" #include "bam_endian.h" #include "kstring.h" +#include "sam_header.h" int bam_is_be = 0; char *bam_flag2char_table = "pPuUrR12sfd\0\0\0\0\0"; @@ -12,28 +14,6 @@ char *bam_flag2char_table = "pPuUrR12sfd\0\0\0\0\0"; * CIGAR related routines * **************************/ -int bam_segreg(int32_t pos, const bam1_core_t *c, const uint32_t *cigar, bam_segreg_t *reg) -{ - unsigned k; - int32_t x = c->pos, y = 0; - int state = 0; - for (k = 0; k < c->n_cigar; ++k) { - int op = cigar[k] & BAM_CIGAR_MASK; // operation - int l = cigar[k] >> BAM_CIGAR_SHIFT; // length - if (state == 0 && (op == BAM_CMATCH || op == BAM_CDEL || op == BAM_CINS) && x + l > pos) { - reg->tbeg = x; reg->qbeg = y; reg->cbeg = k; - state = 1; - } - if (op == BAM_CMATCH) { x += l; y += l; } - else if (op == BAM_CDEL || op == BAM_CREF_SKIP) x += l; - else if (op == BAM_CINS || op == BAM_CSOFT_CLIP) y += l; - if (state == 1 && (op == BAM_CSOFT_CLIP || op == BAM_CHARD_CLIP || op == BAM_CREF_SKIP || k == c->n_cigar - 1)) { - reg->tend = x; reg->qend = y; reg->cend = k; - } - } - return state? 0 : -1; -} - uint32_t bam_calend(const bam1_core_t *c, const uint32_t *cigar) { uint32_t k, end; @@ -80,10 +60,9 @@ void bam_header_destroy(bam_header_t *header) free(header->target_len); } free(header->text); -#ifndef BAM_NO_HASH - if (header->rg2lib) bam_strmap_destroy(header->rg2lib); + if (header->dict) sam_header_free(header->dict); + if (header->rg2lib) sam_tbl_destroy(header->rg2lib); bam_destroy_header_hash(header); -#endif free(header); } @@ -94,7 +73,11 @@ bam_header_t *bam_header_read(bamFile fp) int32_t i = 1, name_len; // check EOF i = bgzf_check_EOF(fp); - if (i < 0) fprintf(stderr, "[bam_header_read] read from pipe; skip EOF checking.\n"); + if (i < 0) { + // If the file is a pipe, checking the EOF marker will *always* fail + // with ESPIPE. Suppress the error message in this case. + if (errno != ESPIPE) perror("[bam_header_read] bgzf_check_EOF"); + } else if (i == 0) fprintf(stderr, "[bam_header_read] EOF marker is absent.\n"); // read "BAM1" if (bam_read(fp, buf, 4) != 4) return 0; @@ -308,3 +291,13 @@ void bam_view1(const bam_header_t *header, const bam1_t *b) printf("%s\n", s); free(s); } + +// FIXME: we should also check the LB tag associated with each alignment +const char *bam_get_library(bam_header_t *h, const bam1_t *b) +{ + const uint8_t *rg; + if (h->dict == 0) h->dict = sam_header_parse2(h->text); + if (h->rg2lib == 0) h->rg2lib = sam_header2tbl(h->dict, "RG", "ID", "LB"); + rg = bam_aux_get(b, "RG"); + return (rg == 0)? 0 : sam_tbl_get(h->rg2lib, (const char*)(rg + 1)); +} diff --git a/bam.h b/bam.h index ec983df..291b303 100644 --- a/bam.h +++ b/bam.h @@ -73,6 +73,7 @@ typedef gzFile bamFile; @field n_targets number of reference sequences @field target_name names of the reference sequences @field target_len lengths of the referene sequences + @field dict header dictionary @field hash hash table for fast name lookup @field rg2lib hash table for @RG-ID -> LB lookup @field l_text length of the plain text in the header @@ -85,7 +86,7 @@ typedef struct { int32_t n_targets; char **target_name; uint32_t *target_len; - void *hash, *rg2lib; + void *dict, *hash, *rg2lib; int l_text; char *text; } bam_header_t; @@ -423,8 +424,8 @@ extern "C" { @abstract Free the memory allocated for an alignment. @param b pointer to an alignment */ -#define bam_destroy1(b) do { \ - free((b)->data); free(b); \ +#define bam_destroy1(b) do { \ + if (b) { free((b)->data); free(b); } \ } while (0) /*! @@ -437,6 +438,8 @@ extern "C" { char *bam_format1_core(const bam_header_t *header, const bam1_t *b, int of); + const char *bam_get_library(bam_header_t *header, const bam1_t *b); + /*! @typedef @abstract Structure for one alignment covering the pileup position. @field b pointer to the alignment @@ -632,14 +635,6 @@ extern "C" { */ int32_t bam_cigar2qlen(const bam1_core_t *c, const uint32_t *cigar); - typedef struct { - int32_t qbeg, qend; - int32_t tbeg, tend; - int32_t cbeg, cend; - } bam_segreg_t; - - int bam_segreg(int32_t pos, const bam1_core_t *c, const uint32_t *cigar, bam_segreg_t *reg); - #ifdef __cplusplus } #endif diff --git a/bam_aux.c b/bam_aux.c index d0d733f..89e99f2 100644 --- a/bam_aux.c +++ b/bam_aux.c @@ -180,70 +180,3 @@ char *bam_aux2Z(const uint8_t *s) if (type == 'Z' || type == 'H') return (char*)s; else return 0; } - -/****************** - * rg2lib related * - ******************/ - -int bam_strmap_put(void *rg2lib, const char *rg, const char *lib) -{ - int ret; - khint_t k; - khash_t(r2l) *h = (khash_t(r2l)*)rg2lib; - char *key; - if (h == 0) return 1; - key = strdup(rg); - k = kh_put(r2l, h, key, &ret); - if (ret) kh_val(h, k) = strdup(lib); - else { - fprintf(stderr, "[bam_rg2lib_put] duplicated @RG ID: %s\n", rg); - free(key); - } - return 0; -} - -const char *bam_strmap_get(const void *rg2lib, const char *rg) -{ - const khash_t(r2l) *h = (const khash_t(r2l)*)rg2lib; - khint_t k; - if (h == 0) return 0; - k = kh_get(r2l, h, rg); - if (k != kh_end(h)) return (const char*)kh_val(h, k); - else return 0; -} - -void *bam_strmap_dup(const void *rg2lib) -{ - const khash_t(r2l) *h = (const khash_t(r2l)*)rg2lib; - khash_t(r2l) *g; - khint_t k, l; - int ret; - if (h == 0) return 0; - g = kh_init(r2l); - for (k = kh_begin(h); k < kh_end(h); ++k) { - if (kh_exist(h, k)) { - char *key = strdup(kh_key(h, k)); - l = kh_put(r2l, g, key, &ret); - kh_val(g, l) = strdup(kh_val(h, k)); - } - } - return g; -} - -void *bam_strmap_init() -{ - return (void*)kh_init(r2l); -} - -void bam_strmap_destroy(void *rg2lib) -{ - khash_t(r2l) *h = (khash_t(r2l)*)rg2lib; - khint_t k; - if (h == 0) return; - for (k = kh_begin(h); k < kh_end(h); ++k) { - if (kh_exist(h, k)) { - free((char*)kh_key(h, k)); free(kh_val(h, k)); - } - } - kh_destroy(r2l, h); -} diff --git a/bam_import.c b/bam_import.c index 1dc906e..9d463d1 100644 --- a/bam_import.c +++ b/bam_import.c @@ -10,6 +10,7 @@ #endif #include "kstring.h" #include "bam.h" +#include "sam_header.h" #include "kseq.h" #include "khash.h" @@ -170,107 +171,26 @@ static inline void append_text(bam_header_t *header, kstring_t *str) header->text[header->l_text] = 0; } -int sam_header_parse_rg(bam_header_t *h) -{ - kstring_t *rgid, *rglib; - char *p, *q, *s, *r; - int n = 0; - - // free - if (h == 0) return 0; - bam_strmap_destroy(h->rg2lib); h->rg2lib = 0; - if (h->l_text < 3) return 0; - // parse @RG lines - h->rg2lib = bam_strmap_init(); - rgid = calloc(1, sizeof(kstring_t)); - rglib = calloc(1, sizeof(kstring_t)); - s = h->text; - while ((s = strstr(s, "@RG")) != 0) { - if (rgid->l && rglib->l) { - bam_strmap_put(h->rg2lib, rgid->s, rglib->s); - ++n; - } - rgid->l = rglib->l = 0; - s += 3; - r = s; - if ((p = strstr(s, "ID:")) != 0) { - q = p + 3; - for (p = q; *p && *p != '\t' && *p != '\r' && *p != '\n'; ++p); - kputsn(q, p - q, rgid); - } else { - fprintf(stderr, "[bam_header_parse] missing ID tag in @RG lines.\n"); - break; - } - if (r < p) r = p; - if ((p = strstr(s, "LB:")) != 0) { - q = p + 3; - for (p = q; *p && *p != '\t' && *p != '\r' && *p != '\n'; ++p); - kputsn(q, p - q, rglib); - } else { - fprintf(stderr, "[bam_header_parse] missing LB tag in @RG lines.\n"); - break; - } - if (r < p) r = p; - s = r + 3; - } - if (rgid->l && rglib->l) { - bam_strmap_put(h->rg2lib, rgid->s, rglib->s); - ++n; - } - free(rgid->s); free(rgid); - free(rglib->s); free(rglib); - if (n == 0) { - bam_strmap_destroy(h->rg2lib); - h->rg2lib = 0; - } - return n; -} - int sam_header_parse(bam_header_t *h) { + char **tmp; int i; - char *s, *p, *q, *r; - - // free free(h->target_len); free(h->target_name); h->n_targets = 0; h->target_len = 0; h->target_name = 0; if (h->l_text < 3) return 0; - // count number of @SQ - s = h->text; - while ((s = strstr(s, "@SQ")) != 0) { - ++h->n_targets; - s += 3; - } + if (h->dict == 0) h->dict = sam_header_parse2(h->text); + tmp = sam_header2list(h->dict, "SQ", "SN", &h->n_targets); if (h->n_targets == 0) return 0; - h->target_len = (uint32_t*)calloc(h->n_targets, 4); - h->target_name = (char**)calloc(h->n_targets, sizeof(void*)); - // parse @SQ lines - i = 0; - s = h->text; - while ((s = strstr(s, "@SQ")) != 0) { - s += 3; - r = s; - if ((p = strstr(s, "SN:")) != 0) { - q = p + 3; - for (p = q; *p && *p != '\t' && *p != '\r' && *p != '\n'; ++p); - h->target_name[i] = (char*)calloc(p - q + 1, 1); - strncpy(h->target_name[i], q, p - q); - } else goto header_err_ret; - if (r < p) r = p; - if ((p = strstr(s, "LN:")) != 0) h->target_len[i] = strtol(p + 3, 0, 10); - else goto header_err_ret; - if (r < p) r = p; - s = r + 3; - ++i; - } - sam_header_parse_rg(h); + h->target_name = calloc(h->n_targets, sizeof(void*)); + for (i = 0; i < h->n_targets; ++i) + h->target_name[i] = strdup(tmp[i]); + free(tmp); + tmp = sam_header2list(h->dict, "SQ", "LN", &h->n_targets); + h->target_len = calloc(h->n_targets, 4); + for (i = 0; i < h->n_targets; ++i) + h->target_len[i] = atoi(tmp[i]); + free(tmp); return h->n_targets; - -header_err_ret: - fprintf(stderr, "[bam_header_parse] missing SN or LN tag in @SQ lines.\n"); - free(h->target_len); free(h->target_name); - h->n_targets = 0; h->target_len = 0; h->target_name = 0; - return 0; } bam_header_t *sam_header_read(tamFile fp) diff --git a/bam_index.c b/bam_index.c index 4ff6bd4..a627884 100644 --- a/bam_index.c +++ b/bam_index.c @@ -476,7 +476,8 @@ static inline int is_overlap(uint32_t beg, uint32_t end, const bam1_t *b) return (rend > beg && rbeg < end); } -int bam_fetch(bamFile fp, const bam_index_t *idx, int tid, int beg, int end, void *data, bam_fetch_f func) +// bam_fetch helper function retrieves +pair64_t * get_chunk_coordinates(const bam_index_t *idx, int tid, int beg, int end, int* cnt_off) { uint16_t *bins; int i, n_bins, n_off; @@ -507,10 +508,8 @@ int bam_fetch(bamFile fp, const bam_index_t *idx, int tid, int beg, int end, voi } free(bins); { - bam1_t *b; - int l, ret, n_seeks; - uint64_t curr_off; - b = (bam1_t*)calloc(1, sizeof(bam1_t)); + bam1_t *b = (bam1_t*)calloc(1, sizeof(bam1_t)); + int l; ks_introsort(off, n_off, off); // resolve completely contained adjacent blocks for (i = 1, l = 0; i < n_off; ++i) @@ -533,8 +532,23 @@ int bam_fetch(bamFile fp, const bam_index_t *idx, int tid, int beg, int end, voi n_off = l + 1; #endif } + bam_destroy1(b); + } + *cnt_off = n_off; + return off; +} + +int bam_fetch(bamFile fp, const bam_index_t *idx, int tid, int beg, int end, void *data, bam_fetch_f func) +{ + int n_off; + pair64_t *off = get_chunk_coordinates(idx, tid, beg, end, &n_off); + if (off == 0) return 0; + { // retrive alignments + uint64_t curr_off; + int i, ret, n_seeks; n_seeks = 0; i = -1; curr_off = 0; + bam1_t *b = (bam1_t*)calloc(1, sizeof(bam1_t)); for (;;) { if (curr_off == 0 || curr_off >= off[i].v) { // then jump to the next chunk if (i == n_off - 1) break; // no more chunks diff --git a/bam_maqcns.c b/bam_maqcns.c index f36b0ee..71c2185 100644 --- a/bam_maqcns.c +++ b/bam_maqcns.c @@ -1,10 +1,13 @@ #include +#include #include "bam.h" #include "bam_maqcns.h" #include "ksort.h" +#include "kaln.h" KSORT_INIT_GENERIC(uint32_t) -#define MAX_WINDOW 33 +#define INDEL_WINDOW_SIZE 50 +#define INDEL_EXT_DEP 0.9 typedef struct __bmc_aux_t { int max; @@ -22,14 +25,13 @@ char bam_nt16_nt4_table[] = { 4, 0, 1, 4, 2, 4, 4, 4, 3, 4, 4, 4, 4, 4, 4, 4 }; /* P() = \theta \sum_{i=1}^{N-1} 1/i P(D|) = \sum_{k=1}^{N-1} p_k 1/2 [(k/N)^n_2(1-k/N)^n_1 + (k/N)^n1(1-k/N)^n_2] - p_k = i/k / \sum_{i=1}^{N-1} 1/i + p_k = 1/k / \sum_{i=1}^{N-1} 1/i */ static void cal_het(bam_maqcns_t *aa) { int k, n1, n2; double sum_harmo; // harmonic sum double poly_rate; - double p1 = 0.0, p3 = 0.0; // just for testing free(aa->lhet); aa->lhet = (double*)calloc(256 * 256, sizeof(double)); @@ -39,7 +41,7 @@ static void cal_het(bam_maqcns_t *aa) for (n1 = 0; n1 < 256; ++n1) { for (n2 = 0; n2 < 256; ++n2) { long double sum = 0.0; - double lC = lgamma(n1+n2+1) - lgamma(n1+1) - lgamma(n2+1); // \binom{n1+n2}{n1} + double lC = aa->is_soap? 0 : lgamma(n1+n2+1) - lgamma(n1+1) - lgamma(n2+1); // \binom{n1+n2}{n1} for (k = 1; k <= aa->n_hap - 1; ++k) { double pk = 1.0 / k / sum_harmo; double log1 = log((double)k/aa->n_hap); @@ -47,8 +49,6 @@ static void cal_het(bam_maqcns_t *aa) sum += pk * 0.5 * (expl(log1*n2) * expl(log2*n1) + expl(log1*n1) * expl(log2*n2)); } aa->lhet[n1<<8|n2] = lC + logl(sum); - if (n1 == 17 && n2 == 3) p3 = lC + logl(expl(logl(0.5) * 20)); - if (n1 == 19 && n2 == 1) p1 = lC + logl(expl(logl(0.5) * 20)); } } poly_rate = aa->het_rate * sum_harmo; @@ -62,16 +62,18 @@ static void cal_coef(bam_maqcns_t *aa) long double sum_a[257], b[256], q_c[256], tmp[256], fk2[256]; double *lC; - lC = (double*)calloc(256 * 256, sizeof(double)); // aa->lhet will be allocated and initialized free(aa->fk); free(aa->coef); + aa->coef = 0; aa->fk = (double*)calloc(256, sizeof(double)); - aa->coef = (double*)calloc(256*256*64, sizeof(double)); aa->fk[0] = fk2[0] = 1.0; for (n = 1; n != 256; ++n) { aa->fk[n] = pow(aa->theta, n) * (1.0 - aa->eta) + aa->eta; fk2[n] = aa->fk[n>>1]; // this is an approximation, assuming reads equally likely come from both strands } + if (aa->is_soap) return; + aa->coef = (double*)calloc(256*256*64, sizeof(double)); + lC = (double*)calloc(256 * 256, sizeof(double)); for (n = 1; n != 256; ++n) for (k = 1; k <= n; ++k) lC[n<<8|k] = lgamma(n+1) - lgamma(k+1) - lgamma(n-k+1); @@ -170,7 +172,7 @@ glf1_t *bam_maqcns_glfgen(int _n, const bam_pileup1_t *pl, uint8_t ref_base, bam if (w[k] < 0xff) ++w[k]; ++b->c[k&3]; } - tmp = (int)(info&0x7f) < bm->cap_mapQ? (int)(info&0x7f) : bm->cap_mapQ; + tmp = (int)(info&0xff) < bm->cap_mapQ? (int)(info&0xff) : bm->cap_mapQ; rms += tmp * tmp; } b->rms_mapQ = (uint8_t)(sqrt((double)rms / n) + .499); @@ -180,56 +182,73 @@ glf1_t *bam_maqcns_glfgen(int _n, const bam_pileup1_t *pl, uint8_t ref_base, bam for (j = 0; j != 4; ++j) b->c[j] = (int)(254.0 * b->c[j] / c + 0.5); for (j = c = 0; j != 4; ++j) c += b->c[j]; } - // generate likelihood - for (j = 0; j != 4; ++j) { - // homozygous - float tmp1, tmp3; - int tmp2, bar_e; - for (k = 0, tmp1 = tmp3 = 0.0, tmp2 = 0; k != 4; ++k) { - if (j == k) continue; - tmp1 += b->esum[k]; tmp2 += b->c[k]; tmp3 += b->fsum[k]; - } - if (tmp2) { - bar_e = (int)(tmp1 / tmp3 + 0.5); - if (bar_e < 4) bar_e = 4; // should not happen - if (bar_e > 63) bar_e = 63; - p[j<<2|j] = tmp1 + bm->coef[bar_e<<16|c<<8|tmp2]; - } else p[j<<2|j] = 0.0; // all the bases are j - // heterozygous - for (k = j + 1; k < 4; ++k) { - for (i = 0, tmp2 = 0, tmp1 = tmp3 = 0.0; i != 4; ++i) { - if (i == j || i == k) continue; - tmp1 += b->esum[i]; tmp2 += b->c[i]; tmp3 += b->fsum[i]; + if (!bm->is_soap) { + // generate likelihood + for (j = 0; j != 4; ++j) { + // homozygous + float tmp1, tmp3; + int tmp2, bar_e; + for (k = 0, tmp1 = tmp3 = 0.0, tmp2 = 0; k != 4; ++k) { + if (j == k) continue; + tmp1 += b->esum[k]; tmp2 += b->c[k]; tmp3 += b->fsum[k]; } if (tmp2) { bar_e = (int)(tmp1 / tmp3 + 0.5); - if (bar_e < 4) bar_e = 4; + if (bar_e < 4) bar_e = 4; // should not happen if (bar_e > 63) bar_e = 63; - p[j<<2|k] = p[k<<2|j] = -4.343 * bm->lhet[b->c[j]<<8|b->c[k]] + tmp1 + bm->coef[bar_e<<16|c<<8|tmp2]; - } else p[j<<2|k] = p[k<<2|j] = -4.343 * bm->lhet[b->c[j]<<8|b->c[k]]; // all the bases are either j or k + p[j<<2|j] = tmp1 + bm->coef[bar_e<<16|c<<8|tmp2]; + } else p[j<<2|j] = 0.0; // all the bases are j + // heterozygous + for (k = j + 1; k < 4; ++k) { + for (i = 0, tmp2 = 0, tmp1 = tmp3 = 0.0; i != 4; ++i) { + if (i == j || i == k) continue; + tmp1 += b->esum[i]; tmp2 += b->c[i]; tmp3 += b->fsum[i]; + } + if (tmp2) { + bar_e = (int)(tmp1 / tmp3 + 0.5); + if (bar_e < 4) bar_e = 4; + if (bar_e > 63) bar_e = 63; + p[j<<2|k] = p[k<<2|j] = -4.343 * bm->lhet[b->c[j]<<8|b->c[k]] + tmp1 + bm->coef[bar_e<<16|c<<8|tmp2]; + } else p[j<<2|k] = p[k<<2|j] = -4.343 * bm->lhet[b->c[j]<<8|b->c[k]]; // all the bases are either j or k + } + // + for (k = 0; k != 4; ++k) + if (p[j<<2|k] < 0.0) p[j<<2|k] = 0.0; } - // - for (k = 0; k != 4; ++k) - if (p[j<<2|k] < 0.0) p[j<<2|k] = 0.0; - } - { // fix p[k<<2|k] - float max1, max2, min1, min2; - int max_k, min_k; - max_k = min_k = -1; - max1 = max2 = -1.0; min1 = min2 = 1e30; - for (k = 0; k < 4; ++k) { - if (b->esum[k] > max1) { - max2 = max1; max1 = b->esum[k]; max_k = k; - } else if (b->esum[k] > max2) max2 = b->esum[k]; + { // fix p[k<<2|k] + float max1, max2, min1, min2; + int max_k, min_k; + max_k = min_k = -1; + max1 = max2 = -1.0; min1 = min2 = 1e30; + for (k = 0; k < 4; ++k) { + if (b->esum[k] > max1) { + max2 = max1; max1 = b->esum[k]; max_k = k; + } else if (b->esum[k] > max2) max2 = b->esum[k]; + } + for (k = 0; k < 4; ++k) { + if (p[k<<2|k] < min1) { + min2 = min1; min1 = p[k<<2|k]; min_k = k; + } else if (p[k<<2|k] < min2) min2 = p[k<<2|k]; + } + if (max1 > max2 && (min_k != max_k || min1 + 1.0 > min2)) + p[max_k<<2|max_k] = min1 > 1.0? min1 - 1.0 : 0.0; } - for (k = 0; k < 4; ++k) { - if (p[k<<2|k] < min1) { - min2 = min1; min1 = p[k<<2|k]; min_k = k; - } else if (p[k<<2|k] < min2) min2 = p[k<<2|k]; + } else { // apply the SOAP model + // generate likelihood + for (j = 0; j != 4; ++j) { + float tmp; + // homozygous + for (k = 0, tmp = 0.0; k != 4; ++k) + if (j != k) tmp += b->esum[k]; + p[j<<2|j] = tmp; + // heterozygous + for (k = j + 1; k < 4; ++k) { + for (i = 0, tmp = 0.0; i != 4; ++i) + if (i != j && i != k) tmp += b->esum[i]; + p[j<<2|k] = p[k<<2|j] = -4.343 * bm->lhet[b->c[j]<<8|b->c[k]] + tmp; + } } - if (max1 > max2 && (min_k != max_k || min1 + 1.0 > min2)) - p[max_k<<2|max_k] = min1 > 1.0? min1 - 1.0 : 0.0; } // convert necessary information to glf1_t @@ -304,12 +323,40 @@ void bam_maqindel_ret_destroy(bam_maqindel_ret_t *mir) free(mir->s[0]); free(mir->s[1]); free(mir); } +int bam_tpos2qpos(const bam1_core_t *c, const uint32_t *cigar, int32_t tpos, int is_left, int32_t *_tpos) +{ + int k, x = c->pos, y = 0, last_y = 0; + *_tpos = c->pos; + for (k = 0; k < c->n_cigar; ++k) { + int op = cigar[k] & BAM_CIGAR_MASK; + int l = cigar[k] >> BAM_CIGAR_SHIFT; + if (op == BAM_CMATCH) { + if (c->pos > tpos) return y; + if (x + l > tpos) { + *_tpos = tpos; + return y + (tpos - x); + } + x += l; y += l; + last_y = y; + } else if (op == BAM_CINS || op == BAM_CSOFT_CLIP) y += l; + else if (op == BAM_CDEL || op == BAM_CREF_SKIP) { + if (x + l > tpos) { + *_tpos = is_left? x : x + l; + return y; + } + x += l; + } + } + *_tpos = x; + return last_y; +} + #define MINUS_CONST 0x10000000 bam_maqindel_ret_t *bam_maqindel(int n, int pos, const bam_maqindel_opt_t *mi, const bam_pileup1_t *pl, const char *ref, int _n_types, int *_types) { - int i, j, n_types, *types, left, right; + int i, j, n_types, *types, left, right, max_rd_len = 0; bam_maqindel_ret_t *ret = 0; // if there is no proposed indel, check if there is an indel from the alignment if (_n_types == 0) { @@ -329,6 +376,8 @@ bam_maqindel_ret_t *bam_maqindel(int n, int pos, const bam_maqindel_opt_t *mi, c const bam_pileup1_t *p = pl + i; if (!(p->b->core.flag&BAM_FUNMAP) && p->indel != 0) aux[m++] = MINUS_CONST + p->indel; + j = bam_cigar2qlen(&p->b->core, bam1_cigar(p->b)); + if (j > max_rd_len) max_rd_len = j; } if (_n_types) // then also add this to aux[] for (i = 0; i < _n_types; ++i) @@ -347,23 +396,17 @@ bam_maqindel_ret_t *bam_maqindel(int n, int pos, const bam_maqindel_opt_t *mi, c free(aux); } { // calculate left and right boundary - bam_segreg_t seg; - left = 0x7fffffff; right = 0; - for (i = 0; i < n; ++i) { - const bam_pileup1_t *p = pl + i; - if (!(p->b->core.flag&BAM_FUNMAP)) { - bam_segreg(pos, &p->b->core, bam1_cigar(p->b), &seg); - if (seg.tbeg < left) left = seg.tbeg; - if (seg.tend > right) right = seg.tend; - } - } - if (pos - left > MAX_WINDOW) left = pos - MAX_WINDOW; - if (right - pos> MAX_WINDOW) right = pos + MAX_WINDOW; + left = pos > INDEL_WINDOW_SIZE? pos - INDEL_WINDOW_SIZE : 0; + right = pos + INDEL_WINDOW_SIZE; + if (types[0] < 0) right -= types[0]; + // in case the alignments stand out the reference + for (i = pos; i < right; ++i) + if (ref[i] == 0) break; + right = i; } { // the core part - char *ref2, *inscns = 0; + char *ref2, *rs, *inscns = 0; int k, l, *score, *pscore, max_ins = types[n_types-1]; - ref2 = (char*)calloc(right - left + types[n_types-1] + 2, 1); if (max_ins > 0) { // get the consensus of inserted sequences int *inscns_aux = (int*)calloc(4 * n_types * max_ins, sizeof(int)); // count occurrences @@ -396,52 +439,68 @@ bam_maqindel_ret_t *bam_maqindel(int n, int pos, const bam_maqindel_opt_t *mi, c free(inscns_aux); } // calculate score + ref2 = (char*)calloc(right - left + types[n_types-1] + 2, 1); + rs = (char*)calloc(right - left + max_rd_len + types[n_types-1] + 2, 1); score = (int*)calloc(n_types * n, sizeof(int)); pscore = (int*)calloc(n_types * n, sizeof(int)); for (i = 0; i < n_types; ++i) { + ka_param_t ap = ka_param_blast; + ap.band_width = 2 * types[n_types - 1] + 2; // write ref2 for (k = 0, j = left; j <= pos; ++j) - ref2[k++] = bam_nt16_table[(int)ref[j]]; + ref2[k++] = bam_nt16_nt4_table[bam_nt16_table[(int)ref[j]]]; if (types[i] <= 0) j += -types[i]; else for (l = 0; l < types[i]; ++l) - ref2[k++] = inscns[i*max_ins + l]; + ref2[k++] = bam_nt16_nt4_table[(int)inscns[i*max_ins + l]]; for (; j < right && ref[j]; ++j) - ref2[k++] = bam_nt16_table[(int)ref[j]]; + ref2[k++] = bam_nt16_nt4_table[bam_nt16_table[(int)ref[j]]]; + if (j < right) right = j; // calculate score for each read for (j = 0; j < n; ++j) { const bam_pileup1_t *p = pl + j; - uint32_t *cigar; - bam1_core_t *c = &p->b->core; - int s, ps; - bam_segreg_t seg; - if (c->flag&BAM_FUNMAP) continue; - cigar = bam1_cigar(p->b); - bam_segreg(pos, c, cigar, &seg); - for (ps = s = 0, l = seg.qbeg; c->pos + l < right && l < seg.qend; ++l) { - int cq = bam1_seqi(bam1_seq(p->b), l), ct; - // in the following line, "<" will happen if reads are too long - ct = c->pos + l - seg.qbeg >= left? ref2[c->pos + l - seg.qbeg - left] : 15; - if (cq < 15 && ct < 15) { - s += cq == ct? 1 : -mi->mm_penalty; - if (cq != ct) ps += bam1_qual(p->b)[l]; + int qbeg, qend, tbeg, tend; + if (p->b->core.flag & BAM_FUNMAP) continue; + qbeg = bam_tpos2qpos(&p->b->core, bam1_cigar(p->b), left, 0, &tbeg); + qend = bam_tpos2qpos(&p->b->core, bam1_cigar(p->b), right, 1, &tend); + assert(tbeg >= left); + for (l = qbeg; l < qend; ++l) + rs[l - qbeg] = bam_nt16_nt4_table[bam1_seqi(bam1_seq(p->b), l)]; + { + int x, y, n_acigar, ps; + uint32_t *acigar; + ps = 0; + if (tend - tbeg + types[i] <= 0) { + score[i*n+j] = -(1<<20); + pscore[i*n+j] = 1<<20; + continue; } - } - score[i*n + j] = s; pscore[i*n + j] = ps; - if (types[i] != 0) { // then try the other way to calculate the score - for (ps = s = 0, l = seg.qbeg; c->pos + l + types[i] < right && l < seg.qend; ++l) { - int cq = bam1_seqi(bam1_seq(p->b), l), ct; - ct = c->pos + l - seg.qbeg + types[i] >= left? ref2[c->pos + l - seg.qbeg + types[i] - left] : 15; - if (cq < 15 && ct < 15) { - s += cq == ct? 1 : -mi->mm_penalty; - if (cq != ct) ps += bam1_qual(p->b)[l]; + acigar = ka_global_core((uint8_t*)ref2 + tbeg - left, tend - tbeg + types[i], (uint8_t*)rs, qend - qbeg, &ap, &score[i*n+j], &n_acigar); + x = tbeg - left; y = 0; + for (l = 0; l < n_acigar; ++l) { + int op = acigar[l]&0xf; + int len = acigar[l]>>4; + if (op == BAM_CMATCH) { + int k; + for (k = 0; k < len; ++k) + if (ref2[x+k] != rs[y+k]) ps += bam1_qual(p->b)[y+k]; + x += len; y += len; + } else if (op == BAM_CINS || op == BAM_CSOFT_CLIP) { + if (op == BAM_CINS) ps += mi->q_indel * len; + y += len; + } else if (op == BAM_CDEL) { + ps += mi->q_indel * len; + x += len; } } + pscore[i*n+j] = ps; + /*if (pos == 2618517) { // for debugging only + fprintf(stderr, "pos=%d, type=%d, j=%d, score=%d, psore=%d, %d, %d, %d, %d, ", pos+1, types[i], j, score[i*n+j], pscore[i*n+j], tbeg, tend, qbeg, qend); + for (l = 0; l < n_acigar; ++l) fprintf(stderr, "%d%c", acigar[l]>>4, "MIDS"[acigar[l]&0xf]); fprintf(stderr, "\n"); + for (l = 0; l < tend - tbeg + types[i]; ++l) fputc("ACGTN"[ref2[l]], stderr); fputc('\n', stderr); + for (l = 0; l < qend - qbeg; ++l) fputc("ACGTN"[rs[l]], stderr); fputc('\n', stderr); + }*/ + free(acigar); } - if (score[i*n+j] < s) score[i*n+j] = s; // choose the higher of the two scores - if (pscore[i*n+j] > ps) pscore[i*n+j] = ps; - //if (types[i] != 0) score[i*n+j] -= mi->indel_err; - //printf("%d, %d, %d, %d, %d, %d, %d\n", p->b->core.pos + 1, seg.qbeg, i, types[i], j, - // score[i*n+j], pscore[i*n+j]); } } { // get final result @@ -491,13 +550,20 @@ bam_maqindel_ret_t *bam_maqindel(int n, int pos, const bam_maqindel_opt_t *mi, c else if (p->indel == ret->indel2) ++ret->cnt2; else ++ret->cnt_anti; } - // write gl[] - ret->gl[0] = ret->gl[1] = 0; - for (j = 0; j < n; ++j) { - int s1 = pscore[max1_i*n + j], s2 = pscore[max2_i*n + j]; - //printf("%d, %d, %d, %d, %d\n", pl[j].b->core.pos+1, max1_i, max2_i, s1, s2); - if (s1 > s2) ret->gl[0] += s1 - s2 < mi->q_indel? s1 - s2 : mi->q_indel; - else ret->gl[1] += s2 - s1 < mi->q_indel? s2 - s1 : mi->q_indel; + { // write gl[] + int tmp, seq_err = 0; + double x = 1.0; + tmp = max1_i - max2_i; + if (tmp < 0) tmp = -tmp; + for (j = 0; j < tmp + 1; ++j) x *= INDEL_EXT_DEP; + seq_err = mi->q_indel * (1.0 - x) / (1.0 - INDEL_EXT_DEP); + ret->gl[0] = ret->gl[1] = 0; + for (j = 0; j < n; ++j) { + int s1 = pscore[max1_i*n + j], s2 = pscore[max2_i*n + j]; + //printf("%d, %d, %d, %d, %d\n", pl[j].b->core.pos+1, max1_i, max2_i, s1, s2); + if (s1 > s2) ret->gl[0] += s1 - s2 < seq_err? s1 - s2 : seq_err; + else ret->gl[1] += s2 - s1 < seq_err? s2 - s1 : seq_err; + } } // write cnt_ref and cnt_ambi if (max1_i != 0 && max2_i != 0) { @@ -509,7 +575,7 @@ bam_maqindel_ret_t *bam_maqindel(int n, int pos, const bam_maqindel_opt_t *mi, c } } } - free(score); free(pscore); free(ref2); free(inscns); + free(score); free(pscore); free(ref2); free(rs); free(inscns); } { // call genotype int q[3], qr_indel = (int)(-4.343 * log(mi->r_indel) + 0.5); diff --git a/bam_maqcns.h b/bam_maqcns.h index 2a82aee..fa5489d 100644 --- a/bam_maqcns.h +++ b/bam_maqcns.h @@ -7,7 +7,7 @@ struct __bmc_aux_t; typedef struct { float het_rate, theta; - int n_hap, cap_mapQ; + int n_hap, cap_mapQ, is_soap; float eta, q_r; double *fk, *coef; diff --git a/bam_md.c b/bam_md.c index 8d07487..3ca7309 100644 --- a/bam_md.c +++ b/bam_md.c @@ -132,8 +132,11 @@ int bam_fillmd(int argc, char *argv[]) free(ref); ref = fai_fetch(fai, fp->header->target_name[b->core.tid], &len); tid = b->core.tid; + if (ref == 0) + fprintf(stderr, "[bam_fillmd] fail to find sequence '%s' in the reference.\n", + fp->header->target_name[tid]); } - bam_fillmd1(b, ref, is_equal); + if (ref) bam_fillmd1(b, ref, is_equal); } samwrite(fpout, b); } diff --git a/bam_plcmd.c b/bam_plcmd.c index 5bf1ed0..ba787a9 100644 --- a/bam_plcmd.c +++ b/bam_plcmd.c @@ -121,9 +121,11 @@ static int glt3_func(uint32_t tid, uint32_t pos, int n, const bam_pileup1_t *pu, g3->offset = pos - d->last_pos; d->last_pos = pos; glf3_write1(d->fp_glf, g3); - if (proposed_indels) - r = bam_maqindel(n, pos, d->ido, pu, d->ref, proposed_indels[0], proposed_indels+1); - else r = bam_maqindel(n, pos, d->ido, pu, d->ref, 0, 0); + if (pos < d->len) { + if (proposed_indels) + r = bam_maqindel(n, pos, d->ido, pu, d->ref, proposed_indels[0], proposed_indels+1); + else r = bam_maqindel(n, pos, d->ido, pu, d->ref, 0, 0); + } if (r) { // then write indel line int het = 3 * n, min; min = het; @@ -182,7 +184,7 @@ static int pileup_func(uint32_t tid, uint32_t pos, int n, const bam_pileup1_t *p // call the consensus and indel if (d->format & BAM_PLF_CNS) // call consensus cns = bam_maqcns_call(n, pu, d->c); - if ((d->format & (BAM_PLF_CNS|BAM_PLF_INDEL_ONLY)) && d->ref) { // call indels + if ((d->format & (BAM_PLF_CNS|BAM_PLF_INDEL_ONLY)) && d->ref && pos < d->len) { // call indels if (proposed_indels) // the first element gives the size of the array r = bam_maqindel(n, pos, d->ido, pu, d->ref, proposed_indels[0], proposed_indels+1); else r = bam_maqindel(n, pos, d->ido, pu, d->ref, 0, 0); @@ -299,8 +301,9 @@ int bam_pileup(int argc, char *argv[]) d->tid = -1; d->mask = BAM_DEF_MASK; d->c = bam_maqcns_init(); d->ido = bam_maqindel_opt_init(); - while ((c = getopt(argc, argv, "st:f:cT:N:r:l:im:gI:G:vM:S2")) >= 0) { + while ((c = getopt(argc, argv, "st:f:cT:N:r:l:im:gI:G:vM:S2a")) >= 0) { switch (c) { + case 'a': d->c->is_soap = 1; break; case 's': d->format |= BAM_PLF_SIMPLE; break; case 't': fn_list = strdup(optarg); break; case 'l': fn_pos = strdup(optarg); break; @@ -327,6 +330,7 @@ int bam_pileup(int argc, char *argv[]) fprintf(stderr, "Usage: samtools pileup [options] |\n\n"); fprintf(stderr, "Option: -s simple (yet incomplete) pileup format\n"); fprintf(stderr, " -S the input is in SAM\n"); + fprintf(stderr, " -a use the SOAPsnp model for SNP calling\n"); fprintf(stderr, " -2 output the 2nd best call and quality\n"); fprintf(stderr, " -i only show lines/consensus with indels\n"); fprintf(stderr, " -m INT filtering reads with bits in INT [%d]\n", d->mask); diff --git a/bam_rmdup.c b/bam_rmdup.c index 5da9460..f0d2b5d 100644 --- a/bam_rmdup.c +++ b/bam_rmdup.c @@ -2,15 +2,23 @@ #include #include #include -#include "bam.h" +#include +#include "sam.h" typedef bam1_t *bam1_p; + #include "khash.h" KHASH_SET_INIT_STR(name) KHASH_MAP_INIT_INT64(pos, bam1_p) #define BUFFER_SIZE 0x40000 +typedef struct { + uint64_t n_checked, n_removed; + khash_t(pos) *best_hash; +} lib_aux_t; +KHASH_MAP_INIT_STR(lib, lib_aux_t) + typedef struct { int n, max; bam1_t **a; @@ -25,15 +33,14 @@ static inline void stack_insert(tmp_stack_t *stack, bam1_t *b) stack->a[stack->n++] = b; } -static inline void dump_best(tmp_stack_t *stack, khash_t(pos) *best_hash, bamFile out) +static inline void dump_best(tmp_stack_t *stack, samfile_t *out) { int i; for (i = 0; i != stack->n; ++i) { - bam_write1(out, stack->a[i]); + samwrite(out, stack->a[i]); bam_destroy1(stack->a[i]); } stack->n = 0; - if (kh_size(best_hash) > BUFFER_SIZE) kh_clear(pos, best_hash); } static void clear_del_set(khash_t(name) *del_set) @@ -45,101 +52,155 @@ static void clear_del_set(khash_t(name) *del_set) kh_clear(name, del_set); } -void bam_rmdup_core(bamFile in, bamFile out) +static lib_aux_t *get_aux(khash_t(lib) *aux, const char *lib) +{ + khint_t k = kh_get(lib, aux, lib); + if (k == kh_end(aux)) { + int ret; + char *p = strdup(lib); + lib_aux_t *q; + k = kh_put(lib, aux, p, &ret); + q = &kh_val(aux, k); + q->n_checked = q->n_removed = 0; + q->best_hash = kh_init(pos); + return q; + } else return &kh_val(aux, k); +} + +static void clear_best(khash_t(lib) *aux, int max) +{ + khint_t k; + for (k = kh_begin(aux); k != kh_end(aux); ++k) { + if (kh_exist(aux, k)) { + lib_aux_t *q = &kh_val(aux, k); + if (kh_size(q->best_hash) >= max) + kh_clear(pos, q->best_hash); + } + } +} + +static inline int sum_qual(const bam1_t *b) +{ + int i, q; + uint8_t *qual = bam1_qual(b); + for (i = q = 0; i < b->core.l_qseq; ++i) q += qual[i]; + return q; +} + +void bam_rmdup_core(samfile_t *in, samfile_t *out) { - bam_header_t *header; bam1_t *b; int last_tid = -1, last_pos = -1; - uint64_t n_checked = 0, n_removed = 0; tmp_stack_t stack; khint_t k; - khash_t(pos) *best_hash; + khash_t(lib) *aux; khash_t(name) *del_set; - - best_hash = kh_init(pos); + + aux = kh_init(lib); del_set = kh_init(name); b = bam_init1(); memset(&stack, 0, sizeof(tmp_stack_t)); - header = bam_header_read(in); - bam_header_write(out, header); kh_resize(name, del_set, 4 * BUFFER_SIZE); - kh_resize(pos, best_hash, 3 * BUFFER_SIZE); - while (bam_read1(in, b) >= 0) { + while (samread(in, b) >= 0) { bam1_core_t *c = &b->core; if (c->tid != last_tid || last_pos != c->pos) { - dump_best(&stack, best_hash, out); // write the result + dump_best(&stack, out); // write the result + clear_best(aux, BUFFER_SIZE); if (c->tid != last_tid) { - kh_clear(pos, best_hash); + clear_best(aux, 0); if (kh_size(del_set)) { // check fprintf(stderr, "[bam_rmdup_core] %llu unmatched pairs\n", (long long)kh_size(del_set)); clear_del_set(del_set); } if ((int)c->tid == -1) { // append unmapped reads - bam_write1(out, b); - while (bam_read1(in, b) >= 0) bam_write1(out, b); + samwrite(out, b); + while (samread(in, b) >= 0) samwrite(out, b); break; } last_tid = c->tid; - fprintf(stderr, "[bam_rmdup_core] processing reference %s...\n", header->target_name[c->tid]); + fprintf(stderr, "[bam_rmdup_core] processing reference %s...\n", in->header->target_name[c->tid]); } } if (!(c->flag&BAM_FPAIRED) || (c->flag&(BAM_FUNMAP|BAM_FMUNMAP)) || (c->mtid >= 0 && c->tid != c->mtid)) { - bam_write1(out, b); + samwrite(out, b); } else if (c->isize > 0) { // paired, head uint64_t key = (uint64_t)c->pos<<32 | c->isize; + const char *lib; + lib_aux_t *q; int ret; - ++n_checked; - k = kh_put(pos, best_hash, key, &ret); + lib = bam_get_library(in->header, b); + q = lib? get_aux(aux, lib) : get_aux(aux, "\t"); + ++q->n_checked; + k = kh_put(pos, q->best_hash, key, &ret); if (ret == 0) { // found in best_hash - bam1_t *p = kh_val(best_hash, k); - ++n_removed; - if (p->core.qual < c->qual) { // the current alignment is better + bam1_t *p = kh_val(q->best_hash, k); + ++q->n_removed; + if (sum_qual(p) < sum_qual(b)) { // the current alignment is better; this can be accelerated in principle kh_put(name, del_set, strdup(bam1_qname(p)), &ret); // p will be removed bam_copy1(p, b); // replaced as b } else kh_put(name, del_set, strdup(bam1_qname(b)), &ret); // b will be removed if (ret == 0) fprintf(stderr, "[bam_rmdup_core] inconsistent BAM file for pair '%s'. Continue anyway.\n", bam1_qname(b)); } else { // not found in best_hash - kh_val(best_hash, k) = bam_dup1(b); - stack_insert(&stack, kh_val(best_hash, k)); + kh_val(q->best_hash, k) = bam_dup1(b); + stack_insert(&stack, kh_val(q->best_hash, k)); } } else { // paired, tail k = kh_get(name, del_set, bam1_qname(b)); if (k != kh_end(del_set)) { free((char*)kh_key(del_set, k)); kh_del(name, del_set, k); - } else bam_write1(out, b); + } else samwrite(out, b); } last_pos = c->pos; } - dump_best(&stack, best_hash, out); - bam_header_destroy(header); + for (k = kh_begin(aux); k != kh_end(aux); ++k) { + if (kh_exist(aux, k)) { + lib_aux_t *q = &kh_val(aux, k); + dump_best(&stack, out); + fprintf(stderr, "[bam_rmdup_core] %lld / %lld = %.4lf in library '%s'\n", (long long)q->n_removed, + (long long)q->n_checked, (double)q->n_removed/q->n_checked, kh_key(aux, k)); + kh_destroy(pos, q->best_hash); + free((char*)kh_key(aux, k)); + } + } + kh_destroy(lib, aux); + clear_del_set(del_set); kh_destroy(name, del_set); - kh_destroy(pos, best_hash); free(stack.a); bam_destroy1(b); - fprintf(stderr, "[bam_rmdup_core] %lld / %lld = %.4lf\n", (long long)n_removed, (long long)n_checked, - (double)n_removed/n_checked); } + +void bam_rmdupse_core(samfile_t *in, samfile_t *out, int force_se); + int bam_rmdup(int argc, char *argv[]) { - bamFile in, out; - if (argc < 3) { - fprintf(stderr, "Usage: samtools rmdup \n\n"); - fprintf(stderr, "Note: Picard is recommended for this task.\n"); + int c, is_se = 0, force_se = 0; + samfile_t *in, *out; + while ((c = getopt(argc, argv, "sS")) >= 0) { + switch (c) { + case 's': is_se = 1; break; + case 'S': force_se = is_se = 1; break; + } + } + if (optind + 2 > argc) { + fprintf(stderr, "\n"); + fprintf(stderr, "Usage: samtools rmdup [-sS] \n\n"); + fprintf(stderr, "Option: -s rmdup for SE reads\n"); + fprintf(stderr, " -S treat PE reads as SE in rmdup (force -s)\n\n"); return 1; } - in = (strcmp(argv[1], "-") == 0)? bam_dopen(fileno(stdin), "r") : bam_open(argv[1], "r"); - out = (strcmp(argv[2], "-") == 0)? bam_dopen(fileno(stdout), "w") : bam_open(argv[2], "w"); + in = samopen(argv[optind], "rb", 0); + out = samopen(argv[optind+1], "wb", in->header); if (in == 0 || out == 0) { fprintf(stderr, "[bam_rmdup] fail to read/write input files\n"); return 1; } - bam_rmdup_core(in, out); - bam_close(in); - bam_close(out); + if (is_se) bam_rmdupse_core(in, out, force_se); + else bam_rmdup_core(in, out); + samclose(in); samclose(out); return 0; } diff --git a/bam_rmdupse.c b/bam_rmdupse.c index cf1b7bd..e7dbdc7 100644 --- a/bam_rmdupse.c +++ b/bam_rmdupse.c @@ -1,178 +1,159 @@ #include #include "sam.h" #include "khash.h" +#include "klist.h" -typedef struct { - int n, m; - int *a; -} listelem_t; - -KHASH_MAP_INIT_INT(32, listelem_t) - -#define BLOCK_SIZE 65536 +#define QUEUE_CLEAR_SIZE 0x100000 +#define MAX_POS 0x7fffffff typedef struct { + int endpos; + uint32_t score:31, discarded:1; bam1_t *b; - int rpos, score; -} elem_t; +} elem_t, *elem_p; +#define __free_elem(p) bam_destroy1((p)->data.b) +KLIST_INIT(q, elem_t, __free_elem) +typedef klist_t(q) queue_t; + +KHASH_MAP_INIT_INT(best, elem_p) +typedef khash_t(best) besthash_t; typedef struct { - int n, max, x; - elem_t *buf; -} buffer_t; + uint64_t n_checked, n_removed; + besthash_t *left, *rght; +} lib_aux_t; +KHASH_MAP_INIT_STR(lib, lib_aux_t) -static int fill_buf(samfile_t *in, buffer_t *buf) +static lib_aux_t *get_aux(khash_t(lib) *aux, const char *lib) { - int i, ret, last_tid, min_rpos = 0x7fffffff, capacity; - bam1_t *b = bam_init1(); - bam1_core_t *c = &b->core; - // squeeze out the empty cells at the beginning - for (i = 0; i < buf->n; ++i) - if (buf->buf[i].b) break; - if (i < buf->n) { // squeeze - if (i > 0) { - memmove(buf->buf, buf->buf + i, sizeof(elem_t) * (buf->n - i)); - buf->n = buf->n - i; - } - } else buf->n = 0; - // calculate min_rpos - for (i = 0; i < buf->n; ++i) { - elem_t *e = buf->buf + i; - if (e->b && e->rpos >= 0 && e->rpos < min_rpos) - min_rpos = buf->buf[i].rpos; - } - // fill the buffer - buf->x = -1; - last_tid = buf->n? buf->buf[0].b->core.tid : -1; - capacity = buf->n + BLOCK_SIZE; - while ((ret = samread(in, b)) >= 0) { - elem_t *e; - uint8_t *qual = bam1_qual(b); - int is_mapped; - if (last_tid < 0) last_tid = c->tid; - if (c->tid != last_tid) { - if (buf->x < 0) buf->x = buf->n; - } - if (buf->n >= buf->max) { // enlarge - buf->max = buf->max? buf->max<<1 : 8; - buf->buf = (elem_t*)realloc(buf->buf, sizeof(elem_t) * buf->max); - } - e = &buf->buf[buf->n++]; - e->b = bam_dup1(b); - e->rpos = -1; e->score = 0; - for (i = 0; i < c->l_qseq; ++i) e->score += qual[i] + 1; - e->score = (double)e->score / sqrt(c->l_qseq + 1); - is_mapped = (c->tid < 0 || c->tid >= in->header->n_targets || (c->flag&BAM_FUNMAP))? 0 : 1; - if (!is_mapped) e->score = -1; - if (is_mapped && (c->flag & BAM_FREVERSE)) { - e->rpos = b->core.pos + bam_calend(&b->core, bam1_cigar(b)); - if (min_rpos > e->rpos) min_rpos = e->rpos; - } - if (buf->n >= capacity) { - if (is_mapped && c->pos <= min_rpos) capacity += BLOCK_SIZE; - else break; - } - } - if (ret >= 0 && buf->x < 0) buf->x = buf->n; - bam_destroy1(b); - return buf->n; + khint_t k = kh_get(lib, aux, lib); + if (k == kh_end(aux)) { + int ret; + char *p = strdup(lib); + lib_aux_t *q; + k = kh_put(lib, aux, p, &ret); + q = &kh_val(aux, k); + q->left = kh_init(best); + q->rght = kh_init(best); + q->n_checked = q->n_removed = 0; + return q; + } else return &kh_val(aux, k); +} + +static inline int sum_qual(const bam1_t *b) +{ + int i, q; + uint8_t *qual = bam1_qual(b); + for (i = q = 0; i < b->core.l_qseq; ++i) q += qual[i]; + return q; +} + +static inline elem_t *push_queue(queue_t *queue, const bam1_t *b, int endpos, int score) +{ + elem_t *p = kl_pushp(q, queue); + p->discarded = 0; + p->endpos = endpos; p->score = score; + if (p->b == 0) p->b = bam_init1(); + bam_copy1(p->b, b); + return p; } -static void rmdupse_buf(buffer_t *buf) +static void clear_besthash(besthash_t *h, int32_t pos) { - khash_t(32) *h; - uint32_t key; khint_t k; - int mpos, i, upper; - listelem_t *p; - mpos = 0x7fffffff; - mpos = (buf->x == buf->n)? buf->buf[buf->x-1].b->core.pos : 0x7fffffff; - upper = (buf->x < 0)? buf->n : buf->x; - // fill the hash table - h = kh_init(32); - for (i = 0; i < upper; ++i) { - elem_t *e = buf->buf + i; - int ret; - if (e->score < 0) continue; - if (e->rpos >= 0) { - if (e->rpos <= mpos) key = (uint32_t)e->rpos<<1 | 1; - else continue; - } else { - if (e->b->core.pos < mpos) key = (uint32_t)e->b->core.pos<<1; - else continue; - } - k = kh_put(32, h, key, &ret); - p = &kh_val(h, k); - if (ret == 0) { // present in the hash table - if (p->n == p->m) { - p->m <<= 1; - p->a = (int*)realloc(p->a, p->m * sizeof(int)); + for (k = kh_begin(h); k != kh_end(h); ++k) + if (kh_exist(h, k) && kh_val(h, k)->endpos <= pos) + kh_del(best, h, k); +} + +static void dump_alignment(samfile_t *out, queue_t *queue, int32_t pos, khash_t(lib) *h) +{ + if (queue->size > QUEUE_CLEAR_SIZE || pos == MAX_POS) { + khint_t k; + while (1) { + elem_t *q; + if (queue->head == queue->tail) break; + q = &kl_val(queue->head); + if (q->discarded) { + q->b->data_len = 0; + kl_shift(q, queue, 0); + continue; } - p->a[p->n++] = i; - } else { - p->m = p->n = 1; - p->a = (int*)calloc(p->m, sizeof(int)); - p->a[0] = i; + if ((q->b->core.flag&BAM_FREVERSE) && q->endpos > pos) break; + samwrite(out, q->b); + q->b->data_len = 0; + kl_shift(q, queue, 0); } - } - // rmdup - for (k = kh_begin(h); k < kh_end(h); ++k) { - if (kh_exist(h, k)) { - int max, maxi; - p = &kh_val(h, k); - // get the max - for (i = max = 0, maxi = -1; i < p->n; ++i) { - if (buf->buf[p->a[i]].score > max) { - max = buf->buf[p->a[i]].score; - maxi = i; - } - } - // mark the elements - for (i = 0; i < p->n; ++i) { - buf->buf[p->a[i]].score = -1; - if (i != maxi) { - bam_destroy1(buf->buf[p->a[i]].b); - buf->buf[p->a[i]].b = 0; - } + for (k = kh_begin(h); k != kh_end(h); ++k) { + if (kh_exist(h, k)) { + clear_besthash(kh_val(h, k).left, pos); + clear_besthash(kh_val(h, k).rght, pos); } - // free - free(p->a); } } - kh_destroy(32, h); } -static void dump_buf(buffer_t *buf, samfile_t *out) +void bam_rmdupse_core(samfile_t *in, samfile_t *out, int force_se) { - int i; - for (i = 0; i < buf->n; ++i) { - elem_t *e = buf->buf + i; - if (e->score != -1) break; - if (e->b) { - samwrite(out, e->b); - bam_destroy1(e->b); - e->b = 0; + bam1_t *b; + queue_t *queue; + khint_t k; + int last_tid = -2; + khash_t(lib) *aux; + + aux = kh_init(lib); + b = bam_init1(); + queue = kl_init(q); + while (samread(in, b) >= 0) { + bam1_core_t *c = &b->core; + int endpos = bam_calend(c, bam1_cigar(b)); + int score = sum_qual(b); + + if (last_tid != c->tid) { + if (last_tid >= 0) dump_alignment(out, queue, MAX_POS, aux); + last_tid = c->tid; + } else dump_alignment(out, queue, c->pos, aux); + if ((c->flag&BAM_FUNMAP) || ((c->flag&BAM_FPAIRED) && !force_se)) { + push_queue(queue, b, endpos, score); + } else { + const char *lib; + lib_aux_t *q; + besthash_t *h; + uint32_t key; + int ret; + lib = bam_get_library(in->header, b); + q = lib? get_aux(aux, lib) : get_aux(aux, "\t"); + ++q->n_checked; + h = (c->flag&BAM_FREVERSE)? q->rght : q->left; + key = (c->flag&BAM_FREVERSE)? endpos : c->pos; + k = kh_put(best, h, key, &ret); + if (ret == 0) { // in the hash table + elem_t *p = kh_val(h, k); + ++q->n_removed; + if (p->score < score) { + if (c->flag&BAM_FREVERSE) { // mark "discarded" and push the queue + p->discarded = 1; + kh_val(h, k) = push_queue(queue, b, endpos, score); + } else { // replace + p->score = score; p->endpos = endpos; + bam_copy1(p->b, b); + } + } // otherwise, discard the alignment + } else kh_val(h, k) = push_queue(queue, b, endpos, score); } } -} + dump_alignment(out, queue, MAX_POS, aux); -int bam_rmdupse(int argc, char *argv[]) -{ - samfile_t *in, *out; - buffer_t *buf; - if (argc < 3) { - fprintf(stderr, "Usage: samtools rmdupse \n\n"); - fprintf(stderr, "Note: Picard is recommended for this task.\n"); - return 1; - } - buf = calloc(1, sizeof(buffer_t)); - in = samopen(argv[1], "rb", 0); - out = samopen(argv[2], "wb", in->header); - while (fill_buf(in, buf)) { - rmdupse_buf(buf); - dump_buf(buf, out); + for (k = kh_begin(aux); k != kh_end(aux); ++k) { + if (kh_exist(aux, k)) { + lib_aux_t *q = &kh_val(aux, k); + fprintf(stderr, "[bam_rmdupse_core] %lld / %lld = %.4lf in library '%s'\n", (long long)q->n_removed, + (long long)q->n_checked, (double)q->n_removed/q->n_checked, kh_key(aux, k)); + kh_destroy(best, q->left); kh_destroy(best, q->rght); + free((char*)kh_key(aux, k)); + } } - samclose(in); samclose(out); - free(buf->buf); free(buf); - return 0; + kh_destroy(lib, aux); + bam_destroy1(b); + kl_destroy(q, queue); } diff --git a/bam_sort.c b/bam_sort.c index a2d3d09..9884f3d 100644 --- a/bam_sort.c +++ b/bam_sort.c @@ -33,18 +33,20 @@ static inline int strnum_cmp(const char *a, const char *b) typedef struct { int i; - uint64_t pos; + uint64_t pos, idx; bam1_t *b; } heap1_t; +#define __pos_cmp(a, b) ((a).pos > (b).pos || ((a).pos == (b).pos && ((a).i > (b).i || ((a).i == (b).i && (a).idx > (b).idx)))) + static inline int heap_lt(const heap1_t a, const heap1_t b) { if (g_is_by_qname) { int t; if (a.b == 0 || b.b == 0) return a.b == 0? 1 : 0; t = strnum_cmp(bam1_qname(a.b), bam1_qname(b.b)); - return (t > 0 || (t == 0 && a.pos > b.pos)); - } else return (a.pos > b.pos); + return (t > 0 || (t == 0 && __pos_cmp(a, b))); + } else return __pos_cmp(a, b); } KSORT_INIT(heap, heap1_t, heap_lt) @@ -69,13 +71,15 @@ static void swap_header_text(bam_header_t *h1, bam_header_t *h2) @discussion Padding information may NOT correctly maintained. This function is NOT thread safe. */ -void bam_merge_core(int by_qname, const char *out, const char *headers, int n, char * const *fn) +void bam_merge_core(int by_qname, const char *out, const char *headers, int n, char * const *fn, int add_RG) { bamFile fpout, *fp; heap1_t *heap; bam_header_t *hout = 0; bam_header_t *hheaders = NULL; - int i, j; + int i, j, *RG_len = 0; + uint64_t idx = 0; + char **RG = 0; if (headers) { tamFile fpheaders = sam_open(headers); @@ -90,10 +94,34 @@ void bam_merge_core(int by_qname, const char *out, const char *headers, int n, c g_is_by_qname = by_qname; fp = (bamFile*)calloc(n, sizeof(bamFile)); heap = (heap1_t*)calloc(n, sizeof(heap1_t)); + // prepare RG tag + if (add_RG) { + RG = (char**)calloc(n, sizeof(void*)); + RG_len = (int*)calloc(n, sizeof(int)); + for (i = 0; i != n; ++i) { + int l = strlen(fn[i]); + const char *s = fn[i]; + if (l > 4 && strcmp(s + l - 4, ".bam") == 0) l -= 4; + for (j = l - 1; j >= 0; --j) if (s[j] == '/') break; + ++j; l -= j; + RG[i] = calloc(l + 1, 1); + RG_len[i] = l; + strncpy(RG[i], s + j, l); + } + } + // read the first for (i = 0; i != n; ++i) { heap1_t *h; bam_header_t *hin; - assert(fp[i] = bam_open(fn[i], "r")); + fp[i] = bam_open(fn[i], "r"); + if (fp[i] == 0) { + int j; + fprintf(stderr, "[bam_merge_core] fail to open file %s\n", fn[i]); + for (j = 0; j < i; ++j) bam_close(fp[j]); + free(fp); free(heap); + // FIXME: possible memory leak + return; + } hin = bam_header_read(fp[i]); if (i == 0) { // the first SAM hout = hin; @@ -129,8 +157,10 @@ void bam_merge_core(int by_qname, const char *out, const char *headers, int n, c h = heap + i; h->i = i; h->b = (bam1_t*)calloc(1, sizeof(bam1_t)); - if (bam_read1(fp[i], h->b) >= 0) + if (bam_read1(fp[i], h->b) >= 0) { h->pos = ((uint64_t)h->b->core.tid<<32) | (uint32_t)h->b->core.pos<<1 | bam1_strand(h->b); + h->idx = idx++; + } else h->pos = HEAP_EMPTY; } fpout = strcmp(out, "-")? bam_open(out, "w") : bam_dopen(fileno(stdout), "w"); @@ -141,9 +171,12 @@ void bam_merge_core(int by_qname, const char *out, const char *headers, int n, c ks_heapmake(heap, n, heap); while (heap->pos != HEAP_EMPTY) { bam1_t *b = heap->b; + if (add_RG && bam_aux_get(b, "RG") == 0) + bam_aux_append(b, "RG", 'Z', RG_len[heap->i] + 1, (uint8_t*)RG[heap->i]); bam_write1_core(fpout, &b->core, b->data_len, b->data); if ((j = bam_read1(fp[heap->i], b)) >= 0) { heap->pos = ((uint64_t)b->core.tid<<32) | (uint32_t)b->core.pos<<1 | bam1_strand(b); + heap->idx = idx++; } else if (j == -1) { heap->pos = HEAP_EMPTY; free(heap->b->data); free(heap->b); @@ -152,32 +185,38 @@ void bam_merge_core(int by_qname, const char *out, const char *headers, int n, c ks_heapadjust(heap, 0, n, heap); } + if (add_RG) { + for (i = 0; i != n; ++i) free(RG[i]); + free(RG); free(RG_len); + } for (i = 0; i != n; ++i) bam_close(fp[i]); bam_close(fpout); free(fp); free(heap); } int bam_merge(int argc, char *argv[]) { - int c, is_by_qname = 0; + int c, is_by_qname = 0, add_RG = 0; char *fn_headers = NULL; - while ((c = getopt(argc, argv, "h:n")) >= 0) { + while ((c = getopt(argc, argv, "h:nr")) >= 0) { switch (c) { + case 'r': add_RG = 1; break; case 'h': fn_headers = strdup(optarg); break; case 'n': is_by_qname = 1; break; } } if (optind + 2 >= argc) { fprintf(stderr, "\n"); - fprintf(stderr, "Usage: samtools merge [-n] [-h inh.sam] [...]\n\n"); + fprintf(stderr, "Usage: samtools merge [-nr] [-h inh.sam] [...]\n\n"); fprintf(stderr, "Options: -n sort by read names\n"); + fprintf(stderr, " -r attach RG tag (inferred from file names)\n"); fprintf(stderr, " -h FILE copy the header in FILE to [in1.bam]\n\n"); fprintf(stderr, "Note: Samtools' merge does not reconstruct the @RG dictionary in the header. Users\n"); fprintf(stderr, " must provide the correct header with -h, or uses Picard which properly maintains\n"); fprintf(stderr, " the header dictionary in merging.\n\n"); return 1; } - bam_merge_core(is_by_qname, argv[optind], fn_headers, argc - optind - 1, argv + optind + 1); + bam_merge_core(is_by_qname, argv[optind], fn_headers, argc - optind - 1, argv + optind + 1, add_RG); free(fn_headers); return 0; } @@ -193,7 +232,7 @@ static inline int bam1_lt(const bam1_p a, const bam1_p b) } KSORT_INIT(sort, bam1_p, bam1_lt) -static void sort_blocks(int n, int k, bam1_p *buf, const char *prefix, const bam_header_t *h) +static void sort_blocks(int n, int k, bam1_p *buf, const char *prefix, const bam_header_t *h, int is_stdout) { char *name; int i; @@ -202,7 +241,13 @@ static void sort_blocks(int n, int k, bam1_p *buf, const char *prefix, const bam name = (char*)calloc(strlen(prefix) + 20, 1); if (n >= 0) sprintf(name, "%s.%.4d.bam", prefix, n); else sprintf(name, "%s.bam", prefix); - assert(fp = bam_open(name, "w")); + fp = is_stdout? bam_dopen(fileno(stdout), "w") : bam_open(name, "w"); + if (fp == 0) { + fprintf(stderr, "[sort_blocks] fail to create file %s.\n", name); + free(name); + // FIXME: possible memory leak + return; + } free(name); bam_header_write(fp, h); for (i = 0; i < k; ++i) @@ -224,7 +269,7 @@ static void sort_blocks(int n, int k, bam1_p *buf, const char *prefix, const bam and then merge them by calling bam_merge_core(). This function is NOT thread safe. */ -void bam_sort_core(int is_by_qname, const char *fn, const char *prefix, size_t max_mem) +void bam_sort_core_ext(int is_by_qname, const char *fn, const char *prefix, size_t max_mem, int is_stdout) { int n, ret, k, i; size_t mem; @@ -235,7 +280,10 @@ void bam_sort_core(int is_by_qname, const char *fn, const char *prefix, size_t m g_is_by_qname = is_by_qname; n = k = 0; mem = 0; fp = strcmp(fn, "-")? bam_open(fn, "r") : bam_dopen(fileno(stdin), "r"); - assert(fp); + if (fp == 0) { + fprintf(stderr, "[bam_sort_core] fail to open file %s\n", fn); + return; + } header = bam_header_read(fp); buf = (bam1_t**)calloc(max_mem / BAM_CORE_SIZE, sizeof(bam1_t*)); // write sub files @@ -246,25 +294,26 @@ void bam_sort_core(int is_by_qname, const char *fn, const char *prefix, size_t m mem += ret; ++k; if (mem >= max_mem) { - sort_blocks(n++, k, buf, prefix, header); + sort_blocks(n++, k, buf, prefix, header, is_stdout); mem = 0; k = 0; } } if (ret != -1) fprintf(stderr, "[bam_sort_core] truncated file. Continue anyway.\n"); - if (n == 0) sort_blocks(-1, k, buf, prefix, header); + if (n == 0) sort_blocks(-1, k, buf, prefix, header, is_stdout); else { // then merge char **fns, *fnout; fprintf(stderr, "[bam_sort_core] merging from %d files...\n", n+1); - sort_blocks(n++, k, buf, prefix, header); + sort_blocks(n++, k, buf, prefix, header, is_stdout); fnout = (char*)calloc(strlen(prefix) + 20, 1); - sprintf(fnout, "%s.bam", prefix); + if (is_stdout) sprintf(fnout, "-"); + else sprintf(fnout, "%s.bam", prefix); fns = (char**)calloc(n, sizeof(char*)); for (i = 0; i < n; ++i) { fns[i] = (char*)calloc(strlen(prefix) + 20, 1); sprintf(fns[i], "%s.%.4d.bam", prefix, i); } - bam_merge_core(is_by_qname, fnout, 0, n, fns); + bam_merge_core(is_by_qname, fnout, 0, n, fns, 0); free(fnout); for (i = 0; i < n; ++i) { unlink(fns[i]); @@ -283,20 +332,26 @@ void bam_sort_core(int is_by_qname, const char *fn, const char *prefix, size_t m bam_close(fp); } +void bam_sort_core(int is_by_qname, const char *fn, const char *prefix, size_t max_mem) +{ + bam_sort_core_ext(is_by_qname, fn, prefix, max_mem, 0); +} + int bam_sort(int argc, char *argv[]) { size_t max_mem = 500000000; - int c, is_by_qname = 0; - while ((c = getopt(argc, argv, "nm:")) >= 0) { + int c, is_by_qname = 0, is_stdout = 0; + while ((c = getopt(argc, argv, "nom:")) >= 0) { switch (c) { + case 'o': is_stdout = 1; break; case 'n': is_by_qname = 1; break; case 'm': max_mem = atol(optarg); break; } } if (optind + 2 > argc) { - fprintf(stderr, "Usage: samtools sort [-n] [-m ] \n"); + fprintf(stderr, "Usage: samtools sort [-on] [-m ] \n"); return 1; } - bam_sort_core(is_by_qname, argv[optind], argv[optind+1], max_mem); + bam_sort_core_ext(is_by_qname, argv[optind], argv[optind+1], max_mem, is_stdout); return 0; } diff --git a/bamtk.c b/bamtk.c index ea66672..48ac76b 100644 --- a/bamtk.c +++ b/bamtk.c @@ -9,7 +9,7 @@ #endif #ifndef PACKAGE_VERSION -#define PACKAGE_VERSION "0.1.6 (r453)" +#define PACKAGE_VERSION "0.1.7 (r510)" #endif int bam_taf2baf(int argc, char *argv[]); @@ -20,7 +20,6 @@ int bam_sort(int argc, char *argv[]); int bam_tview_main(int argc, char *argv[]); int bam_mating(int argc, char *argv[]); int bam_rmdup(int argc, char *argv[]); -int bam_rmdupse(int argc, char *argv[]); int bam_flagstat(int argc, char *argv[]); int bam_fillmd(int argc, char *argv[]); @@ -88,8 +87,8 @@ static int usage() fprintf(stderr, " glfview print GLFv3 file\n"); fprintf(stderr, " flagstat simple stats\n"); fprintf(stderr, " calmd recalculate MD/NM tags and '=' bases\n"); - fprintf(stderr, " merge merge sorted alignments (Picard recommended)\n"); - fprintf(stderr, " rmdup remove PCR duplicates (Picard recommended)\n"); + fprintf(stderr, " merge merge sorted alignments\n"); + fprintf(stderr, " rmdup remove PCR duplicates\n"); fprintf(stderr, "\n"); return 1; } @@ -113,7 +112,6 @@ int main(int argc, char *argv[]) else if (strcmp(argv[1], "faidx") == 0) return faidx_main(argc-1, argv+1); else if (strcmp(argv[1], "fixmate") == 0) return bam_mating(argc-1, argv+1); else if (strcmp(argv[1], "rmdup") == 0) return bam_rmdup(argc-1, argv+1); - else if (strcmp(argv[1], "rmdupse") == 0) return bam_rmdupse(argc-1, argv+1); else if (strcmp(argv[1], "glfview") == 0) return glf3_view_main(argc-1, argv+1); else if (strcmp(argv[1], "flagstat") == 0) return bam_flagstat(argc-1, argv+1); else if (strcmp(argv[1], "tagview") == 0) return bam_tagview(argc-1, argv+1); diff --git a/bgzf.c b/bgzf.c index 646b2b4..59f902f 100644 --- a/bgzf.c +++ b/bgzf.c @@ -199,7 +199,7 @@ bgzf_open(const char* __restrict path, const char* __restrict mode) #ifdef _WIN32 oflag |= O_BINARY; #endif - fd = open(path, oflag, 0644); + fd = open(path, oflag, 0666); if (fd == -1) return 0; fp = open_write(fd, strstr(mode, "u")? 1 : 0); } diff --git a/bgzip.c b/bgzip.c index eb88195..ac2a98e 100644 --- a/bgzip.c +++ b/bgzip.c @@ -50,7 +50,7 @@ static int write_open(const char *fn, int is_forced) int fd = -1; char c; if (!is_forced) { - if ((fd = open(fn, O_WRONLY | O_CREAT | O_TRUNC | O_EXCL, 0644)) < 0 && errno == EEXIST) { + if ((fd = open(fn, O_WRONLY | O_CREAT | O_TRUNC | O_EXCL, 0666)) < 0 && errno == EEXIST) { printf("bgzip: %s already exists; do you wish to overwrite (y or n)? ", fn); scanf("%c", &c); if (c != 'Y' && c != 'y') { @@ -60,7 +60,7 @@ static int write_open(const char *fn, int is_forced) } } if (fd < 0) { - if ((fd = open(fn, O_WRONLY | O_CREAT | O_TRUNC, 0644)) < 0) { + if ((fd = open(fn, O_WRONLY | O_CREAT | O_TRUNC, 0666)) < 0) { fprintf(stderr, "bgzip: %s: Fail to write\n", fn); exit(1); } diff --git a/examples/Makefile b/examples/Makefile index 3fe3e5a..8f0386f 100644 --- a/examples/Makefile +++ b/examples/Makefile @@ -1,4 +1,4 @@ -all:../libbam.a ../samtools ex1.glf ex1.pileup.gz ex1.bam.bai ex1.glfview.gz calDepth +all:../libbam.a ../samtools ex1.glf ex1.pileup.gz ex1.bam.bai ex1f-rmduppe.bam ex1f-rmdupse.bam ex1.glfview.gz calDepth @echo; echo \# You can now launch the viewer with: \'samtools tview ex1.bam ex1.fa\'; echo; ex1.fa.fai:ex1.fa @@ -13,6 +13,18 @@ ex1.glf:ex1.bam ex1.fa ../samtools pileup -gf ex1.fa ex1.bam > ex1.glf ex1.glfview.gz:ex1.glf ../samtools glfview ex1.glf | gzip > ex1.glfview.gz +ex1a.bam:ex1.bam + ../samtools view -h ex1.bam | awk 'BEGIN{FS=OFS="\t"}{if(/^@/)print;else{$$1=$$1"a";print}}' | ../samtools view -bS - > $@ +ex1b.bam:ex1.bam + ../samtools view -h ex1.bam | awk 'BEGIN{FS=OFS="\t"}{if(/^@/)print;else{$$1=$$1"b";print}}' | ../samtools view -bS - > $@ +ex1f.rg: + (echo "@RG ID:ex1 LB:ex1"; echo "@RG ID:ex1a LB:ex1"; echo "@RG ID:ex1b LB:ex1b") > $@ +ex1f.bam:ex1.bam ex1a.bam ex1b.bam ex1f.rg + ../samtools merge -rh ex1f.rg $@ ex1.bam ex1a.bam ex1b.bam +ex1f-rmduppe.bam:ex1f.bam + ../samtools rmdup ex1f.bam $@ +ex1f-rmdupse.bam:ex1f.bam + ../samtools rmdup -S ex1f.bam $@ ../samtools: (cd ..; make samtools) @@ -24,4 +36,4 @@ calDepth:../libbam.a calDepth.c gcc -g -Wall -O2 -I.. calDepth.c -o $@ -lm -lz -L.. -lbam clean: - rm -fr *.bam *.bai *.glf* *.fai *.pileup* *~ calDepth *.dSYM \ No newline at end of file + rm -fr *.bam *.bai *.glf* *.fai *.pileup* *~ calDepth *.dSYM ex1*.rg \ No newline at end of file diff --git a/faidx.c b/faidx.c index 055445f..811bdf8 100644 --- a/faidx.c +++ b/faidx.c @@ -28,6 +28,9 @@ extern int fseeko(FILE *stream, off_t offset, int whence); #define razf_seek(fp, offset, whence) fseeko(fp, offset, whence) #define razf_tell(fp) ftello(fp) #endif +#ifdef _USE_KNETFILE +#include "knetfile.h" +#endif struct __faidx_t { RAZF *rz; @@ -194,7 +197,7 @@ int fai_build(const char *fn) sprintf(str, "%s.fai", fn); rz = razf_open(fn, "r"); if (rz == 0) { - fprintf(stderr, "[fai_build] fail to open the FASTA file.\n"); + fprintf(stderr, "[fai_build] fail to open the FASTA file %s\n",str); free(str); return -1; } @@ -202,7 +205,7 @@ int fai_build(const char *fn) razf_close(rz); fp = fopen(str, "wb"); if (fp == 0) { - fprintf(stderr, "[fai_build] fail to write FASTA index.\n"); + fprintf(stderr, "[fai_build] fail to write FASTA index %s\n",str); fai_destroy(fai); free(str); return -1; } @@ -213,6 +216,47 @@ int fai_build(const char *fn) return 0; } +#ifdef _USE_KNETFILE +FILE *download_and_open(const char *fn) +{ + const int buf_size = 1 * 1024 * 1024; + uint8_t *buf; + FILE *fp; + knetFile *fp_remote; + const char *url = fn; + const char *p; + int l = strlen(fn); + for (p = fn + l - 1; p >= fn; --p) + if (*p == '/') break; + fn = p + 1; + + // First try to open a local copy + fp = fopen(fn, "r"); + if (fp) + return fp; + + // If failed, download from remote and open + fp_remote = knet_open(url, "rb"); + if (fp_remote == 0) { + fprintf(stderr, "[download_from_remote] fail to open remote file %s\n",url); + return NULL; + } + if ((fp = fopen(fn, "wb")) == 0) { + fprintf(stderr, "[download_from_remote] fail to create file in the working directory %s\n",fn); + knet_close(fp_remote); + return NULL; + } + buf = (uint8_t*)calloc(buf_size, 1); + while ((l = knet_read(fp_remote, buf, buf_size)) != 0) + fwrite(buf, 1, l, fp); + free(buf); + fclose(fp); + knet_close(fp_remote); + + return fopen(fn, "r"); +} +#endif + faidx_t *fai_load(const char *fn) { char *str; @@ -220,19 +264,35 @@ faidx_t *fai_load(const char *fn) faidx_t *fai; str = (char*)calloc(strlen(fn) + 5, 1); sprintf(str, "%s.fai", fn); - fp = fopen(str, "rb"); + +#ifdef _USE_KNETFILE + if (strstr(fn, "ftp://") == fn || strstr(fn, "http://") == fn) + { + fp = download_and_open(str); + if ( !fp ) + { + fprintf(stderr, "[fai_load] failed to open remote FASTA index %s\n", str); + free(str); + return 0; + } + } + else +#endif + fp = fopen(str, "rb"); if (fp == 0) { fprintf(stderr, "[fai_load] build FASTA index.\n"); fai_build(fn); - fp = fopen(str, "r"); + fp = fopen(str, "rb"); if (fp == 0) { fprintf(stderr, "[fai_load] fail to open FASTA index.\n"); free(str); return 0; } } + fai = fai_read(fp); fclose(fp); + fai->rz = razf_open(fn, "rb"); free(str); if (fai->rz == 0) { @@ -287,7 +347,7 @@ char *fai_fetch(const faidx_t *fai, const char *str, int *len) l = 0; s = (char*)malloc(end - beg + 2); razf_seek(fai->rz, val.offset + beg / val.line_blen * val.line_len + beg % val.line_blen, SEEK_SET); - while (razf_read(fai->rz, &c, 1) == 1 && l < end - beg) + while (razf_read(fai->rz, &c, 1) == 1 && l < end - beg && !fai->rz->z_err) if (isgraph(c)) s[l++] = c; s[l] = '\0'; *len = l; @@ -323,6 +383,40 @@ int faidx_main(int argc, char *argv[]) return 0; } +int faidx_fetch_nseq(const faidx_t *fai) +{ + return fai->n; +} + +char *faidx_fetch_seq(const faidx_t *fai, char *c_name, int p_beg_i, int p_end_i, int *len) +{ + int l; + char c; + khiter_t iter; + faidx1_t val; + char *seq=NULL; + + // Adjust position + iter = kh_get(s, fai->hash, c_name); + if(iter == kh_end(fai->hash)) return 0; + val = kh_value(fai->hash, iter); + if(p_end_i < p_beg_i) p_beg_i = p_end_i; + if(p_beg_i < 0) p_beg_i = 0; + else if(val.len <= p_beg_i) p_beg_i = val.len - 1; + if(p_end_i < 0) p_end_i = 0; + else if(val.len <= p_end_i) p_end_i = val.len - 1; + + // Now retrieve the sequence + l = 0; + seq = (char*)malloc(p_end_i - p_beg_i + 2); + razf_seek(fai->rz, val.offset + p_beg_i / val.line_blen * val.line_len + p_beg_i % val.line_blen, SEEK_SET); + while (razf_read(fai->rz, &c, 1) == 1 && l < p_end_i - p_beg_i + 1) + if (isgraph(c)) seq[l++] = c; + seq[l] = '\0'; + *len = l; + return seq; +} + #ifdef FAIDX_MAIN int main(int argc, char *argv[]) { return faidx_main(argc, argv); } #endif diff --git a/faidx.h b/faidx.h index 1a52fb7..1fb1b1f 100644 --- a/faidx.h +++ b/faidx.h @@ -75,6 +75,27 @@ extern "C" { */ char *fai_fetch(const faidx_t *fai, const char *reg, int *len); + /*! + @abstract Fetch the number of sequences. + @param fai Pointer to the faidx_t struct + @return The number of sequences + */ + int faidx_fetch_nseq(const faidx_t *fai); + + /*! + @abstract Fetch the sequence in a region. + @param fai Pointer to the faidx_t struct + @param c_name Region name + @param p_beg_i Beginning position number (zero-based) + @param p_end_i End position number (zero-based) + @param len Length of the region + @return Pointer to the sequence; null on failure + + @discussion The returned sequence is allocated by malloc family + and should be destroyed by end users by calling free() on it. + */ + char *faidx_fetch_seq(const faidx_t *fai, char *c_name, int p_beg_i, int p_end_i, int *len); + #ifdef __cplusplus } #endif diff --git a/kaln.c b/kaln.c new file mode 100644 index 0000000..9fa40d0 --- /dev/null +++ b/kaln.c @@ -0,0 +1,370 @@ +/* The MIT License + + Copyright (c) 2003-2006, 2008, 2009, by Heng Li + + Permission is hereby granted, free of charge, to any person obtaining + a copy of this software and associated documentation files (the + "Software"), to deal in the Software without restriction, including + without limitation the rights to use, copy, modify, merge, publish, + distribute, sublicense, and/or sell copies of the Software, and to + permit persons to whom the Software is furnished to do so, subject to + the following conditions: + + The above copyright notice and this permission notice shall be + included in all copies or substantial portions of the Software. + + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + SOFTWARE. +*/ + +#include +#include +#include +#include +#include "kaln.h" + +#define FROM_M 0 +#define FROM_I 1 +#define FROM_D 2 + +typedef struct { + int i, j; + unsigned char ctype; +} path_t; + +int aln_sm_blosum62[] = { +/* A R N D C Q E G H I L K M F P S T W Y V * X */ + 4,-1,-2,-2, 0,-1,-1, 0,-2,-1,-1,-1,-1,-2,-1, 1, 0,-3,-2, 0,-4, 0, + -1, 5, 0,-2,-3, 1, 0,-2, 0,-3,-2, 2,-1,-3,-2,-1,-1,-3,-2,-3,-4,-1, + -2, 0, 6, 1,-3, 0, 0, 0, 1,-3,-3, 0,-2,-3,-2, 1, 0,-4,-2,-3,-4,-1, + -2,-2, 1, 6,-3, 0, 2,-1,-1,-3,-4,-1,-3,-3,-1, 0,-1,-4,-3,-3,-4,-1, + 0,-3,-3,-3, 9,-3,-4,-3,-3,-1,-1,-3,-1,-2,-3,-1,-1,-2,-2,-1,-4,-2, + -1, 1, 0, 0,-3, 5, 2,-2, 0,-3,-2, 1, 0,-3,-1, 0,-1,-2,-1,-2,-4,-1, + -1, 0, 0, 2,-4, 2, 5,-2, 0,-3,-3, 1,-2,-3,-1, 0,-1,-3,-2,-2,-4,-1, + 0,-2, 0,-1,-3,-2,-2, 6,-2,-4,-4,-2,-3,-3,-2, 0,-2,-2,-3,-3,-4,-1, + -2, 0, 1,-1,-3, 0, 0,-2, 8,-3,-3,-1,-2,-1,-2,-1,-2,-2, 2,-3,-4,-1, + -1,-3,-3,-3,-1,-3,-3,-4,-3, 4, 2,-3, 1, 0,-3,-2,-1,-3,-1, 3,-4,-1, + -1,-2,-3,-4,-1,-2,-3,-4,-3, 2, 4,-2, 2, 0,-3,-2,-1,-2,-1, 1,-4,-1, + -1, 2, 0,-1,-3, 1, 1,-2,-1,-3,-2, 5,-1,-3,-1, 0,-1,-3,-2,-2,-4,-1, + -1,-1,-2,-3,-1, 0,-2,-3,-2, 1, 2,-1, 5, 0,-2,-1,-1,-1,-1, 1,-4,-1, + -2,-3,-3,-3,-2,-3,-3,-3,-1, 0, 0,-3, 0, 6,-4,-2,-2, 1, 3,-1,-4,-1, + -1,-2,-2,-1,-3,-1,-1,-2,-2,-3,-3,-1,-2,-4, 7,-1,-1,-4,-3,-2,-4,-2, + 1,-1, 1, 0,-1, 0, 0, 0,-1,-2,-2, 0,-1,-2,-1, 4, 1,-3,-2,-2,-4, 0, + 0,-1, 0,-1,-1,-1,-1,-2,-2,-1,-1,-1,-1,-2,-1, 1, 5,-2,-2, 0,-4, 0, + -3,-3,-4,-4,-2,-2,-3,-2,-2,-3,-2,-3,-1, 1,-4,-3,-2,11, 2,-3,-4,-2, + -2,-2,-2,-3,-2,-1,-2,-3, 2,-1,-1,-2,-1, 3,-3,-2,-2, 2, 7,-1,-4,-1, + 0,-3,-3,-3,-1,-2,-2,-3,-3, 3, 1,-2, 1,-1,-2,-2, 0,-3,-1, 4,-4,-1, + -4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4, 1,-4, + 0,-1,-1,-1,-2,-1,-1,-1,-1,-1,-1,-1,-1,-1,-2, 0, 0,-2,-1,-1,-4,-1 +}; + +int aln_sm_blast[] = { + 1, -3, -3, -3, -2, + -3, 1, -3, -3, -2, + -3, -3, 1, -3, -2, + -3, -3, -3, 1, -2, + -2, -2, -2, -2, -2 +}; + +ka_param_t ka_param_blast = { 5, 2, 2, aln_sm_blast, 5, 50 }; +ka_param_t ka_param_aa2aa = { 10, 2, 2, aln_sm_blosum62, 22, 50 }; + +static uint32_t *ka_path2cigar32(const path_t *path, int path_len, int *n_cigar) +{ + int i, n; + uint32_t *cigar; + unsigned char last_type; + + if (path_len == 0 || path == 0) { + *n_cigar = 0; + return 0; + } + + last_type = path->ctype; + for (i = n = 1; i < path_len; ++i) { + if (last_type != path[i].ctype) ++n; + last_type = path[i].ctype; + } + *n_cigar = n; + cigar = (uint32_t*)calloc(*n_cigar, 4); + + cigar[0] = 1u << 4 | path[path_len-1].ctype; + last_type = path[path_len-1].ctype; + for (i = path_len - 2, n = 0; i >= 0; --i) { + if (path[i].ctype == last_type) cigar[n] += 1u << 4; + else { + cigar[++n] = 1u << 4 | path[i].ctype; + last_type = path[i].ctype; + } + } + + return cigar; +} + +/***************************/ +/* START OF common_align.c */ +/***************************/ + +#define SET_INF(s) (s).M = (s).I = (s).D = MINOR_INF; + +#define set_M(MM, cur, p, sc) \ +{ \ + if ((p)->M >= (p)->I) { \ + if ((p)->M >= (p)->D) { \ + (MM) = (p)->M + (sc); (cur)->Mt = FROM_M; \ + } else { \ + (MM) = (p)->D + (sc); (cur)->Mt = FROM_D; \ + } \ + } else { \ + if ((p)->I > (p)->D) { \ + (MM) = (p)->I + (sc); (cur)->Mt = FROM_I; \ + } else { \ + (MM) = (p)->D + (sc); (cur)->Mt = FROM_D; \ + } \ + } \ +} +#define set_I(II, cur, p) \ +{ \ + if ((p)->M - gap_open > (p)->I) { \ + (cur)->It = FROM_M; \ + (II) = (p)->M - gap_open - gap_ext; \ + } else { \ + (cur)->It = FROM_I; \ + (II) = (p)->I - gap_ext; \ + } \ +} +#define set_end_I(II, cur, p) \ +{ \ + if (gap_end >= 0) { \ + if ((p)->M - gap_open > (p)->I) { \ + (cur)->It = FROM_M; \ + (II) = (p)->M - gap_open - gap_end; \ + } else { \ + (cur)->It = FROM_I; \ + (II) = (p)->I - gap_end; \ + } \ + } else set_I(II, cur, p); \ +} +#define set_D(DD, cur, p) \ +{ \ + if ((p)->M - gap_open > (p)->D) { \ + (cur)->Dt = FROM_M; \ + (DD) = (p)->M - gap_open - gap_ext; \ + } else { \ + (cur)->Dt = FROM_D; \ + (DD) = (p)->D - gap_ext; \ + } \ +} +#define set_end_D(DD, cur, p) \ +{ \ + if (gap_end >= 0) { \ + if ((p)->M - gap_open > (p)->D) { \ + (cur)->Dt = FROM_M; \ + (DD) = (p)->M - gap_open - gap_end; \ + } else { \ + (cur)->Dt = FROM_D; \ + (DD) = (p)->D - gap_end; \ + } \ + } else set_D(DD, cur, p); \ +} + +typedef struct { + uint8_t Mt:3, It:2, Dt:2; +} dpcell_t; + +typedef struct { + int M, I, D; +} dpscore_t; + +/*************************** + * banded global alignment * + ***************************/ +uint32_t *ka_global_core(uint8_t *seq1, int len1, uint8_t *seq2, int len2, const ka_param_t *ap, int *_score, int *n_cigar) +{ + int i, j; + dpcell_t **dpcell, *q; + dpscore_t *curr, *last, *s; + int b1, b2, tmp_end; + int *mat, end, max = 0; + uint8_t type, ctype; + uint32_t *cigar = 0; + + int gap_open, gap_ext, gap_end, b; + int *score_matrix, N_MATRIX_ROW; + + /* initialize some align-related parameters. just for compatibility */ + gap_open = ap->gap_open; + gap_ext = ap->gap_ext; + gap_end = ap->gap_end; + b = ap->band_width; + score_matrix = ap->matrix; + N_MATRIX_ROW = ap->row; + + *n_cigar = 0; + if (len1 == 0 || len2 == 0) return 0; + + /* calculate b1 and b2 */ + if (len1 > len2) { + b1 = len1 - len2 + b; + b2 = b; + } else { + b1 = b; + b2 = len2 - len1 + b; + } + if (b1 > len1) b1 = len1; + if (b2 > len2) b2 = len2; + --seq1; --seq2; + + /* allocate memory */ + end = (b1 + b2 <= len1)? (b1 + b2 + 1) : (len1 + 1); + dpcell = (dpcell_t**)malloc(sizeof(dpcell_t*) * (len2 + 1)); + for (j = 0; j <= len2; ++j) + dpcell[j] = (dpcell_t*)malloc(sizeof(dpcell_t) * end); + for (j = b2 + 1; j <= len2; ++j) + dpcell[j] -= j - b2; + curr = (dpscore_t*)malloc(sizeof(dpscore_t) * (len1 + 1)); + last = (dpscore_t*)malloc(sizeof(dpscore_t) * (len1 + 1)); + + /* set first row */ + SET_INF(*curr); curr->M = 0; + for (i = 1, s = curr + 1; i < b1; ++i, ++s) { + SET_INF(*s); + set_end_D(s->D, dpcell[0] + i, s - 1); + } + s = curr; curr = last; last = s; + + /* core dynamic programming, part 1 */ + tmp_end = (b2 < len2)? b2 : len2 - 1; + for (j = 1; j <= tmp_end; ++j) { + q = dpcell[j]; s = curr; SET_INF(*s); + set_end_I(s->I, q, last); + end = (j + b1 <= len1 + 1)? (j + b1 - 1) : len1; + mat = score_matrix + seq2[j] * N_MATRIX_ROW; + ++s; ++q; + for (i = 1; i != end; ++i, ++s, ++q) { + set_M(s->M, q, last + i - 1, mat[seq1[i]]); /* this will change s->M ! */ + set_I(s->I, q, last + i); + set_D(s->D, q, s - 1); + } + set_M(s->M, q, last + i - 1, mat[seq1[i]]); + set_D(s->D, q, s - 1); + if (j + b1 - 1 > len1) { /* bug fixed, 040227 */ + set_end_I(s->I, q, last + i); + } else s->I = MINOR_INF; + s = curr; curr = last; last = s; + } + /* last row for part 1, use set_end_D() instead of set_D() */ + if (j == len2 && b2 != len2 - 1) { + q = dpcell[j]; s = curr; SET_INF(*s); + set_end_I(s->I, q, last); + end = (j + b1 <= len1 + 1)? (j + b1 - 1) : len1; + mat = score_matrix + seq2[j] * N_MATRIX_ROW; + ++s; ++q; + for (i = 1; i != end; ++i, ++s, ++q) { + set_M(s->M, q, last + i - 1, mat[seq1[i]]); /* this will change s->M ! */ + set_I(s->I, q, last + i); + set_end_D(s->D, q, s - 1); + } + set_M(s->M, q, last + i - 1, mat[seq1[i]]); + set_end_D(s->D, q, s - 1); + if (j + b1 - 1 > len1) { /* bug fixed, 040227 */ + set_end_I(s->I, q, last + i); + } else s->I = MINOR_INF; + s = curr; curr = last; last = s; + ++j; + } + + /* core dynamic programming, part 2 */ + for (; j <= len2 - b2 + 1; ++j) { + SET_INF(curr[j - b2]); + mat = score_matrix + seq2[j] * N_MATRIX_ROW; + end = j + b1 - 1; + for (i = j - b2 + 1, q = dpcell[j] + i, s = curr + i; i != end; ++i, ++s, ++q) { + set_M(s->M, q, last + i - 1, mat[seq1[i]]); + set_I(s->I, q, last + i); + set_D(s->D, q, s - 1); + } + set_M(s->M, q, last + i - 1, mat[seq1[i]]); + set_D(s->D, q, s - 1); + s->I = MINOR_INF; + s = curr; curr = last; last = s; + } + + /* core dynamic programming, part 3 */ + for (; j < len2; ++j) { + SET_INF(curr[j - b2]); + mat = score_matrix + seq2[j] * N_MATRIX_ROW; + for (i = j - b2 + 1, q = dpcell[j] + i, s = curr + i; i < len1; ++i, ++s, ++q) { + set_M(s->M, q, last + i - 1, mat[seq1[i]]); + set_I(s->I, q, last + i); + set_D(s->D, q, s - 1); + } + set_M(s->M, q, last + len1 - 1, mat[seq1[i]]); + set_end_I(s->I, q, last + i); + set_D(s->D, q, s - 1); + s = curr; curr = last; last = s; + } + /* last row */ + if (j == len2) { + SET_INF(curr[j - b2]); + mat = score_matrix + seq2[j] * N_MATRIX_ROW; + for (i = j - b2 + 1, q = dpcell[j] + i, s = curr + i; i < len1; ++i, ++s, ++q) { + set_M(s->M, q, last + i - 1, mat[seq1[i]]); + set_I(s->I, q, last + i); + set_end_D(s->D, q, s - 1); + } + set_M(s->M, q, last + len1 - 1, mat[seq1[i]]); + set_end_I(s->I, q, last + i); + set_end_D(s->D, q, s - 1); + s = curr; curr = last; last = s; + } + + *_score = last[len1].M; + if (n_cigar) { /* backtrace */ + path_t *p, *path = (path_t*)malloc(sizeof(path_t) * (len1 + len2 + 2)); + i = len1; j = len2; + q = dpcell[j] + i; + s = last + len1; + max = s->M; type = q->Mt; ctype = FROM_M; + if (s->I > max) { max = s->I; type = q->It; ctype = FROM_I; } + if (s->D > max) { max = s->D; type = q->Dt; ctype = FROM_D; } + + p = path; + p->ctype = ctype; p->i = i; p->j = j; /* bug fixed 040408 */ + ++p; + do { + switch (ctype) { + case FROM_M: --i; --j; break; + case FROM_I: --j; break; + case FROM_D: --i; break; + } + q = dpcell[j] + i; + ctype = type; + switch (type) { + case FROM_M: type = q->Mt; break; + case FROM_I: type = q->It; break; + case FROM_D: type = q->Dt; break; + } + p->ctype = ctype; p->i = i; p->j = j; + ++p; + } while (i || j); + cigar = ka_path2cigar32(path, p - path - 1, n_cigar); + free(path); + } + + /* free memory */ + for (j = b2 + 1; j <= len2; ++j) + dpcell[j] += j - b2; + for (j = 0; j <= len2; ++j) + free(dpcell[j]); + free(dpcell); + free(curr); free(last); + + return cigar; +} diff --git a/kaln.h b/kaln.h new file mode 100644 index 0000000..b04d8cc --- /dev/null +++ b/kaln.h @@ -0,0 +1,55 @@ +/* The MIT License + + Copyright (c) 2003-2006, 2008, 2009 by Heng Li + + Permission is hereby granted, free of charge, to any person obtaining + a copy of this software and associated documentation files (the + "Software"), to deal in the Software without restriction, including + without limitation the rights to use, copy, modify, merge, publish, + distribute, sublicense, and/or sell copies of the Software, and to + permit persons to whom the Software is furnished to do so, subject to + the following conditions: + + The above copyright notice and this permission notice shall be + included in all copies or substantial portions of the Software. + + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + SOFTWARE. +*/ + +#ifndef LH3_KALN_H_ +#define LH3_KALN_H_ + +#include + +#define MINOR_INF -1073741823 + +typedef struct { + int gap_open; + int gap_ext; + int gap_end; + + int *matrix; + int row; + int band_width; +} ka_param_t; + +#ifdef __cplusplus +extern "C" { +#endif + + uint32_t *ka_global_core(uint8_t *seq1, int len1, uint8_t *seq2, int len2, const ka_param_t *ap, int *_score, int *n_cigar); + +#ifdef __cplusplus +} +#endif + +extern ka_param_t ka_param_blast; /* = { 5, 2, 2, aln_sm_blast, 5, 50 }; */ + +#endif diff --git a/klist.h b/klist.h new file mode 100644 index 0000000..2f17016 --- /dev/null +++ b/klist.h @@ -0,0 +1,96 @@ +#ifndef _LH3_KLIST_H +#define _LH3_KLIST_H + +#include + +#define KMEMPOOL_INIT(name, kmptype_t, kmpfree_f) \ + typedef struct { \ + size_t cnt, n, max; \ + kmptype_t **buf; \ + } kmp_##name##_t; \ + static inline kmp_##name##_t *kmp_init_##name() { \ + return calloc(1, sizeof(kmp_##name##_t)); \ + } \ + static inline void kmp_destroy_##name(kmp_##name##_t *mp) { \ + size_t k; \ + for (k = 0; k < mp->n; ++k) { \ + kmpfree_f(mp->buf[k]); free(mp->buf[k]); \ + } \ + free(mp->buf); free(mp); \ + } \ + static inline kmptype_t *kmp_alloc_##name(kmp_##name##_t *mp) { \ + ++mp->cnt; \ + if (mp->n == 0) return calloc(1, sizeof(kmptype_t)); \ + return mp->buf[--mp->n]; \ + } \ + static inline void kmp_free_##name(kmp_##name##_t *mp, kmptype_t *p) { \ + --mp->cnt; \ + if (mp->n == mp->max) { \ + mp->max = mp->max? mp->max<<1 : 16; \ + mp->buf = realloc(mp->buf, sizeof(void*) * mp->max); \ + } \ + mp->buf[mp->n++] = p; \ + } + +#define kmempool_t(name) kmp_##name##_t +#define kmp_init(name) kmp_init_##name() +#define kmp_destroy(name, mp) kmp_destroy_##name(mp) +#define kmp_alloc(name, mp) kmp_alloc_##name(mp) +#define kmp_free(name, mp, p) kmp_free_##name(mp, p) + +#define KLIST_INIT(name, kltype_t, kmpfree_t) \ + struct __kl1_##name { \ + kltype_t data; \ + struct __kl1_##name *next; \ + }; \ + typedef struct __kl1_##name kl1_##name; \ + KMEMPOOL_INIT(name, kl1_##name, kmpfree_t) \ + typedef struct { \ + kl1_##name *head, *tail; \ + kmp_##name##_t *mp; \ + size_t size; \ + } kl_##name##_t; \ + static inline kl_##name##_t *kl_init_##name() { \ + kl_##name##_t *kl = calloc(1, sizeof(kl_##name##_t)); \ + kl->mp = kmp_init(name); \ + kl->head = kl->tail = kmp_alloc(name, kl->mp); \ + kl->head->next = 0; \ + return kl; \ + } \ + static inline void kl_destroy_##name(kl_##name##_t *kl) { \ + kl1_##name *p; \ + for (p = kl->head; p != kl->tail; p = p->next) \ + kmp_free(name, kl->mp, p); \ + kmp_free(name, kl->mp, p); \ + kmp_destroy(name, kl->mp); \ + free(kl); \ + } \ + static inline kltype_t *kl_pushp_##name(kl_##name##_t *kl) { \ + kl1_##name *q, *p = kmp_alloc(name, kl->mp); \ + q = kl->tail; p->next = 0; kl->tail->next = p; kl->tail = p; \ + ++kl->size; \ + return &q->data; \ + } \ + static inline int kl_shift_##name(kl_##name##_t *kl, kltype_t *d) { \ + kl1_##name *p; \ + if (kl->head->next == 0) return -1; \ + --kl->size; \ + p = kl->head; kl->head = kl->head->next; \ + if (d) *d = p->data; \ + kmp_free(name, kl->mp, p); \ + return 0; \ + } + +#define kliter_t(name) kl1_##name +#define klist_t(name) kl_##name##_t +#define kl_val(iter) ((iter)->data) +#define kl_next(iter) ((iter)->next) +#define kl_begin(kl) ((kl)->head) +#define kl_end(kl) ((kl)->tail) + +#define kl_init(name) kl_init_##name() +#define kl_destroy(name, kl) kl_destroy_##name(kl) +#define kl_pushp(name, kl) kl_pushp_##name(kl) +#define kl_shift(name, kl, d) kl_shift_##name(kl, d) + +#endif diff --git a/knetfile.c b/knetfile.c index e110aa7..994babb 100644 --- a/knetfile.c +++ b/knetfile.c @@ -34,6 +34,7 @@ #include #include #include +#include #include #include @@ -70,7 +71,14 @@ static int socket_wait(int fd, int is_read) if (is_read) fdr = &fds; else fdw = &fds; ret = select(fd+1, fdr, fdw, 0, &tv); +#ifndef _WIN32 if (ret == -1) perror("select"); +#else + if (ret == 0) + fprintf(stderr, "select time-out\n"); + else if (ret == SOCKET_ERROR) + fprintf(stderr, "select: %d\n", WSAGetLastError()); +#endif return ret; } @@ -103,16 +111,28 @@ static int socket_connect(const char *host, const char *port) } #else /* MinGW's printf has problem with "%lld" */ -char *uint64tostr(char *buf, uint64_t x) +char *int64tostr(char *buf, int64_t x) { - int i, cnt; - for (i = 0; x; x /= 10) buf[i++] = '0' + x%10; + int cnt; + int i = 0; + do { + buf[i++] = '0' + x % 10; + x /= 10; + } while (x); buf[i] = 0; for (cnt = i, i = 0; i < cnt/2; ++i) { int c = buf[i]; buf[i] = buf[cnt-i-1]; buf[cnt-i-1] = c; } return buf; } + +int64_t strtoint64(const char *buf) +{ + int64_t x; + for (x = 0; *buf != '\0'; ++buf) + x = x * 10 + ((int64_t) *buf - 48); + return x; +} /* In windows, the first thing is to establish the TCP connection. */ int knet_win32_init() { @@ -129,7 +149,11 @@ void knet_win32_destroy() * non-Windows OS, I do not use this one. */ static SOCKET socket_connect(const char *host, const char *port) { -#define __err_connect(func) do { perror(func); return -1; } while (0) +#define __err_connect(func) \ + do { \ + fprintf(stderr, "%s: %d\n", func, WSAGetLastError()); \ + return -1; \ + } while (0) int on = 1; SOCKET fd; @@ -182,7 +206,11 @@ static off_t my_netread(int fd, void *buf, off_t len) static int kftp_get_response(knetFile *ftp) { +#ifndef _WIN32 unsigned char c; +#else + char c; +#endif int n = 0; char *p; if (socket_wait(ftp->ctrl_fd, 1) <= 0) return 0; @@ -259,10 +287,11 @@ int kftp_reconnect(knetFile *ftp) ftp->ctrl_fd = -1; } netclose(ftp->fd); + ftp->fd = -1; return kftp_connect(ftp); } -// initialize ->type, ->host and ->retr +// initialize ->type, ->host, ->retr and ->size knetFile *kftp_parse_url(const char *fn, const char *mode) { knetFile *fp; @@ -283,25 +312,42 @@ knetFile *kftp_parse_url(const char *fn, const char *mode) strncpy(fp->host, fn + 6, l); fp->retr = calloc(strlen(p) + 8, 1); sprintf(fp->retr, "RETR %s\r\n", p); - fp->seek_offset = -1; + fp->size_cmd = calloc(strlen(p) + 8, 1); + sprintf(fp->size_cmd, "SIZE %s\r\n", p); + fp->seek_offset = 0; return fp; } // place ->fd at offset off int kftp_connect_file(knetFile *fp) { int ret; + long long file_size; if (fp->fd != -1) { netclose(fp->fd); if (fp->no_reconnect) kftp_get_response(fp); } kftp_pasv_prep(fp); - if (fp->offset) { + kftp_send_cmd(fp, fp->size_cmd, 1); +#ifndef _WIN32 + if ( sscanf(fp->response,"%*d %lld", &file_size) != 1 ) + { + fprintf(stderr,"[kftp_connect_file] %s\n", fp->response); + return -1; + } +#else + const char *p = fp->response; + while (*p != ' ') ++p; + while (*p < '0' || *p > '9') ++p; + file_size = strtoint64(p); +#endif + fp->file_size = file_size; + if (fp->offset>=0) { char tmp[32]; #ifndef _WIN32 sprintf(tmp, "REST %lld\r\n", (long long)fp->offset); #else strcpy(tmp, "REST "); - uint64tostr(tmp + 5, fp->offset); + int64tostr(tmp + 5, fp->offset); strcat(tmp, "\r\n"); #endif kftp_send_cmd(fp, tmp, 1); @@ -319,6 +365,7 @@ int kftp_connect_file(knetFile *fp) return 0; } + /************************** * HTTP specific routines * **************************/ @@ -354,7 +401,7 @@ knetFile *khttp_parse_url(const char *fn, const char *mode) } fp->type = KNF_TYPE_HTTP; fp->ctrl_fd = fp->fd = -1; - fp->seek_offset = -1; + fp->seek_offset = 0; return fp; } @@ -366,8 +413,7 @@ int khttp_connect_file(knetFile *fp) fp->fd = socket_connect(fp->host, fp->port); buf = calloc(0x10000, 1); // FIXME: I am lazy... But in principle, 64KB should be large enough. l += sprintf(buf + l, "GET %s HTTP/1.0\r\nHost: %s\r\n", fp->path, fp->http_host); - if (fp->offset) - l += sprintf(buf + l, "Range: bytes=%lld-\r\n", (long long)fp->offset); + l += sprintf(buf + l, "Range: bytes=%lld-\r\n", (long long)fp->offset); l += sprintf(buf + l, "\r\n"); netwrite(fp->fd, buf, l); l = 0; @@ -383,7 +429,7 @@ int khttp_connect_file(knetFile *fp) return -1; } ret = strtol(buf + 8, &p, 0); // HTTP return code - if (ret == 200 && fp->offset) { // 200 (complete result); then skip beginning of the file + if (ret == 200 && fp->offset>0) { // 200 (complete result); then skip beginning of the file off_t rest = fp->offset; while (rest) { off_t l = rest < 0x10000? rest : 0x10000; @@ -482,7 +528,7 @@ off_t knet_read(knetFile *fp, void *buf, off_t len) return l; } -int knet_seek(knetFile *fp, off_t off, int whence) +off_t knet_seek(knetFile *fp, int64_t off, int whence) { if (whence == SEEK_SET && off == fp->offset) return 0; if (fp->type == KNF_TYPE_LOCAL) { @@ -490,20 +536,40 @@ int knet_seek(knetFile *fp, off_t off, int whence) * while fseek() returns zero on success. */ off_t offset = lseek(fp->fd, off, whence); if (offset == -1) { - perror("lseek"); + // Be silent, it is OK for knet_seek to fail when the file is streamed + // fprintf(stderr,"[knet_seek] %s\n", strerror(errno)); return -1; } fp->offset = offset; return 0; - } else if (fp->type == KNF_TYPE_FTP || fp->type == KNF_TYPE_HTTP) { - if (whence != SEEK_SET) { // FIXME: we can surely allow SEEK_CUR and SEEK_END in future - fprintf(stderr, "[knet_seek] only SEEK_SET is supported for FTP/HTTP. Offset is unchanged.\n"); + } + else if (fp->type == KNF_TYPE_FTP) + { + if (whence==SEEK_CUR) + fp->offset += off; + else if (whence==SEEK_SET) + fp->offset = off; + else if ( whence==SEEK_END) + fp->offset = fp->file_size+off; + fp->is_ready = 0; + return 0; + } + else if (fp->type == KNF_TYPE_HTTP) + { + if (whence == SEEK_END) { // FIXME: can we allow SEEK_END in future? + fprintf(stderr, "[knet_seek] SEEK_END is not supported for HTTP. Offset is unchanged.\n"); + errno = ESPIPE; return -1; } - fp->offset = off; + if (whence==SEEK_CUR) + fp->offset += off; + else if (whence==SEEK_SET) + fp->offset = off; fp->is_ready = 0; - return 0; + return fp->offset; } + errno = EINVAL; + fprintf(stderr,"[knet_seek] %s\n", strerror(errno)); return -1; } diff --git a/knetfile.h b/knetfile.h index 9021b93..0a0e66f 100644 --- a/knetfile.h +++ b/knetfile.h @@ -9,7 +9,7 @@ #define netwrite(fd, ptr, len) write(fd, ptr, len) #define netclose(fd) close(fd) #else -#include +#include #define netread(fd, ptr, len) recv(fd, ptr, len, 0) #define netwrite(fd, ptr, len) send(fd, ptr, len, 0) #define netclose(fd) closesocket(fd) @@ -28,8 +28,9 @@ typedef struct knetFile_s { // the following are for FTP only int ctrl_fd, pasv_ip[4], pasv_port, max_response, no_reconnect, is_ready; - char *response, *retr; + char *response, *retr, *size_cmd; int64_t seek_offset; // for lazy seek + int64_t file_size; // the following are for HTTP only char *path, *http_host; @@ -64,7 +65,7 @@ extern "C" { This routine only sets ->offset and ->is_ready=0. It does not communicate with the FTP server. */ - int knet_seek(knetFile *fp, off_t off, int whence); + off_t knet_seek(knetFile *fp, int64_t off, int whence); int knet_close(knetFile *fp); #ifdef __cplusplus diff --git a/misc/novo2sam.pl b/misc/novo2sam.pl index 3d3436c..8b53c9e 100755 --- a/misc/novo2sam.pl +++ b/misc/novo2sam.pl @@ -149,7 +149,7 @@ sub mdtag { $q+=1; } if ($indeltype eq "-") { - my $deletedbase = $2 if $string =~ /(\d+)\-([A-Z]+)/; + my $deletedbase = $2 if $string =~ /(\d+)\-([A-Za-z]+)/; if ($deleteflag == 0 ) { $mdtag.="^"; } @@ -172,12 +172,12 @@ sub indeltype { my $string = shift; my $insert=""; my $indeltype; - if ($string =~ /([A-Z]+)\>/) { + if ($string =~ /([A-Za-z]+)\>/) { $indeltype=">"; $insert=$1; } elsif ($string =~ /\-/) { $indeltype="-"; - } elsif ($string =~ /\+([A-Z]+)/) { + } elsif ($string =~ /\+([A-Za-z]+)/) { $indeltype="+"; $insert=$1; } @@ -204,10 +204,10 @@ sub cigar_method { next if $string =~ />/; my $pos = $1 if $string =~ /^(\d+)/; - if ($string =~ /\+([A-Z]+)/) { + if ($string =~ /\+([A-Za-z]+)/) { $indeltype="+"; $insert = $1; - }elsif ($string =~ /\-([A-Z]+)/) { + }elsif ($string =~ /\-([A-Za-z]+)/) { $indeltype="-"; $insert = $1; } diff --git a/misc/sam2vcf.pl b/misc/sam2vcf.pl new file mode 100755 index 0000000..ede7bd8 --- /dev/null +++ b/misc/sam2vcf.pl @@ -0,0 +1,217 @@ +#!/usr/bin/perl -w +# +# VCF specs: http://www.1000genomes.org/wiki/doku.php?id=1000_genomes:analysis:vcfv3.2 + +# Contact: pd3@sanger +# Version: 2009-10-08 + +use strict; +use warnings; +use Carp; + +my $opts = parse_params(); +do_pileup_to_vcf($opts); + +exit; + +#--------------- + +sub error +{ + my (@msg) = @_; + if ( scalar @msg ) { croak(@msg); } + die + "Usage: sam2vcf.pl [OPTIONS] < in.pileup > out.vcf\n", + "Options:\n", + " -r, -refseq The reference sequence, required when indels are present.\n", + " -h, -?, --help This help message.\n", + "\n"; +} + + +sub parse_params +{ + my %opts = (); + + $opts{fh_in} = *STDIN; + $opts{fh_out} = *STDOUT; + + while (my $arg=shift(@ARGV)) + { + if ( $arg eq '-r' || $arg eq '--refseq' ) { $opts{refseq}=shift(@ARGV); next; } + if ( $arg eq '-?' || $arg eq '-h' || $arg eq '--help' ) { error(); } + + error("Unknown parameter \"$arg\". Run -h for help.\n"); + } + return \%opts; +} + +sub iupac_to_gtype +{ + my ($ref,$base) = @_; + my %iupac = ( + 'K' => ['G','T'], + 'M' => ['A','C'], + 'S' => ['C','G'], + 'R' => ['A','G'], + 'W' => ['A','T'], + 'Y' => ['C','T'], + ); + if ( !exists($iupac{$base}) ) + { + if ( $ref eq $base ) { return ('.','0|0'); } + return ($base,'1|1'); + } + my $gt = $iupac{$base}; + if ( $$gt[0] eq $ref ) { return ($$gt[1],'0|1'); } + elsif ( $$gt[1] eq $ref ) { return ($$gt[0],'0|1'); } + return ("$$gt[0],$$gt[1]",'1|2'); +} + + +sub parse_indel +{ + my ($cons) = @_; + if ( $cons=~/^-/ ) + { + my $len = length($'); + return "D$len"; + } + elsif ( $cons=~/^\+/ ) { return "I$'"; } + elsif ( $cons eq '*' ) { return undef; } + error("FIXME: could not parse [$cons]\n"); +} + + +# An example of the pileup format: +# 1 3000011 C C 32 0 98 1 ^~, A +# 1 3002155 * +T/+T 53 119 52 5 +T * 4 1 0 +# 1 3003094 * -TT/-TT 31 164 60 11 -TT * 5 6 0 +# 1 3073986 * */-AAAAAAAAAAAAAA 3 3 45 9 * -AAAAAAAAAAAAAA 7 2 0 +# +sub do_pileup_to_vcf +{ + my ($opts) = @_; + + my $fh_in = $$opts{fh_in}; + my $fh_out = $$opts{fh_out}; + my ($prev_chr,$prev_pos,$prev_ref); + my $refseq; + + while (my $line=<$fh_in>) + { + chomp($line); + my ($chr,$pos,$ref,$cons,$cons_qual,$snp_qual,$rms_qual,$depth,@items) = split(/\t/,$line); + + my ($alt,$gt); + if ( $ref eq '*' ) + { + # An indel is involved. + if ($chr ne $prev_chr || $pos ne $prev_pos) + { + if ( !$$opts{refseq} ) { error("Cannot do indels without the reference.\n"); } + if ( !$refseq ) { $refseq = Fasta->new(file=>$$opts{refseq}); } + $ref = $refseq->get_base($chr,$pos); + } + else { $ref = $prev_ref; } + + # One of the alleles can be a reference and it can come in arbitrary order + my ($al1,$al2) = split(m{/},$cons); + my $alt1 = parse_indel($al1); + my $alt2 = parse_indel($al2); + if ( !$alt1 && !$alt2 ) { error("FIXME: could not parse indel:\n", $line); } + if ( $alt1 && $alt2 && $alt1 eq $alt2 ) { $alt2=''; } + if ( !$alt1 ) + { + $alt=$alt2; + $gt='0|1'; + } + elsif ( !$alt2 ) + { + $alt=$alt1; + $gt='0|1'; + } + else + { + $alt="$alt1,$alt2"; + $gt='1|2'; + } + } + else + { + # SNP + ($alt,$gt) = iupac_to_gtype($ref,$cons); + } + + print $fh_out "$chr\t$pos\t.\t$ref\t$alt\t$snp_qual\t0\t\tGT:GQ:DP\t$gt:$cons_qual:$depth\n"; + + $prev_ref = $ref; + $prev_pos = $pos; + $prev_chr = $chr; + } +} + + +#------------- Fasta -------------------- +# +# Uses samtools to get a requested base from a fasta file. For efficiency, preloads +# a chunk to memory. The size of the cached sequence can be controlled by the 'size' +# parameter. +# +package Fasta; + +use strict; +use warnings; +use Carp; + +sub Fasta::new +{ + my ($class,@args) = @_; + my $self = @args ? {@args} : {}; + if ( !$$self{file} ) { $self->throw(qq[Missing the parameter "file"\n]); } + $$self{chr} = undef; + $$self{from} = undef; + $$self{to} = undef; + if ( !$$self{size} ) { $$self{size}=10_000_000; } + bless $self, ref($class) || $class; + return $self; +} + +sub read_chunk +{ + my ($self,$chr,$pos) = @_; + my $to = $pos + $$self{size}; + my $cmd = "samtools faidx $$self{file} $chr:$pos-$to"; + my @out = `$cmd`; + if ( $? ) { $self->throw("$cmd: $!"); } + my $line = shift(@out); + if ( !($line=~/^>$chr:(\d+)-(\d+)/) ) { $self->throw("Could not parse: $line"); } + $$self{chr} = $chr; + $$self{from} = $1; + $$self{to} = $2; + my $chunk = ''; + while ($line=shift(@out)) + { + chomp($line); + $chunk .= $line; + } + $$self{chunk} = $chunk; + return; +} + +sub get_base +{ + my ($self,$chr,$pos) = @_; + if ( !$$self{chr} || $chr ne $$self{chr} || $pos<$$self{from} || $pos>$$self{to} ) + { + $self->read_chunk($chr,$pos); + } + my $idx = $pos - $$self{from}; + return substr($$self{chunk},$idx,1); +} + +sub throw +{ + my ($self,@msg) = @_; + croak(@msg); +} diff --git a/misc/samtools.pl b/misc/samtools.pl index 86b285c..320e8aa 100755 --- a/misc/samtools.pl +++ b/misc/samtools.pl @@ -11,7 +11,7 @@ my $version = '0.3.3'; my $command = shift(@ARGV); my %func = (showALEN=>\&showALEN, pileup2fq=>\&pileup2fq, varFilter=>\&varFilter, - unique=>\&unique, uniqcmp=>\&uniqcmp); + unique=>\&unique, uniqcmp=>\&uniqcmp, sra2hdr=>\&sra2hdr); die("Unknown command \"$command\".\n") if (!defined($func{$command})); &{$func{$command}}; @@ -37,6 +37,16 @@ sub showALEN { # varFilter # +# +# Filtration code: +# +# d low depth +# D high depth +# W too many SNPs in a window (SNP only) +# G close to a high-quality indel (SNP only) +# Q low RMS mapping quality (SNP only) +# g close to another indel with higher quality (indel only) + sub varFilter { my %opts = (d=>3, D=>100, l=>30, Q=>25, q=>10, G=>25, s=>100, w=>10, W=>10, N=>2, p=>undef); getopts('pq:d:D:l:Q:w:W:N:G:', \%opts); @@ -216,6 +226,59 @@ sub p2q_print_str { } } +# +# sra2hdr +# + +# This subroutine does not use an XML parser. It requires that the SRA +# XML files are properly formated. +sub sra2hdr { + my %opts = (); + getopts('', \%opts); + die("Usage: samtools.pl sra2hdr \n") if (@ARGV == 0); + my $pre = $ARGV[0]; + my $fh; + # read sample + my $sample = 'UNKNOWN'; + open($fh, "$pre.sample.xml") || die; + while (<$fh>) { + $sample = $1 if (/) { + if (/\s*(\S+)\s*<\/LIBRARY_NAME>/i) { + $exp2lib{$exp} = $1; + } + } + close($fh); + # read run + my ($run, @fn); + open($fh, "$pre.run.xml") || die; + while (<$fh>) { + if (//i) { + if (@fn == 1) { + print STDERR "$fn[0]\t$run\n"; + } else { + for (0 .. $#fn) { + print STDERR "$fn[$_]\t$run", "_", $_+1, "\n"; + } + } + } + } + close($fh); +} + # # unique # diff --git a/razf.c b/razf.c index a5e8f51..e7499f9 100644 --- a/razf.c +++ b/razf.c @@ -38,6 +38,7 @@ #include #include "razf.h" + #if ZLIB_VERNUM < 0x1221 struct _gz_header_s { int text; @@ -107,20 +108,36 @@ static void save_zindex(RAZF *rz, int fd){ } #endif +#ifdef _USE_KNETFILE +static void load_zindex(RAZF *rz, knetFile *fp){ +#else static void load_zindex(RAZF *rz, int fd){ +#endif int32_t i, v32; int is_be; if(!rz->load_index) return; if(rz->index == NULL) rz->index = malloc(sizeof(ZBlockIndex)); is_be = is_big_endian(); +#ifdef _USE_KNETFILE + knet_read(fp, &rz->index->size, sizeof(int)); +#else read(fd, &rz->index->size, sizeof(int)); +#endif if(!is_be) rz->index->size = byte_swap_4((uint32_t)rz->index->size); rz->index->cap = rz->index->size; v32 = rz->index->size / RZ_BIN_SIZE + 1; rz->index->bin_offsets = malloc(sizeof(int64_t) * v32); +#ifdef _USE_KNETFILE + knet_read(fp, rz->index->bin_offsets, sizeof(int64_t) * v32); +#else read(fd, rz->index->bin_offsets, sizeof(int64_t) * v32); +#endif rz->index->cell_offsets = malloc(sizeof(int) * rz->index->size); +#ifdef _USE_KNETFILE + knet_read(fp, rz->index->cell_offsets, sizeof(int) * rz->index->size); +#else read(fd, rz->index->cell_offsets, sizeof(int) * rz->index->size); +#endif if(!is_be){ for(i=0;iindex->bin_offsets[i] = byte_swap_8((uint64_t)rz->index->bin_offsets[i]); for(i=0;iindex->size;i++) rz->index->cell_offsets[i] = byte_swap_4((uint32_t)rz->index->cell_offsets[i]); @@ -141,7 +158,11 @@ static RAZF* razf_open_w(int fd){ #endif rz = calloc(1, sizeof(RAZF)); rz->mode = 'w'; +#ifdef _USE_KNETFILE + rz->x.fpw = fd; +#else rz->filedes = fd; +#endif rz->stream = calloc(sizeof(z_stream), 1); rz->inbuf = malloc(RZ_BUFFER_SIZE); rz->outbuf = malloc(RZ_BUFFER_SIZE); @@ -176,7 +197,11 @@ static void _razf_write(RAZF* rz, const void *data, int size){ deflate(rz->stream, Z_NO_FLUSH); rz->out += tout - rz->stream->avail_out; if(rz->stream->avail_out) break; +#ifdef _USE_KNETFILE + write(rz->x.fpw, rz->outbuf, RZ_BUFFER_SIZE - rz->stream->avail_out); +#else write(rz->filedes, rz->outbuf, RZ_BUFFER_SIZE - rz->stream->avail_out); +#endif rz->stream->avail_out = RZ_BUFFER_SIZE; rz->stream->next_out = rz->outbuf; if(rz->stream->avail_in == 0) break; @@ -192,7 +217,11 @@ static void razf_flush(RAZF *rz){ rz->buf_off = rz->buf_len = 0; } if(rz->stream->avail_out){ +#ifdef _USE_KNETFILE + write(rz->x.fpw, rz->outbuf, RZ_BUFFER_SIZE - rz->stream->avail_out); +#else write(rz->filedes, rz->outbuf, RZ_BUFFER_SIZE - rz->stream->avail_out); +#endif rz->stream->avail_out = RZ_BUFFER_SIZE; rz->stream->next_out = rz->outbuf; } @@ -201,7 +230,11 @@ static void razf_flush(RAZF *rz){ deflate(rz->stream, Z_FULL_FLUSH); rz->out += tout - rz->stream->avail_out; if(rz->stream->avail_out == 0){ +#ifdef _USE_KNETFILE + write(rz->x.fpw, rz->outbuf, RZ_BUFFER_SIZE - rz->stream->avail_out); +#else write(rz->filedes, rz->outbuf, RZ_BUFFER_SIZE - rz->stream->avail_out); +#endif rz->stream->avail_out = RZ_BUFFER_SIZE; rz->stream->next_out = rz->outbuf; } else break; @@ -221,7 +254,11 @@ static void razf_end_flush(RAZF *rz){ deflate(rz->stream, Z_FINISH); rz->out += tout - rz->stream->avail_out; if(rz->stream->avail_out < RZ_BUFFER_SIZE){ +#ifdef _USE_KNETFILE + write(rz->x.fpw, rz->outbuf, RZ_BUFFER_SIZE - rz->stream->avail_out); +#else write(rz->filedes, rz->outbuf, RZ_BUFFER_SIZE - rz->stream->avail_out); +#endif rz->stream->avail_out = RZ_BUFFER_SIZE; rz->stream->next_out = rz->outbuf; } else break; @@ -308,23 +345,35 @@ static int _read_gz_header(unsigned char *data, int size, int *extra_off, int *e return n; } +#ifdef _USE_KNETFILE +static RAZF* razf_open_r(knetFile *fp, int _load_index){ +#else static RAZF* razf_open_r(int fd, int _load_index){ +#endif RAZF *rz; int ext_off, ext_len; int n, is_be, ret; int64_t end; unsigned char c[] = "RAZF"; + rz = calloc(1, sizeof(RAZF)); + rz->mode = 'r'; +#ifdef _USE_KNETFILE + rz->x.fpr = fp; +#else #ifdef _WIN32 setmode(fd, O_BINARY); #endif - rz = calloc(1, sizeof(RAZF)); - rz->mode = 'r'; rz->filedes = fd; +#endif rz->stream = calloc(sizeof(z_stream), 1); rz->inbuf = malloc(RZ_BUFFER_SIZE); rz->outbuf = malloc(RZ_BUFFER_SIZE); rz->end = rz->src_end = 0x7FFFFFFFFFFFFFFFLL; +#ifdef _USE_KNETFILE + n = knet_read(rz->x.fpr, rz->inbuf, RZ_BUFFER_SIZE); +#else n = read(rz->filedes, rz->inbuf, RZ_BUFFER_SIZE); +#endif ret = _read_gz_header(rz->inbuf, n, &ext_off, &ext_len); if(ret == 0){ PLAIN_FILE: @@ -355,7 +404,11 @@ static RAZF* razf_open_r(int fd, int _load_index){ } rz->load_index = _load_index; rz->file_type = FILE_TYPE_RZ; +#ifdef _USE_KNETFILE + if(knet_seek(fp, -16, SEEK_END) == -1){ +#else if(lseek(fd, -16, SEEK_END) == -1){ +#endif UNSEEKABLE: rz->seekable = 0; rz->index = NULL; @@ -363,10 +416,19 @@ static RAZF* razf_open_r(int fd, int _load_index){ } else { is_be = is_big_endian(); rz->seekable = 1; +#ifdef _USE_KNETFILE + knet_read(fp, &end, sizeof(int64_t)); +#else read(fd, &end, sizeof(int64_t)); +#endif if(!is_be) rz->src_end = (int64_t)byte_swap_8((uint64_t)end); else rz->src_end = end; + +#ifdef _USE_KNETFILE + knet_read(fp, &end, sizeof(int64_t)); +#else read(fd, &end, sizeof(int64_t)); +#endif if(!is_be) rz->end = (int64_t)byte_swap_8((uint64_t)end); else rz->end = end; if(n > rz->end){ @@ -374,19 +436,47 @@ static RAZF* razf_open_r(int fd, int _load_index){ n = rz->end; } if(rz->end > rz->src_end){ +#ifdef _USE_KNETFILE + knet_seek(fp, rz->in, SEEK_SET); +#else lseek(fd, rz->in, SEEK_SET); +#endif goto UNSEEKABLE; } +#ifdef _USE_KNETFILE + knet_seek(fp, rz->end, SEEK_SET); + if(knet_tell(fp) != rz->end){ + knet_seek(fp, rz->in, SEEK_SET); +#else if(lseek(fd, rz->end, SEEK_SET) != rz->end){ lseek(fd, rz->in, SEEK_SET); +#endif goto UNSEEKABLE; } +#ifdef _USE_KNETFILE + load_zindex(rz, fp); + knet_seek(fp, n, SEEK_SET); +#else load_zindex(rz, fd); lseek(fd, n, SEEK_SET); +#endif } return rz; } +#ifdef _USE_KNETFILE +RAZF* razf_dopen(int fd, const char *mode){ + if (strstr(mode, "r")) fprintf(stderr,"[razf_dopen] implement me\n"); + else if(strstr(mode, "w")) return razf_open_w(fd); + return NULL; +} + +RAZF* razf_dopen2(int fd, const char *mode) +{ + fprintf(stderr,"[razf_dopen2] implement me\n"); + return NULL; +} +#else RAZF* razf_dopen(int fd, const char *mode){ if(strstr(mode, "r")) return razf_open_r(fd, 1); else if(strstr(mode, "w")) return razf_open_w(fd); @@ -399,23 +489,34 @@ RAZF* razf_dopen2(int fd, const char *mode) else if(strstr(mode, "w")) return razf_open_w(fd); else return NULL; } +#endif static inline RAZF* _razf_open(const char *filename, const char *mode, int _load_index){ int fd; RAZF *rz; if(strstr(mode, "r")){ +#ifdef _USE_KNETFILE + knetFile *fd = knet_open(filename, "r"); + if (fd == 0) { + fprintf(stderr, "[_razf_open] fail to open %s\n", filename); + return NULL; + } +#else #ifdef _WIN32 fd = open(filename, O_RDONLY | O_BINARY); #else fd = open(filename, O_RDONLY); #endif +#endif + if(fd < 0) return NULL; rz = razf_open_r(fd, _load_index); } else if(strstr(mode, "w")){ #ifdef _WIN32 - fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, 0644); + fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, 0666); #else - fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC, 0644); + fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC, 0666); #endif + if(fd < 0) return NULL; rz = razf_open_w(fd); } else return NULL; return rz; @@ -435,9 +536,17 @@ int razf_get_data_size(RAZF *rz, int64_t *u_size, int64_t *c_size){ switch(rz->file_type){ case FILE_TYPE_PLAIN: if(rz->end == 0x7fffffffffffffffLL){ +#ifdef _USE_KNETFILE + if(knet_seek(rz->x.fpr, 0, SEEK_CUR) == -1) return 0; + n = knet_tell(rz->x.fpr); + knet_seek(rz->x.fpr, 0, SEEK_END); + rz->end = knet_tell(rz->x.fpr); + knet_seek(rz->x.fpr, n, SEEK_SET); +#else if((n = lseek(rz->filedes, 0, SEEK_CUR)) == -1) return 0; rz->end = lseek(rz->filedes, 0, SEEK_END); lseek(rz->filedes, n, SEEK_SET); +#endif } *u_size = *c_size = rz->end; return 1; @@ -457,7 +566,11 @@ static int _razf_read(RAZF* rz, void *data, int size){ int ret, tin; if(rz->z_eof || rz->z_err) return 0; if (rz->file_type == FILE_TYPE_PLAIN) { +#ifdef _USE_KNETFILE + ret = knet_read(rz->x.fpr, data, size); +#else ret = read(rz->filedes, data, size); +#endif if (ret == 0) rz->z_eof = 1; return ret; } @@ -467,9 +580,17 @@ static int _razf_read(RAZF* rz, void *data, int size){ if(rz->stream->avail_in == 0){ if(rz->in >= rz->end){ rz->z_eof = 1; break; } if(rz->end - rz->in < RZ_BUFFER_SIZE){ +#ifdef _USE_KNETFILE + rz->stream->avail_in = knet_read(rz->x.fpr, rz->inbuf, rz->end -rz->in); +#else rz->stream->avail_in = read(rz->filedes, rz->inbuf, rz->end -rz->in); +#endif } else { +#ifdef _USE_KNETFILE + rz->stream->avail_in = knet_read(rz->x.fpr, rz->inbuf, RZ_BUFFER_SIZE); +#else rz->stream->avail_in = read(rz->filedes, rz->inbuf, RZ_BUFFER_SIZE); +#endif } if(rz->stream->avail_in == 0){ rz->z_eof = 1; @@ -481,7 +602,7 @@ static int _razf_read(RAZF* rz, void *data, int size){ ret = inflate(rz->stream, Z_BLOCK); rz->in += tin - rz->stream->avail_in; if(ret == Z_NEED_DICT || ret == Z_MEM_ERROR || ret == Z_DATA_ERROR){ - fprintf(stderr, "[_razf_read] inflate error: %d (at %s:%d)\n", ret, __FILE__, __LINE__); + fprintf(stderr, "[_razf_read] inflate error: %d %s (at %s:%d)\n", ret, rz->stream->msg ? rz->stream->msg : "", __FILE__, __LINE__); rz->z_err = 1; break; } @@ -566,14 +687,18 @@ int razf_skip(RAZF* rz, int size){ } if(rz->buf_flush) continue; rz->buf_len = _razf_read(rz, rz->outbuf, RZ_BUFFER_SIZE); - if(rz->z_eof) break; + if(rz->z_eof || rz->z_err) break; } rz->out += ori_size - size; return ori_size - size; } static void _razf_reset_read(RAZF *rz, int64_t in, int64_t out){ +#ifdef _USE_KNETFILE + knet_seek(rz->x.fpr, in, SEEK_SET); +#else lseek(rz->filedes, in, SEEK_SET); +#endif rz->in = in; rz->out = out; rz->block_pos = in; @@ -592,7 +717,12 @@ int64_t razf_jump(RAZF *rz, int64_t block_start, int block_offset){ if(rz->file_type == FILE_TYPE_PLAIN){ rz->buf_off = rz->buf_len = 0; pos = block_start + block_offset; +#ifdef _USE_KNETFILE + knet_seek(rz->x.fpr, pos, SEEK_SET); + pos = knet_tell(rz->x.fpr); +#else pos = lseek(rz->filedes, pos, SEEK_SET); +#endif rz->out = rz->in = pos; return pos; } @@ -614,7 +744,12 @@ int64_t razf_seek(RAZF* rz, int64_t pos, int where){ if (where == SEEK_CUR) pos += rz->out; else if (where == SEEK_END) pos += rz->src_end; if(rz->file_type == FILE_TYPE_PLAIN){ +#ifdef _USE_KNETFILE + knet_seek(rz->x.fpr, pos, SEEK_SET); + seek_pos = knet_tell(rz->x.fpr); +#else seek_pos = lseek(rz->filedes, pos, SEEK_SET); +#endif rz->buf_off = rz->buf_len = 0; rz->out = rz->in = seek_pos; return seek_pos; @@ -663,6 +798,18 @@ void razf_close(RAZF *rz){ #ifndef _RZ_READONLY razf_end_flush(rz); deflateEnd(rz->stream); +#ifdef _USE_KNETFILE + save_zindex(rz, rz->x.fpw); + if(is_big_endian()){ + write(rz->x.fpw, &rz->in, sizeof(int64_t)); + write(rz->x.fpw, &rz->out, sizeof(int64_t)); + } else { + uint64_t v64 = byte_swap_8((uint64_t)rz->in); + write(rz->x.fpw, &v64, sizeof(int64_t)); + v64 = byte_swap_8((uint64_t)rz->out); + write(rz->x.fpw, &v64, sizeof(int64_t)); + } +#else save_zindex(rz, rz->filedes); if(is_big_endian()){ write(rz->filedes, &rz->in, sizeof(int64_t)); @@ -673,6 +820,7 @@ void razf_close(RAZF *rz){ v64 = byte_swap_8((uint64_t)rz->out); write(rz->filedes, &v64, sizeof(int64_t)); } +#endif #endif } else if(rz->mode == 'r'){ if(rz->stream) inflateEnd(rz->stream); @@ -691,7 +839,14 @@ void razf_close(RAZF *rz){ free(rz->index); } free(rz->stream); +#ifdef _USE_KNETFILE + if (rz->mode == 'r') + knet_close(rz->x.fpr); + if (rz->mode == 'w') + close(rz->x.fpw); +#else close(rz->filedes); +#endif free(rz); } diff --git a/razf.h b/razf.h index f7e5097..60a0c96 100644 --- a/razf.h +++ b/razf.h @@ -37,6 +37,10 @@ #include #include "zlib.h" +#ifdef _USE_KNETFILE +#include "knetfile.h" +#endif + #if ZLIB_VERNUM < 0x1221 #define _RZ_READONLY struct _gz_header_s; @@ -76,7 +80,14 @@ typedef struct RandomAccessZFile { char mode; /* 'w' : write mode; 'r' : read mode */ int file_type; /* plain file or rz file, razf_read support plain file as input too, in this case, razf_read work as buffered fread */ +#ifdef _USE_KNETFILE + union { + knetFile *fpr; + int fpw; + } x; +#else int filedes; /* the file descriptor */ +#endif z_stream *stream; ZBlockIndex *index; int64_t in, out, end, src_end; diff --git a/razip.c b/razip.c index 2b49883..dff9347 100644 --- a/razip.c +++ b/razip.c @@ -27,7 +27,7 @@ static int write_open(const char *fn, int is_forced) int fd = -1; char c; if (!is_forced) { - if ((fd = open(fn, O_WRONLY | O_CREAT | O_TRUNC | O_EXCL, 0644)) < 0 && errno == EEXIST) { + if ((fd = open(fn, O_WRONLY | O_CREAT | O_TRUNC | O_EXCL, 0666)) < 0 && errno == EEXIST) { printf("razip: %s already exists; do you wish to overwrite (y or n)? ", fn); scanf("%c", &c); if (c != 'Y' && c != 'y') { @@ -37,7 +37,7 @@ static int write_open(const char *fn, int is_forced) } } if (fd < 0) { - if ((fd = open(fn, O_WRONLY | O_CREAT | O_TRUNC, 0644)) < 0) { + if ((fd = open(fn, O_WRONLY | O_CREAT | O_TRUNC, 0666)) < 0) { fprintf(stderr, "razip: %s: Fail to write\n", fn); exit(1); } diff --git a/sam.c b/sam.c index a74c557..ad4325b 100644 --- a/sam.c +++ b/sam.c @@ -12,7 +12,7 @@ bam_header_t *bam_header_dup(const bam_header_t *h0) int i; h = bam_header_init(); *h = *h0; - h->hash = 0; + h->hash = h->dict = h->rg2lib = 0; h->text = (char*)calloc(h->l_text + 1, 1); memcpy(h->text, h0->text, h->l_text); h->target_len = (uint32_t*)calloc(h->n_targets, 4); @@ -21,7 +21,6 @@ bam_header_t *bam_header_dup(const bam_header_t *h0) h->target_len[i] = h0->target_len[i]; h->target_name[i] = strdup(h0->target_name[i]); } - if (h0->rg2lib) h->rg2lib = bam_strmap_dup(h0->rg2lib); return h; } static void append_header_text(bam_header_t *header, char* text, int len) @@ -63,7 +62,6 @@ samfile_t *samopen(const char *fn, const char *mode, const void *aux) fprintf(stderr, "[samopen] no @SQ lines in the header.\n"); } else fprintf(stderr, "[samopen] SAM header is present: %d sequences.\n", fp->header->n_targets); } - sam_header_parse_rg(fp->header); } else if (mode[0] == 'w') { // write fp->header = bam_header_dup((const bam_header_t*)aux); if (mode[1] == 'b') { // binary diff --git a/sam_header.c b/sam_header.c new file mode 100644 index 0000000..a119c02 --- /dev/null +++ b/sam_header.c @@ -0,0 +1,701 @@ +#include "sam_header.h" +#include +#include +#include +#include +#include + +#include "khash.h" +KHASH_MAP_INIT_STR(str, const char *) + +struct _HeaderList +{ + struct _HeaderList *next; + void *data; +}; +typedef struct _HeaderList list_t; +typedef list_t HeaderDict; + +typedef struct +{ + char key[2]; + char *value; +} +HeaderTag; + +typedef struct +{ + char type[2]; + list_t *tags; +} +HeaderLine; + +const char *o_hd_tags[] = {"SO","GO",NULL}; +const char *r_hd_tags[] = {"VN",NULL}; + +const char *o_sq_tags[] = {"AS","M5","UR","SP",NULL}; +const char *r_sq_tags[] = {"SN","LN",NULL}; +const char *u_sq_tags[] = {"SN",NULL}; + +const char *o_rg_tags[] = {"LB","DS","PU","PI","CN","DT","PL",NULL}; +const char *r_rg_tags[] = {"ID",NULL}; +const char *u_rg_tags[] = {"ID",NULL}; + +const char *o_pg_tags[] = {"VN","CL",NULL}; +const char *r_pg_tags[] = {"ID",NULL}; + +const char *types[] = {"HD","SQ","RG","PG","CO",NULL}; +const char **optional_tags[] = {o_hd_tags,o_sq_tags,o_rg_tags,o_pg_tags,NULL,NULL}; +const char **required_tags[] = {r_hd_tags,r_sq_tags,r_rg_tags,r_pg_tags,NULL,NULL}; +const char **unique_tags[] = {NULL, u_sq_tags,u_rg_tags,NULL,NULL,NULL}; + + +static void debug(const char *format, ...) +{ + va_list ap; + va_start(ap, format); + vfprintf(stderr, format, ap); + va_end(ap); +} + +static list_t *list_append(list_t *root, void *data) +{ + list_t *l = root; + while (l && l->next) + l = l->next; + if ( l ) + { + l->next = malloc(sizeof(list_t)); + l = l->next; + } + else + { + l = malloc(sizeof(list_t)); + root = l; + } + l->data = data; + l->next = NULL; + return root; +} + +static void list_free(list_t *root) +{ + list_t *l = root; + while (root) + { + l = root; + root = root->next; + free(l); + } +} + + + +// Look for a tag "XY" in a predefined const char *[] array. +static int tag_exists(const char *tag, const char **tags) +{ + int itag=0; + if ( !tags ) return -1; + while ( tags[itag] ) + { + if ( tags[itag][0]==tag[0] && tags[itag][1]==tag[1] ) return itag; + itag++; + } + return -1; +} + + + +// Mimics the behaviour of getline, except it returns pointer to the next chunk of the text +// or NULL if everything has been read. The lineptr should be freed by the caller. The +// newline character is stripped. +static const char *nextline(char **lineptr, size_t *n, const char *text) +{ + int len; + const char *to = text; + + if ( !*to ) return NULL; + + while ( *to && *to!='\n' && *to!='\r' ) to++; + len = to - text + 1; + + if ( *to ) + { + // Advance the pointer for the next call + if ( *to=='\n' ) to++; + else if ( *to=='\r' && *(to+1)=='\n' ) to+=2; + } + if ( !len ) + return to; + + if ( !*lineptr ) + { + *lineptr = malloc(len); + *n = len; + } + else if ( *nkey[0] = name[0]; + tag->key[1] = name[1]; + tag->value = malloc(len+1); + memcpy(tag->value,value_from,len+1); + tag->value[len] = 0; + return tag; +} + +static HeaderTag *header_line_has_tag(HeaderLine *hline, const char *key) +{ + list_t *tags = hline->tags; + while (tags) + { + HeaderTag *tag = tags->data; + if ( tag->key[0]==key[0] && tag->key[1]==key[1] ) return tag; + tags = tags->next; + } + return NULL; +} + + +// Return codes: +// 0 .. different types or unique tags differ or conflicting tags, cannot be merged +// 1 .. all tags identical -> no need to merge, drop one +// 2 .. the unique tags match and there are some conflicting tags (same tag, different value) -> error, cannot be merged nor duplicated +// 3 .. there are some missing complementary tags and no unique conflict -> can be merged into a single line +static int sam_header_compare_lines(HeaderLine *hline1, HeaderLine *hline2) +{ + HeaderTag *t1, *t2; + + if ( hline1->type[0]!=hline2->type[0] || hline1->type[1]!=hline2->type[1] ) + return 0; + + int itype = tag_exists(hline1->type,types); + if ( itype==-1 ) { + debug("[sam_header_compare_lines] Unknown type [%c%c]\n", hline1->type[0],hline1->type[1]); + return -1; // FIXME (lh3): error; I do not know how this will be handled in Petr's code + } + + if ( unique_tags[itype] ) + { + t1 = header_line_has_tag(hline1,unique_tags[itype][0]); + t2 = header_line_has_tag(hline2,unique_tags[itype][0]); + if ( !t1 || !t2 ) // this should never happen, the unique tags are required + return 2; + + if ( strcmp(t1->value,t2->value) ) + return 0; // the unique tags differ, cannot be merged + } + if ( !required_tags[itype] && !optional_tags[itype] ) + { + t1 = hline1->tags->data; + t2 = hline2->tags->data; + if ( !strcmp(t1->value,t2->value) ) return 1; // identical comments + return 0; + } + + int missing=0, itag=0; + while ( required_tags[itype] && required_tags[itype][itag] ) + { + t1 = header_line_has_tag(hline1,required_tags[itype][itag]); + t2 = header_line_has_tag(hline2,required_tags[itype][itag]); + if ( !t1 && !t2 ) + return 2; // this should never happen + else if ( !t1 || !t2 ) + missing = 1; // there is some tag missing in one of the hlines + else if ( strcmp(t1->value,t2->value) ) + { + if ( unique_tags[itype] ) + return 2; // the lines have a matching unique tag but have a conflicting tag + + return 0; // the lines contain conflicting tags, cannot be merged + } + itag++; + } + itag = 0; + while ( optional_tags[itype] && optional_tags[itype][itag] ) + { + t1 = header_line_has_tag(hline1,optional_tags[itype][itag]); + t2 = header_line_has_tag(hline2,optional_tags[itype][itag]); + if ( !t1 && !t2 ) + { + itag++; + continue; + } + if ( !t1 || !t2 ) + missing = 1; // there is some tag missing in one of the hlines + else if ( strcmp(t1->value,t2->value) ) + { + if ( unique_tags[itype] ) + return 2; // the lines have a matching unique tag but have a conflicting tag + + return 0; // the lines contain conflicting tags, cannot be merged + } + itag++; + } + if ( missing ) return 3; // there are some missing complementary tags with no conflicts, can be merged + return 1; +} + + +static HeaderLine *sam_header_line_clone(const HeaderLine *hline) +{ + list_t *tags; + HeaderLine *out = malloc(sizeof(HeaderLine)); + out->type[0] = hline->type[0]; + out->type[1] = hline->type[1]; + out->tags = NULL; + + tags = hline->tags; + while (tags) + { + HeaderTag *old = tags->data; + + HeaderTag *new = malloc(sizeof(HeaderTag)); + new->key[0] = old->key[0]; + new->key[1] = old->key[1]; + new->value = strdup(old->value); + out->tags = list_append(out->tags, new); + + tags = tags->next; + } + return out; +} + +static int sam_header_line_merge_with(HeaderLine *out_hline, const HeaderLine *tmpl_hline) +{ + list_t *tmpl_tags; + + if ( out_hline->type[0]!=tmpl_hline->type[0] || out_hline->type[1]!=tmpl_hline->type[1] ) + return 0; + + tmpl_tags = tmpl_hline->tags; + while (tmpl_tags) + { + HeaderTag *tmpl_tag = tmpl_tags->data; + HeaderTag *out_tag = header_line_has_tag(out_hline, tmpl_tag->key); + if ( !out_tag ) + { + HeaderTag *tag = malloc(sizeof(HeaderTag)); + tag->key[0] = tmpl_tag->key[0]; + tag->key[1] = tmpl_tag->key[1]; + tag->value = strdup(tmpl_tag->value); + out_hline->tags = list_append(out_hline->tags,tag); + } + tmpl_tags = tmpl_tags->next; + } + return 1; +} + + +static HeaderLine *sam_header_line_parse(const char *headerLine) +{ + HeaderLine *hline; + HeaderTag *tag; + const char *from, *to; + from = headerLine; + + if ( *from != '@' ) { + debug("[sam_header_line_parse] expected '@', got [%s]\n", headerLine); + return 0; + } + to = ++from; + + while (*to && *to!='\t') to++; + if ( to-from != 2 ) { + debug("[sam_header_line_parse] expected '@XY', got [%s]\n", headerLine); + return 0; + } + + hline = malloc(sizeof(HeaderLine)); + hline->type[0] = from[0]; + hline->type[1] = from[1]; + hline->tags = NULL; + + int itype = tag_exists(hline->type, types); + + from = to; + while (*to && *to=='\t') to++; + if ( to-from != 1 ) { + debug("[sam_header_line_parse] multiple tabs on line [%s] (%d)\n", headerLine,(int)(to-from)); + return 0; + } + from = to; + while (*from) + { + while (*to && *to!='\t') to++; + + if ( !required_tags[itype] && !optional_tags[itype] ) + tag = new_tag(" ",from,to-1); + else + tag = new_tag(from,from+3,to-1); + + if ( header_line_has_tag(hline,tag->key) ) + debug("The tag '%c%c' present (at least) twice on line [%s]\n", tag->key[0],tag->key[1], headerLine); + hline->tags = list_append(hline->tags, tag); + + from = to; + while (*to && *to=='\t') to++; + if ( *to && to-from != 1 ) { + debug("[sam_header_line_parse] multiple tabs on line [%s] (%d)\n", headerLine,(int)(to-from)); + return 0; + } + + from = to; + } + return hline; +} + + +// Must be of an existing type, all tags must be recognised and all required tags must be present +static int sam_header_line_validate(HeaderLine *hline) +{ + list_t *tags; + HeaderTag *tag; + int itype, itag; + + // Is the type correct? + itype = tag_exists(hline->type, types); + if ( itype==-1 ) + { + debug("The type [%c%c] not recognised.\n", hline->type[0],hline->type[1]); + return 0; + } + + // Has all required tags? + itag = 0; + while ( required_tags[itype] && required_tags[itype][itag] ) + { + if ( !header_line_has_tag(hline,required_tags[itype][itag]) ) + { + debug("The tag [%c%c] required for [%c%c] not present.\n", required_tags[itype][itag][0],required_tags[itype][itag][1], + hline->type[0],hline->type[1]); + return 0; + } + itag++; + } + + // Are all tags recognised? + tags = hline->tags; + while ( tags ) + { + tag = tags->data; + if ( !tag_exists(tag->key,required_tags[itype]) && !tag_exists(tag->key,optional_tags[itype]) ) + { + debug("Unknown tag [%c%c] for [%c%c].\n", tag->key[0],tag->key[1], hline->type[0],hline->type[1]); + return 0; + } + tags = tags->next; + } + + return 1; +} + + +static void print_header_line(FILE *fp, HeaderLine *hline) +{ + list_t *tags = hline->tags; + HeaderTag *tag; + + fprintf(fp, "@%c%c", hline->type[0],hline->type[1]); + while (tags) + { + tag = tags->data; + + fprintf(fp, "\t"); + if ( tag->key[0]!=' ' || tag->key[1]!=' ' ) + fprintf(fp, "%c%c:", tag->key[0],tag->key[1]); + fprintf(fp, "%s", tag->value); + + tags = tags->next; + } + fprintf(fp,"\n"); +} + + +static void sam_header_line_free(HeaderLine *hline) +{ + list_t *tags = hline->tags; + while (tags) + { + HeaderTag *tag = tags->data; + free(tag->value); + free(tag); + tags = tags->next; + } + list_free(hline->tags); + free(hline); +} + +void sam_header_free(void *_header) +{ + HeaderDict *header = (HeaderDict*)_header; + list_t *hlines = header; + while (hlines) + { + sam_header_line_free(hlines->data); + hlines = hlines->next; + } + list_free(header); +} + +HeaderDict *sam_header_clone(const HeaderDict *dict) +{ + HeaderDict *out = NULL; + while (dict) + { + HeaderLine *hline = dict->data; + out = list_append(out, sam_header_line_clone(hline)); + dict = dict->next; + } + return out; +} + +// Returns a newly allocated string +char *sam_header_write(const void *_header) +{ + const HeaderDict *header = (const HeaderDict*)_header; + char *out = NULL; + int len=0, nout=0; + const list_t *hlines; + + // Calculate the length of the string to allocate + hlines = header; + while (hlines) + { + len += 4; // @XY and \n + + HeaderLine *hline = hlines->data; + list_t *tags = hline->tags; + while (tags) + { + HeaderTag *tag = tags->data; + len += strlen(tag->value) + 1; // \t + if ( tag->key[0]!=' ' || tag->key[1]!=' ' ) + len += strlen(tag->value) + 3; // XY: + tags = tags->next; + } + hlines = hlines->next; + } + + nout = 0; + out = malloc(len+1); + hlines = header; + while (hlines) + { + HeaderLine *hline = hlines->data; + + nout += sprintf(out+nout,"@%c%c",hline->type[0],hline->type[1]); + + list_t *tags = hline->tags; + while (tags) + { + HeaderTag *tag = tags->data; + nout += sprintf(out+nout,"\t"); + if ( tag->key[0]!=' ' || tag->key[1]!=' ' ) + nout += sprintf(out+nout,"%c%c:", tag->key[0],tag->key[1]); + nout += sprintf(out+nout,"%s", tag->value); + tags = tags->next; + } + hlines = hlines->next; + nout += sprintf(out+nout,"\n"); + } + out[len] = 0; + return out; +} + +void *sam_header_parse2(const char *headerText) +{ + list_t *hlines = NULL; + HeaderLine *hline; + const char *text; + char *buf=NULL; + size_t nbuf = 0; + + if ( !headerText ) + return 0; + + text = headerText; + while ( (text=nextline(&buf, &nbuf, text)) ) + { + hline = sam_header_line_parse(buf); + if ( hline && sam_header_line_validate(hline) ) + hlines = list_append(hlines, hline); + else + { + if (hline) sam_header_line_free(hline); + sam_header_free(hlines); + if ( buf ) free(buf); + return NULL; + } + } + if ( buf ) free(buf); + + return hlines; +} + +void *sam_header2tbl(const void *_dict, char type[2], char key_tag[2], char value_tag[2]) +{ + const HeaderDict *dict = (const HeaderDict*)_dict; + const list_t *l = dict; + khash_t(str) *tbl = kh_init(str); + khiter_t k; + int ret; + + if (_dict == 0) return tbl; // return an empty (not null) hash table + while (l) + { + HeaderLine *hline = l->data; + if ( hline->type[0]!=type[0] || hline->type[1]!=type[1] ) + { + l = l->next; + continue; + } + + HeaderTag *key, *value; + key = header_line_has_tag(hline,key_tag); + value = header_line_has_tag(hline,value_tag); + if ( !key || !value ) + { + l = l->next; + continue; + } + + k = kh_get(str, tbl, key->value); + if ( k != kh_end(tbl) ) + debug("[sam_header_lookup_table] They key %s not unique.\n", key->value); + k = kh_put(str, tbl, key->value, &ret); + kh_value(tbl, k) = value->value; + + l = l->next; + } + return tbl; +} + +char **sam_header2list(const void *_dict, char type[2], char key_tag[2], int *_n) +{ + const HeaderDict *dict = (const HeaderDict*)_dict; + const list_t *l = dict; + int max, n; + char **ret; + + ret = 0; *_n = max = n = 0; + while (l) + { + HeaderLine *hline = l->data; + if ( hline->type[0]!=type[0] || hline->type[1]!=type[1] ) + { + l = l->next; + continue; + } + + HeaderTag *key; + key = header_line_has_tag(hline,key_tag); + if ( !key ) + { + l = l->next; + continue; + } + + if (n == max) { + max = max? max<<1 : 4; + ret = realloc(ret, max * sizeof(void*)); + } + ret[n++] = key->value; + + l = l->next; + } + *_n = n; + return ret; +} + +const char *sam_tbl_get(void *h, const char *key) +{ + khash_t(str) *tbl = (khash_t(str)*)h; + khint_t k; + k = kh_get(str, tbl, key); + return k == kh_end(tbl)? 0 : kh_val(tbl, k); +} + +int sam_tbl_size(void *h) +{ + khash_t(str) *tbl = (khash_t(str)*)h; + return h? kh_size(tbl) : 0; +} + +void sam_tbl_destroy(void *h) +{ + khash_t(str) *tbl = (khash_t(str)*)h; + kh_destroy(str, tbl); +} + +void *sam_header_merge(int n, const void **_dicts) +{ + const HeaderDict **dicts = (const HeaderDict**)_dicts; + HeaderDict *out_dict; + int idict, status; + + if ( n<2 ) return NULL; + + out_dict = sam_header_clone(dicts[0]); + + for (idict=1; idictdata, out_hlines->data); + if ( status==0 ) + { + out_hlines = out_hlines->next; + continue; + } + + if ( status==2 ) + { + print_header_line(stderr,tmpl_hlines->data); + print_header_line(stderr,out_hlines->data); + debug("Conflicting lines, cannot merge the headers.\n"); + return 0; + } + if ( status==3 ) + sam_header_line_merge_with(out_hlines->data, tmpl_hlines->data); + + inserted = 1; + break; + } + if ( !inserted ) + out_dict = list_append(out_dict, sam_header_line_clone(tmpl_hlines->data)); + + tmpl_hlines = tmpl_hlines->next; + } + } + + return out_dict; +} + + diff --git a/sam_header.h b/sam_header.h new file mode 100644 index 0000000..e5c754f --- /dev/null +++ b/sam_header.h @@ -0,0 +1,24 @@ +#ifndef __SAM_HEADER_H__ +#define __SAM_HEADER_H__ + +#ifdef __cplusplus +extern "C" { +#endif + + void *sam_header_parse2(const char *headerText); + void *sam_header_merge(int n, const void **dicts); + void sam_header_free(void *header); + char *sam_header_write(const void *headerDict); // returns a newly allocated string + + char **sam_header2list(const void *_dict, char type[2], char key_tag[2], int *_n); + + void *sam_header2tbl(const void *dict, char type[2], char key_tag[2], char value_tag[2]); + const char *sam_tbl_get(void *h, const char *key); + int sam_tbl_size(void *h); + void sam_tbl_destroy(void *h); + +#ifdef __cplusplus +} +#endif + +#endif diff --git a/sam_view.c b/sam_view.c index 113c6c4..06dd01a 100644 --- a/sam_view.c +++ b/sam_view.c @@ -2,26 +2,45 @@ #include #include #include +#include +#include "sam_header.h" #include "sam.h" #include "faidx.h" static int g_min_mapQ = 0, g_flag_on = 0, g_flag_off = 0; static char *g_library, *g_rg; +static int g_sol2sanger_tbl[128]; + +static void sol2sanger(bam1_t *b) +{ + int l; + uint8_t *qual = bam1_qual(b); + if (g_sol2sanger_tbl[30] == 0) { + for (l = 0; l != 128; ++l) { + g_sol2sanger_tbl[l] = (int)(10.0 * log(1.0 + pow(10.0, (l - 64 + 33) / 10.0)) / log(10.0) + .499); + if (g_sol2sanger_tbl[l] >= 93) g_sol2sanger_tbl[l] = 93; + } + } + for (l = 0; l < b->core.l_qseq; ++l) { + int q = qual[l]; + if (q > 127) q = 127; + qual[l] = g_sol2sanger_tbl[q]; + } +} static inline int __g_skip_aln(const bam_header_t *h, const bam1_t *b) { if (b->core.qual < g_min_mapQ || ((b->core.flag & g_flag_on) != g_flag_on) || (b->core.flag & g_flag_off)) return 1; - if (g_library || g_rg) { + if (g_rg) { uint8_t *s = bam_aux_get(b, "RG"); - if (s) { - if (g_rg && strcmp(g_rg, (char*)(s + 1)) == 0) return 0; - if (g_library) { - const char *p = bam_strmap_get(h->rg2lib, (char*)(s + 1)); - return (p && strcmp(p, g_library) == 0)? 0 : 1; - } return 1; - } else return 1; - } else return 0; + if (s && strcmp(g_rg, (char*)(s + 1)) == 0) return 0; + } + if (g_library) { + const char *p = bam_get_library((bam_header_t*)h, b); + return (p && strcmp(p, g_library) == 0)? 0 : 1; + } + return 0; } // callback function for bam_fetch() @@ -36,15 +55,16 @@ static int usage(int is_long_help); int main_samview(int argc, char *argv[]) { - int c, is_header = 0, is_header_only = 0, is_bamin = 1, ret = 0, is_uncompressed = 0, is_bamout = 0; + int c, is_header = 0, is_header_only = 0, is_bamin = 1, ret = 0, is_uncompressed = 0, is_bamout = 0, slx2sngr = 0; int of_type = BAM_OFDEC, is_long_help = 0; samfile_t *in = 0, *out = 0; char in_mode[5], out_mode[5], *fn_out = 0, *fn_list = 0, *fn_ref = 0; /* parse command-line options */ strcpy(in_mode, "r"); strcpy(out_mode, "w"); - while ((c = getopt(argc, argv, "Sbt:hHo:q:f:F:ul:r:xX?T:")) >= 0) { + while ((c = getopt(argc, argv, "Sbt:hHo:q:f:F:ul:r:xX?T:C")) >= 0) { switch (c) { + case 'C': slx2sngr = 1; break; case 'S': is_bamin = 0; break; case 'b': is_bamout = 1; break; case 't': fn_list = strdup(optarg); is_bamin = 0; break; @@ -96,9 +116,12 @@ int main_samview(int argc, char *argv[]) if (argc == optind + 1) { // convert/print the entire file bam1_t *b = bam_init1(); int r; - while ((r = samread(in, b)) >= 0) // read one alignment from `in' - if (!__g_skip_aln(in->header, b)) + while ((r = samread(in, b)) >= 0) { // read one alignment from `in' + if (!__g_skip_aln(in->header, b)) { + if (slx2sngr) sol2sanger(b); samwrite(out, b); // write the alignment to `out' + } + } if (r < -1) fprintf(stderr, "[main_samview] truncated file.\n"); bam_destroy1(b); } else { // retrieve alignments in specified regions diff --git a/samtools.1 b/samtools.1 index d2c78f1..31375f3 100644 --- a/samtools.1 +++ b/samtools.1 @@ -1,4 +1,4 @@ -.TH samtools 1 "2 September 2009" "samtools-0.1.6" "Bioinformatics tools" +.TH samtools 1 "10 November 2009" "samtools-0.1.7" "Bioinformatics tools" .SH NAME .PP samtools - Utilities for the Sequence Alignment/Map (SAM) format @@ -123,8 +123,9 @@ is specified, all the alignments will be printed; otherwise only alignments overlapping the specified regions will be output. An alignment may be given multiple times if it is overlapping several regions. A region can be presented, for example, in the following -format: `chr2', `chr2:1000000' or `chr2:1,000,000-2,000,000'. The -coordinate is 1-based. +format: `chr2' (the whole chr2), `chr2:1000000' (region starting from +1,000,000bp) or `chr2:1,000,000-2,000,000' (region between 1,000,000 and +2,000,000bp including the end points). The coordinate is 1-based. .B OPTIONS: .RS @@ -220,14 +221,16 @@ mapping quality. A symbol `$' marks the end of a read segment. If option .B -c -is applied, the consensus base, consensus quality, SNP quality and RMS -mapping quality of the reads covering the site will be inserted between -the `reference base' and the `read bases' columns. An indel occupies an -additional line. Each indel line consists of chromosome name, -coordinate, a star, the genotype, consensus quality, SNP quality, RMS -mapping quality, # covering reads, the first alllele, the second allele, -# reads supporting the first allele, # reads supporting the second -allele and # reads containing indels different from the top two alleles. +is applied, the consensus base, Phred-scaled consensus quality, SNP +quality (i.e. the Phred-scaled probability of the consensus being +identical to the reference) and root mean square (RMS) mapping quality +of the reads covering the site will be inserted between the `reference +base' and the `read bases' columns. An indel occupies an additional +line. Each indel line consists of chromosome name, coordinate, a star, +the genotype, consensus quality, SNP quality, RMS mapping quality, # +covering reads, the first alllele, the second allele, # reads supporting +the first allele, # reads supporting the second allele and # reads +containing indels different from the top two alleles. .B OPTIONS: .RS @@ -322,8 +325,6 @@ Text alignment viewer (based on the ncurses library). In the viewer, press `?' for help and press `g' to check the alignment start from a region in the format like `chr10:10,000,000'. -.RE - .TP .B fixmate samtools fixmate @@ -341,8 +342,6 @@ This command .B ONLY works with FR orientation and requires ISIZE is correctly set. -.RE - .TP .B rmdupse samtools rmdupse @@ -350,8 +349,6 @@ samtools rmdupse Remove potential duplicates for single-ended reads. This command will treat all reads as single-ended even if they are paired in fact. -.RE - .TP .B fillmd samtools fillmd [-e] diff --git a/samtools.txt b/samtools.txt index 63e6a25..feec238 100644 --- a/samtools.txt +++ b/samtools.txt @@ -103,35 +103,38 @@ COMMANDS AND OPTIONS otherwise only alignments overlapping the specified regions will be output. An alignment may be given multiple times if it is overlapping several regions. A region can be presented, - for example, in the following format: `chr2', `chr2:1000000' - or `chr2:1,000,000-2,000,000'. The coordinate is 1-based. + for example, in the following format: `chr2' (the whole + chr2), `chr2:1000000' (region starting from 1,000,000bp) or + `chr2:1,000,000-2,000,000' (region between 1,000,000 and + 2,000,000bp including the end points). The coordinate is + 1-based. OPTIONS: -b Output in the BAM format. -u Output uncompressed BAM. This option saves time spent - on compression/decomprssion and is thus preferred + on compression/decomprssion and is thus preferred when the output is piped to another samtools command. -h Include the header in the output. -H Output the header only. - -S Input is in SAM. If @SQ header lines are absent, the + -S Input is in SAM. If @SQ header lines are absent, the `-t' option is required. - -t FILE This file is TAB-delimited. Each line must contain - the reference name and the length of the reference, - one line for each distinct reference; additional - fields are ignored. This file also defines the order - of the reference sequences in sorting. If you run - `samtools faidx ', the resultant index file - .fai can be used as this file. + -t FILE This file is TAB-delimited. Each line must contain + the reference name and the length of the reference, + one line for each distinct reference; additional + fields are ignored. This file also defines the order + of the reference sequences in sorting. If you run + `samtools faidx ', the resultant index file + .fai can be used as this file. -o FILE Output file [stdout] - -f INT Only output alignments with all bits in INT present + -f INT Only output alignments with all bits in INT present in the FLAG field. INT can be in hex in the format of /^0x[0-9A-F]+/ [0] @@ -146,58 +149,60 @@ COMMANDS AND OPTIONS faidx samtools faidx [region1 [...]] - Index reference sequence in the FASTA format or extract sub- - sequence from indexed reference sequence. If no region is + Index reference sequence in the FASTA format or extract sub- + sequence from indexed reference sequence. If no region is specified, faidx will index the file and create - .fai on the disk. If regions are speficified, the - subsequences will be retrieved and printed to stdout in the - FASTA format. The input file can be compressed in the RAZF + .fai on the disk. If regions are speficified, the + subsequences will be retrieved and printed to stdout in the + FASTA format. The input file can be compressed in the RAZF format. - pileup samtools pileup [-f in.ref.fasta] [-t in.ref_list] [-l - in.site_list] [-iscgS2] [-T theta] [-N nHap] [-r + pileup samtools pileup [-f in.ref.fasta] [-t in.ref_list] [-l + in.site_list] [-iscgS2] [-T theta] [-N nHap] [-r pairDiffRate] | - Print the alignment in the pileup format. In the pileup for- - mat, each line represents a genomic position, consisting of + Print the alignment in the pileup format. In the pileup for- + mat, each line represents a genomic position, consisting of chromosome name, coordinate, reference base, read bases, read - qualities and alignment mapping qualities. Information on + qualities and alignment mapping qualities. Information on match, mismatch, indel, strand, mapping quality and start and - end of a read are all encoded at the read base column. At - this column, a dot stands for a match to the reference base - on the forward strand, a comma for a match on the reverse - strand, `ACGTN' for a mismatch on the forward strand and - `acgtn' for a mismatch on the reverse strand. A pattern - `\+[0-9]+[ACGTNacgtn]+' indicates there is an insertion - between this reference position and the next reference posi- - tion. The length of the insertion is given by the integer in - the pattern, followed by the inserted sequence. Similarly, a + end of a read are all encoded at the read base column. At + this column, a dot stands for a match to the reference base + on the forward strand, a comma for a match on the reverse + strand, `ACGTN' for a mismatch on the forward strand and + `acgtn' for a mismatch on the reverse strand. A pattern + `\+[0-9]+[ACGTNacgtn]+' indicates there is an insertion + between this reference position and the next reference posi- + tion. The length of the insertion is given by the integer in + the pattern, followed by the inserted sequence. Similarly, a pattern `-[0-9]+[ACGTNacgtn]+' represents a deletion from the - reference. The deleted bases will be presented as `*' in the - following lines. Also at the read base column, a symbol `^' - marks the start of a read segment which is a contiguous sub- - sequence on the read separated by `N/S/H' CIGAR operations. - The ASCII of the character following `^' minus 33 gives the - mapping quality. A symbol `$' marks the end of a read seg- + reference. The deleted bases will be presented as `*' in the + following lines. Also at the read base column, a symbol `^' + marks the start of a read segment which is a contiguous sub- + sequence on the read separated by `N/S/H' CIGAR operations. + The ASCII of the character following `^' minus 33 gives the + mapping quality. A symbol `$' marks the end of a read seg- ment. - If option -c is applied, the consensus base, consensus qual- - ity, SNP quality and RMS mapping quality of the reads cover- - ing the site will be inserted between the `reference base' - and the `read bases' columns. An indel occupies an additional - line. Each indel line consists of chromosome name, coordi- - nate, a star, the genotype, consensus quality, SNP quality, + If option -c is applied, the consensus base, Phred-scaled + consensus quality, SNP quality (i.e. the Phred-scaled proba- + bility of the consensus being identical to the reference) and + root mean square (RMS) mapping quality of the reads covering + the site will be inserted between the `reference base' and + the `read bases' columns. An indel occupies an additional + line. Each indel line consists of chromosome name, coordi- + nate, a star, the genotype, consensus quality, SNP quality, RMS mapping quality, # covering reads, the first alllele, the - second allele, # reads supporting the first allele, # reads - supporting the second allele and # reads containing indels + second allele, # reads supporting the first allele, # reads + supporting the second allele and # reads containing indels different from the top two alleles. OPTIONS: - -s Print the mapping quality as the last column. This - option makes the output easier to parse, although + -s Print the mapping quality as the last column. This + option makes the output easier to parse, although this format is not space efficient. @@ -207,62 +212,61 @@ COMMANDS AND OPTIONS -i Only output pileup lines containing indels. - -f FILE The reference sequence in the FASTA format. Index + -f FILE The reference sequence in the FASTA format. Index file FILE.fai will be created if absent. -M INT Cap mapping quality at INT [60] - -t FILE List of reference names ane sequence lengths, in - the format described for the import command. If - this option is present, samtools assumes the input + -t FILE List of reference names ane sequence lengths, in + the format described for the import command. If + this option is present, samtools assumes the input is in SAM format; otherwise it assumes in BAM format. - -l FILE List of sites at which pileup is output. This file - is space delimited. The first two columns are - required to be chromosome and 1-based coordinate. - Additional columns are ignored. It is recommended + -l FILE List of sites at which pileup is output. This file + is space delimited. The first two columns are + required to be chromosome and 1-based coordinate. + Additional columns are ignored. It is recommended to use option -s together with -l as in the default format we may not know the mapping quality. - -c Call the consensus sequence using MAQ consensus + -c Call the consensus sequence using MAQ consensus model. Options -T, -N, -I and -r are only effective when -c or -g is in use. - -g Generate genotype likelihood in the binary GLFv3 + -g Generate genotype likelihood in the binary GLFv3 format. This option suppresses -c, -i and -s. - -T FLOAT The theta parameter (error dependency coefficient) + -T FLOAT The theta parameter (error dependency coefficient) in the maq consensus calling model [0.85] -N INT Number of haplotypes in the sample (>=2) [2] - -r FLOAT Expected fraction of differences between a pair of + -r FLOAT Expected fraction of differences between a pair of haplotypes [0.001] - -I INT Phred probability of an indel in sequencing/prep. + -I INT Phred probability of an indel in sequencing/prep. [40] tview samtools tview [ref.fasta] - Text alignment viewer (based on the ncurses library). In the - viewer, press `?' for help and press `g' to check the align- - ment start from a region in the format like + Text alignment viewer (based on the ncurses library). In the + viewer, press `?' for help and press `g' to check the align- + ment start from a region in the format like `chr10:10,000,000'. - fixmate samtools fixmate Fill in mate coordinates, ISIZE and mate related flags from a @@ -271,37 +275,35 @@ COMMANDS AND OPTIONS rmdup samtools rmdup - Remove potential PCR duplicates: if multiple read pairs have - identical external coordinates, only retain the pair with - highest mapping quality. This command ONLY works with FR + Remove potential PCR duplicates: if multiple read pairs have + identical external coordinates, only retain the pair with + highest mapping quality. This command ONLY works with FR orientation and requires ISIZE is correctly set. - rmdupse samtools rmdupse Remove potential duplicates for single-ended reads. This com- - mand will treat all reads as single-ended even if they are + mand will treat all reads as single-ended even if they are paired in fact. - fillmd samtools fillmd [-e] - Generate the MD tag. If the MD tag is already present, this - command will give a warning if the MD tag generated is dif- + Generate the MD tag. If the MD tag is already present, this + command will give a warning if the MD tag generated is dif- ferent from the existing tag. OPTIONS: - -e Convert a the read base to = if it is identical to - the aligned reference base. Indel caller does not + -e Convert a the read base to = if it is identical to + the aligned reference base. Indel caller does not support the = bases at the moment. SAM FORMAT - SAM is TAB-delimited. Apart from the header lines, which are started + SAM is TAB-delimited. Apart from the header lines, which are started with the `@' symbol, each alignment line consists of: @@ -342,15 +344,15 @@ SAM FORMAT +-------+--------------------------------------------------+ LIMITATIONS - o Unaligned words used in bam_import.c, bam_endian.h, bam.c and + o Unaligned words used in bam_import.c, bam_endian.h, bam.c and bam_aux.c. o CIGAR operation P is not properly handled at the moment. - o In merging, the input files are required to have the same number of - reference sequences. The requirement can be relaxed. In addition, - merging does not reconstruct the header dictionaries automatically. - Endusers have to provide the correct header. Picard is better at + o In merging, the input files are required to have the same number of + reference sequences. The requirement can be relaxed. In addition, + merging does not reconstruct the header dictionaries automatically. + Endusers have to provide the correct header. Picard is better at merging. o Samtools' rmdup does not work for single-end data and does not remove @@ -358,10 +360,10 @@ LIMITATIONS AUTHOR - Heng Li from the Sanger Institute wrote the C version of samtools. Bob + Heng Li from the Sanger Institute wrote the C version of samtools. Bob Handsaker from the Broad Institute implemented the BGZF library and Jue - Ruan from Beijing Genomics Institute wrote the RAZF library. Various - people in the 1000Genomes Project contributed to the SAM format speci- + Ruan from Beijing Genomics Institute wrote the RAZF library. Various + people in the 1000Genomes Project contributed to the SAM format speci- fication. @@ -370,4 +372,4 @@ SEE ALSO -samtools-0.1.6 2 September 2009 samtools(1) +samtools-0.1.7 10 November 2009 samtools(1) -- 2.30.2