X-Git-Url: http://woldlab.caltech.edu/gitweb/?p=samtools.git;a=blobdiff_plain;f=samtools.1;h=31375f323f14f1df17a1f2b3b6185974d94c7b47;hp=45e16123948c1e72ba7193a68500e4430cc0d1c4;hb=8af6e7eac6605aab3bd8b1a1f874309b59789b08;hpb=b27e00385f41769d03a8cca4dbd71275fc9fa906 diff --git a/samtools.1 b/samtools.1 index 45e1612..31375f3 100644 --- a/samtools.1 +++ b/samtools.1 @@ -1,4 +1,4 @@ -.TH samtools 1 "6 July 2009" "samtools-0.1.5" "Bioinformatics tools" +.TH samtools 1 "10 November 2009" "samtools-0.1.7" "Bioinformatics tools" .SH NAME .PP samtools - Utilities for the Sequence Alignment/Map (SAM) format @@ -33,12 +33,11 @@ output (stdout). Several commands can thus be combined with Unix pipes. Samtools always output warning and error messages to the standard error output (stderr). -Samtools is also able to open a BAM (not SAM) file on a remote FTP -server if the BAM file name starts with `ftp://'. Samtools checks the -current working directory for the index file and will download the index -upon absence. Samtools achieves random FTP file access with the `REST' -ftp command. It does not retrieve the entire alignment file unless it is -asked to do so. +Samtools is also able to open a BAM (not SAM) file on a remote FTP or +HTTP server if the BAM file name starts with `ftp://' or `http://'. +Samtools checks the current working directory for the index file and +will download the index upon absence. Samtools does not retrieve the +entire alignment file unless it is asked to do so. .SH COMMANDS AND OPTIONS @@ -73,17 +72,34 @@ Approximately the maximum required memory. [500000000] .TP .B merge -samtools merge [-n] [...] - -Merge multiple sorted alignments. The header of -.I +samtools merge [-h inh.sam] [-n] [...] + +Merge multiple sorted alignments. +The header reference lists of all the input BAM files, and the @SQ headers of +.IR inh.sam , +if any, must all refer to the same set of reference sequences. +The header reference list and (unless overridden by +.BR -h ) +`@' headers of +.I in1.bam will be copied to -.I +.IR out.bam , and the headers of other files will be ignored. .B OPTIONS: .RS .TP 8 +.B -h FILE +Use the lines of +.I FILE +as `@' headers to be copied to +.IR out.bam , +replacing any header lines that would otherwise be copied from +.IR in1.bam . +.RI ( FILE +is actually in SAM format, though any alignment records it may contain +are ignored.) +.TP .B -n The input alignments are sorted by read names rather than by chromosomal coordinates @@ -107,8 +123,9 @@ is specified, all the alignments will be printed; otherwise only alignments overlapping the specified regions will be output. An alignment may be given multiple times if it is overlapping several regions. A region can be presented, for example, in the following -format: `chr2', `chr2:1000000' or `chr2:1,000,000-2,000,000'. The -coordinate is 1-based. +format: `chr2' (the whole chr2), `chr2:1000000' (region starting from +1,000,000bp) or `chr2:1,000,000-2,000,000' (region between 1,000,000 and +2,000,000bp including the end points). The coordinate is 1-based. .B OPTIONS: .RS @@ -204,14 +221,16 @@ mapping quality. A symbol `$' marks the end of a read segment. If option .B -c -is applied, the consensus base, consensus quality, SNP quality and RMS -mapping quality of the reads covering the site will be inserted between -the `reference base' and the `read bases' columns. An indel occupies an -additional line. Each indel line consists of chromosome name, -coordinate, a star, the genotype, consensus quality, SNP quality, RMS -mapping quality, # covering reads, the first alllele, the second allele, -# reads supporting the first allele, # reads supporting the second -allele and # reads containing indels different from the top two alleles. +is applied, the consensus base, Phred-scaled consensus quality, SNP +quality (i.e. the Phred-scaled probability of the consensus being +identical to the reference) and root mean square (RMS) mapping quality +of the reads covering the site will be inserted between the `reference +base' and the `read bases' columns. An indel occupies an additional +line. Each indel line consists of chromosome name, coordinate, a star, +the genotype, consensus quality, SNP quality, RMS mapping quality, # +covering reads, the first alllele, the second allele, # reads supporting +the first allele, # reads supporting the second allele and # reads +containing indels different from the top two alleles. .B OPTIONS: .RS @@ -304,11 +323,7 @@ samtools tview [ref.fasta] Text alignment viewer (based on the ncurses library). In the viewer, press `?' for help and press `g' to check the alignment start from a -region in the format like `chr10:10,000,000'. Note that if the region -showed on the screen contains no mapped reads, a blank screen will be -seen. This is a known issue and will be improved later. - -.RE +region in the format like `chr10:10,000,000'. .TP .B fixmate @@ -327,8 +342,6 @@ This command .B ONLY works with FR orientation and requires ISIZE is correctly set. -.RE - .TP .B rmdupse samtools rmdupse @@ -336,8 +349,6 @@ samtools rmdupse Remove potential duplicates for single-ended reads. This command will treat all reads as single-ended even if they are paired in fact. -.RE - .TP .B fillmd samtools fillmd [-e] @@ -408,6 +419,15 @@ _ Unaligned words used in bam_import.c, bam_endian.h, bam.c and bam_aux.c. .IP o 2 CIGAR operation P is not properly handled at the moment. +.IP o 2 +In merging, the input files are required to have the same number of +reference sequences. The requirement can be relaxed. In addition, +merging does not reconstruct the header dictionaries +automatically. Endusers have to provide the correct header. Picard is +better at merging. +.IP o 2 +Samtools' rmdup does not work for single-end data and does not remove +duplicates across chromosomes. Picard is better. .SH AUTHOR .PP @@ -419,4 +439,4 @@ specification. .SH SEE ALSO .PP -Samtools website: http://samtools.sourceforge.net +Samtools website: