5 [Bowtie] is an ultrafast, memory-efficient short read aligner geared
6 toward quickly aligning large sets of short DNA sequences (reads) to
7 large genomes. It aligns 35-base-pair reads to the human genome at a
8 rate of 25 million reads per hour on a typical workstation. Bowtie
9 indexes the genome with a [Burrows-Wheeler] index to keep its memory
10 footprint small: for the human genome, the index is typically about
11 2.2 GB (for unpaired alignment) or 2.9 GB (for paired-end or colorspace
12 alignment). Multiple processors can be used simultaneously to achieve
13 greater alignment speed. Bowtie can also output alignments in the
14 standard [SAM] format, allowing Bowtie to interoperate with other tools
15 supporting SAM, including the [SAMtools] consensus, SNP, and indel
16 callers. Bowtie runs on the command line under Windows, Mac OS X,
19 [Bowtie] also forms the basis for other tools, including [TopHat]: a
20 fast splice junction mapper for RNA-seq reads, [Cufflinks]: a tool for
21 transcriptome assembly and isoform quantitiation from RNA-seq reads,
22 [Crossbow]: a cloud-computing software tool for large-scale
23 resequencing data,and [Myrna]: a cloud computing tool for calculating
24 differential gene expression in large RNA-seq datasets.
26 If you use [Bowtie] for your published research, please cite the
29 [Bowtie]: http://bowtie-bio.sf.net
30 [Burrows-Wheeler]: http://en.wikipedia.org/wiki/Burrows-Wheeler_transform
31 [SAM]: http://samtools.sourceforge.net/SAM1.pdf
32 [SAMtools]: http://samtools.sourceforge.net/
33 [TopHat]: http://tophat.cbcb.umd.edu/
34 [Cufflinks]: http://cufflinks.cbcb.umd.edu/
35 [Crossbow]: http://bowtie-bio.sf.net/crossbow
36 [Myrna]: http://bowtie-bio.sf.net/myrna
37 [Bowtie paper]: http://genomebiology.com/2009/10/3/R25
42 Bowtie is not a general-purpose alignment tool like [MUMmer], [BLAST]
43 or [Vmatch]. Bowtie works best when aligning short reads to large
44 genomes, though it supports arbitrarily small reference sequences (e.g.
45 amplicons) and reads as long as 1024 bases. Bowtie is designed to be
46 extremely fast for sets of short reads where (a) many of the reads have
47 at least one good, valid alignment, (b) many of the reads are
48 relatively high-quality, and (c) the number of alignments reported per
49 read is small (close to 1).
51 Bowtie does not yet report gapped alignments; this is future work.
53 [MUMmer]: http://mummer.sourceforge.net/
54 [BLAST]: http://blast.ncbi.nlm.nih.gov/Blast.cgi
55 [Vmatch]: http://www.vmatch.de/
60 You may download either Bowtie sources or binaries for your platform
61 from the [Download] section of the Sourceforge project site. Binaries
62 are currently available for Intel architectures (`i386` and `x86_64`)
63 running Linux, Windows, and Mac OS X.
68 Building Bowtie from source requires a GNU-like environment that
69 includes GCC, GNU Make and other basics. It should be possible to
70 build Bowtie on a vanilla Linux or Mac installation. Bowtie can also
71 be built on Windows using [Cygwin] or [MinGW]. We recommend
72 [TDM's MinGW Build]. If using [MinGW], you must also have [MSYS]
75 To build Bowtie, extract the sources, change to the extracted
76 directory, and run GNU `make` (usually with the command `make`, but
77 sometimes with `gmake`) with no arguments. If building with [MinGW],
78 run `make` from the [MSYS] command line.
80 To support the `-p` (multithreading) option, Bowtie needs the
81 `pthreads` library. To compile Bowtie without `pthreads` (which
82 disables `-p`), use `make BOWTIE_PTHREADS=0`.
84 [Cygwin]: http://www.cygwin.com/
85 [MinGW]: http://www.mingw.org/
86 [TDM's MinGW Build]: http://www.tdragon.net/recentgcc/
87 [MSYS]: http://www.mingw.org/wiki/msys
88 [Download]: https://sourceforge.net/projects/bowtie-bio/files/bowtie/
93 `bowtie` takes an index and a set of reads as input and outputs a list
94 of alignments. Alignments are selected according to a combination of
95 the `-v`/`-n`/`-e`/`-l` options (plus the `-I`/`-X`/`--fr`/`--rf`/
96 `--ff` options for paired-end alignment), which define which alignments
97 are legal, and the `-k`/`-a`/`-m`/`-M`/`--best`/`--strata` options
98 which define which and how many legal alignments should be reported.
100 By default, Bowtie enforces an alignment policy similar to [Maq]'s
101 default quality-aware policy (`-n` 2 `-l` 28 `-e` 70). See [the -n
102 alignment mode] section of the manual for details about this mode. But
103 Bowtie can also enforce a simpler end-to-end k-difference policy (e.g.
104 with `-v` 2). See [the -v alignment mode] section of the manual for
105 details about that mode. [The -n alignment mode] and [the -v alignment
106 mode] are mutually exclusive.
108 Bowtie works best when aligning short reads to large genomes (e.g.
109 human or mouse), though it supports arbitrarily small reference
110 sequences and reads as long as 1024 bases. Bowtie is designed to be
111 very fast for sets of short reads where a) many reads have at least one
112 good, valid alignment, b) many reads are relatively high-quality, c)
113 the number of alignments reported per read is small (close to 1).
114 These criteria are generally satisfied in the context of modern
115 short-read analyses such as RNA-seq, ChIP-seq, other types of -seq, and
116 mammalian resequencing. You may observe longer running times in other
119 If `bowtie` is too slow for your application, try some of the
120 performance-tuning hints described in the [Performance Tuning] section
123 Alignments involving one or more ambiguous reference characters (`N`,
124 `-`, `R`, `Y`, etc.) are considered invalid by Bowtie. This is true
125 only for ambiguous characters in the reference; alignments involving
126 ambiguous characters in the read are legal, subject to the alignment
127 policy. Ambiguous characters in the read mismatch all other
128 characters. Alignments that "fall off" the reference sequence are not
131 The process by which `bowtie` chooses an alignment to report is
132 randomized in order to avoid "mapping bias" - the phenomenon whereby
133 an aligner systematically fails to report a particular class of good
134 alignments, causing spurious "holes" in the comparative assembly.
135 Whenever `bowtie` reports a subset of the valid alignments that exist,
136 it makes an effort to sample them randomly. This randomness flows
137 from a simple seeded pseudo-random number generator and is
138 deterministic in the sense that Bowtie will always produce the same
139 results for the same read when run with the same initial "seed" value
140 (see `--seed` option).
142 In the default mode, `bowtie` can exhibit strand bias. Strand bias
143 occurs when input reference and reads are such that (a) some reads
144 align equally well to sites on the forward and reverse strands of the
145 reference, and (b) the number of such sites on one strand is different
146 from the number on the other strand. When this happens for a given
147 read, `bowtie` effectively chooses one strand or the other with 50%
148 probability, then reports a randomly-selected alignment for that read
149 from among the sites on the selected strand. This tends to overassign
150 alignments to the sites on the strand with fewer sites and underassign
151 to sites on the strand with more sites. The effect is mitigated,
152 though it may not be eliminated, when reads are longer or when
153 paired-end reads are used. Running Bowtie in `--best` mode
154 eliminates strand bias by forcing Bowtie to select one strand or the
155 other with a probability that is proportional to the number of best
158 Gapped alignments are not currently supported, but support is planned
159 for a future release.
161 [Maq]: http://maq.sf.net
163 The `-n` alignment mode
164 -----------------------
166 When the `-n` option is specified (which is the default), `bowtie`
167 determines which alignments are valid according to the following
168 policy, which is similar to [Maq]'s default policy.
170 1. Alignments may have no more than `N` mismatches (where `N` is a
171 number 0-3, set with `-n`) in the first `L` bases (where `L` is a
172 number 5 or greater, set with `-l`) on the high-quality (left) end
173 of the read. The first `L` bases are called the "seed".
175 2. The sum of the [Phred quality] values at *all* mismatched positions
176 (not just in the seed) may not exceed `E` (set with `-e`). Where
177 qualities are unavailable (e.g. if the reads are from a FASTA
178 file), the [Phred quality] defaults to 40.
180 The `-n` option is mutually exclusive with the `-v` option.
182 If there are many possible alignments satisfying these criteria, Bowtie
183 gives preference to alignments with fewer mismatches and where the sum
184 from criterion 2 is smaller. When the `--best` option is specified,
185 Bowtie guarantees the reported alignment(s) are "best" in terms of
186 these criteria (criterion 1 has priority), and that the alignments are
187 reported in best-to-worst order. Bowtie is somewhat slower when
188 `--best` is specified.
190 Note that [Maq] internally rounds base qualities to the nearest 10 and
191 rounds qualities greater than 30 to 30. To maintain compatibility,
192 Bowtie does the same. Rounding can be suppressed with the
193 `--nomaqround` option.
195 Bowtie is not fully sensitive in `-n` 2 and `-n` 3 modes by default.
196 In these modes Bowtie imposes a "backtracking limit" to limit effort
197 spent trying to find valid alignments for low-quality reads unlikely to
198 have any. This may cause bowtie to miss some legal 2- and 3-mismatch
199 alignments. The limit is set to a reasonable default (125 without
200 `--best`, 800 with `--best`), but the user may decrease or increase the
201 limit using the `--maxbts` and/or `-y` options. `-y` mode is
202 relatively slow but guarantees full sensitivity.
204 [Maq]: http://maq.sf.net
205 [Phred quality]: http://en.wikipedia.org/wiki/FASTQ_format#Variations
207 The `-v` alignment mode
208 -----------------------
210 In `-v` mode, alignments may have no more than `V` mismatches, where
211 `V` may be a number from 0 through 3 set using the `-v` option.
212 Quality values are ignored. The `-v` option is mutually exclusive with
215 If there are many legal alignments, Bowtie gives preference to
216 alignments with fewer mismatches. When the `--best` option is
217 specified, Bowtie guarantees the reported alignment(s) are "best" in
218 terms of the number of mismatches, and that the alignments are reported
219 in best-to-worst order. Bowtie is somewhat slower when `--best` is
225 In [the -n alignment mode], an alignment's "stratum" is defined as the
226 number of mismatches in the "seed" region, i.e. the leftmost `L` bases,
227 where `L` is set with the `-l` option. In [the -v alignment mode], an
228 alignment's stratum is defined as the total number of mismatches in the
229 entire alignment. Some of Bowtie's options (e.g. `--strata` and `-m`
230 use the notion of "stratum" to limit or expand the scope of reportable
236 With the `-k`, `-a`, `-m`, `-M`, `--best` and `--strata` options, the
237 user can flexibily select which alignments are reported. Below we
238 demonstrate a few ways in which these options can be combined. All
239 examples are using the `e_coli` index packaged with Bowtie. The
240 `--suppress` option is used to keep the output concise and some
241 output is elided for clarity.
245 $ ./bowtie -a -v 2 e_coli --suppress 1,5,6,7 -c ATGCATCATGCGCCAT
246 - gi|110640213|ref|NC_008253.1| 148810 10:A>G,13:C>G
247 - gi|110640213|ref|NC_008253.1| 2852852 8:T>A
248 - gi|110640213|ref|NC_008253.1| 4930433 4:G>T,6:C>G
249 - gi|110640213|ref|NC_008253.1| 905664 6:A>G,7:G>T
250 + gi|110640213|ref|NC_008253.1| 1093035 2:T>G,15:A>T
252 Specifying `-a` instructs bowtie to report *all* valid alignments,
253 subject to the alignment policy: `-v` 2. In this case, bowtie finds
254 5 inexact hits in the E. coli genome; 1 hit (the 2nd one listed)
255 has 1 mismatch, and the other 4 hits have 2 mismatches. Four are on
256 the reverse reference strand and one is on the forward strand. Note
257 that they are not listed in best-to-worst order.
261 $ ./bowtie -k 3 -v 2 e_coli --suppress 1,5,6,7 -c ATGCATCATGCGCCAT
262 - gi|110640213|ref|NC_008253.1| 148810 10:A>G,13:C>G
263 - gi|110640213|ref|NC_008253.1| 2852852 8:T>A
264 - gi|110640213|ref|NC_008253.1| 4930433 4:G>T,6:C>G
266 Specifying `-k` 3 instructs bowtie to report up to 3 valid
267 alignments. In this case, a total of 5 valid alignments exist (see
268 [Example 1]); `bowtie` reports 3 out of those 5. `-k` can be set to
269 any integer greater than 0.
273 $ ./bowtie -k 6 -v 2 e_coli --suppress 1,5,6,7 -c ATGCATCATGCGCCAT
274 - gi|110640213|ref|NC_008253.1| 148810 10:A>G,13:C>G
275 - gi|110640213|ref|NC_008253.1| 2852852 8:T>A
276 - gi|110640213|ref|NC_008253.1| 4930433 4:G>T,6:C>G
277 - gi|110640213|ref|NC_008253.1| 905664 6:A>G,7:G>T
278 + gi|110640213|ref|NC_008253.1| 1093035 2:T>G,15:A>T
280 Specifying `-k` 6 instructs bowtie to report up to 6 valid
281 alignments. In this case, a total of 5 valid alignments exist, so
282 `bowtie` reports all 5.
284 Example 4: default (`-k 1`)
286 $ ./bowtie -v 2 e_coli --suppress 1,5,6,7 -c ATGCATCATGCGCCAT
287 - gi|110640213|ref|NC_008253.1| 148810 10:A>G,13:C>G
289 Leaving the reporting options at their defaults causes `bowtie` to
290 report the first valid alignment it encounters. Because `--best` was
291 not specified, we are not guaranteed that bowtie will report the best
292 alignment, and in this case it does not (the 1-mismatch alignment from
293 the previous example would have been better). The default reporting
294 mode is equivalent to `-k` 1.
296 Example 5: `-a --best`
298 $ ./bowtie -a --best -v 2 e_coli --suppress 1,5,6,7 -c ATGCATCATGCGCCAT
299 - gi|110640213|ref|NC_008253.1| 2852852 8:T>A
300 + gi|110640213|ref|NC_008253.1| 1093035 2:T>G,15:A>T
301 - gi|110640213|ref|NC_008253.1| 905664 6:A>G,7:G>T
302 - gi|110640213|ref|NC_008253.1| 148810 10:A>G,13:C>G
303 - gi|110640213|ref|NC_008253.1| 4930433 4:G>T,6:C>G
305 Specifying `-a` `--best` results in the same alignments being printed
306 as if just `-a` had been specified, but they are guaranteed to be
307 reported in best-to-worst order.
309 Example 6: `-a --best --strata`
311 $ ./bowtie -a --best --strata -v 2 --suppress 1,5,6,7 e_coli -c ATGCATCATGCGCCAT
312 - gi|110640213|ref|NC_008253.1| 2852852 8:T>A
314 Specifying `--strata` in addition to `-a` and `--best` causes
315 `bowtie` to report only those alignments in the best alignment
316 "stratum". The alignments in the best stratum are those having the
317 least number of mismatches (or mismatches just in the "seed" portion of
318 the alignment in the case of `-n` mode). Note that if `--strata`
319 is specified, `--best` must also be specified.
323 $ ./bowtie -a -m 3 -v 2 e_coli -c ATGCATCATGCGCCAT
326 Specifying `-m` 3 instructs bowtie to refrain from reporting any
327 alignments for reads having more than 3 reportable alignments. The
328 `-m` option is useful when the user would like to guarantee that
329 reported alignments are "unique", for some definition of unique.
331 Example 1 showed that the read has 5 reportable alignments when `-a`
332 and `-v` 2 are specified, so the `-m` 3 limit causes bowtie to
333 output no alignments.
337 $ ./bowtie -a -m 5 -v 2 e_coli --suppress 1,5,6,7 -c ATGCATCATGCGCCAT
338 - gi|110640213|ref|NC_008253.1| 148810 10:A>G,13:C>G
339 - gi|110640213|ref|NC_008253.1| 2852852 8:T>A
340 - gi|110640213|ref|NC_008253.1| 4930433 4:G>T,6:C>G
341 - gi|110640213|ref|NC_008253.1| 905664 6:A>G,7:G>T
342 + gi|110640213|ref|NC_008253.1| 1093035 2:T>G,15:A>T
344 Specifying `-m` 5 instructs bowtie to refrain from reporting any
345 alignments for reads having more than 5 reportable alignments. Since
346 the read has exactly 5 reportable alignments, the `-m` 5 limit allows
347 `bowtie` to print them as usual.
349 Example 9: `-a -m 3 --best --strata`
351 $ ./bowtie -a -m 3 --best --strata -v 2 e_coli --suppress 1,5,6,7 -c ATGCATCATGCGCCAT
352 - gi|110640213|ref|NC_008253.1| 2852852 8:T>A
354 Specifying `-m` 3 instructs bowtie to refrain from reporting any
355 alignments for reads having more than 3 reportable alignments. As we
356 saw in Example 6, the read has only 1 reportable alignment when `-a`,
357 `--best` and `--strata` are specified, so the `-m` 3 limit allows
358 `bowtie` to print that alignment as usual.
360 Intuitively, the `-m` option, when combined with the `--best` and
361 `--strata` options, guarantees a principled, though weaker form of
362 "uniqueness." A stronger form of uniqueness is enforced when `-m` is
363 specified but `--best` and `--strata` are not.
368 `bowtie` can align paired-end reads when properly paired read files are
369 specified using the `-1` and `-2` options (for pairs of raw, FASTA, or
370 FASTQ read files), or using the `--12` option (for Tab-delimited read
371 files). A valid paired-end alignment satisfies these criteria:
373 1. Both mates have a valid alignment according to the alignment policy
374 defined by the `-v`/`-n`/`-e`/`-l` options.
375 2. The relative orientation and position of the mates satisfy the
376 constraints defined by the `-I`/`-X`/`--fr`/`--rf`/`--ff`
379 Policies governing which paired-end alignments are reported for a
380 given read are specified using the `-k`, `-a` and `-m` options as
381 usual. The `--strata` and `--best` options do not apply in
384 A paired-end alignment is reported as a pair of mate alignments, both
385 on a separate line, where the alignment for each mate is formatted the
386 same as an unpaired (singleton) alignment. The alignment for the mate
387 that occurs closest to the beginning of the reference sequence (the
388 "upstream" mate) is always printed before the alignment for the
389 downstream mate. Reads files containing paired-end reads will
390 sometimes name the reads according to whether they are the #1 or #2
391 mates by appending a `/1` or `/2` suffix to the read name. If no such
392 suffix is present in Bowtie's input, the suffix will be added when
393 Bowtie prints read names in alignments (except in `-S` "SAM" mode,
394 where mate information is encoded in the `FLAGS` field instead).
396 Finding a valid paired-end alignment where both mates align to
397 repetitive regions of the reference can be very time-consuming. By
398 default, Bowtie avoids much of this cost by imposing a limit on the
399 number of "tries" it makes to match an alignment for one mate with a
400 nearby alignment for the other. The default limit is 100. This causes
401 `bowtie` to miss some valid paired-end alignments where both mates lie
402 in repetitive regions, but the user may use the `--pairtries` or
403 `-y` options to increase Bowtie's sensitivity as desired.
405 Paired-end alignments where one mate's alignment is entirely contained
406 within the other's are considered invalid.
408 When colospace alignment is enabled via `-C`, the default setting for
409 paired-end orientation is `--ff`. This is because most SOLiD datasets
410 have that orientation. When colorspace alignment is not enabled
411 (default), the default setting for orientation is `--fr`, since most
412 Illumina datasets have this orientation. The default can be overriden
415 Because Bowtie uses an in-memory representation of the original
416 reference string when finding paired-end alignments, its memory
417 footprint is larger when aligning paired-end reads. For example, the
418 human index has a memory footprint of about 2.2 GB in single-end mode
419 and 2.9 GB in paired-end mode. Note that paired-end and unpaired
420 alignment incur the same memory footprint in colorspace (e.g. human
426 As of version 0.12.0, `bowtie` can align colorspace reads against a
427 colorspace index when `-C` is specified. Colorspace is the
428 characteristic output format of Applied Biosystems' SOLiD system. In a
429 colorspace read, each character is a color rather than a nucleotide,
430 where a color encodes a class of dinucleotides. E.g. the color blue
431 encodes any of the dinucleotides: AA, CC, GG, TT. Colorspace has the
432 advantage of (often) being able to distinguish sequencing errors from
433 SNPs once the read has been aligned. See ABI's [Principles of Di-Base
434 Sequencing] document for details.
438 All input formats (FASTA `-f`, FASTQ `-q`, raw `-r`, tab-delimited
439 `--12`, command-line `-c`) are compatible with colorspace (`-C`).
440 When `-C` is specified, read sequences are treated as colors. Colors
441 may be encoded either as numbers (`0`=blue, `1`=green, `2`=orange,
442 `3`=red) or as characters `A/C/G/T` (`A`=blue, `C`=green, `G`=orange,
445 Some reads include a primer base as the first character; e.g.:
448 T2213120002010301233221223311331
450 T2302111203131231130300111123220
453 Here, `T` is the primer base. `bowtie` detects and handles primer
454 bases properly (i.e., the primer base and the adjacent color are both
455 trimmed away prior to alignment) as long as the rest of the read is
458 `bowtie` also handles input in the form of parallel `.csfasta` and
459 `_QV.qual` files. Use `-f` to specify the `.csfasta` files and `-Q`
460 (for unpaired reads) or `--Q1`/`--Q2` (for paired-end reads) to
461 specify the corresponding `_QV.qual` files. It is not necessary to
462 first convert to FASTQ, though `bowtie` also handles FASTQ-formatted
463 colorspace reads (with `-q`, the default).
465 Building a colorspace index
467 A colorspace index is built in the same way as a normal index except
468 that `-C` must be specified when running `bowtie-build`. If the user
469 attempts to use `bowtie` without `-C` to align against an index that
470 was built with `-C` (or vice versa), `bowtie` prints an error message
473 Decoding colorspace alignments
475 Once a colorspace read is aligned, Bowtie decodes the alignment into
476 nucleotides and reports the decoded nucleotide sequence. A principled
477 decoding scheme is necessary because many different possible decodings
478 are usually possible. Finding the true decoding with 100% certainty
479 requires knowing all variants (e.g. SNPs) in the subject's genome
480 beforehand, which is usually not possible. Instead, `bowtie` employs
481 the approximate decoding scheme described in the [BWA paper]. This
482 scheme attempts to distinguish variants from sequencing errors
483 according to their relative likelihood under a model that considers the
484 quality values of the colors and the (configurable) global likelihood
487 Quality values are also "decoded" so that each reported quality value
488 is a function of the two color qualities overlapping it. Bowtie again
489 adopts the scheme described in the [BWA paper], i.e., the decoded
490 nucleotide quality is either the sum of the overlapping color qualities
491 (when both overlapping colors correspond to bases that match in the
492 alignment), the quality of the matching color minus the quality of the
493 mismatching color, or 0 (when both overlapping colors correspond to
496 For accurate decoding, `--snpphred`/`--snpfrac` should be set according
497 to the user's best guess of the SNP frequency in the subject. The
498 `--snpphred` parameter sets the SNP penalty directly (on the [Phred
499 quality] scale), whereas `--snpfrac` allows the user to specify the
500 fraction of sites expected to be SNPs; the fraction is then converted
501 to a [Phred quality] internally. For the purpose of decoding, the SNP
502 fraction is defined in terms of SNPs per *haplotype* base. Thus, if
503 the genome is diploid, heterozygous SNPs have half the weight of
506 Note that in `-S`/`--sam` mode, the decoded nucleotide sequence is
507 printed for alignments, but the original color sequence (with `A`=blue,
508 `C`=green, `G`=orange, `T`=red) is printed for unaligned reads without
509 any reported alignments. As always, the `--un`, `--max` and `--al`
510 parameters print reads exactly as they appeared in the input file.
512 Paired-end colorspace alignment
514 Like other platforms, SOLiD supports generation of paired-end reads.
515 When colorspace alignment is enabled, the default paired-end
516 orientation setting is `--ff`. This is because most SOLiD datasets
517 have that orientation.
519 Note that SOLiD-generated read files can have "orphaned" mates; i.e.
520 mates without a correpsondingly-named mate in the other file. To avoid
521 problems due to orphaned mates, SOLiD paired-end output should first be
522 converted to `.csfastq` files with unpaired mates omitted. This can be
523 accomplished using, for example, [Galaxy]'s conversion tool (click
524 "NGS: QC and manipulation", then "SOLiD-to-FASTQ" in the left-hand
527 [Principles of Di-Base Sequencing]: http://tinyurl.com/ygnb2gn
528 [BWA paper]: http://bioinformatics.oxfordjournals.org/cgi/content/abstract/25/14/1754
533 1. Use 64-bit bowtie if possible
535 The 64-bit version of Bowtie is substantially (usually more then
536 50%) faster than the 32-bit version, owing to its use of 64-bit
537 arithmetic. If possible, download the 64-bit binaries for Bowtie
538 and run on a 64-bit computer. If you are building Bowtie from
539 sources, you may need to pass the `-m64` option to `g++` to compile
540 the 64-bit version; you can do this by including `BITS=64` in the
541 arguments to the `make` command; e.g.: `make BITS=64 bowtie`. To
542 determine whether your version of bowtie is 64-bit or 32-bit, run
545 2. If your computer has multiple processors/cores, use `-p`
547 The `-p` option causes Bowtie to launch a specified number of
548 parallel search threads. Each thread runs on a different
549 processor/core and all threads find alignments in parallel,
550 increasing alignment throughput by approximately a multiple of the
551 number of threads (though in practice, speedup is somewhat worse
554 3. If reporting many alignments per read, try tweaking
555 `bowtie-build --offrate`
557 If you are using the `-k`, `-a` or `-m` options and Bowtie is
558 reporting many alignments per read (an average of more than about
559 10 per read) and you have some memory to spare, using an index with
560 a denser SA sample can speed things up considerably.
562 To do this, specify a smaller-than-default `-o`/`--offrate` value
563 when running `bowtie-build`. A denser SA sample yields a larger
564 index, but is also particularly effective at speeding up alignment
565 when many alignments are reported per read. For example,
566 decreasing the index's `-o`/`--offrate` by 1 could as much as
567 double alignment performance, and decreasing by 2 could quadruple
568 alignment performance, etc.
570 On the other hand, decreasing `-o`/`--offrate` increases the size
571 of the Bowtie index, both on disk and in memory when aligning
572 reads. At the default `-o`/`--offrate` of 5, the SA sample for the
573 human genome occupies about 375 MB of memory when aligning reads.
574 Decreasing the `-o`/`--offrate` by 1 doubles the memory taken by
575 the SA sample, and decreasing by 2 quadruples the memory taken,
578 4. If bowtie "thrashes", try increasing `bowtie --offrate`
580 If `bowtie` runs very slow on a relatively low-memory machine
581 (having less than about 4 GB of memory), then try setting `bowtie`
582 `-o`/`--offrate` to a *larger* value than the value used to build
583 the index. For example, `bowtie-build`'s default `-o`/`--offrate`
584 is 5 and all pre-built indexes available from the Bowtie website
585 are built with `-o`/`--offrate` 5; so if `bowtie` thrashes when
586 querying such an index, try using `bowtie` `--offrate` 6. If
587 `bowtie` still thrashes, try `bowtie` `--offrate` 7, etc. A higher
588 `-o`/`--offrate` causes `bowtie` to use a sparser sample of the
589 suffix array than is stored in the index; this saves memory but
590 makes alignment reporting slower (which is especially slow when
591 using `-a` or large `-k` or `-m`).
598 bowtie [options]* <ebwt> {-1 <m1> -2 <m2> | --12 <r> | <s>} [<hit>]
604 The basename of the index to be searched. The basename is the name of
605 any of the index files up to but not including the final `.1.ebwt` /
606 `.rev.1.ebwt` / etc. `bowtie` looks for the specified index first in
607 the current directory, then in the `indexes` subdirectory under the
608 directory where the `bowtie` executable is located, then looks in the
609 directory specified in the `BOWTIE_INDEXES` environment variable.
613 Comma-separated list of files containing the #1 mates (filename usually
614 includes `_1`), or, if `-c` is specified, the mate sequences
615 themselves. E.g., this might be `flyA_1.fq,flyB_1.fq`, or, if `-c`
616 is specified, this might be `GGTCATCCT,ACGGGTCGT`. Sequences specified
617 with this option must correspond file-for-file and read-for-read with
618 those specified in `<m2>`. Reads may be a mix of different lengths.
619 If `-` is specified, `bowtie` will read the #1 mates from the "standard
624 Comma-separated list of files containing the #2 mates (filename usually
625 includes `_2`), or, if `-c` is specified, the mate sequences
626 themselves. E.g., this might be `flyA_2.fq,flyB_2.fq`, or, if `-c`
627 is specified, this might be `GGTCATCCT,ACGGGTCGT`. Sequences specified
628 with this option must correspond file-for-file and read-for-read with
629 those specified in `<m1>`. Reads may be a mix of different lengths.
630 If `-` is specified, `bowtie` will read the #2 mates from the "standard
635 Comma-separated list of files containing a mix of unpaired and
636 paired-end reads in Tab-delimited format. Tab-delimited format is a
637 1-read-per-line format where unpaired reads consist of a read name,
638 sequence and quality string each separated by tabs. A paired-end read
639 consists of a read name, sequnce of the #1 mate, quality values of the
640 #1 mate, sequence of the #2 mate, and quality values of the #2 mate
641 separated by tabs. Quality values can be expressed using any of the
642 scales supported in FASTQ files. Reads may be a mix of different
643 lengths and paired-end and unpaired reads may be intermingled in the
644 same file. If `-` is specified, `bowtie` will read the Tab-delimited
645 reads from the "standard in" filehandle.
649 A comma-separated list of files containing unpaired reads to be
650 aligned, or, if `-c` is specified, the unpaired read sequences
651 themselves. E.g., this might be
652 `lane1.fq,lane2.fq,lane3.fq,lane4.fq`, or, if `-c` is specified, this
653 might be `GGTCATCCT,ACGGGTCGT`. Reads may be a mix of different
654 lengths. If `-` is specified, Bowtie gets the reads from the "standard
659 File to write alignments to. By default, alignments are written to the
660 "standard out" filehandle (i.e. the console).
668 The query input files (specified either as `<m1>` and `<m2>`, or as
669 `<s>`) are FASTQ files (usually having extension `.fq` or `.fastq`).
670 This is the default. See also: `--solexa-quals` and
675 The query input files (specified either as `<m1>` and `<m2>`, or as
676 `<s>`) are FASTA files (usually having extension `.fa`, `.mfa`, `.fna`
677 or similar). All quality values are assumed to be 40 on the [Phred
682 The query input files (specified either as `<m1>` and `<m2>`, or as
683 `<s>`) are Raw files: one sequence per line, without quality values or
684 names. All quality values are assumed to be 40 on the [Phred quality]
689 The query sequences are given on command line. I.e. `<m1>`, `<m2>` and
690 `<singles>` are comma-separated lists of reads rather than lists of
695 Align in colorspace. Read characters are interpreted as colors. The
696 index specified must be a colorspace index (i.e. built with
697 `bowtie-build` `-C`, or `bowtie` will print an error message and quit.
698 See [Colorspace alignment] for more details.
702 Comma-separated list of files containing quality values for
703 corresponding unpaired CSFASTA reads. Use in combination with `-C`
704 and `-f`. `--integer-quals` is set automatically when `-Q`/`--quals`
709 Comma-separated list of files containing quality values for
710 corresponding CSFASTA #1 mates. Use in combination with `-C`, `-f`,
711 and `-1`. `--integer-quals` is set automatically when `--Q1`
716 Comma-separated list of files containing quality values for
717 corresponding CSFASTA #2 mates. Use in combination with `-C`, `-f`,
718 and `-2`. `--integer-quals` is set automatically when `--Q2`
723 Skip (i.e. do not align) the first `<int>` reads or pairs in the input.
727 Only align the first `<int>` reads or read pairs from the input (after
728 the `-s`/`--skip` reads or pairs have been skipped). Default: no
733 Trim `<int>` bases from high-quality (left) end of each read before
734 alignment (default: 0).
738 Trim `<int>` bases from low-quality (right) end of each read before
739 alignment (default: 0).
743 Input qualities are ASCII chars equal to the [Phred quality] plus 33.
748 Input qualities are ASCII chars equal to the [Phred quality] plus 64.
753 Convert input qualities from [Solexa][Phred quality] (which can be
754 negative) to [Phred][Phred quality] (which can't). This is usually the
755 right option for use with (unconverted) reads emitted by GA Pipeline
756 versions prior to 1.3. Default: off.
760 Same as `--phred64-quals`. This is usually the right option for use
761 with (unconverted) reads emitted by GA Pipeline version 1.3 or later.
766 Quality values are represented in the read input file as
767 space-separated ASCII integers, e.g., `40 40 30 40`..., rather than
768 ASCII characters, e.g., `II?I`.... Integers are treated as being on
769 the [Phred quality] scale unless `--solexa-quals` is also specified.
776 Report alignments with at most `<int>` mismatches. `-e` and `-l`
777 options are ignored and quality values have no effect on what
778 alignments are valid. `-v` is mutually exclusive with `-n`.
782 Maximum number of mismatches permitted in the "seed", i.e. the first
783 `L` base pairs of the read (where `L` is set with `-l`/`--seedlen`).
784 This may be 0, 1, 2 or 3 and the default is 2. This option is mutually
785 exclusive with the `-v` option.
789 Maximum permitted total of quality values at *all* mismatched read
790 positions throughout the entire alignment, not just in the "seed". The
791 default is 70. Like [Maq], `bowtie` rounds quality values to the
792 nearest 10 and saturates at 30; rounding can be disabled with
797 The "seed length"; i.e., the number of bases on the high-quality end of
798 the read to which the `-n` ceiling applies. The lowest permitted
799 setting is 5 and the default is 28. `bowtie` is faster for larger
804 [Maq] accepts quality values in the [Phred quality] scale, but
805 internally rounds values to the nearest 10, with a maximum of 30. By
806 default, `bowtie` also rounds this way. `--nomaqround` prevents this
807 rounding in `bowtie`.
811 The minimum insert size for valid paired-end alignments. E.g. if `-I
812 60` is specified and a paired-end alignment consists of two 20-bp
813 alignments in the appropriate orientation with a 20-bp gap between
814 them, that alignment is considered valid (as long as `-X` is also
815 satisfied). A 19-bp gap would not be valid in that case. If trimming
816 options `-3` or `-5` are also used, the `-I` constraint is
817 applied with respect to the untrimmed mates. Default: 0.
821 The maximum insert size for valid paired-end alignments. E.g. if `-X
822 100` is specified and a paired-end alignment consists of two 20-bp
823 alignments in the proper orientation with a 60-bp gap between them,
824 that alignment is considered valid (as long as `-I` is also
825 satisfied). A 61-bp gap would not be valid in that case. If trimming
826 options `-3` or `-5` are also used, the `-X` constraint is applied
827 with respect to the untrimmed mates, not the trimmed mates. Default:
832 The upstream/downstream mate orientations for a valid paired-end
833 alignment against the forward reference strand. E.g., if `--fr` is
834 specified and there is a candidate paired-end alignment where mate1
835 appears upstream of the reverse complement of mate2 and the insert
836 length constraints are met, that alignment is valid. Also, if mate2
837 appears upstream of the reverse complement of mate1 and all other
838 constraints are met, that too is valid. `--rf` likewise requires that
839 an upstream mate1 be reverse-complemented and a downstream mate2 be
840 forward-oriented. ` --ff` requires both an upstream mate1 and a
841 downstream mate2 to be forward-oriented. Default: `--fr` when `-C`
842 (colorspace alignment) is not specified, `--ff` when `-C` is specified.
846 If `--nofw` is specified, `bowtie` will not attempt to align against
847 the forward reference strand. If `--norc` is specified, `bowtie` will
848 not attempt to align against the reverse-complement reference strand.
849 For paired-end reads using `--fr` or `--rf` modes, `--nofw` and
850 `--norc` apply to the forward and reverse-complement pair orientations.
851 I.e. specifying `--nofw` and `--fr` will only find reads in the R/F
852 orientation where mate 2 occurs upstream of mate 1 with respect to the
853 forward reference strand.
857 The maximum number of backtracks permitted when aligning a read in
858 `-n` 2 or `-n` 3 mode (default: 125 without `--best`, 800 with
859 `--best`). A "backtrack" is the introduction of a speculative
860 substitution into the alignment. Without this limit, the default
861 parameters will sometimes require that `bowtie` try 100s or 1,000s of
862 backtracks to align a read, especially if the read has many low-quality
863 bases and/or has no valid alignments, slowing bowtie down
864 significantly. However, this limit may cause some valid alignments to
865 be missed. Higher limits yield greater sensitivity at the expensive of
866 longer running times. See also: `-y`/`--tryhard`.
870 For paired-end alignment, this is the maximum number of attempts
871 `bowtie` will make to match an alignment for one mate up with an
872 alignment for the opposite mate. Most paired-end alignments require
873 only a few such attempts, but pairs where both mates occur in highly
874 repetitive regions of the reference can require significantly more.
875 Setting this to a higher number allows `bowtie` to find more paired-
876 end alignments for repetitive pairs at the expense of speed. The
877 default is 100. See also: `-y`/`--tryhard`.
881 Try as hard as possible to find valid alignments when they exist,
882 including paired-end alignments. This is equivalent to specifying very
883 high values for the `--maxbts` and `--pairtries` options. This
884 mode is generally much slower than the default settings, but can be
885 useful for certain problems. This mode is slower when (a) the
886 reference is very repetitive, (b) the reads are low quality, or (c) not
887 many reads have valid alignments.
891 The number of megabytes of memory a given thread is given to store path
892 descriptors in `--best` mode. Best-first search must keep track of
893 many paths at once to ensure it is always extending the path with the
894 lowest cumulative cost. Bowtie tries to minimize the memory impact of
895 the descriptors, but they can still grow very large in some cases. If
896 you receive an error message saying that chunk memory has been
897 exhausted in `--best` mode, try adjusting this parameter up to
898 dedicate more memory to the descriptors. Default: 64.
904 Report up to `<int>` valid alignments per read or pair (default: 1).
905 Validity of alignments is determined by the alignment policy (combined
906 effects of `-n`, `-v`, `-l`, and `-e`). If more than one valid
907 alignment exists and the `--best` and `--strata` options are
908 specified, then only those alignments belonging to the best alignment
909 "stratum" will be reported. Bowtie is designed to be very fast for
910 small `-k` but bowtie can become significantly slower as `-k`
911 increases. If you would like to use Bowtie for larger values of
912 `-k`, consider building an index with a denser suffix-array sample,
913 i.e. specify a smaller `-o`/`--offrate` when invoking `bowtie-build`
914 for the relevant index (see the [Performance tuning] section for
919 Report all valid alignments per read or pair (default: off). Validity
920 of alignments is determined by the alignment policy (combined effects
921 of `-n`, `-v`, `-l`, and `-e`). If more than one valid alignment
922 exists and the `--best` and `--strata` options are specified, then only
923 those alignments belonging to the best alignment "stratum" will be
924 reported. Bowtie is designed to be very fast for small `-k` but bowtie
925 can become significantly slower if `-a`/`--all` is specified. If you
926 would like to use Bowtie with `-a`, consider building an index with a
927 denser suffix-array sample, i.e. specify a smaller `-o`/`--offrate`
928 when invoking `bowtie-build` for the relevant index (see the
929 [Performance tuning] section for details).
933 Suppress all alignments for a particular read or pair if more than
934 `<int>` reportable alignments exist for it. Reportable alignments are
935 those that would be reported given the `-n`, `-v`, `-l`, `-e`, `-k`,
936 `-a`, `--best`, and `--strata` options. Default: no limit. Bowtie is
937 designed to be very fast for small `-m` but bowtie can become
938 significantly slower for larger values of `-m`. If you would like to
939 use Bowtie for larger values of `-k`, consider building an index with a
940 denser suffix-array sample, i.e. specify a smaller `-o`/`--offrate` when
941 invoking `bowtie-build` for the relevant index (see the [Performance
942 tuning] section for details).
946 Behaves like `-m` except that if a read has more than `<int>`
947 reportable alignments, one is reported at random. In [default
948 output mode], the selected alignment's 7th column is set to `<int>`+1 to
949 indicate the read has at least `<int>`+1 valid alignments. In
950 `-S`/`--sam` mode, the selected alignment is given a `MAPQ` (mapping
951 quality) of 0 and the `XM:I` field is set to `<int>`+1. This option
952 requires `--best`; if specified without `--best`, `--best` is enabled
957 Make Bowtie guarantee that reported singleton alignments are "best" in
958 terms of stratum (i.e. number of mismatches, or mismatches in the seed
959 in the case of `-n` mode) and in terms of the quality values at the
960 mismatched position(s). Stratum always trumps quality; e.g. a
961 1-mismatch alignment where the mismatched position has [Phred quality]
962 40 is preferred over a 2-mismatch alignment where the mismatched
963 positions both have [Phred quality] 10. When `--best` is not
964 specified, Bowtie may report alignments that are sub-optimal in terms
965 of stratum and/or quality (though an effort is made to report the best
966 alignment). `--best` mode also removes all strand bias. Note that
967 `--best` does not affect which alignments are considered "valid" by
968 `bowtie`, only which valid alignments are reported by `bowtie`. When
969 `--best` is specified and multiple hits are allowed (via `-k` or
970 `-a`), the alignments for a given read are guaranteed to appear in
971 best-to-worst order in `bowtie`'s output. `bowtie` is somewhat slower
972 when `--best` is specified.
976 If many valid alignments exist and are reportable (e.g. are not
977 disallowed via the `-k` option) and they fall into more than one
978 alignment "stratum", report only those alignments that fall into the
979 best stratum. By default, Bowtie reports all reportable alignments
980 regardless of whether they fall into multiple strata. When
981 `--strata` is specified, `--best` must also be specified.
987 Print the amount of wall-clock time taken by each phase.
991 When outputting alignments, number the first base of a reference
992 sequence as `<int>`. Default: 0.
996 Print nothing besides alignments.
1000 Write alignments to a set of files named `refXXXXX.map`, where `XXXXX`
1001 is the 0-padded index of the reference sequence aligned to. This can
1002 be a useful way to break up work for downstream analyses when dealing
1003 with, for example, large numbers of reads aligned to the assembled
1004 human genome. If `<hits>` is also specified, it will be ignored.
1008 When a reference sequence is referred to in a reported alignment, refer
1009 to it by 0-based index (its offset into the list of references that
1010 were indexed) rather than by name.
1014 Write all reads for which at least one alignment was reported to a file
1015 with name `<filename>`. Written reads will appear as they did in the
1016 input, without any of the trimming or translation of quality values
1017 that may have taken place within `bowtie`. Paired-end reads will be
1018 written to two parallel files with `_1` and `_2` inserted in the
1019 filename, e.g., if `<filename>` is `aligned.fq`, the #1 and #2 mates
1020 that fail to align will be written to `aligned_1.fq` and `aligned_2.fq`
1025 Write all reads that could not be aligned to a file with name
1026 `<filename>`. Written reads will appear as they did in the input,
1027 without any of the trimming or translation of quality values that may
1028 have taken place within Bowtie. Paired-end reads will be written to
1029 two parallel files with `_1` and `_2` inserted in the filename, e.g.,
1030 if `<filename>` is `unaligned.fq`, the #1 and #2 mates that fail to
1031 align will be written to `unaligned_1.fq` and `unaligned_2.fq`
1032 respectively. Unless `--max` is also specified, reads with a number
1033 of valid alignments exceeding the limit set with the `-m` option are
1034 also written to `<filename>`.
1038 Write all reads with a number of valid alignments exceeding the limit
1039 set with the `-m` option to a file with name `<filename>`. Written
1040 reads will appear as they did in the input, without any of the trimming
1041 or translation of quality values that may have taken place within
1042 `bowtie`. Paired-end reads will be written to two parallel files with
1043 `_1` and `_2` inserted in the filename, e.g., if `<filename>` is
1044 `max.fq`, the #1 and #2 mates that exceed the `-m` limit will be
1045 written to `max_1.fq` and `max_2.fq` respectively. These reads are not
1046 written to the file specified with `--un`.
1050 Suppress columns of output in the [default output mode]. E.g. if
1051 `--suppress 1,5,6` is specified, the read name, read sequence, and read
1052 quality fields will be omitted. See [Default Bowtie output] for field
1053 descriptions. This option is ignored if the output mode is
1058 Print the full refernce sequence name, including whitespace, in
1059 alignment output. By default `bowtie` prints everything up to but not
1060 including the first whitespace.
1066 When decoding colorspace alignments, use `<int>` as the SNP penalty.
1067 This should be set to the user's best guess of the true ratio of SNPs
1068 per base in the subject genome, converted to the [Phred quality] scale.
1069 E.g., if the user expects about 1 SNP every 1,000 positions,
1070 `--snpphred` should be set to 30 (which is also the default). To
1071 specify the fraction directly, use `--snpfrac`.
1075 When decoding colorspace alignments, use `<dec>` as the estimated ratio
1076 of SNPs per base. For best decoding results, this should be set to the
1077 user's best guess of the true ratio. `bowtie` internally converts the
1078 ratio to a [Phred quality], and behaves as if that quality had been set
1079 via the `--snpphred` option. Default: 0.001.
1083 If reads are in colorspace and the [default output mode] is active,
1084 `--col-cseq` causes the reads' color sequence to appear in the
1085 read-sequence column (column 5) instead of the decoded nucleotide
1086 sequence. See the [Decoding colorspace alignments] section for details
1087 about decoding. This option is ignored in `-S`/`--sam` mode.
1091 If reads are in colorspace and the [default output mode] is active,
1092 `--col-cqual` causes the reads' original (color) quality sequence to
1093 appear in the quality column (column 6) instead of the decoded
1094 qualities. See the [Colorspace alignment] section for details about
1095 decoding. This option is ignored in `-S`/`--sam` mode.
1099 When decoding colorpsace alignments, `bowtie` trims off a nucleotide
1100 and quality from the left and right edges of the alignment. This is
1101 because those nucleotides are supported by only one color, in contrast
1102 to the middle nucleotides which are supported by two. Specify
1103 `--col-keepends` to keep the extreme-end nucleotides and qualities.
1109 Print alignments in [SAM] format. See the [SAM output] section of the
1110 manual for details. To suppress all SAM headers, use `--sam-nohead`
1111 in addition to `-S/--sam`. To suppress just the `@SQ` headers (e.g. if
1112 the alignment is against a very large number of reference sequences),
1113 use `--sam-nosq` in addition to `-S/--sam`. `bowtie` does not write
1114 BAM files directly, but SAM output can be converted to BAM on the fly
1115 by piping `bowtie`'s output to `samtools view`. `-S`/`--sam` is not
1116 compatible with `--refout`.
1120 If an alignment is non-repetitive (according to `-m`, `--strata` and
1121 other options) set the `MAPQ` (mapping quality) field to this value.
1122 See the [SAM Spec][SAM] for details about the `MAPQ` field Default: 255.
1126 Suppress header lines (starting with `@`) when output is `-S`/`--sam`.
1127 This must be specified *in addition to* `-S`/`--sam`. `--sam-nohead`
1128 is ignored unless `-S`/`--sam` is also specified.
1132 Suppress `@SQ` header lines when output is `-S`/`--sam`. This must be
1133 specified *in addition to* `-S`/`--sam`. `--sam-nosq` is ignored
1134 unless `-S`/`--sam` is also specified.
1138 Add `<text>` (usually of the form `TAG:VAL`, e.g. `ID:IL7LANE2`) as a
1139 field on the `@RG` header line. Specify `--sam-RG` multiple times to
1140 set multiple fields. See the [SAM Spec][SAM] for details about what fields
1141 are legal. Note that, if any `@RG` fields are set using this option,
1142 the `ID` and `SM` fields must both be among them to make the `@RG` line
1143 legal according to the [SAM Spec][SAM]. `--sam-RG` is ignored unless
1144 `-S`/`--sam` is also specified.
1150 Override the offrate of the index with `<int>`. If `<int>` is greater
1151 than the offrate used to build the index, then some row markings are
1152 discarded when the index is read into memory. This reduces the memory
1153 footprint of the aligner but requires more time to calculate text
1154 offsets. `<int>` must be greater than the value used to build the
1159 Launch `<int>` parallel search threads (default: 1). Threads will run
1160 on separate processors/cores and synchronize when parsing reads and
1161 outputting alignments. Searching for alignments is highly parallel,
1162 and speedup is fairly close to linear. This option is only available
1163 if `bowtie` is linked with the `pthreads` library (i.e. if
1164 `BOWTIE_PTHREADS=0` is not specified at build time).
1168 Use memory-mapped I/O to load the index, rather than normal C file I/O.
1169 Memory-mapping the index allows many concurrent `bowtie` processes on
1170 the same computer to share the same memory image of the index (i.e. you
1171 pay the memory overhead just once). This facilitates memory-efficient
1172 parallelization of `bowtie` in situations where using `-p` is not
1177 Use shared memory to load the index, rather than normal C file I/O.
1178 Using shared memory allows many concurrent bowtie processes on the same
1179 computer to share the same memory image of the index (i.e. you pay the
1180 memory overhead just once). This facilitates memory-efficient
1181 parallelization of `bowtie` in situations where using `-p` is not
1182 desirable. Unlike `--mm`, `--shmem` installs the index into shared
1183 memory permanently, or until the user deletes the shared memory chunks
1184 manually. See your operating system documentation for details on how
1185 to manually list and remove shared memory chunks (on Linux and Mac OS
1186 X, these commands are `ipcs` and `ipcrm`). You may also need to
1187 increase your OS's maximum shared-memory chunk size to accomodate
1188 larger indexes; see your OS documentation.
1194 Use `<int>` as the seed for pseudo-random number generator.
1198 Print verbose output (for debugging).
1202 Print version information and quit.
1206 Print usage information and quit.
1208 Default `bowtie` output
1209 -----------------------
1211 `bowtie` outputs one alignment per line. Each line is a collection of
1212 8 fields separated by tabs; from left to right, the fields are:
1214 1. Name of read that aligned
1216 2. Reference strand aligned to, `+` for forward strand, `-` for
1219 3. Name of reference sequence where alignment occurs, or numeric ID if
1220 no name was provided
1222 4. 0-based offset into the forward reference strand where leftmost
1223 character of the alignment occurs
1225 5. Read sequence (reverse-complemented if orientation is `-`).
1227 If the read was in colorspace, then the sequence shown in this
1228 column is the sequence of *decoded nucleotides*, not the original
1229 colors. See the [Colorspace alignment] section for details about
1230 decoding. To display colors instead, use the `--col-cseq` option.
1232 6. ASCII-encoded read qualities (reversed if orientation is `-`). The
1233 encoded quality values are on the Phred scale and the encoding is
1234 ASCII-offset by 33 (ASCII char `!`).
1236 If the read was in colorspace, then the qualities shown in this
1237 column are the *decoded qualities*, not the original qualities.
1238 See the [Colorspace alignment] section for details about decoding.
1239 To display colors instead, use the `--col-cqual` option.
1241 7. If `-M` was specified and the prescribed ceiling was exceeded for
1242 this read, this column contains the value of the ceiling,
1243 indicating that at least that many valid alignments were found in
1244 addition to the one reported.
1246 Otherwise, this column contains the number of other instances where
1247 the same sequence aligned against the same reference characters as
1248 were aligned against in the reported alignment. This is *not* the
1249 number of other places the read aligns with the same number of
1250 mismatches. The number in this column is generally not a good
1251 proxy for that number (e.g., the number in this column may be '0'
1252 while the number of other alignments with the same number of
1253 mismatches might be large).
1255 8. Comma-separated list of mismatch descriptors. If there are no
1256 mismatches in the alignment, this field is empty. A single
1257 descriptor has the format offset:reference-base>read-base. The
1258 offset is expressed as a 0-based offset from the high-quality (5')
1264 Following is a brief description of the [SAM] format as output by
1265 `bowtie` when the `-S`/`--sam` option is specified. For more
1266 details, see the [SAM format specification][SAM].
1268 When `-S`/`--sam` is specified, `bowtie` prints a SAM header with
1269 `@HD`, `@SQ` and `@PG` lines. When one or more `--sam-RG` arguments
1270 are specified, `bowtie` will also print an `@RG` line that includes all
1271 user-specified `--sam-RG` tokens separated by tabs.
1273 Each subsequnt line corresponds to a read or an alignment. Each line
1274 is a collection of at least 12 fields separated by tabs; from left to
1275 right, the fields are:
1277 1. Name of read that aligned
1279 2. Sum of all applicable flags. Flags relevant to Bowtie are:
1283 The read is one of a pair
1287 The alignment is one end of a proper paired-end alignment
1291 The read has no reported alignments
1295 The read is one of a pair and has no reported alignments
1299 The alignment is to the reverse reference strand
1303 The other mate in the paired-end alignment is aligned to the
1304 reverse reference strand
1308 The read is the first mate in a pair
1312 The read is the second mate in a pair
1314 Thus, an unpaired read that aligns to the reverse reference strand
1315 will have flag 16. A paired-end read that aligns and is the first
1316 mate in the pair will have flag 83 (= 64 + 16 + 2 + 1).
1318 3. Name of reference sequence where alignment occurs, or ordinal ID
1319 if no name was provided
1321 4. 1-based offset into the forward reference strand where leftmost
1322 character of the alignment occurs
1326 6. CIGAR string representation of alignment
1328 7. Name of reference sequence where mate's alignment occurs. Set to
1329 `=` if the mate's reference sequence is the same as this
1330 alignment's, or `*` if there is no mate.
1332 8. 1-based offset into the forward reference strand where leftmost
1333 character of the mate's alignment occurs. Offset is 0 if there is
1336 9. Inferred insert size. Size is negative if the mate's alignment
1337 occurs upstream of this alignment. Size is 0 if there is no mate.
1339 10. Read sequence (reverse-complemented if aligned to the reverse
1342 11. ASCII-encoded read qualities (reverse-complemented if the read
1343 aligned to the reverse strand). The encoded quality values are on
1344 the [Phred quality] scale and the encoding is ASCII-offset by 33
1345 (ASCII char `!`), similarly to a [FASTQ] file.
1347 12. Optional fields. Fields are tab-separated. For descriptions of
1348 all possible optional fields, see the SAM format specification.
1349 `bowtie` outputs some of these optional fields for each alignment,
1350 depending on the type of the alignment:
1354 Aligned read has an edit distance of `<N>`.
1358 Aligned read has an edit distance of `<N>` in colorspace. This
1359 field is present in addition to the `NM` field in `-C`/`--color`
1360 mode, but is omitted otherwise.
1364 For aligned reads, `<S>` is a string representation of the
1365 mismatched reference bases in the alignment. See [SAM] format
1366 specification for details. For colorspace alignments, `<S>`
1367 describes the decoded *nucleotide* alignment, not the colorspace
1372 Aligned read belongs to stratum `<N>`. See [Strata] for definition.
1376 For a read with no reported alignments, `<N>` is 0 if the read had
1377 no alignments. If `-m` was specified and the read's alignments
1378 were supressed because the `-m` ceiling was exceeded, `<N>` equals
1379 the `-m` ceiling + 1, to indicate that there were at least that
1380 many valid alignments (but all were suppressed). In `-M` mode, if
1381 the alignment was randomly selected because the `-M` ceiling was
1382 exceeded, `<N>` equals the `-M` ceiling + 1, to indicate that there
1383 were at least that many valid alignments (of which one was reported
1386 [SAM format specification]: http://samtools.sf.net/SAM1.pdf
1387 [FASTQ]: http://en.wikipedia.org/wiki/FASTQ_format
1389 The `bowtie-build` indexer
1390 ==========================
1392 `bowtie-build` builds a Bowtie index from a set of DNA sequences.
1393 `bowtie-build` outputs a set of 6 files with suffixes
1394 `.1.ebwt`, `.2.ebwt`, `.3.ebwt`, `.4.ebwt`, `.rev.1.ebwt`, and
1395 `.rev.2.ebwt`. These files together constitute the index: they are all
1396 that is needed to align reads to that reference. The original sequence
1397 files are no longer used by Bowtie once the index is built.
1399 Use of Karkkainen's [blockwise algorithm] allows `bowtie-build` to
1400 trade off between running time and memory usage. `bowtie-build` has
1401 three options governing how it makes this trade: `-p`/`--packed`,
1402 `--bmax`/`--bmaxdivn`, and `--dcv`. By default, `bowtie-build` will
1403 automatically search for the settings that yield the best
1404 running time without exhausting memory. This behavior can be disabled
1405 using the `-a`/`--noauto` option.
1407 The indexer provides options pertaining to the "shape" of the index,
1408 e.g. `--offrate` governs the fraction of [Burrows-Wheeler] rows that
1409 are "marked" (i.e., the density of the suffix-array sample; see the
1410 original [FM Index] paper for details). All of these options are
1411 potentially profitable trade-offs depending on the application. They
1412 have been set to defaults that are reasonable for most cases according
1413 to our experiments. See [Performance Tuning] for details.
1415 Because `bowtie-build` uses 32-bit pointers internally, it can handle
1416 up to a theoretical maximum of 2^32-1 (somewhat more than 4 billion)
1417 characters in an index, though, with other constraints, the actual
1418 ceiling is somewhat less than that. If your reference exceeds 2^32-1
1419 characters, `bowtie-build` will print an error message and abort. To
1420 resolve this, divide your reference sequences into smaller batches
1421 and/or chunks and build a separate index for each.
1423 If your computer has more than 3-4 GB of memory and you would like to
1424 exploit that fact to make index building faster, use a 64-bit version
1425 of the `bowtie-build` binary. The 32-bit version of the binary is
1426 restricted to using less than 4 GB of memory. If a 64-bit pre-built
1427 binary does not yet exist for your platform on the sourceforge download
1428 site, you will need to build one from source.
1430 The Bowtie index is based on the [FM Index] of Ferragina and Manzini,
1431 which in turn is based on the [Burrows-Wheeler] transform. The
1432 algorithm used to build the index is based on the [blockwise algorithm]
1435 [Blockwise algorithm]: http://portal.acm.org/citation.cfm?id=1314852
1436 [FM Index]: http://portal.acm.org/citation.cfm?id=796543
1437 [Burrows-Wheeler]: http://en.wikipedia.org/wiki/Burrows-Wheeler_transform
1444 bowtie-build [options]* <reference_in> <ebwt_base>
1450 A comma-separated list of FASTA files containing the reference
1451 sequences to be aligned to, or, if `-c` is specified, the sequences
1452 themselves. E.g., `<reference_in>` might be
1453 `chr1.fa,chr2.fa,chrX.fa,chrY.fa`, or, if `-c` is specified, this might
1454 be `GGTCATCCT,ACGGGTCGT,CCGTTCTATGCGGCTTA`.
1458 The basename of the index files to write. By default, `bowtie-build`
1459 writes files named `NAME.1.ebwt`, `NAME.2.ebwt`, `NAME.3.ebwt`,
1460 `NAME.4.ebwt`, `NAME.rev.1.ebwt`, and `NAME.rev.2.ebwt`, where `NAME`
1467 The reference input files (specified as `<reference_in>`) are FASTA
1468 files (usually having extension `.fa`, `.mfa`, `.fna` or similar).
1472 The reference sequences are given on the command line. I.e.
1473 `<reference_in>` is a comma-separated list of sequences rather than a
1474 list of FASTA files.
1478 Build a colorspace index, to be queried using `bowtie` `-C`.
1482 Disable the default behavior whereby `bowtie-build` automatically
1483 selects values for the `--bmax`, `--dcv` and `--packed` parameters
1484 according to available memory. Instead, user may specify values for
1485 those parameters. If memory is exhausted during indexing, an error
1486 message will be printed; it is up to the user to try new parameters.
1490 Use a packed (2-bits-per-nucleotide) representation for DNA strings.
1491 This saves memory but makes indexing 2-3 times slower. Default: off.
1492 This is configured automatically by default; use `-a`/`--noauto` to
1497 The maximum number of suffixes allowed in a block. Allowing more
1498 suffixes per block makes indexing faster, but increases peak memory
1499 usage. Setting this option overrides any previous setting for
1500 `--bmax`, or `--bmaxdivn`. Default (in terms of the `--bmaxdivn`
1501 parameter) is `--bmaxdivn` 4. This is configured automatically by
1502 default; use `-a`/`--noauto` to configure manually.
1506 The maximum number of suffixes allowed in a block, expressed as a
1507 fraction of the length of the reference. Setting this option overrides
1508 any previous setting for `--bmax`, or `--bmaxdivn`. Default:
1509 `--bmaxdivn` 4. This is configured automatically by default; use
1510 `-a`/`--noauto` to configure manually.
1514 Use `<int>` as the period for the difference-cover sample. A larger
1515 period yields less memory overhead, but may make suffix sorting slower,
1516 especially if repeats are present. Must be a power of 2 no greater
1517 than 4096. Default: 1024. This is configured automatically by
1518 default; use `-a`/`--noauto` to configure manually.
1522 Disable use of the difference-cover sample. Suffix sorting becomes
1523 quadratic-time in the worst case (where the worst case is an extremely
1524 repetitive reference). Default: off.
1528 Do not build the `NAME.3.ebwt` and `NAME.4.ebwt` portions of the index,
1529 which contain a bitpacked version of the reference sequences and are
1530 used for paired-end alignment.
1534 Build *only* the `NAME.3.ebwt` and `NAME.4.ebwt` portions of the index,
1535 which contain a bitpacked version of the reference sequences and are
1536 used for paired-end alignment.
1540 To map alignments back to positions on the reference sequences, it's
1541 necessary to annotate ("mark") some or all of the [Burrows-Wheeler]
1542 rows with their corresponding location on the genome. `-o`/`--offrate`
1543 governs how many rows get marked: the indexer will mark every 2^`<int>`
1544 rows. Marking more rows makes reference-position lookups faster, but
1545 requires more memory to hold the annotations at runtime. The default
1546 is 5 (every 32nd row is marked; for human genome, annotations occupy
1547 about 340 megabytes).
1549 -t/--ftabchars <int>
1551 The ftab is the lookup table used to calculate an initial
1552 [Burrows-Wheeler] range with respect to the first `<int>` characters
1553 of the query. A larger `<int>` yields a larger lookup table but faster
1554 query times. The ftab has size 4^(`<int>`+1) bytes. The default
1555 setting is 10 (ftab is 4MB).
1559 Convert Ns in the reference sequence to As before building the index.
1560 By default, Ns are simply excluded from the index and `bowtie` will not
1561 report alignments that overlap them.
1565 Endianness to use when serializing integers to the index file.
1566 Default: little-endian (recommended for Intel- and AMD-based
1571 Use `<int>` as the seed for pseudo-random number generator.
1575 Index only the first `<int>` bases of the reference sequences
1576 (cumulative across sequences) and ignore the rest.
1580 `bowtie-build` is verbose by default. With this option `bowtie-build`
1581 will print only error messages.
1585 Print usage information and quit.
1589 Print version information and quit.
1591 The `bowtie-inspect` index inspector
1592 ====================================
1594 `bowtie-inspect` extracts information from a Bowtie index about what
1595 kind of index it is and what reference sequences were used to build it.
1596 When run without any options, the tool will output a FASTA file
1597 containing the sequences of the original references (with all
1598 non-`A`/`C`/`G`/`T` characters converted to `N`s). It can also be used
1599 to extract just the reference sequence names using the `-n`/`--names`
1600 option or a more verbose summary using the `-s`/`--summary` option.
1607 bowtie-inspect [options]* <ebwt_base>
1613 The basename of the index to be inspected. The basename is name of any
1614 of the index files but with the `.X.ebwt` or `.rev.X.ebwt` suffix
1615 omitted. `bowtie-inspect` first looks in the current directory for the
1616 index files, then looks in the `indexes` subdirectory under the
1617 directory where the currently-running `bowtie` executable is located,
1618 then looks in the directory specified in the `BOWTIE_INDEXES`
1619 environment variable.
1625 When printing FASTA output, output a newline character every `<int>`
1626 bases (default: 60).
1630 Print reference sequence names, one per line, and quit.
1634 Print a summary that includes information about index settings, as well
1635 as the names and lengths of the input sequences. The summary has this
1639 SA-Sample 1 in <sample>
1641 Sequence-1 <name> <len>
1642 Sequence-2 <name> <len>
1644 Sequence-N <name> <len>
1646 Fields are separated by tabs.
1650 By default, when `bowtie-inspect` is run without `-s` or `-n`, it
1651 recreates the reference nucleotide sequences using the bit-encoded
1652 reference nucleotides kept in the `.3.ebwt` and `.4.ebwt` index files.
1653 When `-e/--ebwt-ref` is specified, `bowtie-inspect` recreates the
1654 reference sequences from the Burrows-Wheeler-transformed reference
1655 sequence in the `.1.ebwt` file instead. The reference recreation
1656 process is much slower when `-e/--ebwt-ref` is specified. Also, when
1657 `-e/--ebwt-ref` is specified and the index is in colorspace, the
1658 reference is printed in colors (A=blue, C=green, G=orange, T=red).
1662 Print verbose output (for debugging).
1666 Print version information and quit.
1670 Print usage information and quit.