2 ! This manual is written in "markdown" format and thus contains some
3 ! distracting clutter encoding information about how to convert to
4 ! HTML. See 'MANUAL' for a clearer version of this document.
10 [Bowtie] is an ultrafast, memory-efficient short read aligner geared
11 toward quickly aligning large sets of short DNA sequences (reads) to
12 large genomes. It aligns 35-base-pair reads to the human genome at a
13 rate of 25 million reads per hour on a typical workstation. Bowtie
14 indexes the genome with a [Burrows-Wheeler] index to keep its memory
15 footprint small: for the human genome, the index is typically about
16 2.2 GB (for unpaired alignment) or 2.9 GB (for paired-end or colorspace
17 alignment). Multiple processors can be used simultaneously to achieve
18 greater alignment speed. Bowtie can also output alignments in the
19 standard [SAM] format, allowing Bowtie to interoperate with other tools
20 supporting SAM, including the [SAMtools] consensus, SNP, and indel
21 callers. Bowtie runs on the command line under Windows, Mac OS X,
24 [Bowtie] also forms the basis for other tools, including [TopHat]: a
25 fast splice junction mapper for RNA-seq reads, [Cufflinks]: a tool for
26 transcriptome assembly and isoform quantitiation from RNA-seq reads,
27 [Crossbow]: a cloud-computing software tool for large-scale
28 resequencing data,and [Myrna]: a cloud computing tool for calculating
29 differential gene expression in large RNA-seq datasets.
31 If you use [Bowtie] for your published research, please cite the
34 [Bowtie]: http://bowtie-bio.sf.net
35 [Burrows-Wheeler]: http://en.wikipedia.org/wiki/Burrows-Wheeler_transform
36 [SAM]: http://samtools.sourceforge.net/SAM1.pdf
37 [SAMtools]: http://samtools.sourceforge.net/
38 [TopHat]: http://tophat.cbcb.umd.edu/
39 [Cufflinks]: http://cufflinks.cbcb.umd.edu/
40 [Crossbow]: http://bowtie-bio.sf.net/crossbow
41 [Myrna]: http://bowtie-bio.sf.net/myrna
42 [Bowtie paper]: http://genomebiology.com/2009/10/3/R25
47 Bowtie is not a general-purpose alignment tool like [MUMmer], [BLAST]
48 or [Vmatch]. Bowtie works best when aligning short reads to large
49 genomes, though it supports arbitrarily small reference sequences (e.g.
50 amplicons) and reads as long as 1024 bases. Bowtie is designed to be
51 extremely fast for sets of short reads where (a) many of the reads have
52 at least one good, valid alignment, (b) many of the reads are
53 relatively high-quality, and (c) the number of alignments reported per
54 read is small (close to 1).
56 Bowtie does not yet report gapped alignments; this is future work.
58 [MUMmer]: http://mummer.sourceforge.net/
59 [BLAST]: http://blast.ncbi.nlm.nih.gov/Blast.cgi
60 [Vmatch]: http://www.vmatch.de/
65 You may download either Bowtie sources or binaries for your platform
66 from the [Download] section of the Sourceforge project site. Binaries
67 are currently available for Intel architectures (`i386` and `x86_64`)
68 running Linux, Windows, and Mac OS X.
73 Building Bowtie from source requires a GNU-like environment that
74 includes GCC, GNU Make and other basics. It should be possible to
75 build Bowtie on a vanilla Linux or Mac installation. Bowtie can also
76 be built on Windows using [Cygwin] or [MinGW]. We recommend
77 [TDM's MinGW Build]. If using [MinGW], you must also have [MSYS]
80 To build Bowtie, extract the sources, change to the extracted
81 directory, and run GNU `make` (usually with the command `make`, but
82 sometimes with `gmake`) with no arguments. If building with [MinGW],
83 run `make` from the [MSYS] command line.
85 To support the [`-p`] (multithreading) option, Bowtie needs the
86 `pthreads` library. To compile Bowtie without `pthreads` (which
87 disables [`-p`]), use `make BOWTIE_PTHREADS=0`.
89 [Cygwin]: http://www.cygwin.com/
90 [MinGW]: http://www.mingw.org/
91 [TDM's MinGW Build]: http://www.tdragon.net/recentgcc/
92 [MSYS]: http://www.mingw.org/wiki/msys
93 [Download]: https://sourceforge.net/projects/bowtie-bio/files/bowtie/
98 `bowtie` takes an index and a set of reads as input and outputs a list
99 of alignments. Alignments are selected according to a combination of
100 the [`-v`]/[`-n`]/[`-e`]/[`-l`] options (plus the [`-I`]/[`-X`]/[`--fr`]/[`--rf`]/
101 [`--ff`] options for paired-end alignment), which define which alignments
102 are legal, and the [`-k`]/[`-a`]/[`-m`]/[`-M`]/[`--best`]/[`--strata`] options
103 which define which and how many legal alignments should be reported.
105 By default, Bowtie enforces an alignment policy similar to [Maq]'s
106 default quality-aware policy ([`-n`] 2 [`-l`] 28 [`-e`] 70). See [the -n
107 alignment mode] section of the manual for details about this mode. But
108 Bowtie can also enforce a simpler end-to-end k-difference policy (e.g.
109 with [`-v`] 2). See [the -v alignment mode] section of the manual for
110 details about that mode. [The -n alignment mode] and [the -v alignment
111 mode] are mutually exclusive.
113 Bowtie works best when aligning short reads to large genomes (e.g.
114 human or mouse), though it supports arbitrarily small reference
115 sequences and reads as long as 1024 bases. Bowtie is designed to be
116 very fast for sets of short reads where a) many reads have at least one
117 good, valid alignment, b) many reads are relatively high-quality, c)
118 the number of alignments reported per read is small (close to 1).
119 These criteria are generally satisfied in the context of modern
120 short-read analyses such as RNA-seq, ChIP-seq, other types of -seq, and
121 mammalian resequencing. You may observe longer running times in other
124 If `bowtie` is too slow for your application, try some of the
125 performance-tuning hints described in the [Performance Tuning] section
128 Alignments involving one or more ambiguous reference characters (`N`,
129 `-`, `R`, `Y`, etc.) are considered invalid by Bowtie. This is true
130 only for ambiguous characters in the reference; alignments involving
131 ambiguous characters in the read are legal, subject to the alignment
132 policy. Ambiguous characters in the read mismatch all other
133 characters. Alignments that "fall off" the reference sequence are not
136 The process by which `bowtie` chooses an alignment to report is
137 randomized in order to avoid "mapping bias" - the phenomenon whereby
138 an aligner systematically fails to report a particular class of good
139 alignments, causing spurious "holes" in the comparative assembly.
140 Whenever `bowtie` reports a subset of the valid alignments that exist,
141 it makes an effort to sample them randomly. This randomness flows
142 from a simple seeded pseudo-random number generator and is
143 deterministic in the sense that Bowtie will always produce the same
144 results for the same read when run with the same initial "seed" value
145 (see [`--seed`] option).
147 In the default mode, `bowtie` can exhibit strand bias. Strand bias
148 occurs when input reference and reads are such that (a) some reads
149 align equally well to sites on the forward and reverse strands of the
150 reference, and (b) the number of such sites on one strand is different
151 from the number on the other strand. When this happens for a given
152 read, `bowtie` effectively chooses one strand or the other with 50%
153 probability, then reports a randomly-selected alignment for that read
154 from among the sites on the selected strand. This tends to overassign
155 alignments to the sites on the strand with fewer sites and underassign
156 to sites on the strand with more sites. The effect is mitigated,
157 though it may not be eliminated, when reads are longer or when
158 paired-end reads are used. Running Bowtie in [`--best`] mode
159 eliminates strand bias by forcing Bowtie to select one strand or the
160 other with a probability that is proportional to the number of best
163 Gapped alignments are not currently supported, but support is planned
164 for a future release.
166 [the -n alignment mode]: #the--n-alignment-mode
167 [the -v alignment mode]: #the--v-alignment-mode
168 [High Performance Tips]: #high-performance-tips
169 [Maq]: http://maq.sf.net
171 The `-n` alignment mode
172 -----------------------
174 When the [`-n`] option is specified (which is the default), `bowtie`
175 determines which alignments are valid according to the following
176 policy, which is similar to [Maq]'s default policy.
178 1. Alignments may have no more than `N` mismatches (where `N` is a
179 number 0-3, set with [`-n`]) in the first `L` bases (where `L` is a
180 number 5 or greater, set with [`-l`]) on the high-quality (left) end
181 of the read. The first `L` bases are called the "seed".
183 2. The sum of the [Phred quality] values at *all* mismatched positions
184 (not just in the seed) may not exceed `E` (set with [`-e`]). Where
185 qualities are unavailable (e.g. if the reads are from a FASTA
186 file), the [Phred quality] defaults to 40.
188 The [`-n`] option is mutually exclusive with the [`-v`] option.
190 If there are many possible alignments satisfying these criteria, Bowtie
191 gives preference to alignments with fewer mismatches and where the sum
192 from criterion 2 is smaller. When the [`--best`] option is specified,
193 Bowtie guarantees the reported alignment(s) are "best" in terms of
194 these criteria (criterion 1 has priority), and that the alignments are
195 reported in best-to-worst order. Bowtie is somewhat slower when
196 [`--best`] is specified.
198 Note that [Maq] internally rounds base qualities to the nearest 10 and
199 rounds qualities greater than 30 to 30. To maintain compatibility,
200 Bowtie does the same. Rounding can be suppressed with the
201 [`--nomaqround`] option.
203 Bowtie is not fully sensitive in [`-n`] 2 and [`-n`] 3 modes by default.
204 In these modes Bowtie imposes a "backtracking limit" to limit effort
205 spent trying to find valid alignments for low-quality reads unlikely to
206 have any. This may cause bowtie to miss some legal 2- and 3-mismatch
207 alignments. The limit is set to a reasonable default (125 without
208 [`--best`], 800 with [`--best`]), but the user may decrease or increase the
209 limit using the [`--maxbts`] and/or [`-y`] options. [`-y`] mode is
210 relatively slow but guarantees full sensitivity.
212 [Maq]: http://maq.sf.net
213 [Phred quality]: http://en.wikipedia.org/wiki/FASTQ_format#Variations
215 The `-v` alignment mode
216 -----------------------
218 In [`-v`] mode, alignments may have no more than `V` mismatches, where
219 `V` may be a number from 0 through 3 set using the [`-v`] option.
220 Quality values are ignored. The [`-v`] option is mutually exclusive with
223 If there are many legal alignments, Bowtie gives preference to
224 alignments with fewer mismatches. When the [`--best`] option is
225 specified, Bowtie guarantees the reported alignment(s) are "best" in
226 terms of the number of mismatches, and that the alignments are reported
227 in best-to-worst order. Bowtie is somewhat slower when [`--best`] is
233 In [the -n alignment mode], an alignment's "stratum" is defined as the
234 number of mismatches in the "seed" region, i.e. the leftmost `L` bases,
235 where `L` is set with the [`-l`] option. In [the -v alignment mode], an
236 alignment's stratum is defined as the total number of mismatches in the
237 entire alignment. Some of Bowtie's options (e.g. [`--strata`] and [`-m`]
238 use the notion of "stratum" to limit or expand the scope of reportable
244 With the [`-k`], [`-a`], [`-m`], [`-M`], [`--best`] and [`--strata`] options, the
245 user can flexibily select which alignments are reported. Below we
246 demonstrate a few ways in which these options can be combined. All
247 examples are using the `e_coli` index packaged with Bowtie. The
248 [`--suppress`] option is used to keep the output concise and some
249 output is elided for clarity.
253 $ ./bowtie -a -v 2 e_coli --suppress 1,5,6,7 -c ATGCATCATGCGCCAT
254 - gi|110640213|ref|NC_008253.1| 148810 10:A>G,13:C>G
255 - gi|110640213|ref|NC_008253.1| 2852852 8:T>A
256 - gi|110640213|ref|NC_008253.1| 4930433 4:G>T,6:C>G
257 - gi|110640213|ref|NC_008253.1| 905664 6:A>G,7:G>T
258 + gi|110640213|ref|NC_008253.1| 1093035 2:T>G,15:A>T
260 Specifying [`-a`] instructs bowtie to report *all* valid alignments,
261 subject to the alignment policy: [`-v`] 2. In this case, bowtie finds
262 5 inexact hits in the E. coli genome; 1 hit (the 2nd one listed)
263 has 1 mismatch, and the other 4 hits have 2 mismatches. Four are on
264 the reverse reference strand and one is on the forward strand. Note
265 that they are not listed in best-to-worst order.
267 ### Example 2: `-k 3`
269 $ ./bowtie -k 3 -v 2 e_coli --suppress 1,5,6,7 -c ATGCATCATGCGCCAT
270 - gi|110640213|ref|NC_008253.1| 148810 10:A>G,13:C>G
271 - gi|110640213|ref|NC_008253.1| 2852852 8:T>A
272 - gi|110640213|ref|NC_008253.1| 4930433 4:G>T,6:C>G
274 Specifying [`-k`] 3 instructs bowtie to report up to 3 valid
275 alignments. In this case, a total of 5 valid alignments exist (see
276 [Example 1]); `bowtie` reports 3 out of those 5. [`-k`] can be set to
277 any integer greater than 0.
279 [Example 1]: #example-1
281 ### Example 3: `-k 6`
283 $ ./bowtie -k 6 -v 2 e_coli --suppress 1,5,6,7 -c ATGCATCATGCGCCAT
284 - gi|110640213|ref|NC_008253.1| 148810 10:A>G,13:C>G
285 - gi|110640213|ref|NC_008253.1| 2852852 8:T>A
286 - gi|110640213|ref|NC_008253.1| 4930433 4:G>T,6:C>G
287 - gi|110640213|ref|NC_008253.1| 905664 6:A>G,7:G>T
288 + gi|110640213|ref|NC_008253.1| 1093035 2:T>G,15:A>T
290 Specifying [`-k`] 6 instructs bowtie to report up to 6 valid
291 alignments. In this case, a total of 5 valid alignments exist, so
292 `bowtie` reports all 5.
294 ### Example 4: default (`-k 1`)
296 $ ./bowtie -v 2 e_coli --suppress 1,5,6,7 -c ATGCATCATGCGCCAT
297 - gi|110640213|ref|NC_008253.1| 148810 10:A>G,13:C>G
299 Leaving the reporting options at their defaults causes `bowtie` to
300 report the first valid alignment it encounters. Because [`--best`] was
301 not specified, we are not guaranteed that bowtie will report the best
302 alignment, and in this case it does not (the 1-mismatch alignment from
303 the previous example would have been better). The default reporting
304 mode is equivalent to [`-k`] 1.
306 ### Example 5: `-a --best`
308 $ ./bowtie -a --best -v 2 e_coli --suppress 1,5,6,7 -c ATGCATCATGCGCCAT
309 - gi|110640213|ref|NC_008253.1| 2852852 8:T>A
310 + gi|110640213|ref|NC_008253.1| 1093035 2:T>G,15:A>T
311 - gi|110640213|ref|NC_008253.1| 905664 6:A>G,7:G>T
312 - gi|110640213|ref|NC_008253.1| 148810 10:A>G,13:C>G
313 - gi|110640213|ref|NC_008253.1| 4930433 4:G>T,6:C>G
315 Specifying [`-a`] [`--best`] results in the same alignments being printed
316 as if just [`-a`] had been specified, but they are guaranteed to be
317 reported in best-to-worst order.
319 ### Example 6: `-a --best --strata`
321 $ ./bowtie -a --best --strata -v 2 --suppress 1,5,6,7 e_coli -c ATGCATCATGCGCCAT
322 - gi|110640213|ref|NC_008253.1| 2852852 8:T>A
324 Specifying [`--strata`] in addition to [`-a`] and [`--best`] causes
325 `bowtie` to report only those alignments in the best alignment
326 "stratum". The alignments in the best stratum are those having the
327 least number of mismatches (or mismatches just in the "seed" portion of
328 the alignment in the case of [`-n`] mode). Note that if [`--strata`]
329 is specified, [`--best`] must also be specified.
331 ### Example 7: `-a -m 3`
333 $ ./bowtie -a -m 3 -v 2 e_coli -c ATGCATCATGCGCCAT
336 Specifying [`-m`] 3 instructs bowtie to refrain from reporting any
337 alignments for reads having more than 3 reportable alignments. The
338 [`-m`] option is useful when the user would like to guarantee that
339 reported alignments are "unique", for some definition of unique.
341 Example 1 showed that the read has 5 reportable alignments when [`-a`]
342 and [`-v`] 2 are specified, so the [`-m`] 3 limit causes bowtie to
343 output no alignments.
345 ### Example 8: `-a -m 5`
347 $ ./bowtie -a -m 5 -v 2 e_coli --suppress 1,5,6,7 -c ATGCATCATGCGCCAT
348 - gi|110640213|ref|NC_008253.1| 148810 10:A>G,13:C>G
349 - gi|110640213|ref|NC_008253.1| 2852852 8:T>A
350 - gi|110640213|ref|NC_008253.1| 4930433 4:G>T,6:C>G
351 - gi|110640213|ref|NC_008253.1| 905664 6:A>G,7:G>T
352 + gi|110640213|ref|NC_008253.1| 1093035 2:T>G,15:A>T
354 Specifying [`-m`] 5 instructs bowtie to refrain from reporting any
355 alignments for reads having more than 5 reportable alignments. Since
356 the read has exactly 5 reportable alignments, the [`-m`] 5 limit allows
357 `bowtie` to print them as usual.
359 ### Example 9: `-a -m 3 --best --strata`
361 $ ./bowtie -a -m 3 --best --strata -v 2 e_coli --suppress 1,5,6,7 -c ATGCATCATGCGCCAT
362 - gi|110640213|ref|NC_008253.1| 2852852 8:T>A
364 Specifying [`-m`] 3 instructs bowtie to refrain from reporting any
365 alignments for reads having more than 3 reportable alignments. As we
366 saw in Example 6, the read has only 1 reportable alignment when [`-a`],
367 [`--best`] and [`--strata`] are specified, so the [`-m`] 3 limit allows
368 `bowtie` to print that alignment as usual.
370 Intuitively, the [`-m`] option, when combined with the [`--best`] and
371 [`--strata`] options, guarantees a principled, though weaker form of
372 "uniqueness." A stronger form of uniqueness is enforced when [`-m`] is
373 specified but [`--best`] and [`--strata`] are not.
378 `bowtie` can align paired-end reads when properly paired read files are
379 specified using the [`-1`](#command-line) and [`-2`](#command-line) options (for pairs of raw, FASTA, or
380 FASTQ read files), or using the [`--12`](#command-line) option (for Tab-delimited read
381 files). A valid paired-end alignment satisfies these criteria:
383 1. Both mates have a valid alignment according to the alignment policy
384 defined by the [`-v`]/[`-n`]/[`-e`]/[`-l`] options.
385 2. The relative orientation and position of the mates satisfy the
386 constraints defined by the [`-I`]/[`-X`]/[`--fr`]/[`--rf`]/[`--ff`]
389 Policies governing which paired-end alignments are reported for a
390 given read are specified using the [`-k`], [`-a`] and [`-m`] options as
391 usual. The [`--strata`] and [`--best`] options do not apply in
394 A paired-end alignment is reported as a pair of mate alignments, both
395 on a separate line, where the alignment for each mate is formatted the
396 same as an unpaired (singleton) alignment. The alignment for the mate
397 that occurs closest to the beginning of the reference sequence (the
398 "upstream" mate) is always printed before the alignment for the
399 downstream mate. Reads files containing paired-end reads will
400 sometimes name the reads according to whether they are the #1 or #2
401 mates by appending a `/1` or `/2` suffix to the read name. If no such
402 suffix is present in Bowtie's input, the suffix will be added when
403 Bowtie prints read names in alignments (except in [`-S`] "SAM" mode,
404 where mate information is encoded in the `FLAGS` field instead).
406 Finding a valid paired-end alignment where both mates align to
407 repetitive regions of the reference can be very time-consuming. By
408 default, Bowtie avoids much of this cost by imposing a limit on the
409 number of "tries" it makes to match an alignment for one mate with a
410 nearby alignment for the other. The default limit is 100. This causes
411 `bowtie` to miss some valid paired-end alignments where both mates lie
412 in repetitive regions, but the user may use the [`--pairtries`] or
413 [`-y`] options to increase Bowtie's sensitivity as desired.
415 Paired-end alignments where one mate's alignment is entirely contained
416 within the other's are considered invalid.
418 When colospace alignment is enabled via [`-C`], the default setting for
419 paired-end orientation is [`--ff`]. This is because most SOLiD datasets
420 have that orientation. When colorspace alignment is not enabled
421 (default), the default setting for orientation is [`--fr`], since most
422 Illumina datasets have this orientation. The default can be overriden
425 Because Bowtie uses an in-memory representation of the original
426 reference string when finding paired-end alignments, its memory
427 footprint is larger when aligning paired-end reads. For example, the
428 human index has a memory footprint of about 2.2 GB in single-end mode
429 and 2.9 GB in paired-end mode. Note that paired-end and unpaired
430 alignment incur the same memory footprint in colorspace (e.g. human
436 [Colorspace alignment]: #colorspace-alignment
438 As of version 0.12.0, `bowtie` can align colorspace reads against a
439 colorspace index when [`-C`] is specified. Colorspace is the
440 characteristic output format of Applied Biosystems' SOLiD system. In a
441 colorspace read, each character is a color rather than a nucleotide,
442 where a color encodes a class of dinucleotides. E.g. the color blue
443 encodes any of the dinucleotides: AA, CC, GG, TT. Colorspace has the
444 advantage of (often) being able to distinguish sequencing errors from
445 SNPs once the read has been aligned. See ABI's [Principles of Di-Base
446 Sequencing] document for details.
450 All input formats (FASTA [`-f`], FASTQ [`-q`], raw [`-r`], tab-delimited
451 [`--12`](#command-line), command-line [`-c`]) are compatible with colorspace ([`-C`]).
452 When [`-C`] is specified, read sequences are treated as colors. Colors
453 may be encoded either as numbers (`0`=blue, `1`=green, `2`=orange,
454 `3`=red) or as characters `A/C/G/T` (`A`=blue, `C`=green, `G`=orange,
457 Some reads include a primer base as the first character; e.g.:
460 T2213120002010301233221223311331
462 T2302111203131231130300111123220
465 Here, `T` is the primer base. `bowtie` detects and handles primer
466 bases properly (i.e., the primer base and the adjacent color are both
467 trimmed away prior to alignment) as long as the rest of the read is
470 `bowtie` also handles input in the form of parallel `.csfasta` and
471 `_QV.qual` files. Use [`-f`] to specify the `.csfasta` files and [`-Q`]
472 (for unpaired reads) or [`--Q1`]/[`--Q2`] (for paired-end reads) to
473 specify the corresponding `_QV.qual` files. It is not necessary to
474 first convert to FASTQ, though `bowtie` also handles FASTQ-formatted
475 colorspace reads (with [`-q`], the default).
477 ### Building a colorspace index
479 A colorspace index is built in the same way as a normal index except
480 that [`-C`](#bowtie-build-options-C) must be specified when running `bowtie-build`. If the user
481 attempts to use `bowtie` without [`-C`] to align against an index that
482 was built with [`-C`] (or vice versa), `bowtie` prints an error message
485 ### Decoding colorspace alignments
487 Once a colorspace read is aligned, Bowtie decodes the alignment into
488 nucleotides and reports the decoded nucleotide sequence. A principled
489 decoding scheme is necessary because many different possible decodings
490 are usually possible. Finding the true decoding with 100% certainty
491 requires knowing all variants (e.g. SNPs) in the subject's genome
492 beforehand, which is usually not possible. Instead, `bowtie` employs
493 the approximate decoding scheme described in the [BWA paper]. This
494 scheme attempts to distinguish variants from sequencing errors
495 according to their relative likelihood under a model that considers the
496 quality values of the colors and the (configurable) global likelihood
499 Quality values are also "decoded" so that each reported quality value
500 is a function of the two color qualities overlapping it. Bowtie again
501 adopts the scheme described in the [BWA paper], i.e., the decoded
502 nucleotide quality is either the sum of the overlapping color qualities
503 (when both overlapping colors correspond to bases that match in the
504 alignment), the quality of the matching color minus the quality of the
505 mismatching color, or 0 (when both overlapping colors correspond to
508 For accurate decoding, [`--snpphred`]/[`--snpfrac`] should be set according
509 to the user's best guess of the SNP frequency in the subject. The
510 [`--snpphred`] parameter sets the SNP penalty directly (on the [Phred
511 quality] scale), whereas [`--snpfrac`] allows the user to specify the
512 fraction of sites expected to be SNPs; the fraction is then converted
513 to a [Phred quality] internally. For the purpose of decoding, the SNP
514 fraction is defined in terms of SNPs per *haplotype* base. Thus, if
515 the genome is diploid, heterozygous SNPs have half the weight of
518 Note that in [`-S`/`--sam`] mode, the decoded nucleotide sequence is
519 printed for alignments, but the original color sequence (with `A`=blue,
520 `C`=green, `G`=orange, `T`=red) is printed for unaligned reads without
521 any reported alignments. As always, the [`--un`], [`--max`] and [`--al`]
522 parameters print reads exactly as they appeared in the input file.
524 ### Paired-end colorspace alignment
526 Like other platforms, SOLiD supports generation of paired-end reads.
527 When colorspace alignment is enabled, the default paired-end
528 orientation setting is [`--ff`]. This is because most SOLiD datasets
529 have that orientation.
531 Note that SOLiD-generated read files can have "orphaned" mates; i.e.
532 mates without a correpsondingly-named mate in the other file. To avoid
533 problems due to orphaned mates, SOLiD paired-end output should first be
534 converted to `.csfastq` files with unpaired mates omitted. This can be
535 accomplished using, for example, [Galaxy]'s conversion tool (click
536 "NGS: QC and manipulation", then "SOLiD-to-FASTQ" in the left-hand
539 [Principles of Di-Base Sequencing]: http://tinyurl.com/ygnb2gn
540 [Decoding colorspace alignments]: #decoding-colorspace-alignments
541 [BWA paper]: http://bioinformatics.oxfordjournals.org/cgi/content/abstract/25/14/1754
546 [Performance tuning]: #performance-tuning
548 1. Use 64-bit bowtie if possible
550 The 64-bit version of Bowtie is substantially (usually more then
551 50%) faster than the 32-bit version, owing to its use of 64-bit
552 arithmetic. If possible, download the 64-bit binaries for Bowtie
553 and run on a 64-bit computer. If you are building Bowtie from
554 sources, you may need to pass the `-m64` option to `g++` to compile
555 the 64-bit version; you can do this by including `BITS=64` in the
556 arguments to the `make` command; e.g.: `make BITS=64 bowtie`. To
557 determine whether your version of bowtie is 64-bit or 32-bit, run
560 2. If your computer has multiple processors/cores, use `-p`
562 The [`-p`] option causes Bowtie to launch a specified number of
563 parallel search threads. Each thread runs on a different
564 processor/core and all threads find alignments in parallel,
565 increasing alignment throughput by approximately a multiple of the
566 number of threads (though in practice, speedup is somewhat worse
569 3. If reporting many alignments per read, try tweaking
570 `bowtie-build --offrate`
572 If you are using the [`-k`], [`-a`] or [`-m`] options and Bowtie is
573 reporting many alignments per read (an average of more than about
574 10 per read) and you have some memory to spare, using an index with
575 a denser SA sample can speed things up considerably.
577 To do this, specify a smaller-than-default [`-o`/`--offrate`](#bowtie-build-options-o) value
578 when running `bowtie-build`. A denser SA sample yields a larger
579 index, but is also particularly effective at speeding up alignment
580 when many alignments are reported per read. For example,
581 decreasing the index's [`-o`/`--offrate`](#bowtie-build-options-o) by 1 could as much as
582 double alignment performance, and decreasing by 2 could quadruple
583 alignment performance, etc.
585 On the other hand, decreasing [`-o`/`--offrate`](#bowtie-build-options-o) increases the size
586 of the Bowtie index, both on disk and in memory when aligning
587 reads. At the default [`-o`/`--offrate`](#bowtie-build-options-o) of 5, the SA sample for the
588 human genome occupies about 375 MB of memory when aligning reads.
589 Decreasing the [`-o`/`--offrate`](#bowtie-build-options-o) by 1 doubles the memory taken by
590 the SA sample, and decreasing by 2 quadruples the memory taken,
593 4. If bowtie "thrashes", try increasing `bowtie --offrate`
595 If `bowtie` runs very slow on a relatively low-memory machine
596 (having less than about 4 GB of memory), then try setting `bowtie`
597 [`-o`/`--offrate`] to a *larger* value than the value used to build
598 the index. For example, `bowtie-build`'s default [`-o`/`--offrate`](#bowtie-build-options-o)
599 is 5 and all pre-built indexes available from the Bowtie website
600 are built with [`-o`/`--offrate`](#bowtie-build-options-o) 5; so if `bowtie` thrashes when
601 querying such an index, try using `bowtie` [`--offrate`] 6. If
602 `bowtie` still thrashes, try `bowtie` [`--offrate`] 7, etc. A higher
603 [`-o`/`--offrate`] causes `bowtie` to use a sparser sample of the
604 suffix array than is stored in the index; this saves memory but
605 makes alignment reporting slower (which is especially slow when
606 using [`-a`] or large [`-k`] or [`-m`]).
613 bowtie [options]* <ebwt> {-1 <m1> -2 <m2> | --12 <r> | <s>} [<hit>]
623 The basename of the index to be searched. The basename is the name of
624 any of the index files up to but not including the final `.1.ebwt` /
625 `.rev.1.ebwt` / etc. `bowtie` looks for the specified index first in
626 the current directory, then in the `indexes` subdirectory under the
627 directory where the `bowtie` executable is located, then looks in the
628 directory specified in the `BOWTIE_INDEXES` environment variable.
636 Comma-separated list of files containing the #1 mates (filename usually
637 includes `_1`), or, if [`-c`] is specified, the mate sequences
638 themselves. E.g., this might be `flyA_1.fq,flyB_1.fq`, or, if [`-c`]
639 is specified, this might be `GGTCATCCT,ACGGGTCGT`. Sequences specified
640 with this option must correspond file-for-file and read-for-read with
641 those specified in `<m2>`. Reads may be a mix of different lengths.
642 If `-` is specified, `bowtie` will read the #1 mates from the "standard
651 Comma-separated list of files containing the #2 mates (filename usually
652 includes `_2`), or, if [`-c`] is specified, the mate sequences
653 themselves. E.g., this might be `flyA_2.fq,flyB_2.fq`, or, if [`-c`]
654 is specified, this might be `GGTCATCCT,ACGGGTCGT`. Sequences specified
655 with this option must correspond file-for-file and read-for-read with
656 those specified in `<m1>`. Reads may be a mix of different lengths.
657 If `-` is specified, `bowtie` will read the #2 mates from the "standard
666 Comma-separated list of files containing a mix of unpaired and
667 paired-end reads in Tab-delimited format. Tab-delimited format is a
668 1-read-per-line format where unpaired reads consist of a read name,
669 sequence and quality string each separated by tabs. A paired-end read
670 consists of a read name, sequnce of the #1 mate, quality values of the
671 #1 mate, sequence of the #2 mate, and quality values of the #2 mate
672 separated by tabs. Quality values can be expressed using any of the
673 scales supported in FASTQ files. Reads may be a mix of different
674 lengths and paired-end and unpaired reads may be intermingled in the
675 same file. If `-` is specified, `bowtie` will read the Tab-delimited
676 reads from the "standard in" filehandle.
684 A comma-separated list of files containing unpaired reads to be
685 aligned, or, if [`-c`] is specified, the unpaired read sequences
686 themselves. E.g., this might be
687 `lane1.fq,lane2.fq,lane3.fq,lane4.fq`, or, if [`-c`] is specified, this
688 might be `GGTCATCCT,ACGGGTCGT`. Reads may be a mix of different
689 lengths. If `-` is specified, Bowtie gets the reads from the "standard
698 File to write alignments to. By default, alignments are written to the
699 "standard out" filehandle (i.e. the console).
708 <tr><td id="bowtie-options-q">
710 [`-q`]: #bowtie-options-q
716 The query input files (specified either as `<m1>` and `<m2>`, or as
717 `<s>`) are FASTQ files (usually having extension `.fq` or `.fastq`).
718 This is the default. See also: [`--solexa-quals`] and
721 </td></tr><tr><td id="bowtie-options-f">
723 [`-f`]: #bowtie-options-f
729 The query input files (specified either as `<m1>` and `<m2>`, or as
730 `<s>`) are FASTA files (usually having extension `.fa`, `.mfa`, `.fna`
731 or similar). All quality values are assumed to be 40 on the [Phred
734 </td></tr><tr><td id="bowtie-options-r">
736 [`-r`]: #bowtie-options-r
742 The query input files (specified either as `<m1>` and `<m2>`, or as
743 `<s>`) are Raw files: one sequence per line, without quality values or
744 names. All quality values are assumed to be 40 on the [Phred quality]
747 </td></tr><tr><td id="bowtie-options-c">
749 [`-c`]: #bowtie-options-c
755 The query sequences are given on command line. I.e. `<m1>`, `<m2>` and
756 `<singles>` are comma-separated lists of reads rather than lists of
759 </td></tr><tr><td id="bowtie-options-C">
761 [`-C`]: #bowtie-options-C
762 [`-C`/`--color`]: #bowtie-options-C
768 Align in colorspace. Read characters are interpreted as colors. The
769 index specified must be a colorspace index (i.e. built with
770 `bowtie-build` [`-C`](#bowtie-build-options-C), or `bowtie` will print an error message and quit.
771 See [Colorspace alignment] for more details.
773 </td></tr><tr><td id="bowtie-options-Q">
775 [`-Q`]: #bowtie-options-Q
776 [`-Q`/`--quals`]: #bowtie-options-Q
782 Comma-separated list of files containing quality values for
783 corresponding unpaired CSFASTA reads. Use in combination with [`-C`]
784 and [`-f`]. [`--integer-quals`] is set automatically when `-Q`/`--quals`
787 </td></tr><tr><td id="bowtie-options-Q1">
789 [`--Q1`]: #bowtie-options-Q1
795 Comma-separated list of files containing quality values for
796 corresponding CSFASTA #1 mates. Use in combination with [`-C`], [`-f`],
797 and [`-1`](#command-line). [`--integer-quals`] is set automatically when `--Q1`
800 </td></tr><tr><td id="bowtie-options-Q2">
802 [`--Q2`]: #bowtie-options-Q2
808 Comma-separated list of files containing quality values for
809 corresponding CSFASTA #2 mates. Use in combination with [`-C`], [`-f`],
810 and [`-2`](#command-line). [`--integer-quals`] is set automatically when `--Q2`
813 </td></tr><tr><td id="bowtie-options-s">
815 [`-s`/`--skip`]: #bowtie-options-s
816 [`-s`]: #bowtie-options-s
822 Skip (i.e. do not align) the first `<int>` reads or pairs in the input.
824 </td></tr><tr><td id="bowtie-options-u">
826 [`-u`/`--qupto`]: #bowtie-options-u
827 [`-u`]: #bowtie-options-u
833 Only align the first `<int>` reads or read pairs from the input (after
834 the [`-s`/`--skip`] reads or pairs have been skipped). Default: no
837 </td></tr><tr><td id="bowtie-options-5">
839 [`-5`/`--trim5`]: #bowtie-options-5
840 [`-5`]: #bowtie-options-5
846 Trim `<int>` bases from high-quality (left) end of each read before
847 alignment (default: 0).
849 </td></tr><tr><td id="bowtie-options-3">
851 [`-3`/`--trim3`]: #bowtie-options-3
852 [`-3`]: #bowtie-options-3
858 Trim `<int>` bases from low-quality (right) end of each read before
859 alignment (default: 0).
861 </td></tr><tr><td id="bowtie-options-phred33-quals">
863 [`--phred33-quals`]: #bowtie-options-phred33-quals
869 Input qualities are ASCII chars equal to the [Phred quality] plus 33.
872 </td></tr><tr><td id="bowtie-options-phred64-quals">
874 [`--phred64-quals`]: #bowtie-options-phred64-quals
880 Input qualities are ASCII chars equal to the [Phred quality] plus 64.
883 </td></tr><tr><td id="bowtie-options-solexa-quals">
885 [`--solexa-quals`]: #bowtie-options-solexa-quals
891 Convert input qualities from [Solexa][Phred quality] (which can be
892 negative) to [Phred][Phred quality] (which can't). This is usually the
893 right option for use with (unconverted) reads emitted by GA Pipeline
894 versions prior to 1.3. Default: off.
896 </td></tr><tr><td id="bowtie-options-solexa1.3-quals">
898 [`--solexa1.3-quals`]: #bowtie-options-solexa1.3-quals
904 Same as [`--phred64-quals`]. This is usually the right option for use
905 with (unconverted) reads emitted by GA Pipeline version 1.3 or later.
908 </td></tr><tr><td id="bowtie-options-integer-quals">
910 [`--integer-quals`]: #bowtie-options-integer-quals
916 Quality values are represented in the read input file as
917 space-separated ASCII integers, e.g., `40 40 30 40`..., rather than
918 ASCII characters, e.g., `II?I`.... Integers are treated as being on
919 the [Phred quality] scale unless [`--solexa-quals`] is also specified.
928 <tr><td id="bowtie-options-v">
930 [`-v`]: #bowtie-options-v
936 Report alignments with at most `<int>` mismatches. [`-e`] and [`-l`]
937 options are ignored and quality values have no effect on what
938 alignments are valid. [`-v`] is mutually exclusive with [`-n`].
940 </td></tr><tr><td id="bowtie-options-n">
942 [`-n`/`--seedmms`]: #bowtie-options-n
943 [`-n`]: #bowtie-options-n
949 Maximum number of mismatches permitted in the "seed", i.e. the first
950 `L` base pairs of the read (where `L` is set with [`-l`/`--seedlen`]).
951 This may be 0, 1, 2 or 3 and the default is 2. This option is mutually
952 exclusive with the [`-v`] option.
954 </td></tr><tr><td id="bowtie-options-e">
956 [`-e`/`--maqerr`]: #bowtie-options-e
957 [`-e`]: #bowtie-options-e
963 Maximum permitted total of quality values at *all* mismatched read
964 positions throughout the entire alignment, not just in the "seed". The
965 default is 70. Like [Maq], `bowtie` rounds quality values to the
966 nearest 10 and saturates at 30; rounding can be disabled with
969 </td></tr><tr><td id="bowtie-options-l">
971 [`-l`/`--seedlen`]: #bowtie-options-l
972 [`-l`]: #bowtie-options-l
978 The "seed length"; i.e., the number of bases on the high-quality end of
979 the read to which the [`-n`] ceiling applies. The lowest permitted
980 setting is 5 and the default is 28. `bowtie` is faster for larger
983 </td></tr><tr><td id="bowtie-options-nomaqround">
985 [`--nomaqround`]: #bowtie-options-nomaqround
991 [Maq] accepts quality values in the [Phred quality] scale, but
992 internally rounds values to the nearest 10, with a maximum of 30. By
993 default, `bowtie` also rounds this way. [`--nomaqround`] prevents this
994 rounding in `bowtie`.
996 </td></tr><tr><td id="bowtie-options-I">
998 [`-I`/`--minins`]: #bowtie-options-I
999 [`-I`]: #bowtie-options-I
1005 The minimum insert size for valid paired-end alignments. E.g. if `-I
1006 60` is specified and a paired-end alignment consists of two 20-bp
1007 alignments in the appropriate orientation with a 20-bp gap between
1008 them, that alignment is considered valid (as long as [`-X`] is also
1009 satisfied). A 19-bp gap would not be valid in that case. If trimming
1010 options [`-3`] or [`-5`] are also used, the [`-I`] constraint is
1011 applied with respect to the untrimmed mates. Default: 0.
1013 </td></tr><tr><td id="bowtie-options-X">
1015 [`-X`/`--maxins`]: #bowtie-options-X
1016 [`-X`]: #bowtie-options-X
1022 The maximum insert size for valid paired-end alignments. E.g. if `-X
1023 100` is specified and a paired-end alignment consists of two 20-bp
1024 alignments in the proper orientation with a 60-bp gap between them,
1025 that alignment is considered valid (as long as [`-I`] is also
1026 satisfied). A 61-bp gap would not be valid in that case. If trimming
1027 options [`-3`] or [`-5`] are also used, the `-X` constraint is applied
1028 with respect to the untrimmed mates, not the trimmed mates. Default:
1031 </td></tr><tr><td id="bowtie-options-fr">
1033 [`--fr`/`--rf`/`--ff`]: #bowtie-options-fr
1034 [`--fr`]: #bowtie-options-fr
1035 [`--rf`]: #bowtie-options-fr
1036 [`--ff`]: #bowtie-options-fr
1042 The upstream/downstream mate orientations for a valid paired-end
1043 alignment against the forward reference strand. E.g., if `--fr` is
1044 specified and there is a candidate paired-end alignment where mate1
1045 appears upstream of the reverse complement of mate2 and the insert
1046 length constraints are met, that alignment is valid. Also, if mate2
1047 appears upstream of the reverse complement of mate1 and all other
1048 constraints are met, that too is valid. `--rf` likewise requires that
1049 an upstream mate1 be reverse-complemented and a downstream mate2 be
1050 forward-oriented. ` --ff` requires both an upstream mate1 and a
1051 downstream mate2 to be forward-oriented. Default: `--fr` when [`-C`]
1052 (colorspace alignment) is not specified, `--ff` when [`-C`] is specified.
1054 </td></tr><tr><td id="bowtie-options-nofw">
1056 [`--nofw`]: #bowtie-options-nofw
1062 If `--nofw` is specified, `bowtie` will not attempt to align against
1063 the forward reference strand. If `--norc` is specified, `bowtie` will
1064 not attempt to align against the reverse-complement reference strand.
1065 For paired-end reads using [`--fr`] or [`--rf`] modes, `--nofw` and
1066 `--norc` apply to the forward and reverse-complement pair orientations.
1067 I.e. specifying `--nofw` and [`--fr`] will only find reads in the R/F
1068 orientation where mate 2 occurs upstream of mate 1 with respect to the
1069 forward reference strand.
1071 </td></tr><tr><td id="bowtie-options-maxbts">
1073 [`--maxbts`]: #bowtie-options-maxbts
1079 The maximum number of backtracks permitted when aligning a read in
1080 [`-n`] 2 or [`-n`] 3 mode (default: 125 without [`--best`], 800 with
1081 [`--best`]). A "backtrack" is the introduction of a speculative
1082 substitution into the alignment. Without this limit, the default
1083 parameters will sometimes require that `bowtie` try 100s or 1,000s of
1084 backtracks to align a read, especially if the read has many low-quality
1085 bases and/or has no valid alignments, slowing bowtie down
1086 significantly. However, this limit may cause some valid alignments to
1087 be missed. Higher limits yield greater sensitivity at the expensive of
1088 longer running times. See also: [`-y`/`--tryhard`].
1090 </td></tr><tr><td id="bowtie-options-pairtries">
1092 [`--pairtries`]: #bowtie-options-pairtries
1098 For paired-end alignment, this is the maximum number of attempts
1099 `bowtie` will make to match an alignment for one mate up with an
1100 alignment for the opposite mate. Most paired-end alignments require
1101 only a few such attempts, but pairs where both mates occur in highly
1102 repetitive regions of the reference can require significantly more.
1103 Setting this to a higher number allows `bowtie` to find more paired-
1104 end alignments for repetitive pairs at the expense of speed. The
1105 default is 100. See also: [`-y`/`--tryhard`].
1107 </td></tr><tr><td id="bowtie-options-y">
1109 [`-y`/`--tryhard`]: #bowtie-options-y
1110 [`-y`]: #bowtie-options-y
1116 Try as hard as possible to find valid alignments when they exist,
1117 including paired-end alignments. This is equivalent to specifying very
1118 high values for the [`--maxbts`] and [`--pairtries`] options. This
1119 mode is generally much slower than the default settings, but can be
1120 useful for certain problems. This mode is slower when (a) the
1121 reference is very repetitive, (b) the reads are low quality, or (c) not
1122 many reads have valid alignments.
1124 </td></tr><tr><td id="bowtie-options-chunkmbs">
1126 [`--chunkmbs`]: #bowtie-options-chunkmbs
1132 The number of megabytes of memory a given thread is given to store path
1133 descriptors in [`--best`] mode. Best-first search must keep track of
1134 many paths at once to ensure it is always extending the path with the
1135 lowest cumulative cost. Bowtie tries to minimize the memory impact of
1136 the descriptors, but they can still grow very large in some cases. If
1137 you receive an error message saying that chunk memory has been
1138 exhausted in [`--best`] mode, try adjusting this parameter up to
1139 dedicate more memory to the descriptors. Default: 64.
1145 <table><tr><td id="bowtie-options-k">
1147 [`-k`]: #bowtie-options-k
1153 Report up to `<int>` valid alignments per read or pair (default: 1).
1154 Validity of alignments is determined by the alignment policy (combined
1155 effects of [`-n`], [`-v`], [`-l`], and [`-e`]). If more than one valid
1156 alignment exists and the [`--best`] and [`--strata`] options are
1157 specified, then only those alignments belonging to the best alignment
1158 "stratum" will be reported. Bowtie is designed to be very fast for
1159 small [`-k`] but bowtie can become significantly slower as [`-k`]
1160 increases. If you would like to use Bowtie for larger values of
1161 [`-k`], consider building an index with a denser suffix-array sample,
1162 i.e. specify a smaller [`-o`/`--offrate`](#bowtie-build-options-o) when invoking `bowtie-build`
1163 for the relevant index (see the [Performance tuning] section for
1166 </td></tr><tr><td id="bowtie-options-a">
1168 [`-a`/`--all`]: #bowtie-options-a
1169 [`-a`]: #bowtie-options-a
1175 Report all valid alignments per read or pair (default: off). Validity
1176 of alignments is determined by the alignment policy (combined effects
1177 of [`-n`], [`-v`], [`-l`], and [`-e`]). If more than one valid alignment
1178 exists and the [`--best`] and [`--strata`] options are specified, then only
1179 those alignments belonging to the best alignment "stratum" will be
1180 reported. Bowtie is designed to be very fast for small [`-k`] but bowtie
1181 can become significantly slower if [`-a`/`--all`] is specified. If you
1182 would like to use Bowtie with [`-a`], consider building an index with a
1183 denser suffix-array sample, i.e. specify a smaller [`-o`/`--offrate`](#bowtie-build-options-o)
1184 when invoking `bowtie-build` for the relevant index (see the
1185 [Performance tuning] section for details).
1187 </td></tr><tr><td id="bowtie-options-m">
1189 [`-m`]: #bowtie-options-m
1195 Suppress all alignments for a particular read or pair if more than
1196 `<int>` reportable alignments exist for it. Reportable alignments are
1197 those that would be reported given the [`-n`], [`-v`], [`-l`], [`-e`], [`-k`],
1198 [`-a`], [`--best`], and [`--strata`] options. Default: no limit. Bowtie is
1199 designed to be very fast for small [`-m`] but bowtie can become
1200 significantly slower for larger values of [`-m`]. If you would like to
1201 use Bowtie for larger values of [`-k`], consider building an index with a
1202 denser suffix-array sample, i.e. specify a smaller [`-o`/`--offrate`](#bowtie-build-options-o) when
1203 invoking `bowtie-build` for the relevant index (see the [Performance
1204 tuning] section for details).
1206 </td></tr><tr><td id="bowtie-options-M">
1208 [`-M`]: #bowtie-options-M
1214 Behaves like [`-m`] except that if a read has more than `<int>`
1215 reportable alignments, one is reported at random. In [default
1216 output mode], the selected alignment's 7th column is set to `<int>`+1 to
1217 indicate the read has at least `<int>`+1 valid alignments. In
1218 [`-S`/`--sam`] mode, the selected alignment is given a `MAPQ` (mapping
1219 quality) of 0 and the `XM:I` field is set to `<int>`+1. This option
1220 requires [`--best`]; if specified without [`--best`], [`--best`] is enabled
1223 [default output mode]: #default-bowtie-output
1225 </td></tr><tr><td id="bowtie-options-best">
1227 [`--best`]: #bowtie-options-best
1233 Make Bowtie guarantee that reported singleton alignments are "best" in
1234 terms of stratum (i.e. number of mismatches, or mismatches in the seed
1235 in the case of [`-n`] mode) and in terms of the quality values at the
1236 mismatched position(s). Stratum always trumps quality; e.g. a
1237 1-mismatch alignment where the mismatched position has [Phred quality]
1238 40 is preferred over a 2-mismatch alignment where the mismatched
1239 positions both have [Phred quality] 10. When [`--best`] is not
1240 specified, Bowtie may report alignments that are sub-optimal in terms
1241 of stratum and/or quality (though an effort is made to report the best
1242 alignment). [`--best`] mode also removes all strand bias. Note that
1243 [`--best`] does not affect which alignments are considered "valid" by
1244 `bowtie`, only which valid alignments are reported by `bowtie`. When
1245 [`--best`] is specified and multiple hits are allowed (via [`-k`] or
1246 [`-a`]), the alignments for a given read are guaranteed to appear in
1247 best-to-worst order in `bowtie`'s output. `bowtie` is somewhat slower
1248 when [`--best`] is specified.
1250 </td></tr><tr><td id="bowtie-options-strata">
1252 [`--strata`]: #bowtie-options-strata
1258 If many valid alignments exist and are reportable (e.g. are not
1259 disallowed via the [`-k`] option) and they fall into more than one
1260 alignment "stratum", report only those alignments that fall into the
1261 best stratum. By default, Bowtie reports all reportable alignments
1262 regardless of whether they fall into multiple strata. When
1263 [`--strata`] is specified, [`--best`] must also be specified.
1272 <tr><td id="bowtie-options-t">
1274 [`-t`/`--time`]: #bowtie-options-t
1275 [`-t`]: #bowtie-options-t
1281 Print the amount of wall-clock time taken by each phase.
1283 </td></tr><tr><td id="bowtie-options-B">
1285 [`-B`/`--offbase`]: #bowtie-options-B
1286 [`-B`]: #bowtie-options-B
1292 When outputting alignments, number the first base of a reference
1293 sequence as `<int>`. Default: 0.
1295 </td></tr><tr><td id="bowtie-options-quiet">
1297 [`--quiet`]: #bowtie-options-quiet
1303 Print nothing besides alignments.
1305 </td></tr><tr><td id="bowtie-options-refout">
1307 [`--refout`]: #bowtie-options-refout
1313 Write alignments to a set of files named `refXXXXX.map`, where `XXXXX`
1314 is the 0-padded index of the reference sequence aligned to. This can
1315 be a useful way to break up work for downstream analyses when dealing
1316 with, for example, large numbers of reads aligned to the assembled
1317 human genome. If `<hits>` is also specified, it will be ignored.
1319 </td></tr><tr><td id="bowtie-options-refidx">
1321 [`--refidx`]: #bowtie-options-refidx
1327 When a reference sequence is referred to in a reported alignment, refer
1328 to it by 0-based index (its offset into the list of references that
1329 were indexed) rather than by name.
1331 </td></tr><tr><td id="bowtie-options-al">
1333 [`--al`]: #bowtie-options-al
1339 Write all reads for which at least one alignment was reported to a file
1340 with name `<filename>`. Written reads will appear as they did in the
1341 input, without any of the trimming or translation of quality values
1342 that may have taken place within `bowtie`. Paired-end reads will be
1343 written to two parallel files with `_1` and `_2` inserted in the
1344 filename, e.g., if `<filename>` is `aligned.fq`, the #1 and #2 mates
1345 that fail to align will be written to `aligned_1.fq` and `aligned_2.fq`
1348 </td></tr><tr><td id="bowtie-options-un">
1350 [`--un`]: #bowtie-options-un
1356 Write all reads that could not be aligned to a file with name
1357 `<filename>`. Written reads will appear as they did in the input,
1358 without any of the trimming or translation of quality values that may
1359 have taken place within Bowtie. Paired-end reads will be written to
1360 two parallel files with `_1` and `_2` inserted in the filename, e.g.,
1361 if `<filename>` is `unaligned.fq`, the #1 and #2 mates that fail to
1362 align will be written to `unaligned_1.fq` and `unaligned_2.fq`
1363 respectively. Unless [`--max`] is also specified, reads with a number
1364 of valid alignments exceeding the limit set with the [`-m`] option are
1365 also written to `<filename>`.
1367 </td></tr><tr><td id="bowtie-options-max">
1369 [`--max`]: #bowtie-options-max
1375 Write all reads with a number of valid alignments exceeding the limit
1376 set with the [`-m`] option to a file with name `<filename>`. Written
1377 reads will appear as they did in the input, without any of the trimming
1378 or translation of quality values that may have taken place within
1379 `bowtie`. Paired-end reads will be written to two parallel files with
1380 `_1` and `_2` inserted in the filename, e.g., if `<filename>` is
1381 `max.fq`, the #1 and #2 mates that exceed the [`-m`] limit will be
1382 written to `max_1.fq` and `max_2.fq` respectively. These reads are not
1383 written to the file specified with [`--un`].
1385 </td></tr><tr><td id="bowtie-options-suppress">
1387 [`--suppress`]: #bowtie-options-suppress
1393 Suppress columns of output in the [default output mode]. E.g. if
1394 `--suppress 1,5,6` is specified, the read name, read sequence, and read
1395 quality fields will be omitted. See [Default Bowtie output] for field
1396 descriptions. This option is ignored if the output mode is
1400 <tr><td id="bowtie-options-fullref">
1402 [`--fullref`]: #bowtie-options-fullref
1408 Print the full refernce sequence name, including whitespace, in
1409 alignment output. By default `bowtie` prints everything up to but not
1410 including the first whitespace.
1417 <tr><td id="bowtie-options-snpphred">
1419 [`--snpphred`]: #bowtie-options-snpphred
1425 When decoding colorspace alignments, use `<int>` as the SNP penalty.
1426 This should be set to the user's best guess of the true ratio of SNPs
1427 per base in the subject genome, converted to the [Phred quality] scale.
1428 E.g., if the user expects about 1 SNP every 1,000 positions,
1429 `--snpphred` should be set to 30 (which is also the default). To
1430 specify the fraction directly, use [`--snpfrac`].
1433 <tr><td id="bowtie-options-snpfrac">
1435 [`--snpfrac`]: #bowtie-options-snpfrac
1441 When decoding colorspace alignments, use `<dec>` as the estimated ratio
1442 of SNPs per base. For best decoding results, this should be set to the
1443 user's best guess of the true ratio. `bowtie` internally converts the
1444 ratio to a [Phred quality], and behaves as if that quality had been set
1445 via the [`--snpphred`] option. Default: 0.001.
1448 <tr><td id="bowtie-options-col-cseq">
1450 [`--col-cseq`]: #bowtie-options-col-cseq
1456 If reads are in colorspace and the [default output mode] is active,
1457 `--col-cseq` causes the reads' color sequence to appear in the
1458 read-sequence column (column 5) instead of the decoded nucleotide
1459 sequence. See the [Decoding colorspace alignments] section for details
1460 about decoding. This option is ignored in [`-S`/`--sam`] mode.
1463 <tr><td id="bowtie-options-col-cqual">
1465 [`--col-cqual`]: #bowtie-options-col-cqual
1471 If reads are in colorspace and the [default output mode] is active,
1472 `--col-cqual` causes the reads' original (color) quality sequence to
1473 appear in the quality column (column 6) instead of the decoded
1474 qualities. See the [Colorspace alignment] section for details about
1475 decoding. This option is ignored in [`-S`/`--sam`] mode.
1478 <tr><td id="bowtie-options-col-keepends">
1480 [`--col-keepends`]: #bowtie-options-col-keepends
1486 When decoding colorpsace alignments, `bowtie` trims off a nucleotide
1487 and quality from the left and right edges of the alignment. This is
1488 because those nucleotides are supported by only one color, in contrast
1489 to the middle nucleotides which are supported by two. Specify
1490 `--col-keepends` to keep the extreme-end nucleotides and qualities.
1499 <tr><td id="bowtie-options-S">
1501 [`-S`/`--sam`]: #bowtie-options-S
1502 [`-S`]: #bowtie-options-S
1508 Print alignments in [SAM] format. See the [SAM output] section of the
1509 manual for details. To suppress all SAM headers, use [`--sam-nohead`]
1510 in addition to `-S/--sam`. To suppress just the `@SQ` headers (e.g. if
1511 the alignment is against a very large number of reference sequences),
1512 use [`--sam-nosq`] in addition to `-S/--sam`. `bowtie` does not write
1513 BAM files directly, but SAM output can be converted to BAM on the fly
1514 by piping `bowtie`'s output to `samtools view`. [`-S`/`--sam`] is not
1515 compatible with [`--refout`].
1517 [SAM output]: #sam-bowtie-output
1519 </td></tr><tr><td id="bowtie-options-mapq">
1521 [`--mapq`]: #bowtie-options-mapq
1527 If an alignment is non-repetitive (according to [`-m`], [`--strata`] and
1528 other options) set the `MAPQ` (mapping quality) field to this value.
1529 See the [SAM Spec][SAM] for details about the `MAPQ` field Default: 255.
1531 </td></tr><tr><td id="bowtie-options-sam-nohead">
1533 [`--sam-nohead`]: #bowtie-options-sam-nohead
1539 Suppress header lines (starting with `@`) when output is [`-S`/`--sam`].
1540 This must be specified *in addition to* [`-S`/`--sam`]. `--sam-nohead`
1541 is ignored unless [`-S`/`--sam`] is also specified.
1543 </td></tr><tr><td id="bowtie-options-sam-nosq">
1545 [`--sam-nosq`]: #bowtie-options-sam-nosq
1551 Suppress `@SQ` header lines when output is [`-S`/`--sam`]. This must be
1552 specified *in addition to* [`-S`/`--sam`]. `--sam-nosq` is ignored
1553 unless [`-S`/`--sam`] is also specified.
1555 </td></tr><tr><td id="bowtie-options-sam-RG">
1557 [`--sam-RG`]: #bowtie-options-sam-RG
1563 Add `<text>` (usually of the form `TAG:VAL`, e.g. `ID:IL7LANE2`) as a
1564 field on the `@RG` header line. Specify `--sam-RG` multiple times to
1565 set multiple fields. See the [SAM Spec][SAM] for details about what fields
1566 are legal. Note that, if any `@RG` fields are set using this option,
1567 the `ID` and `SM` fields must both be among them to make the `@RG` line
1568 legal according to the [SAM Spec][SAM]. `--sam-RG` is ignored unless
1569 [`-S`/`--sam`] is also specified.
1577 <td id="bowtie-options-o">
1579 [`-o`/`--offrate`]: #bowtie-options-o
1580 [`-o`]: #bowtie-options-o
1581 [`--offrate`]: #bowtie-options-o
1587 Override the offrate of the index with `<int>`. If `<int>` is greater
1588 than the offrate used to build the index, then some row markings are
1589 discarded when the index is read into memory. This reduces the memory
1590 footprint of the aligner but requires more time to calculate text
1591 offsets. `<int>` must be greater than the value used to build the
1594 </td></tr><tr><td id="bowtie-options-p">
1596 [`-p`/`--threads`]: #bowtie-options-p
1597 [`-p`]: #bowtie-options-p
1603 Launch `<int>` parallel search threads (default: 1). Threads will run
1604 on separate processors/cores and synchronize when parsing reads and
1605 outputting alignments. Searching for alignments is highly parallel,
1606 and speedup is fairly close to linear. This option is only available
1607 if `bowtie` is linked with the `pthreads` library (i.e. if
1608 `BOWTIE_PTHREADS=0` is not specified at build time).
1610 </td></tr><tr><td id="bowtie-options-mm">
1612 [`--mm`]: #bowtie-options-mm
1618 Use memory-mapped I/O to load the index, rather than normal C file I/O.
1619 Memory-mapping the index allows many concurrent `bowtie` processes on
1620 the same computer to share the same memory image of the index (i.e. you
1621 pay the memory overhead just once). This facilitates memory-efficient
1622 parallelization of `bowtie` in situations where using [`-p`] is not
1625 </td></tr><tr><td id="bowtie-options-shmem">
1627 [`--shmem`]: #bowtie-options-shmem
1633 Use shared memory to load the index, rather than normal C file I/O.
1634 Using shared memory allows many concurrent bowtie processes on the same
1635 computer to share the same memory image of the index (i.e. you pay the
1636 memory overhead just once). This facilitates memory-efficient
1637 parallelization of `bowtie` in situations where using [`-p`] is not
1638 desirable. Unlike [`--mm`], `--shmem` installs the index into shared
1639 memory permanently, or until the user deletes the shared memory chunks
1640 manually. See your operating system documentation for details on how
1641 to manually list and remove shared memory chunks (on Linux and Mac OS
1642 X, these commands are `ipcs` and `ipcrm`). You may also need to
1643 increase your OS's maximum shared-memory chunk size to accomodate
1644 larger indexes; see your OS documentation.
1650 <table><tr><td id="bowtie-options-seed">
1652 [`--seed`]: #bowtie-options-seed
1658 Use `<int>` as the seed for pseudo-random number generator.
1660 </td></tr><tr><td id="bowtie-options-verbose">
1662 [`--verbose`]: #bowtie-options-verbose
1668 Print verbose output (for debugging).
1670 </td></tr><tr><td id="bowtie-options-version">
1672 [`--version`]: #bowtie-options-version
1678 Print version information and quit.
1680 </td></tr><tr><td id="bowtie-options-h">
1686 Print usage information and quit.
1690 Default `bowtie` output
1691 -----------------------
1693 [Default Bowtie output]: #default-bowtie-output
1695 `bowtie` outputs one alignment per line. Each line is a collection of
1696 8 fields separated by tabs; from left to right, the fields are:
1698 1. Name of read that aligned
1700 2. Reference strand aligned to, `+` for forward strand, `-` for
1703 3. Name of reference sequence where alignment occurs, or numeric ID if
1704 no name was provided
1706 4. 0-based offset into the forward reference strand where leftmost
1707 character of the alignment occurs
1709 5. Read sequence (reverse-complemented if orientation is `-`).
1711 If the read was in colorspace, then the sequence shown in this
1712 column is the sequence of *decoded nucleotides*, not the original
1713 colors. See the [Colorspace alignment] section for details about
1714 decoding. To display colors instead, use the [`--col-cseq`] option.
1716 6. ASCII-encoded read qualities (reversed if orientation is `-`). The
1717 encoded quality values are on the Phred scale and the encoding is
1718 ASCII-offset by 33 (ASCII char `!`).
1720 If the read was in colorspace, then the qualities shown in this
1721 column are the *decoded qualities*, not the original qualities.
1722 See the [Colorspace alignment] section for details about decoding.
1723 To display colors instead, use the [`--col-cqual`] option.
1725 7. If [`-M`] was specified and the prescribed ceiling was exceeded for
1726 this read, this column contains the value of the ceiling,
1727 indicating that at least that many valid alignments were found in
1728 addition to the one reported.
1730 Otherwise, this column contains the number of other instances where
1731 the same sequence aligned against the same reference characters as
1732 were aligned against in the reported alignment. This is *not* the
1733 number of other places the read aligns with the same number of
1734 mismatches. The number in this column is generally not a good
1735 proxy for that number (e.g., the number in this column may be '0'
1736 while the number of other alignments with the same number of
1737 mismatches might be large).
1739 8. Comma-separated list of mismatch descriptors. If there are no
1740 mismatches in the alignment, this field is empty. A single
1741 descriptor has the format offset:reference-base>read-base. The
1742 offset is expressed as a 0-based offset from the high-quality (5')
1748 Following is a brief description of the [SAM] format as output by
1749 `bowtie` when the [`-S`/`--sam`] option is specified. For more
1750 details, see the [SAM format specification][SAM].
1752 When [`-S`/`--sam`] is specified, `bowtie` prints a SAM header with
1753 `@HD`, `@SQ` and `@PG` lines. When one or more [`--sam-RG`] arguments
1754 are specified, `bowtie` will also print an `@RG` line that includes all
1755 user-specified [`--sam-RG`] tokens separated by tabs.
1757 Each subsequnt line corresponds to a read or an alignment. Each line
1758 is a collection of at least 12 fields separated by tabs; from left to
1759 right, the fields are:
1761 1. Name of read that aligned
1763 2. Sum of all applicable flags. Flags relevant to Bowtie are:
1771 The read is one of a pair
1779 The alignment is one end of a proper paired-end alignment
1787 The read has no reported alignments
1795 The read is one of a pair and has no reported alignments
1803 The alignment is to the reverse reference strand
1811 The other mate in the paired-end alignment is aligned to the
1812 reverse reference strand
1820 The read is the first (#1) mate in a pair
1828 The read is the second (#2) mate in a pair
1832 Thus, an unpaired read that aligns to the reverse reference strand
1833 will have flag 16. A paired-end read that aligns and is the first
1834 mate in the pair will have flag 83 (= 64 + 16 + 2 + 1).
1836 3. Name of reference sequence where alignment occurs, or ordinal ID
1837 if no name was provided
1839 4. 1-based offset into the forward reference strand where leftmost
1840 character of the alignment occurs
1844 6. CIGAR string representation of alignment
1846 7. Name of reference sequence where mate's alignment occurs. Set to
1847 `=` if the mate's reference sequence is the same as this
1848 alignment's, or `*` if there is no mate.
1850 8. 1-based offset into the forward reference strand where leftmost
1851 character of the mate's alignment occurs. Offset is 0 if there is
1854 9. Inferred insert size. Size is negative if the mate's alignment
1855 occurs upstream of this alignment. Size is 0 if there is no mate.
1857 10. Read sequence (reverse-complemented if aligned to the reverse
1860 11. ASCII-encoded read qualities (reverse-complemented if the read
1861 aligned to the reverse strand). The encoded quality values are on
1862 the [Phred quality] scale and the encoding is ASCII-offset by 33
1863 (ASCII char `!`), similarly to a [FASTQ] file.
1865 12. Optional fields. Fields are tab-separated. For descriptions of
1866 all possible optional fields, see the SAM format specification.
1867 `bowtie` outputs some of these optional fields for each alignment,
1868 depending on the type of the alignment:
1876 Aligned read has an edit distance of `<N>`.
1884 Aligned read has an edit distance of `<N>` in colorspace. This
1885 field is present in addition to the `NM` field in [`-C`/`--color`]
1886 mode, but is omitted otherwise.
1894 For aligned reads, `<S>` is a string representation of the
1895 mismatched reference bases in the alignment. See [SAM] format
1896 specification for details. For colorspace alignments, `<S>`
1897 describes the decoded *nucleotide* alignment, not the colorspace
1906 Aligned read belongs to stratum `<N>`. See [Strata] for definition.
1916 For a read with no reported alignments, `<N>` is 0 if the read had
1917 no alignments. If [`-m`] was specified and the read's alignments
1918 were supressed because the [`-m`] ceiling was exceeded, `<N>` equals
1919 the [`-m`] ceiling + 1, to indicate that there were at least that
1920 many valid alignments (but all were suppressed). In [`-M`] mode, if
1921 the alignment was randomly selected because the [`-M`] ceiling was
1922 exceeded, `<N>` equals the [`-M`] ceiling + 1, to indicate that there
1923 were at least that many valid alignments (of which one was reported
1928 [SAM format specification]: http://samtools.sf.net/SAM1.pdf
1929 [FASTQ]: http://en.wikipedia.org/wiki/FASTQ_format
1930 [`-S`/`--sam`]: #bowtie-options-S
1931 [`-m`]: #bowtie-options-m
1933 The `bowtie-build` indexer
1934 ==========================
1936 `bowtie-build` builds a Bowtie index from a set of DNA sequences.
1937 `bowtie-build` outputs a set of 6 files with suffixes
1938 `.1.ebwt`, `.2.ebwt`, `.3.ebwt`, `.4.ebwt`, `.rev.1.ebwt`, and
1939 `.rev.2.ebwt`. These files together constitute the index: they are all
1940 that is needed to align reads to that reference. The original sequence
1941 files are no longer used by Bowtie once the index is built.
1943 Use of Karkkainen's [blockwise algorithm] allows `bowtie-build` to
1944 trade off between running time and memory usage. `bowtie-build` has
1945 three options governing how it makes this trade: [`-p`/`--packed`],
1946 [`--bmax`]/[`--bmaxdivn`], and [`--dcv`]. By default, `bowtie-build` will
1947 automatically search for the settings that yield the best
1948 running time without exhausting memory. This behavior can be disabled
1949 using the [`-a`/`--noauto`] option.
1951 The indexer provides options pertaining to the "shape" of the index,
1952 e.g. [`--offrate`](#bowtie-build-options-o) governs the fraction of [Burrows-Wheeler] rows that
1953 are "marked" (i.e., the density of the suffix-array sample; see the
1954 original [FM Index] paper for details). All of these options are
1955 potentially profitable trade-offs depending on the application. They
1956 have been set to defaults that are reasonable for most cases according
1957 to our experiments. See [Performance Tuning] for details.
1959 Because `bowtie-build` uses 32-bit pointers internally, it can handle
1960 up to a theoretical maximum of 2^32-1 (somewhat more than 4 billion)
1961 characters in an index, though, with other constraints, the actual
1962 ceiling is somewhat less than that. If your reference exceeds 2^32-1
1963 characters, `bowtie-build` will print an error message and abort. To
1964 resolve this, divide your reference sequences into smaller batches
1965 and/or chunks and build a separate index for each.
1967 If your computer has more than 3-4 GB of memory and you would like to
1968 exploit that fact to make index building faster, use a 64-bit version
1969 of the `bowtie-build` binary. The 32-bit version of the binary is
1970 restricted to using less than 4 GB of memory. If a 64-bit pre-built
1971 binary does not yet exist for your platform on the sourceforge download
1972 site, you will need to build one from source.
1974 The Bowtie index is based on the [FM Index] of Ferragina and Manzini,
1975 which in turn is based on the [Burrows-Wheeler] transform. The
1976 algorithm used to build the index is based on the [blockwise algorithm]
1979 [Blockwise algorithm]: http://portal.acm.org/citation.cfm?id=1314852
1980 [FM Index]: http://portal.acm.org/citation.cfm?id=796543
1981 [Burrows-Wheeler]: http://en.wikipedia.org/wiki/Burrows-Wheeler_transform
1988 bowtie-build [options]* <reference_in> <ebwt_base>
1998 A comma-separated list of FASTA files containing the reference
1999 sequences to be aligned to, or, if [`-c`](#bowtie-build-options-c) is specified, the sequences
2000 themselves. E.g., `<reference_in>` might be
2001 `chr1.fa,chr2.fa,chrX.fa,chrY.fa`, or, if [`-c`](#bowtie-build-options-c) is specified, this might
2002 be `GGTCATCCT,ACGGGTCGT,CCGTTCTATGCGGCTTA`.
2010 The basename of the index files to write. By default, `bowtie-build`
2011 writes files named `NAME.1.ebwt`, `NAME.2.ebwt`, `NAME.3.ebwt`,
2012 `NAME.4.ebwt`, `NAME.rev.1.ebwt`, and `NAME.rev.2.ebwt`, where `NAME`
2025 The reference input files (specified as `<reference_in>`) are FASTA
2026 files (usually having extension `.fa`, `.mfa`, `.fna` or similar).
2028 </td></tr><tr><td id="bowtie-build-options-c">
2034 The reference sequences are given on the command line. I.e.
2035 `<reference_in>` is a comma-separated list of sequences rather than a
2036 list of FASTA files.
2038 </td></tr><tr><td id="bowtie-build-options-C">
2044 Build a colorspace index, to be queried using `bowtie` [`-C`].
2046 </td></tr><tr><td id="bowtie-build-options-a">
2048 [`-a`/`--noauto`]: #bowtie-build-options-a
2054 Disable the default behavior whereby `bowtie-build` automatically
2055 selects values for the [`--bmax`], [`--dcv`] and [`--packed`] parameters
2056 according to available memory. Instead, user may specify values for
2057 those parameters. If memory is exhausted during indexing, an error
2058 message will be printed; it is up to the user to try new parameters.
2060 </td></tr><tr><td id="bowtie-build-options-p">
2062 [`--packed`]: #bowtie-build-options-p
2063 [`-p`/`--packed`]: #bowtie-build-options-p
2069 Use a packed (2-bits-per-nucleotide) representation for DNA strings.
2070 This saves memory but makes indexing 2-3 times slower. Default: off.
2071 This is configured automatically by default; use [`-a`/`--noauto`] to
2074 </td></tr><tr><td id="bowtie-build-options-bmax">
2076 [`--bmax`]: #bowtie-build-options-bmax
2082 The maximum number of suffixes allowed in a block. Allowing more
2083 suffixes per block makes indexing faster, but increases peak memory
2084 usage. Setting this option overrides any previous setting for
2085 [`--bmax`], or [`--bmaxdivn`]. Default (in terms of the [`--bmaxdivn`]
2086 parameter) is [`--bmaxdivn`] 4. This is configured automatically by
2087 default; use [`-a`/`--noauto`] to configure manually.
2089 </td></tr><tr><td id="bowtie-build-options-bmaxdivn">
2091 [`--bmaxdivn`]: #bowtie-build-options-bmaxdivn
2097 The maximum number of suffixes allowed in a block, expressed as a
2098 fraction of the length of the reference. Setting this option overrides
2099 any previous setting for [`--bmax`], or [`--bmaxdivn`]. Default:
2100 [`--bmaxdivn`] 4. This is configured automatically by default; use
2101 [`-a`/`--noauto`] to configure manually.
2103 </td></tr><tr><td id="bowtie-build-options-dcv">
2105 [`--dcv`]: #bowtie-build-options-dcv
2111 Use `<int>` as the period for the difference-cover sample. A larger
2112 period yields less memory overhead, but may make suffix sorting slower,
2113 especially if repeats are present. Must be a power of 2 no greater
2114 than 4096. Default: 1024. This is configured automatically by
2115 default; use [`-a`/`--noauto`] to configure manually.
2117 </td></tr><tr><td id="bowtie-build-options-nodc">
2119 [`--nodc`]: #bowtie-build-options-nodc
2125 Disable use of the difference-cover sample. Suffix sorting becomes
2126 quadratic-time in the worst case (where the worst case is an extremely
2127 repetitive reference). Default: off.
2135 Do not build the `NAME.3.ebwt` and `NAME.4.ebwt` portions of the index,
2136 which contain a bitpacked version of the reference sequences and are
2137 used for paired-end alignment.
2145 Build *only* the `NAME.3.ebwt` and `NAME.4.ebwt` portions of the index,
2146 which contain a bitpacked version of the reference sequences and are
2147 used for paired-end alignment.
2149 </td></tr><tr><td id="bowtie-build-options-o">
2155 To map alignments back to positions on the reference sequences, it's
2156 necessary to annotate ("mark") some or all of the [Burrows-Wheeler]
2157 rows with their corresponding location on the genome. [`-o`/`--offrate`](#bowtie-build-options-o)
2158 governs how many rows get marked: the indexer will mark every 2^`<int>`
2159 rows. Marking more rows makes reference-position lookups faster, but
2160 requires more memory to hold the annotations at runtime. The default
2161 is 5 (every 32nd row is marked; for human genome, annotations occupy
2162 about 340 megabytes).
2166 -t/--ftabchars <int>
2170 The ftab is the lookup table used to calculate an initial
2171 [Burrows-Wheeler] range with respect to the first `<int>` characters
2172 of the query. A larger `<int>` yields a larger lookup table but faster
2173 query times. The ftab has size 4^(`<int>`+1) bytes. The default
2174 setting is 10 (ftab is 4MB).
2176 </td></tr><tr><td id="bowtie-build-options-ntoa">
2182 Convert Ns in the reference sequence to As before building the index.
2183 By default, Ns are simply excluded from the index and `bowtie` will not
2184 report alignments that overlap them.
2186 </td></tr><tr><td id="bowtie-build-options-big-little">
2192 Endianness to use when serializing integers to the index file.
2193 Default: little-endian (recommended for Intel- and AMD-based
2196 </td></tr><tr><td id="bowtie-build-options-seed">
2202 Use `<int>` as the seed for pseudo-random number generator.
2210 Index only the first `<int>` bases of the reference sequences
2211 (cumulative across sequences) and ignore the rest.
2219 `bowtie-build` is verbose by default. With this option `bowtie-build`
2220 will print only error messages.
2228 Print usage information and quit.
2236 Print version information and quit.
2240 The `bowtie-inspect` index inspector
2241 ====================================
2243 `bowtie-inspect` extracts information from a Bowtie index about what
2244 kind of index it is and what reference sequences were used to build it.
2245 When run without any options, the tool will output a FASTA file
2246 containing the sequences of the original references (with all
2247 non-`A`/`C`/`G`/`T` characters converted to `N`s). It can also be used
2248 to extract just the reference sequence names using the [`-n`/`--names`]
2249 option or a more verbose summary using the [`-s`/`--summary`] option.
2256 bowtie-inspect [options]* <ebwt_base>
2266 The basename of the index to be inspected. The basename is name of any
2267 of the index files but with the `.X.ebwt` or `.rev.X.ebwt` suffix
2268 omitted. `bowtie-inspect` first looks in the current directory for the
2269 index files, then looks in the `indexes` subdirectory under the
2270 directory where the currently-running `bowtie` executable is located,
2271 then looks in the directory specified in the `BOWTIE_INDEXES`
2272 environment variable.
2284 When printing FASTA output, output a newline character every `<int>`
2285 bases (default: 60).
2287 </td></tr><tr><td id="bowtie-build-options-n">
2289 [`-n`/`--names`]: #bowtie-build-options-n
2295 Print reference sequence names, one per line, and quit.
2297 </td></tr><tr><td id="bowtie-inspect-options-s">
2299 [`-s`/`--summary`]: #bowtie-inspect-options-s
2305 Print a summary that includes information about index settings, as well
2306 as the names and lengths of the input sequences. The summary has this
2310 SA-Sample 1 in <sample>
2312 Sequence-1 <name> <len>
2313 Sequence-2 <name> <len>
2315 Sequence-N <name> <len>
2317 Fields are separated by tabs.
2319 </td></tr><tr><td id="bowtie-inspect-options-e">
2321 [`-e`/`--ebwt-ref`]: #bowtie-inspect-options-e
2327 By default, when `bowtie-inspect` is run without [`-s`] or [`-n`], it
2328 recreates the reference nucleotide sequences using the bit-encoded
2329 reference nucleotides kept in the `.3.ebwt` and `.4.ebwt` index files.
2330 When `-e/--ebwt-ref` is specified, `bowtie-inspect` recreates the
2331 reference sequences from the Burrows-Wheeler-transformed reference
2332 sequence in the `.1.ebwt` file instead. The reference recreation
2333 process is much slower when `-e/--ebwt-ref` is specified. Also, when
2334 `-e/--ebwt-ref` is specified and the index is in colorspace, the
2335 reference is printed in colors (A=blue, C=green, G=orange, T=red).
2343 Print verbose output (for debugging).
2351 Print version information and quit.
2359 Print usage information and quit.