MANUAL

   1
   2 What is Bowtie?
   3 ===============
   4
   5 [Bowtie] is an ultrafast, memory-efficient short read aligner geared
   6 toward quickly aligning large sets of short DNA sequences (reads) to
   7 large genomes. It aligns 35-base-pair reads to the human genome at a
   8 rate of 25 million reads per hour on a typical workstation. Bowtie
   9 indexes the genome with a [Burrows-Wheeler] index to keep its memory
  10 footprint small: for the human genome, the index is typically about
  11 2.2 GB (for unpaired alignment) or 2.9 GB (for paired-end or colorspace
  12 alignment).  Multiple processors can be used simultaneously to achieve
  13 greater alignment speed.  Bowtie can also output alignments in the
  14 standard [SAM] format, allowing Bowtie to interoperate with other tools
  15 supporting SAM, including the [SAMtools] consensus, SNP, and indel
  16 callers.  Bowtie runs on the command line under Windows, Mac OS X,
  17 Linux, and Solaris.
  18
  19 [Bowtie] also forms the basis for other tools, including [TopHat]: a
  20 fast splice junction mapper for RNA-seq reads, [Cufflinks]: a tool for
  21 transcriptome assembly and isoform quantitiation from RNA-seq reads,
  22 [Crossbow]: a cloud-computing software tool for large-scale
  23 resequencing data,and [Myrna]: a cloud computing tool for calculating
  24 differential gene expression in large RNA-seq datasets.
  25
  26 If you use [Bowtie] for your published research, please cite the
  27 [Bowtie paper].
  28
  29 [Bowtie]:          http://bowtie-bio.sf.net
  30 [Burrows-Wheeler]: http://en.wikipedia.org/wiki/Burrows-Wheeler_transform
  31 [SAM]:             http://samtools.sourceforge.net/SAM1.pdf
  32 [SAMtools]:        http://samtools.sourceforge.net/
  33 [TopHat]:          http://tophat.cbcb.umd.edu/
  34 [Cufflinks]:       http://cufflinks.cbcb.umd.edu/
  35 [Crossbow]:        http://bowtie-bio.sf.net/crossbow
  36 [Myrna]:           http://bowtie-bio.sf.net/myrna
  37 [Bowtie paper]:    http://genomebiology.com/2009/10/3/R25
  38
  39 What isn't Bowtie?
  40 ==================
  41
  42 Bowtie is not a general-purpose alignment tool like [MUMmer], [BLAST]
  43 or [Vmatch].  Bowtie works best when aligning short reads to large
  44 genomes, though it supports arbitrarily small reference sequences (e.g.
  45 amplicons) and reads as long as 1024 bases.  Bowtie is designed to be
  46 extremely fast for sets of short reads where (a) many of the reads have
  47 at least one good, valid alignment, (b) many of the reads are
  48 relatively high-quality, and (c) the number of alignments reported per
  49 read is small (close to 1).
  50
  51 Bowtie does not yet report gapped alignments; this is future work.
  52
  53 [MUMmer]: http://mummer.sourceforge.net/
  54 [BLAST]:  http://blast.ncbi.nlm.nih.gov/Blast.cgi
  55 [Vmatch]: http://www.vmatch.de/
  56
  57 Obtaining Bowtie
  58 ================
  59
  60 You may download either Bowtie sources or binaries for your platform
  61 from the [Download] section of the Sourceforge project site.  Binaries
  62 are currently available for Intel architectures (`i386` and `x86_64`)
  63 running Linux, Windows, and Mac OS X.
  64
  65 Building from source
  66 --------------------
  67
  68 Building Bowtie from source requires a GNU-like environment that
  69 includes GCC, GNU Make and other basics.  It should be possible to
  70 build Bowtie on a vanilla Linux or Mac installation.  Bowtie can also
  71 be built on Windows using [Cygwin] or [MinGW].  We recommend
  72 [TDM's MinGW Build].  If using [MinGW], you must also have [MSYS]
  73 installed.
  74
  75 To build Bowtie, extract the sources, change to the extracted
  76 directory, and run GNU `make` (usually with the command `make`, but
  77 sometimes with `gmake`) with no arguments.  If building with [MinGW],
  78 run `make` from the [MSYS] command line.
  79
  80 To support the `-p` (multithreading) option, Bowtie needs the
  81 `pthreads` library.  To compile Bowtie without `pthreads` (which
  82 disables `-p`), use `make BOWTIE_PTHREADS=0`.
  83
  84 [Cygwin]:   http://www.cygwin.com/
  85 [MinGW]:    http://www.mingw.org/
  86 [TDM's MinGW Build]: http://www.tdragon.net/recentgcc/
  87 [MSYS]:     http://www.mingw.org/wiki/msys
  88 [Download]: https://sourceforge.net/projects/bowtie-bio/files/bowtie/
  89
  90 The `bowtie` aligner
  91 ====================
  92
  93 `bowtie` takes an index and a set of reads as input and outputs a list
  94 of alignments.  Alignments are selected according to a combination of
  95 the `-v`/`-n`/`-e`/`-l` options (plus the `-I`/`-X`/`--fr`/`--rf`/
  96 `--ff` options for paired-end alignment), which define which alignments
  97 are legal, and the `-k`/`-a`/`-m`/`-M`/`--best`/`--strata` options
  98 which define which and how many legal alignments should be reported.
  99
 100 By default, Bowtie enforces an alignment policy similar to [Maq]'s
 101 default quality-aware policy (`-n` 2 `-l` 28 `-e` 70).  See [the -n
 102 alignment mode] section of the manual for details about this mode.  But
 103 Bowtie can also enforce a simpler end-to-end k-difference policy (e.g.
 104 with `-v` 2).  See [the -v alignment mode] section of the manual for
 105 details about that mode.  [The -n alignment mode] and [the -v alignment
 106 mode] are mutually exclusive.
 107
 108 Bowtie works best when aligning short reads to large genomes (e.g.
 109 human or mouse), though it supports arbitrarily small reference
 110 sequences and reads as long as 1024 bases.  Bowtie is designed to be
 111 very fast for sets of short reads where a) many reads have at least one
 112 good, valid alignment, b) many reads are relatively high-quality, c)
 113 the number of alignments reported per read is small (close to 1).
 114 These criteria are generally satisfied in the context of modern
 115 short-read analyses such as RNA-seq, ChIP-seq, other types of -seq, and
 116 mammalian resequencing.  You may observe longer running times in other
 117 research contexts.
 118
 119 If `bowtie` is too slow for your application, try some of the
 120 performance-tuning hints described in the [Performance Tuning] section
 121 below.
 122
 123 Alignments involving one or more ambiguous reference characters (`N`,
 124 `-`, `R`, `Y`, etc.) are considered invalid by Bowtie.  This is true
 125 only for ambiguous characters in the reference; alignments involving
 126 ambiguous characters in the read are legal, subject to the alignment
 127 policy.  Ambiguous characters in the read mismatch all other
 128 characters.  Alignments that "fall off" the reference sequence are not
 129 considered valid.
 130
 131 The process by which `bowtie` chooses an alignment to report is
 132 randomized in order to avoid "mapping bias" - the phenomenon whereby
 133 an aligner systematically fails to report a particular class of good
 134 alignments, causing spurious "holes" in the comparative assembly.
 135 Whenever `bowtie` reports a subset of the valid alignments that exist,
 136 it makes an effort to sample them randomly.  This randomness flows
 137 from a simple seeded pseudo-random number generator and is
 138 deterministic in the sense that Bowtie will always produce the same
 139 results for the same read when run with the same initial "seed" value
 140 (see `--seed` option).
 141
 142 In the default mode, `bowtie` can exhibit strand bias.  Strand bias
 143 occurs when input reference and reads are such that (a) some reads
 144 align equally well to sites on the forward and reverse strands of the
 145 reference, and (b) the number of such sites on one strand is different
 146 from the number on the other strand.  When this happens for a given
 147 read, `bowtie` effectively chooses one strand or the other with 50%
 148 probability, then reports a randomly-selected alignment for that read
 149 from among the sites on the selected strand.  This tends to overassign
 150 alignments to the sites on the strand with fewer sites and underassign
 151 to sites on the strand with more sites.  The effect is mitigated,
 152 though it may not be eliminated, when reads are longer or when
 153 paired-end reads are used.  Running Bowtie in `--best` mode
 154 eliminates strand bias by forcing Bowtie to select one strand or the
 155 other with a probability that is proportional to the number of best
 156 sites on the strand.
 157
 158 Gapped alignments are not currently supported, but support is planned
 159 for a future release.
 160
 161 [Maq]: http://maq.sf.net
 162
 163 The `-n` alignment mode
 164 -----------------------
 165
 166 When the `-n` option is specified (which is the default), `bowtie`
 167 determines which alignments are valid according to the following
 168 policy, which is similar to [Maq]'s default policy.
 169
 170   1. Alignments may have no more than `N` mismatches (where `N` is a
 171      number 0-3, set with `-n`) in the first `L` bases (where `L` is a
 172      number 5 or greater, set with `-l`) on the high-quality (left) end
 173      of the read.  The first `L` bases are called the "seed".
 174
 175   2. The sum of the [Phred quality] values at *all* mismatched positions
 176      (not just in the seed) may not exceed `E` (set with `-e`).  Where
 177      qualities are unavailable (e.g. if the reads are from a FASTA
 178      file), the [Phred quality] defaults to 40.
 179
 180 The `-n` option is mutually exclusive with the `-v` option.
 181
 182 If there are many possible alignments satisfying these criteria, Bowtie
 183 gives preference to alignments with fewer mismatches and where the sum
 184 from criterion 2 is smaller.  When the `--best` option is specified,
 185 Bowtie guarantees the reported alignment(s) are "best" in terms of
 186 these criteria (criterion 1 has priority), and that the alignments are
 187 reported in best-to-worst order.  Bowtie is somewhat slower when
 188 `--best` is specified.
 189
 190 Note that [Maq] internally rounds base qualities to the nearest 10 and
 191 rounds qualities greater than 30 to 30.  To maintain compatibility,
 192 Bowtie does the same.  Rounding can be suppressed with the
 193 `--nomaqround` option.
 194
 195 Bowtie is not fully sensitive in `-n` 2 and `-n` 3 modes by default.
 196 In these modes Bowtie imposes a "backtracking limit" to limit effort
 197 spent trying to find valid alignments for low-quality reads unlikely to
 198 have any.  This may cause bowtie to miss some legal 2- and 3-mismatch
 199 alignments.  The limit is set to a reasonable default (125 without
 200 `--best`, 800 with `--best`), but the user may decrease or increase the
 201 limit using the `--maxbts` and/or `-y` options.  `-y` mode is
 202 relatively slow but guarantees full sensitivity.
 203
 204 [Maq]: http://maq.sf.net
 205 [Phred quality]: http://en.wikipedia.org/wiki/FASTQ_format#Variations
 206
 207 The `-v` alignment mode
 208 -----------------------
 209
 210 In `-v` mode, alignments may have no more than `V` mismatches, where
 211 `V` may be a number from 0 through 3 set using the `-v` option.
 212 Quality values are ignored.  The `-v` option is mutually exclusive with
 213 the `-n` option.
 214
 215 If there are many legal alignments, Bowtie gives preference to
 216 alignments with fewer mismatches.  When the `--best` option is
 217 specified, Bowtie guarantees the reported alignment(s) are "best" in
 218 terms of the number of mismatches, and that the alignments are reported
 219 in best-to-worst order.  Bowtie is somewhat slower when `--best` is
 220 specified.
 221
 222 Strata
 223 ------
 224
 225 In [the -n alignment mode], an alignment's "stratum" is defined as the
 226 number of mismatches in the "seed" region, i.e. the leftmost `L` bases,
 227 where `L` is set with the `-l` option.  In [the -v alignment mode], an
 228 alignment's stratum is defined as the total number of mismatches in the
 229 entire alignment.  Some of Bowtie's options (e.g. `--strata` and `-m`
 230 use the notion of "stratum" to limit or expand the scope of reportable
 231 alignments.
 232
 233 Reporting Modes
 234 ---------------
 235
 236 With the `-k`, `-a`, `-m`, `-M`, `--best` and `--strata` options, the
 237 user can flexibily select which alignments are reported.  Below we
 238 demonstrate a few ways in which these options can be combined.  All
 239 examples are using the `e_coli` index packaged with Bowtie.  The
 240 `--suppress` option is used to keep the output concise and some
 241 output is elided for clarity.
 242
 243   Example 1: `-a`
 244
 245     $ ./bowtie -a -v 2 e_coli --suppress 1,5,6,7 -c ATGCATCATGCGCCAT
 246     -   gi|110640213|ref|NC_008253.1|   148810  10:A>G,13:C>G
 247     -   gi|110640213|ref|NC_008253.1|   2852852 8:T>A
 248     -   gi|110640213|ref|NC_008253.1|   4930433 4:G>T,6:C>G
 249     -   gi|110640213|ref|NC_008253.1|   905664  6:A>G,7:G>T
 250     +   gi|110640213|ref|NC_008253.1|   1093035 2:T>G,15:A>T
 251
 252 Specifying `-a` instructs bowtie to report *all* valid alignments,
 253 subject to the alignment policy: `-v` 2.  In this case, bowtie finds
 254 5 inexact hits in the E. coli genome; 1 hit (the 2nd one listed)
 255 has 1 mismatch, and the other 4 hits have 2 mismatches.  Four are on
 256 the reverse reference strand and one is on the forward strand.  Note
 257 that they are not listed in best-to-worst order.
 258
 259   Example 2: `-k 3`
 260
 261     $ ./bowtie -k 3 -v 2 e_coli --suppress 1,5,6,7 -c ATGCATCATGCGCCAT
 262     -   gi|110640213|ref|NC_008253.1|   148810  10:A>G,13:C>G
 263     -   gi|110640213|ref|NC_008253.1|   2852852 8:T>A
 264     -   gi|110640213|ref|NC_008253.1|   4930433 4:G>T,6:C>G
 265
 266 Specifying `-k` 3 instructs bowtie to report up to 3 valid
 267 alignments.  In this case, a total of 5 valid alignments exist (see
 268 [Example 1]); `bowtie` reports 3 out of those 5.  `-k` can be set to
 269 any integer greater than 0.
 270
 271   Example 3: `-k 6`
 272
 273     $ ./bowtie -k 6 -v 2 e_coli --suppress 1,5,6,7 -c ATGCATCATGCGCCAT
 274     -   gi|110640213|ref|NC_008253.1|   148810  10:A>G,13:C>G
 275     -   gi|110640213|ref|NC_008253.1|   2852852 8:T>A
 276     -   gi|110640213|ref|NC_008253.1|   4930433 4:G>T,6:C>G
 277     -   gi|110640213|ref|NC_008253.1|   905664  6:A>G,7:G>T
 278     +   gi|110640213|ref|NC_008253.1|   1093035 2:T>G,15:A>T
 279
 280 Specifying `-k` 6 instructs bowtie to report up to 6 valid
 281 alignments.  In this case, a total of 5 valid alignments exist, so
 282 `bowtie` reports all 5.
 283
 284   Example 4: default (`-k 1`)
 285
 286     $ ./bowtie -v 2 e_coli --suppress 1,5,6,7 -c ATGCATCATGCGCCAT
 287     -   gi|110640213|ref|NC_008253.1|   148810  10:A>G,13:C>G
 288
 289 Leaving the reporting options at their defaults causes `bowtie` to
 290 report the first valid alignment it encounters.  Because `--best` was
 291 not specified, we are not guaranteed that bowtie will report the best
 292 alignment, and in this case it does not (the 1-mismatch alignment from
 293 the previous example would have been better).  The default reporting
 294 mode is equivalent to `-k` 1.
 295
 296   Example 5: `-a --best`
 297
 298     $ ./bowtie -a --best -v 2 e_coli --suppress 1,5,6,7 -c ATGCATCATGCGCCAT
 299     -   gi|110640213|ref|NC_008253.1|   2852852 8:T>A
 300     +   gi|110640213|ref|NC_008253.1|   1093035 2:T>G,15:A>T
 301     -   gi|110640213|ref|NC_008253.1|   905664  6:A>G,7:G>T
 302     -   gi|110640213|ref|NC_008253.1|   148810  10:A>G,13:C>G
 303     -   gi|110640213|ref|NC_008253.1|   4930433 4:G>T,6:C>G
 304
 305 Specifying `-a` `--best` results in the same alignments being printed
 306 as if just `-a` had been specified, but they are guaranteed to be
 307 reported in best-to-worst order.
 308
 309   Example 6: `-a --best --strata`
 310
 311     $ ./bowtie -a --best --strata -v 2 --suppress 1,5,6,7 e_coli -c ATGCATCATGCGCCAT
 312     -   gi|110640213|ref|NC_008253.1|   2852852 8:T>A
 313
 314 Specifying `--strata` in addition to `-a` and `--best` causes
 315 `bowtie` to report only those alignments in the best alignment
 316 "stratum".  The alignments in the best stratum are those having the
 317 least number of mismatches (or mismatches just in the "seed" portion of
 318 the alignment in the case of `-n` mode).  Note that if `--strata`
 319 is specified, `--best` must also be specified.
 320
 321   Example 7: `-a -m 3`
 322
 323     $ ./bowtie -a -m 3 -v 2 e_coli -c ATGCATCATGCGCCAT
 324     No alignments
 325
 326 Specifying `-m` 3 instructs bowtie to refrain from reporting any
 327 alignments for reads having more than 3 reportable alignments.  The
 328 `-m` option is useful when the user would like to guarantee that
 329 reported alignments are "unique", for some definition of unique.
 330
 331 Example 1 showed that the read has 5 reportable alignments when `-a`
 332 and `-v` 2 are specified, so the `-m` 3 limit causes bowtie to
 333 output no alignments.
 334
 335   Example 8: `-a -m 5`
 336
 337     $ ./bowtie -a -m 5 -v 2 e_coli --suppress 1,5,6,7 -c ATGCATCATGCGCCAT
 338     -   gi|110640213|ref|NC_008253.1|   148810  10:A>G,13:C>G
 339     -   gi|110640213|ref|NC_008253.1|   2852852 8:T>A
 340     -   gi|110640213|ref|NC_008253.1|   4930433 4:G>T,6:C>G
 341     -   gi|110640213|ref|NC_008253.1|   905664  6:A>G,7:G>T
 342     +   gi|110640213|ref|NC_008253.1|   1093035 2:T>G,15:A>T
 343
 344 Specifying `-m` 5 instructs bowtie to refrain from reporting any
 345 alignments for reads having more than 5 reportable alignments.  Since
 346 the read has exactly 5 reportable alignments, the `-m` 5 limit allows
 347 `bowtie` to print them as usual.
 348
 349   Example 9: `-a -m 3 --best --strata`
 350
 351     $ ./bowtie -a -m 3 --best --strata -v 2 e_coli --suppress 1,5,6,7 -c ATGCATCATGCGCCAT
 352     -   gi|110640213|ref|NC_008253.1|   2852852 8:T>A
 353
 354 Specifying `-m` 3 instructs bowtie to refrain from reporting any
 355 alignments for reads having more than 3 reportable alignments.  As we
 356 saw in Example 6, the read has only 1 reportable alignment when `-a`,
 357 `--best` and `--strata` are specified, so the `-m` 3 limit allows
 358 `bowtie` to print that alignment as usual.
 359
 360 Intuitively, the `-m` option, when combined with the `--best` and
 361 `--strata` options, guarantees a principled, though weaker form of
 362 "uniqueness."  A stronger form of uniqueness is enforced when `-m` is
 363 specified but `--best` and `--strata` are not.
 364
 365 Paired-end Alignment
 366 --------------------
 367
 368 `bowtie` can align paired-end reads when properly paired read files are
 369 specified using the `-1` and `-2` options (for pairs of raw, FASTA, or
 370 FASTQ read files), or using the `--12` option (for Tab-delimited read
 371 files).  A valid paired-end alignment satisfies these criteria:
 372
 373 1. Both mates have a valid alignment according to the alignment policy
 374    defined by the `-v`/`-n`/`-e`/`-l` options.
 375 2. The relative orientation and position of the mates satisfy the
 376    constraints defined by the `-I`/`-X`/`--fr`/`--rf`/`--ff`
 377    options.
 378
 379 Policies governing which paired-end alignments are reported for a
 380 given read are specified using the `-k`, `-a` and `-m` options as
 381 usual.  The `--strata` and `--best` options do not apply in
 382 paired-end mode.
 383
 384 A paired-end alignment is reported as a pair of mate alignments, both
 385 on a separate line, where the alignment for each mate is formatted the
 386 same as an unpaired (singleton) alignment.  The alignment for the mate
 387 that occurs closest to the beginning of the reference sequence (the
 388 "upstream" mate) is always printed before the alignment for the
 389 downstream mate.  Reads files containing paired-end reads will
 390 sometimes name the reads according to whether they are the #1 or #2
 391 mates by appending a `/1` or `/2` suffix to the read name.  If no such
 392 suffix is present in Bowtie's input, the suffix will be added when
 393 Bowtie prints read names in alignments (except in `-S` "SAM" mode,
 394 where mate information is encoded in the `FLAGS` field instead).
 395
 396 Finding a valid paired-end alignment where both mates align to
 397 repetitive regions of the reference can be very time-consuming.  By
 398 default, Bowtie avoids much of this cost by imposing a limit on the
 399 number of "tries" it makes to match an alignment for one mate with a
 400 nearby alignment for the other.  The default limit is 100.  This causes
 401 `bowtie` to miss some valid paired-end alignments where both mates lie
 402 in repetitive regions, but the user may use the `--pairtries` or
 403 `-y` options to increase Bowtie's sensitivity as desired.
 404
 405 Paired-end alignments where one mate's alignment is entirely contained
 406 within the other's are considered invalid.
 407
 408 When colospace alignment is enabled via `-C`, the default setting for
 409 paired-end orientation is `--ff`.  This is because most SOLiD datasets
 410 have that orientation.  When colorspace alignment is not enabled
 411 (default), the default setting for orientation is `--fr`, since most
 412 Illumina datasets have this orientation.  The default can be overriden
 413 in either case.
 414
 415 Because Bowtie uses an in-memory representation of the original
 416 reference string when finding paired-end alignments, its memory
 417 footprint is larger when aligning paired-end reads.  For example, the
 418 human index has a memory footprint of about 2.2 GB in single-end mode
 419 and 2.9 GB in paired-end mode.  Note that paired-end and unpaired
 420 alignment incur the same memory footprint in colorspace (e.g. human
 421 incurs about 2.9 GB)
 422
 423 Colorspace Alignment
 424 --------------------
 425
 426 As of version 0.12.0, `bowtie` can align colorspace reads against a
 427 colorspace index when `-C` is specified.  Colorspace is the
 428 characteristic output format of Applied Biosystems' SOLiD system.  In a
 429 colorspace read, each character is a color rather than a nucleotide,
 430 where a color encodes a class of dinucleotides.  E.g. the color blue
 431 encodes any of the dinucleotides: AA, CC, GG, TT.  Colorspace has the
 432 advantage of (often) being able to distinguish sequencing errors from
 433 SNPs once the read has been aligned.  See ABI's [Principles of Di-Base
 434 Sequencing] document for details.
 435
 436   Colorspace reads
 437
 438 All input formats (FASTA `-f`, FASTQ `-q`, raw `-r`, tab-delimited
 439 `--12`, command-line `-c`) are compatible with colorspace (`-C`).
 440 When `-C` is specified, read sequences are treated as colors.  Colors
 441 may be encoded either as numbers (`0`=blue, `1`=green, `2`=orange,
 442 `3`=red) or as characters `A/C/G/T` (`A`=blue, `C`=green, `G`=orange,
 443 `T`=red).
 444
 445 Some reads include a primer base as the first character; e.g.:
 446
 447     >1_53_33_F3
 448     T2213120002010301233221223311331
 449     >1_53_70_F3
 450     T2302111203131231130300111123220
 451     ...
 452
 453 Here, `T` is the primer base.  `bowtie` detects and handles primer
 454 bases properly (i.e., the primer base and the adjacent color are both
 455 trimmed away prior to alignment) as long as the rest of the read is
 456 encoded as numbers.
 457
 458 `bowtie` also handles input in the form of parallel `.csfasta` and
 459 `_QV.qual` files.  Use `-f` to specify the `.csfasta` files and `-Q`
 460 (for unpaired reads) or `--Q1`/`--Q2` (for paired-end reads) to
 461 specify the corresponding `_QV.qual` files.  It is not necessary to
 462 first convert to FASTQ, though `bowtie` also handles FASTQ-formatted
 463 colorspace reads (with `-q`, the default).
 464
 465   Building a colorspace index
 466
 467 A colorspace index is built in the same way as a normal index except
 468 that `-C` must be specified when running `bowtie-build`.  If the user
 469 attempts to use `bowtie` without `-C` to align against an index that
 470 was built with `-C` (or vice versa), `bowtie` prints an error message
 471 and quits.
 472
 473   Decoding colorspace alignments
 474
 475 Once a colorspace read is aligned, Bowtie decodes the alignment into
 476 nucleotides and reports the decoded nucleotide sequence.  A principled
 477 decoding scheme is necessary because many different possible decodings
 478 are usually possible.  Finding the true decoding with 100% certainty
 479 requires knowing all variants (e.g. SNPs) in the subject's genome
 480 beforehand, which is usually not possible.  Instead, `bowtie` employs
 481 the approximate decoding scheme described in the [BWA paper].  This
 482 scheme attempts to distinguish variants from sequencing errors
 483 according to their relative likelihood under a model that considers the
 484 quality values of the colors and the (configurable) global likelihood
 485 of a SNP.
 486
 487 Quality values are also "decoded" so that each reported quality value
 488 is a function of the two color qualities overlapping it.  Bowtie again
 489 adopts the scheme described in the [BWA paper], i.e., the decoded
 490 nucleotide quality is either the sum of the overlapping color qualities
 491 (when both overlapping colors correspond to bases that match in the
 492 alignment), the quality of the matching color minus the quality of the
 493 mismatching color, or 0 (when both overlapping colors correspond to
 494 mismatches).
 495
 496 For accurate decoding, `--snpphred`/`--snpfrac` should be set according
 497 to the user's best guess of the SNP frequency in the subject.  The
 498 `--snpphred` parameter sets the SNP penalty directly (on the [Phred
 499 quality] scale), whereas `--snpfrac` allows the user to specify the
 500 fraction of sites expected to be SNPs; the fraction is then converted
 501 to a [Phred quality] internally.  For the purpose of decoding, the SNP
 502 fraction is defined in terms of SNPs per *haplotype* base.  Thus, if
 503 the genome is diploid, heterozygous SNPs have half the weight of
 504 homozygous SNPs
 505
 506 Note that in `-S`/`--sam` mode, the decoded nucleotide sequence is
 507 printed for alignments, but the original color sequence (with `A`=blue,
 508 `C`=green, `G`=orange, `T`=red) is printed for unaligned reads without
 509 any reported alignments.  As always, the `--un`, `--max` and `--al`
 510 parameters print reads exactly as they appeared in the input file.
 511
 512   Paired-end colorspace alignment
 513
 514 Like other platforms, SOLiD supports generation of paired-end reads.
 515 When colorspace alignment is enabled, the default paired-end
 516 orientation setting is `--ff`.  This is because most SOLiD datasets
 517 have that orientation.
 518
 519 Note that SOLiD-generated read files can have "orphaned" mates; i.e.
 520 mates without a correpsondingly-named mate in the other file.  To avoid
 521 problems due to orphaned mates, SOLiD paired-end output should first be
 522 converted to `.csfastq` files with unpaired mates omitted.  This can be
 523 accomplished using, for example, [Galaxy]'s conversion tool (click
 524 "NGS: QC and manipulation", then "SOLiD-to-FASTQ" in the left-hand
 525 sidebar).
 526
 527 [Principles of Di-Base Sequencing]: http://tinyurl.com/ygnb2gn
 528 [BWA paper]: http://bioinformatics.oxfordjournals.org/cgi/content/abstract/25/14/1754
 529
 530 Performance Tuning
 531 ------------------
 532
 533 1.  Use 64-bit bowtie if possible
 534
 535     The 64-bit version of Bowtie is substantially (usually more then
 536     50%) faster than the 32-bit version, owing to its use of 64-bit
 537     arithmetic.  If possible, download the 64-bit binaries for Bowtie
 538     and run on a 64-bit computer.  If you are building Bowtie from
 539     sources, you may need to pass the `-m64` option to `g++` to compile
 540     the 64-bit version; you can do this by including `BITS=64` in the
 541     arguments to the `make` command; e.g.: `make BITS=64 bowtie`.  To
 542     determine whether your version of bowtie is 64-bit or 32-bit, run
 543     `bowtie --version`.
 544
 545 2.  If your computer has multiple processors/cores, use `-p`
 546
 547     The `-p` option causes Bowtie to launch a specified number of
 548     parallel search threads.  Each thread runs on a different
 549     processor/core and all threads find alignments in parallel,
 550     increasing alignment throughput by approximately a multiple of the
 551     number of threads (though in practice, speedup is somewhat worse
 552     than linear).
 553
 554 3.  If reporting many alignments per read, try tweaking
 555     `bowtie-build --offrate`
 556
 557     If you are using the `-k`, `-a` or `-m` options and Bowtie is
 558     reporting many alignments per read (an average of more than about
 559     10 per read) and you have some memory to spare, using an index with
 560     a denser SA sample can speed things up considerably.
 561
 562     To do this, specify a smaller-than-default `-o`/`--offrate` value
 563     when running `bowtie-build`.  A denser SA sample yields a larger
 564     index, but is also particularly effective at speeding up alignment
 565     when many alignments are reported per read.  For example,
 566     decreasing the index's `-o`/`--offrate` by 1 could as much as
 567     double alignment performance, and decreasing by 2 could quadruple
 568     alignment performance, etc.
 569
 570     On the other hand, decreasing `-o`/`--offrate` increases the size
 571     of the Bowtie index, both on disk and in memory when aligning
 572     reads.  At the default `-o`/`--offrate` of 5, the SA sample for the
 573     human genome occupies about 375 MB of memory when aligning reads.
 574     Decreasing the `-o`/`--offrate` by 1 doubles the memory taken by
 575     the SA sample, and decreasing by 2 quadruples the memory taken,
 576     etc.
 577
 578 4.  If bowtie "thrashes", try increasing `bowtie --offrate`
 579
 580     If `bowtie` runs very slow on a relatively low-memory machine
 581     (having less than about 4 GB of memory), then try setting `bowtie`
 582     `-o`/`--offrate` to a *larger* value than the value used to build
 583     the index.  For example, `bowtie-build`'s default `-o`/`--offrate`
 584     is 5 and all pre-built indexes available from the Bowtie website
 585     are built with `-o`/`--offrate` 5; so if `bowtie` thrashes when
 586     querying such an index, try using `bowtie` `--offrate` 6.  If
 587     `bowtie` still thrashes, try `bowtie` `--offrate` 7, etc.  A higher
 588     `-o`/`--offrate` causes `bowtie` to use a sparser sample of the
 589     suffix array than is stored in the index; this saves memory but
 590     makes alignment reporting slower (which is especially slow when
 591     using `-a` or large `-k` or `-m`).
 592
 593 Command Line
 594 ------------
 595
 596 Usage:
 597
 598     bowtie [options]* <ebwt> {-1 <m1> -2 <m2> | --12 <r> | <s>} [<hit>]
 599
 600   Main arguments
 601
 602     <ebwt>
 603
 604 The basename of the index to be searched.  The basename is the name of
 605 any of the index files up to but not including the final `.1.ebwt` /
 606 `.rev.1.ebwt` / etc.  `bowtie` looks for the specified index first in
 607 the current directory, then in the `indexes` subdirectory under the
 608 directory where the `bowtie` executable is located, then looks in the
 609 directory specified in the `BOWTIE_INDEXES` environment variable.
 610
 611     <m1>
 612
 613 Comma-separated list of files containing the #1 mates (filename usually
 614 includes `_1`), or, if `-c` is specified, the mate sequences
 615 themselves.  E.g., this might be `flyA_1.fq,flyB_1.fq`, or, if `-c`
 616 is specified, this might be `GGTCATCCT,ACGGGTCGT`.  Sequences specified
 617 with this option must correspond file-for-file and read-for-read with
 618 those specified in `<m2>`.  Reads may be a mix of different lengths.
 619 If `-` is specified, `bowtie` will read the #1 mates from the "standard
 620 in" filehandle.
 621
 622     <m2>
 623
 624 Comma-separated list of files containing the #2 mates (filename usually
 625 includes `_2`), or, if `-c` is specified, the mate sequences
 626 themselves.  E.g., this might be `flyA_2.fq,flyB_2.fq`, or, if `-c`
 627 is specified, this might be `GGTCATCCT,ACGGGTCGT`.  Sequences specified
 628 with this option must correspond file-for-file and read-for-read with
 629 those specified in `<m1>`.  Reads may be a mix of different lengths.
 630 If `-` is specified, `bowtie` will read the #2 mates from the "standard
 631 in" filehandle.
 632
 633     <r>
 634
 635 Comma-separated list of files containing a mix of unpaired and
 636 paired-end reads in Tab-delimited format.  Tab-delimited format is a
 637 1-read-per-line format where unpaired reads consist of a read name,
 638 sequence and quality string each separated by tabs.  A paired-end read
 639 consists of a read name, sequnce of the #1 mate, quality values of the
 640 #1 mate, sequence of the #2 mate, and quality values of the #2 mate
 641 separated by tabs.  Quality values can be expressed using any of the
 642 scales supported in FASTQ files.  Reads may be a mix of different
 643 lengths and paired-end and unpaired reads may be intermingled in the
 644 same file.  If `-` is specified, `bowtie` will read the Tab-delimited
 645 reads from the "standard in" filehandle.
 646
 647     <s>
 648
 649 A comma-separated list of files containing unpaired reads to be
 650 aligned, or, if `-c` is specified, the unpaired read sequences
 651 themselves.  E.g., this might be
 652 `lane1.fq,lane2.fq,lane3.fq,lane4.fq`, or, if `-c` is specified, this
 653 might be `GGTCATCCT,ACGGGTCGT`.  Reads may be a mix of different
 654 lengths.  If `-` is specified, Bowtie gets the reads from the "standard
 655 in" filehandle.
 656
 657     <hit>
 658
 659 File to write alignments to.  By default, alignments are written to the
 660 "standard out" filehandle (i.e. the console).
 661
 662   Options
 663
 664     Input
 665
 666     -q
 667
 668 The query input files (specified either as `<m1>` and `<m2>`, or as
 669 `<s>`) are FASTQ files (usually having extension `.fq` or `.fastq`).
 670 This is the default.  See also: `--solexa-quals` and
 671 `--integer-quals`.
 672
 673     -f
 674
 675 The query input files (specified either as `<m1>` and `<m2>`, or as
 676 `<s>`) are FASTA files (usually having extension `.fa`, `.mfa`, `.fna`
 677 or similar).  All quality values are assumed to be 40 on the [Phred
 678 quality] scale.
 679
 680     -r
 681
 682 The query input files (specified either as `<m1>` and `<m2>`, or as
 683 `<s>`) are Raw files: one sequence per line, without quality values or
 684 names.  All quality values are assumed to be 40 on the [Phred quality]
 685 scale.
 686
 687     -c
 688
 689 The query sequences are given on command line.  I.e. `<m1>`, `<m2>` and
 690 `<singles>` are comma-separated lists of reads rather than lists of
 691 read files.
 692
 693     -C/--color
 694
 695 Align in colorspace.  Read characters are interpreted as colors.  The
 696 index specified must be a colorspace index (i.e. built with
 697 `bowtie-build` `-C`, or `bowtie` will print an error message and quit.
 698 See [Colorspace alignment] for more details.
 699
 700     -Q/--quals <files>
 701
 702 Comma-separated list of files containing quality values for
 703 corresponding unpaired CSFASTA reads.  Use in combination with `-C`
 704 and `-f`.  `--integer-quals` is set automatically when `-Q`/`--quals`
 705 is specified.
 706
 707     --Q1 <files>
 708
 709 Comma-separated list of files containing quality values for
 710 corresponding CSFASTA #1 mates.  Use in combination with `-C`, `-f`,
 711 and `-1`.  `--integer-quals` is set automatically when `--Q1`
 712 is specified.
 713
 714     --Q2 <files>
 715
 716 Comma-separated list of files containing quality values for
 717 corresponding CSFASTA #2 mates.  Use in combination with `-C`, `-f`,
 718 and `-2`.  `--integer-quals` is set automatically when `--Q2`
 719 is specified.
 720
 721     -s/--skip <int>
 722
 723 Skip (i.e. do not align) the first `<int>` reads or pairs in the input.
 724
 725     -u/--qupto <int>
 726
 727 Only align the first `<int>` reads or read pairs from the input (after
 728 the `-s`/`--skip` reads or pairs have been skipped).  Default: no
 729 limit.
 730
 731     -5/--trim5 <int>
 732
 733 Trim `<int>` bases from high-quality (left) end of each read before
 734 alignment (default: 0).
 735
 736     -3/--trim3 <int>
 737
 738 Trim `<int>` bases from low-quality (right) end of each read before
 739 alignment (default: 0).
 740
 741     --phred33-quals
 742
 743 Input qualities are ASCII chars equal to the [Phred quality] plus 33.
 744 Default: on.
 745
 746     --phred64-quals
 747
 748 Input qualities are ASCII chars equal to the [Phred quality] plus 64.
 749 Default: off.
 750
 751     --solexa-quals
 752
 753 Convert input qualities from [Solexa][Phred quality] (which can be
 754 negative) to [Phred][Phred quality] (which can't).  This is usually the
 755 right option for use with (unconverted) reads emitted by GA Pipeline
 756 versions prior to 1.3.  Default: off.
 757
 758     --solexa1.3-quals
 759
 760 Same as `--phred64-quals`.  This is usually the right option for use
 761 with (unconverted) reads emitted by GA Pipeline version 1.3 or later.
 762 Default: off.
 763
 764     --integer-quals
 765
 766 Quality values are represented in the read input file as
 767 space-separated ASCII integers, e.g., `40 40 30 40`..., rather than
 768 ASCII characters, e.g., `II?I`....  Integers are treated as being on
 769 the [Phred quality] scale unless `--solexa-quals` is also specified.
 770 Default: off.
 771
 772     Alignment
 773
 774     -v <int>
 775
 776 Report alignments with at most `<int>` mismatches.  `-e` and `-l`
 777 options are ignored and quality values have no effect on what
 778 alignments are valid.  `-v` is mutually exclusive with `-n`.
 779
 780     -n/--seedmms <int>
 781
 782 Maximum number of mismatches permitted in the "seed", i.e. the first
 783 `L` base pairs of the read (where `L` is set with `-l`/`--seedlen`).
 784 This may be 0, 1, 2 or 3 and the default is 2.  This option is mutually
 785 exclusive with the `-v` option.
 786
 787     -e/--maqerr <int>
 788
 789 Maximum permitted total of quality values at *all* mismatched read
 790 positions throughout the entire alignment, not just in the "seed".  The
 791 default is 70.  Like [Maq], `bowtie` rounds quality values to the
 792 nearest 10 and saturates at 30; rounding can be disabled with
 793 `--nomaqround`.
 794
 795     -l/--seedlen <int>
 796
 797 The "seed length"; i.e., the number of bases on the high-quality end of
 798 the read to which the `-n` ceiling applies.  The lowest permitted
 799 setting is 5 and the default is 28.  `bowtie` is faster for larger
 800 values of `-l`.
 801
 802     --nomaqround
 803
 804 [Maq] accepts quality values in the [Phred quality] scale, but
 805 internally rounds values to the nearest 10, with a maximum of 30.  By
 806 default, `bowtie` also rounds this way.  `--nomaqround` prevents this
 807 rounding in `bowtie`.
 808
 809     -I/--minins <int>
 810
 811 The minimum insert size for valid paired-end alignments.  E.g. if `-I
 812 60` is specified and a paired-end alignment consists of two 20-bp
 813 alignments in the appropriate orientation with a 20-bp gap between
 814 them, that alignment is considered valid (as long as `-X` is also
 815 satisfied).  A 19-bp gap would not be valid in that case.  If trimming
 816 options `-3` or `-5` are also used, the `-I` constraint is
 817 applied with respect to the untrimmed mates.  Default: 0.
 818
 819     -X/--maxins <int>
 820
 821 The maximum insert size for valid paired-end alignments.  E.g. if `-X
 822 100` is specified and a paired-end alignment consists of two 20-bp
 823 alignments in the proper orientation with a 60-bp gap between them,
 824 that alignment is considered valid (as long as `-I` is also
 825 satisfied).  A 61-bp gap would not be valid in that case.  If trimming
 826 options `-3` or `-5` are also used, the `-X` constraint is applied
 827 with respect to the untrimmed mates, not the trimmed mates.  Default:
 828 250.
 829
 830     --fr/--rf/--ff
 831
 832 The upstream/downstream mate orientations for a valid paired-end
 833 alignment against the forward reference strand.  E.g., if `--fr` is
 834 specified and there is a candidate paired-end alignment where mate1
 835 appears upstream of the reverse complement of mate2 and the insert
 836 length constraints are met, that alignment is valid.  Also, if mate2
 837 appears upstream of the reverse complement of mate1 and all other
 838 constraints are met, that too is valid.  `--rf` likewise requires that
 839 an upstream mate1 be reverse-complemented and a downstream mate2 be
 840 forward-oriented. ` --ff` requires both an upstream mate1 and a
 841 downstream mate2 to be forward-oriented.  Default: `--fr` when `-C`
 842 (colorspace alignment) is not specified, `--ff` when `-C` is specified.
 843
 844     --nofw/--norc
 845
 846 If `--nofw` is specified, `bowtie` will not attempt to align against
 847 the forward reference strand.  If `--norc` is specified, `bowtie` will
 848 not attempt to align against the reverse-complement reference strand.
 849 For paired-end reads using `--fr` or `--rf` modes, `--nofw` and
 850 `--norc` apply to the forward and reverse-complement pair orientations.
 851 I.e. specifying `--nofw` and `--fr` will only find reads in the R/F
 852 orientation where mate 2 occurs upstream of mate 1 with respect to the
 853 forward reference strand.
 854
 855     --maxbts
 856
 857 The maximum number of backtracks permitted when aligning a read in
 858 `-n` 2 or `-n` 3 mode (default: 125 without `--best`, 800 with
 859 `--best`).  A "backtrack" is the introduction of a speculative
 860 substitution into the alignment.  Without this limit, the default
 861 parameters will sometimes require that `bowtie` try 100s or 1,000s of
 862 backtracks to align a read, especially if the read has many low-quality
 863 bases and/or has no valid alignments, slowing bowtie down
 864 significantly.  However, this limit may cause some valid alignments to
 865 be missed.  Higher limits yield greater sensitivity at the expensive of
 866 longer running times.  See also: `-y`/`--tryhard`.
 867
 868     --pairtries <int>
 869
 870 For paired-end alignment, this is the maximum number of attempts
 871 `bowtie` will make to match an alignment for one mate up with an
 872 alignment for the opposite mate.  Most paired-end alignments require
 873 only a few such attempts, but pairs where both mates occur in highly
 874 repetitive regions of the reference can require significantly more.
 875 Setting this to a higher number allows `bowtie` to find more paired-
 876 end alignments for repetitive pairs at the expense of speed.  The
 877 default is 100.  See also: `-y`/`--tryhard`.
 878
 879     -y/--tryhard
 880
 881 Try as hard as possible to find valid alignments when they exist,
 882 including paired-end alignments.  This is equivalent to specifying very
 883 high values for the `--maxbts` and `--pairtries` options.  This
 884 mode is generally much slower than the default settings, but can be
 885 useful for certain problems.  This mode is slower when (a) the
 886 reference is very repetitive, (b) the reads are low quality, or (c) not
 887 many reads have valid alignments.
 888
 889     --chunkmbs <int>
 890
 891 The number of megabytes of memory a given thread is given to store path
 892 descriptors in `--best` mode.  Best-first search must keep track of
 893 many paths at once to ensure it is always extending the path with the
 894 lowest cumulative cost.  Bowtie tries to minimize the memory impact of
 895 the descriptors, but they can still grow very large in some cases.  If
 896 you receive an error message saying that chunk memory has been
 897 exhausted in `--best` mode, try adjusting this parameter up to
 898 dedicate more memory to the descriptors.  Default: 64.
 899
 900     Reporting
 901
 902     -k <int>
 903
 904 Report up to `<int>` valid alignments per read or pair (default: 1).
 905 Validity of alignments is determined by the alignment policy (combined
 906 effects of `-n`, `-v`, `-l`, and `-e`).  If more than one valid
 907 alignment exists and the `--best` and `--strata` options are
 908 specified, then only those alignments belonging to the best alignment
 909 "stratum" will be reported.  Bowtie is designed to be very fast for
 910 small `-k` but bowtie can become significantly slower as `-k`
 911 increases.  If you would like to use Bowtie for larger values of
 912 `-k`, consider building an index with a denser suffix-array sample,
 913 i.e. specify a smaller `-o`/`--offrate` when invoking `bowtie-build`
 914 for the relevant index (see the [Performance tuning] section for
 915 details).
 916
 917     -a/--all
 918
 919 Report all valid alignments per read or pair (default: off).  Validity
 920 of alignments is determined by the alignment policy (combined effects
 921 of `-n`, `-v`, `-l`, and `-e`).  If more than one valid alignment
 922 exists and the `--best` and `--strata` options are specified, then only
 923 those alignments belonging to the best alignment "stratum" will be
 924 reported.  Bowtie is designed to be very fast for small `-k` but bowtie
 925 can become significantly slower if `-a`/`--all` is specified.  If you
 926 would like to use Bowtie with `-a`, consider building an index with a
 927 denser suffix-array sample, i.e. specify a smaller `-o`/`--offrate`
 928 when invoking `bowtie-build` for the relevant index (see the
 929 [Performance tuning] section for details).
 930
 931     -m <int>
 932
 933 Suppress all alignments for a particular read or pair if more than
 934 `<int>` reportable alignments exist for it.  Reportable alignments are
 935 those that would be reported given the `-n`, `-v`, `-l`, `-e`, `-k`,
 936 `-a`, `--best`, and `--strata` options.  Default: no limit.  Bowtie is
 937 designed to be very fast for small `-m` but bowtie can become
 938 significantly slower for larger values of `-m`.  If you would like to
 939 use Bowtie for larger values of `-k`, consider building an index with a
 940 denser suffix-array sample, i.e. specify a smaller `-o`/`--offrate` when
 941 invoking `bowtie-build` for the relevant index (see the [Performance
 942 tuning] section for details).
 943
 944     -M <int>
 945
 946 Behaves like `-m` except that if a read has more than `<int>`
 947 reportable alignments, one is reported at random.  In [default
 948 output mode], the selected alignment's 7th column is set to `<int>`+1 to
 949 indicate the read has at least `<int>`+1 valid alignments.  In
 950 `-S`/`--sam` mode, the selected alignment is given a `MAPQ` (mapping
 951 quality) of 0 and the `XM:I` field is set to `<int>`+1.  This option
 952 requires `--best`; if specified without `--best`, `--best` is enabled
 953 automatically.
 954
 955     --best
 956
 957 Make Bowtie guarantee that reported singleton alignments are "best" in
 958 terms of stratum (i.e. number of mismatches, or mismatches in the seed
 959 in the case of `-n` mode) and in terms of the quality values at the
 960 mismatched position(s).  Stratum always trumps quality; e.g. a
 961 1-mismatch alignment where the mismatched position has [Phred quality]
 962 40 is preferred over a 2-mismatch alignment where the mismatched
 963 positions both have [Phred quality] 10.  When `--best` is not
 964 specified, Bowtie may report alignments that are sub-optimal in terms
 965 of stratum and/or quality (though an effort is made to report the best
 966 alignment).  `--best` mode also removes all strand bias.  Note that
 967 `--best` does not affect which alignments are considered "valid" by
 968 `bowtie`, only which valid alignments are reported by `bowtie`.  When
 969 `--best` is specified and multiple hits are allowed (via `-k` or
 970 `-a`), the alignments for a given read are guaranteed to appear in
 971 best-to-worst order in `bowtie`'s output.  `bowtie` is somewhat slower
 972 when `--best` is specified.
 973
 974     --strata
 975
 976 If many valid alignments exist and are reportable (e.g. are not
 977 disallowed via the `-k` option) and they fall into more than one
 978 alignment "stratum", report only those alignments that fall into the
 979 best stratum.  By default, Bowtie reports all reportable alignments
 980 regardless of whether they fall into multiple strata.  When
 981 `--strata` is specified, `--best` must also be specified.
 982
 983     Output
 984
 985     -t/--time
 986
 987 Print the amount of wall-clock time taken by each phase.
 988
 989     -B/--offbase <int>
 990
 991 When outputting alignments, number the first base of a reference
 992 sequence as `<int>`.  Default: 0.
 993
 994     --quiet
 995
 996 Print nothing besides alignments.
 997
 998     --refout
 999
1000 Write alignments to a set of files named `refXXXXX.map`, where `XXXXX`
1001 is the 0-padded index of the reference sequence aligned to.  This can
1002 be a useful way to break up work for downstream analyses when dealing
1003 with, for example, large numbers of reads aligned to the assembled
1004 human genome.  If `<hits>` is also specified, it will be ignored.
1005
1006     --refidx
1007
1008 When a reference sequence is referred to in a reported alignment, refer
1009 to it by 0-based index (its offset into the list of references that
1010 were indexed) rather than by name.
1011
1012     --al <filename>
1013
1014 Write all reads for which at least one alignment was reported to a file
1015 with name `<filename>`.  Written reads will appear as they did in the
1016 input, without any of the trimming or translation of quality values
1017 that may have taken place within `bowtie`.  Paired-end reads will be
1018 written to two parallel files with `_1` and `_2` inserted in the
1019 filename, e.g., if `<filename>` is `aligned.fq`, the #1 and #2 mates
1020 that fail to align will be written to `aligned_1.fq` and `aligned_2.fq`
1021 respectively.
1022
1023     --un <filename>
1024
1025 Write all reads that could not be aligned to a file with name
1026 `<filename>`.  Written reads will appear as they did in the input,
1027 without any of the trimming or translation of quality values that may
1028 have taken place within Bowtie.  Paired-end reads will be written to
1029 two parallel files with `_1` and `_2` inserted in the filename, e.g.,
1030 if `<filename>` is `unaligned.fq`, the #1 and #2 mates that fail to
1031 align will be written to `unaligned_1.fq` and `unaligned_2.fq`
1032 respectively.  Unless `--max` is also specified, reads with a number
1033 of valid alignments exceeding the limit set with the `-m` option are
1034 also written to `<filename>`.
1035
1036     --max <filename>
1037
1038 Write all reads with a number of valid alignments exceeding the limit
1039 set with the `-m` option to a file with name `<filename>`.  Written
1040 reads will appear as they did in the input, without any of the trimming
1041 or translation of quality values that may have taken place within
1042 `bowtie`.  Paired-end reads will be written to two parallel files with
1043 `_1` and `_2` inserted in the filename, e.g., if `<filename>` is
1044 `max.fq`, the #1 and #2 mates that exceed the `-m` limit will be
1045 written to `max_1.fq` and `max_2.fq` respectively.  These reads are not
1046 written to the file specified with `--un`.
1047
1048     --suppress <cols>
1049
1050 Suppress columns of output in the [default output mode].  E.g. if
1051 `--suppress 1,5,6` is specified, the read name, read sequence, and read
1052 quality fields will be omitted.  See [Default Bowtie output] for field
1053 descriptions.  This option is ignored if the output mode is
1054 `-S`/`--sam`.
1055
1056     --fullref
1057
1058 Print the full refernce sequence name, including whitespace, in
1059 alignment output.  By default `bowtie` prints everything up to but not
1060 including the first whitespace.
1061
1062     Colorspace
1063
1064     --snpphred <int>
1065
1066 When decoding colorspace alignments, use `<int>` as the SNP penalty.
1067 This should be set to the user's best guess of the true ratio of SNPs
1068 per base in the subject genome, converted to the [Phred quality] scale.
1069 E.g., if the user expects about 1 SNP every 1,000 positions,
1070 `--snpphred` should be set to 30 (which is also the default).  To
1071 specify the fraction directly, use `--snpfrac`.
1072
1073     --snpfrac <dec>
1074
1075 When decoding colorspace alignments, use `<dec>` as the estimated ratio
1076 of SNPs per base.  For best decoding results, this should be set to the
1077 user's best guess of the true ratio.  `bowtie` internally converts the
1078 ratio to a [Phred quality], and behaves as if that quality had been set
1079 via the `--snpphred` option.  Default: 0.001.
1080
1081     --col-cseq
1082
1083 If reads are in colorspace and the [default output mode] is active,
1084 `--col-cseq` causes the reads' color sequence to appear in the
1085 read-sequence column (column 5) instead of the decoded nucleotide
1086 sequence.  See the [Decoding colorspace alignments] section for details
1087 about decoding.  This option is ignored in `-S`/`--sam` mode.
1088
1089     --col-cqual
1090
1091 If reads are in colorspace and the [default output mode] is active,
1092 `--col-cqual` causes the reads' original (color) quality sequence to
1093 appear in the quality column (column 6) instead of the decoded
1094 qualities.  See the [Colorspace alignment] section for details about
1095 decoding.  This option is ignored in `-S`/`--sam` mode.
1096
1097     --col-keepends
1098
1099 When decoding colorpsace alignments, `bowtie` trims off a nucleotide
1100 and quality from the left and right edges of the alignment.  This is
1101 because those nucleotides are supported by only one color, in contrast
1102 to the middle nucleotides which are supported by two.  Specify
1103 `--col-keepends` to keep the extreme-end nucleotides and qualities.
1104
1105     SAM
1106
1107     -S/--sam
1108
1109 Print alignments in [SAM] format.  See the [SAM output] section of the
1110 manual for details.  To suppress all SAM headers, use `--sam-nohead`
1111 in addition to `-S/--sam`.  To suppress just the `@SQ` headers (e.g. if
1112 the alignment is against a very large number of reference sequences),
1113 use `--sam-nosq` in addition to `-S/--sam`.  `bowtie` does not write
1114 BAM files directly, but SAM output can be converted to BAM on the fly
1115 by piping `bowtie`'s output to `samtools view`.  `-S`/`--sam` is not
1116 compatible with `--refout`.
1117
1118     --mapq <int>
1119
1120 If an alignment is non-repetitive (according to `-m`, `--strata` and
1121 other options) set the `MAPQ` (mapping quality) field to this value.
1122 See the [SAM Spec][SAM] for details about the `MAPQ` field  Default: 255.
1123
1124     --sam-nohead
1125
1126 Suppress header lines (starting with `@`) when output is `-S`/`--sam`.
1127 This must be specified *in addition to* `-S`/`--sam`.  `--sam-nohead`
1128 is ignored unless `-S`/`--sam` is also specified.
1129
1130     --sam-nosq
1131
1132 Suppress `@SQ` header lines when output is `-S`/`--sam`.  This must be
1133 specified *in addition to* `-S`/`--sam`.  `--sam-nosq` is ignored
1134 unless `-S`/`--sam` is also specified.
1135
1136     --sam-RG <text>
1137
1138 Add `<text>` (usually of the form `TAG:VAL`, e.g. `ID:IL7LANE2`) as a
1139 field on the `@RG` header line.  Specify `--sam-RG` multiple times to
1140 set multiple fields.  See the [SAM Spec][SAM] for details about what fields
1141 are legal.  Note that, if any `@RG` fields are set using this option,
1142 the `ID` and `SM` fields must both be among them to make the `@RG` line
1143 legal according to the [SAM Spec][SAM].  `--sam-RG` is ignored unless
1144 `-S`/`--sam` is also specified.
1145
1146     Performance
1147
1148     -o/--offrate <int>
1149
1150 Override the offrate of the index with `<int>`.  If `<int>` is greater
1151 than the offrate used to build the index, then some row markings are
1152 discarded when the index is read into memory.  This reduces the memory
1153 footprint of the aligner but requires more time to calculate text
1154 offsets.  `<int>` must be greater than the value used to build the
1155 index.
1156
1157     -p/--threads <int>
1158
1159 Launch `<int>` parallel search threads (default: 1).  Threads will run
1160 on separate processors/cores and synchronize when parsing reads and
1161 outputting alignments.  Searching for alignments is highly parallel,
1162 and speedup is fairly close to linear.  This option is only available
1163 if `bowtie` is linked with the `pthreads` library (i.e. if
1164 `BOWTIE_PTHREADS=0` is not specified at build time).
1165
1166     --mm
1167
1168 Use memory-mapped I/O to load the index, rather than normal C file I/O.
1169 Memory-mapping the index allows many concurrent `bowtie` processes on
1170 the same computer to share the same memory image of the index (i.e. you
1171 pay the memory overhead just once).  This facilitates memory-efficient
1172 parallelization of `bowtie` in situations where using `-p` is not
1173 possible.
1174
1175     --shmem
1176
1177 Use shared memory to load the index, rather than normal C file I/O.
1178 Using shared memory allows many concurrent bowtie processes on the same
1179 computer to share the same memory image of the index (i.e. you pay the
1180 memory overhead just once).  This facilitates memory-efficient
1181 parallelization of `bowtie` in situations where using `-p` is not
1182 desirable.  Unlike `--mm`, `--shmem` installs the index into shared
1183 memory permanently, or until the user deletes the shared memory chunks
1184 manually.  See your operating system documentation for details on how
1185 to manually list and remove shared memory chunks (on Linux and Mac OS
1186 X, these commands are `ipcs` and `ipcrm`).  You may also need to
1187 increase your OS's maximum shared-memory chunk size to accomodate
1188 larger indexes; see your OS documentation.
1189
1190     Other
1191
1192     --seed <int>
1193
1194 Use `<int>` as the seed for pseudo-random number generator.
1195
1196     --verbose
1197
1198 Print verbose output (for debugging).
1199
1200     --version
1201
1202 Print version information and quit.
1203
1204     -h/--help
1205
1206 Print usage information and quit.
1207
1208 Default `bowtie` output
1209 -----------------------
1210
1211 `bowtie` outputs one alignment per line.  Each line is a collection of
1212 8 fields separated by tabs; from left to right, the fields are:
1213
1214 1.  Name of read that aligned
1215
1216 2.  Reference strand aligned to, `+` for forward strand, `-` for
1217     reverse
1218
1219 3.  Name of reference sequence where alignment occurs, or numeric ID if
1220     no name was provided
1221
1222 4.  0-based offset into the forward reference strand where leftmost
1223     character of the alignment occurs
1224
1225 5.  Read sequence (reverse-complemented if orientation is `-`).
1226
1227     If the read was in colorspace, then the sequence shown in this
1228     column is the sequence of *decoded nucleotides*, not the original
1229     colors.  See the [Colorspace alignment] section for details about
1230     decoding.  To display colors instead, use the `--col-cseq` option.
1231
1232 6.  ASCII-encoded read qualities (reversed if orientation is `-`).  The
1233     encoded quality values are on the Phred scale and the encoding is
1234     ASCII-offset by 33 (ASCII char `!`).
1235
1236     If the read was in colorspace, then the qualities shown in this
1237     column are the *decoded qualities*, not the original qualities.
1238     See the [Colorspace alignment] section for details about decoding.
1239     To display colors instead, use the `--col-cqual` option.
1240
1241 7.  If `-M` was specified and the prescribed ceiling was exceeded for
1242     this read, this column contains the value of the ceiling,
1243     indicating that at least that many valid alignments were found in
1244     addition to the one reported.
1245
1246     Otherwise, this column contains the number of other instances where
1247     the same sequence aligned against the same reference characters as
1248     were aligned against in the reported alignment.  This is *not* the
1249     number of other places the read aligns with the same number of
1250     mismatches.  The number in this column is generally not a good
1251     proxy for that number (e.g., the number in this column may be '0'
1252     while the number of other alignments with the same number of
1253     mismatches might be large).
1254
1255 8.  Comma-separated list of mismatch descriptors.  If there are no
1256     mismatches in the alignment, this field is empty.  A single
1257     descriptor has the format offset:reference-base>read-base.  The
1258     offset is expressed as a 0-based offset from the high-quality (5')
1259     end of the read.
1260
1261 SAM `bowtie` output
1262 -------------------
1263
1264 Following is a brief description of the [SAM] format as output by
1265 `bowtie` when the `-S`/`--sam` option is specified.  For more
1266 details, see the [SAM format specification][SAM].
1267
1268 When `-S`/`--sam` is specified, `bowtie` prints a SAM header with
1269 `@HD`, `@SQ` and `@PG` lines.  When one or more `--sam-RG` arguments
1270 are specified, `bowtie` will also print an `@RG` line that includes all
1271 user-specified `--sam-RG` tokens separated by tabs.
1272
1273 Each subsequnt line corresponds to a read or an alignment.  Each line
1274 is a collection of at least 12 fields separated by tabs; from left to
1275 right, the fields are:
1276
1277 1.  Name of read that aligned
1278
1279 2.  Sum of all applicable flags.  Flags relevant to Bowtie are:
1280
1281         1
1282
1283     The read is one of a pair
1284
1285         2
1286
1287     The alignment is one end of a proper paired-end alignment
1288
1289         4
1290
1291     The read has no reported alignments
1292
1293         8
1294
1295     The read is one of a pair and has no reported alignments
1296
1297         16
1298
1299     The alignment is to the reverse reference strand
1300
1301         32
1302
1303     The other mate in the paired-end alignment is aligned to the
1304     reverse reference strand
1305
1306         64
1307
1308     The read is the first  mate in a pair
1309
1310         128
1311
1312     The read is the second  mate in a pair
1313
1314     Thus, an unpaired read that aligns to the reverse reference strand
1315     will have flag 16.  A paired-end read that aligns and is the first
1316     mate in the pair will have flag 83 (= 64 + 16 + 2 + 1).
1317
1318 3.  Name of reference sequence where alignment occurs, or ordinal ID
1319     if no name was provided
1320
1321 4.  1-based offset into the forward reference strand where leftmost
1322     character of the alignment occurs
1323
1324 5.  Mapping quality
1325
1326 6.  CIGAR string representation of alignment
1327
1328 7.  Name of reference sequence where mate's alignment occurs.  Set to
1329     `=` if the mate's reference sequence is the same as this
1330     alignment's, or `*` if there is no mate.
1331
1332 8.  1-based offset into the forward reference strand where leftmost
1333     character of the mate's alignment occurs.  Offset is 0 if there is
1334     no mate.
1335
1336 9.  Inferred insert size.  Size is negative if the mate's alignment
1337     occurs upstream of this alignment.  Size is 0 if there is no mate.
1338
1339 10. Read sequence (reverse-complemented if aligned to the reverse
1340     strand)
1341
1342 11. ASCII-encoded read qualities (reverse-complemented if the read
1343     aligned to the reverse strand).  The encoded quality values are on
1344     the [Phred quality] scale and the encoding is ASCII-offset by 33
1345     (ASCII char `!`), similarly to a [FASTQ] file.
1346
1347 12. Optional fields.  Fields are tab-separated.  For descriptions of
1348     all possible optional fields, see the SAM format specification.
1349     `bowtie` outputs some of these optional fields for each alignment,
1350     depending on the type of the alignment:
1351
1352         NM:i:<N>
1353
1354     Aligned read has an edit distance of `<N>`.
1355
1356         CM:i:<N>
1357
1358     Aligned read has an edit distance of `<N>` in colorspace.  This
1359     field is present in addition to the `NM` field in `-C`/`--color`
1360     mode, but is omitted otherwise.
1361
1362         MD:Z:<S>
1363
1364     For aligned reads, `<S>` is a string representation of the
1365     mismatched reference bases in the alignment.  See [SAM] format
1366     specification for details.  For colorspace alignments, `<S>`
1367     describes the decoded *nucleotide* alignment, not the colorspace
1368     alignment.
1369
1370         XA:i:<N>
1371
1372     Aligned read belongs to stratum `<N>`.  See [Strata] for definition.
1373
1374         XM:i:<N>
1375
1376     For a read with no reported alignments, `<N>` is 0 if the read had
1377     no alignments.  If `-m` was specified and the read's alignments
1378     were supressed because the `-m` ceiling was exceeded, `<N>` equals
1379     the `-m` ceiling + 1, to indicate that there were at least that
1380     many valid alignments (but all were suppressed).  In `-M` mode, if
1381     the alignment was randomly selected because the `-M` ceiling was
1382     exceeded, `<N>` equals the `-M` ceiling + 1, to indicate that there
1383     were at least that many valid alignments (of which one was reported
1384     at random).
1385
1386 [SAM format specification]: http://samtools.sf.net/SAM1.pdf
1387 [FASTQ]: http://en.wikipedia.org/wiki/FASTQ_format
1388
1389 The `bowtie-build` indexer
1390 ==========================
1391
1392 `bowtie-build` builds a Bowtie index from a set of DNA sequences.
1393 `bowtie-build` outputs a set of 6 files with suffixes
1394 `.1.ebwt`, `.2.ebwt`, `.3.ebwt`, `.4.ebwt`, `.rev.1.ebwt`, and
1395 `.rev.2.ebwt`.  These files together constitute the index: they are all
1396 that is needed to align reads to that reference.  The original sequence
1397 files are no longer used by Bowtie once the index is built.
1398
1399 Use of Karkkainen's [blockwise algorithm] allows `bowtie-build` to
1400 trade off between running time and memory usage. `bowtie-build` has
1401 three options governing how it makes this trade: `-p`/`--packed`,
1402 `--bmax`/`--bmaxdivn`, and `--dcv`.  By default, `bowtie-build` will
1403 automatically search for the settings that yield the best
1404  running time without exhausting memory.  This behavior can be disabled
1405  using the `-a`/`--noauto` option.
1406
1407 The indexer provides options pertaining to the "shape" of the index,
1408 e.g. `--offrate` governs the fraction of [Burrows-Wheeler] rows that
1409 are "marked" (i.e., the density of the suffix-array sample; see the
1410 original [FM Index] paper for details).  All of these options are
1411 potentially profitable trade-offs depending on the application.  They
1412 have been set to defaults that are reasonable for most cases according
1413 to our experiments.  See [Performance Tuning] for details.
1414
1415 Because `bowtie-build` uses 32-bit pointers internally, it can handle
1416 up to a theoretical maximum of 2^32-1 (somewhat more than 4 billion)
1417 characters in an index, though, with other constraints, the actual
1418 ceiling is somewhat less than that.  If your reference exceeds 2^32-1
1419 characters, `bowtie-build` will print an error message and abort.  To
1420 resolve this, divide your reference sequences into smaller batches
1421 and/or chunks and build a separate index for each.
1422
1423 If your computer has more than 3-4 GB of memory and you would like to
1424 exploit that fact to make index building faster, use a 64-bit version
1425 of the `bowtie-build` binary.  The 32-bit version of the binary is
1426 restricted to using less than 4 GB of memory.  If a 64-bit pre-built
1427 binary does not yet exist for your platform on the sourceforge download
1428 site, you will need to build one from source.
1429
1430 The Bowtie index is based on the [FM Index] of Ferragina and Manzini,
1431 which in turn is based on the [Burrows-Wheeler] transform.  The
1432 algorithm used to build the index is based on the [blockwise algorithm]
1433 of Karkkainen.
1434
1435 [Blockwise algorithm]: http://portal.acm.org/citation.cfm?id=1314852
1436 [FM Index]: http://portal.acm.org/citation.cfm?id=796543
1437 [Burrows-Wheeler]: http://en.wikipedia.org/wiki/Burrows-Wheeler_transform
1438
1439 Command Line
1440 ------------
1441
1442 Usage:
1443
1444     bowtie-build [options]* <reference_in> <ebwt_base>
1445
1446   Main arguments
1447
1448     <reference_in>
1449
1450 A comma-separated list of FASTA files containing the reference
1451 sequences to be aligned to, or, if `-c` is specified, the sequences
1452 themselves. E.g., `<reference_in>` might be
1453 `chr1.fa,chr2.fa,chrX.fa,chrY.fa`, or, if `-c` is specified, this might
1454 be `GGTCATCCT,ACGGGTCGT,CCGTTCTATGCGGCTTA`.
1455
1456     <ebwt_base>
1457
1458 The basename of the index files to write.  By default, `bowtie-build`
1459 writes files named `NAME.1.ebwt`, `NAME.2.ebwt`, `NAME.3.ebwt`,
1460 `NAME.4.ebwt`, `NAME.rev.1.ebwt`, and `NAME.rev.2.ebwt`, where `NAME`
1461 is `<ebwt_base>`.
1462
1463   Options
1464
1465     -f
1466
1467 The reference input files (specified as `<reference_in>`) are FASTA
1468 files (usually having extension `.fa`, `.mfa`, `.fna` or similar).
1469
1470     -c
1471
1472 The reference sequences are given on the command line.  I.e.
1473 `<reference_in>` is a comma-separated list of sequences rather than a
1474 list of FASTA files.
1475
1476     -C/--color
1477
1478 Build a colorspace index, to be queried using `bowtie` `-C`.
1479
1480     -a/--noauto
1481
1482 Disable the default behavior whereby `bowtie-build` automatically
1483 selects values for the `--bmax`, `--dcv` and `--packed` parameters
1484 according to available memory.  Instead, user may specify values for
1485 those parameters.  If memory is exhausted during indexing, an error
1486 message will be printed; it is up to the user to try new parameters.
1487
1488     -p/--packed
1489
1490 Use a packed (2-bits-per-nucleotide) representation for DNA strings.
1491 This saves memory but makes indexing 2-3 times slower.  Default: off.
1492 This is configured automatically by default; use `-a`/`--noauto` to
1493 configure manually.
1494
1495     --bmax <int>
1496
1497 The maximum number of suffixes allowed in a block.  Allowing more
1498 suffixes per block makes indexing faster, but increases peak memory
1499 usage.  Setting this option overrides any previous setting for
1500 `--bmax`, or `--bmaxdivn`.  Default (in terms of the `--bmaxdivn`
1501 parameter) is `--bmaxdivn` 4.  This is configured automatically by
1502 default; use `-a`/`--noauto` to configure manually.
1503
1504     --bmaxdivn <int>
1505
1506 The maximum number of suffixes allowed in a block, expressed as a
1507 fraction of the length of the reference.  Setting this option overrides
1508 any previous setting for `--bmax`, or `--bmaxdivn`.  Default:
1509 `--bmaxdivn` 4.  This is configured automatically by default; use
1510 `-a`/`--noauto` to configure manually.
1511
1512     --dcv <int>
1513
1514 Use `<int>` as the period for the difference-cover sample.  A larger
1515 period yields less memory overhead, but may make suffix sorting slower,
1516 especially if repeats are present.  Must be a power of 2 no greater
1517 than 4096.  Default: 1024.  This is configured automatically by
1518 default; use `-a`/`--noauto` to configure manually.
1519
1520     --nodc
1521
1522 Disable use of the difference-cover sample.  Suffix sorting becomes
1523 quadratic-time in the worst case (where the worst case is an extremely
1524 repetitive reference).  Default: off.
1525
1526     -r/--noref
1527
1528 Do not build the `NAME.3.ebwt` and `NAME.4.ebwt` portions of the index,
1529 which contain a bitpacked version of the reference sequences and are
1530 used for paired-end alignment.
1531
1532     -3/--justref
1533
1534 Build *only* the `NAME.3.ebwt` and `NAME.4.ebwt` portions of the index,
1535 which contain a bitpacked version of the reference sequences and are
1536 used for paired-end alignment.
1537
1538     -o/--offrate <int>
1539
1540 To map alignments back to positions on the reference sequences, it's
1541 necessary to annotate ("mark") some or all of the [Burrows-Wheeler]
1542 rows with their corresponding location on the genome.  `-o`/`--offrate`
1543 governs how many rows get marked: the indexer will mark every 2^`<int>`
1544 rows.  Marking more rows makes reference-position lookups faster, but
1545 requires more memory to hold the annotations at runtime.  The default
1546 is 5 (every 32nd row is marked; for human genome, annotations occupy
1547 about 340 megabytes).
1548
1549     -t/--ftabchars <int>
1550
1551 The ftab is the lookup table used to calculate an initial
1552 [Burrows-Wheeler] range with respect to the first `<int>` characters
1553 of the query.  A larger `<int>` yields a larger lookup table but faster
1554 query times.  The ftab has size 4^(`<int>`+1) bytes.  The default
1555 setting is 10 (ftab is 4MB).
1556
1557     --ntoa
1558
1559 Convert Ns in the reference sequence to As before building the index.
1560 By default, Ns are simply excluded from the index and `bowtie` will not
1561 report alignments that overlap them.
1562
1563     --big --little
1564
1565 Endianness to use when serializing integers to the index file.
1566 Default: little-endian (recommended for Intel- and AMD-based
1567 architectures).
1568
1569     --seed <int>
1570
1571 Use `<int>` as the seed for pseudo-random number generator.
1572
1573     --cutoff <int>
1574
1575 Index only the first `<int>` bases of the reference sequences
1576 (cumulative across sequences) and ignore the rest.
1577
1578     -q/--quiet
1579
1580 `bowtie-build` is verbose by default.  With this option `bowtie-build`
1581 will print only error messages.
1582
1583     -h/--help
1584
1585 Print usage information and quit.
1586
1587     --version
1588
1589 Print version information and quit.
1590
1591 The `bowtie-inspect` index inspector
1592 ====================================
1593
1594 `bowtie-inspect` extracts information from a Bowtie index about what
1595 kind of index it is and what reference sequences were used to build it.
1596 When run without any options, the tool will output a FASTA file
1597 containing the sequences of the original references (with all
1598 non-`A`/`C`/`G`/`T` characters converted to `N`s).  It can also be used
1599 to extract just the reference sequence names using the `-n`/`--names`
1600 option or a more verbose summary using the `-s`/`--summary` option.
1601
1602 Command Line
1603 ------------
1604
1605 Usage:
1606
1607     bowtie-inspect [options]* <ebwt_base>
1608
1609   Main arguments
1610
1611     <ebwt_base>
1612
1613 The basename of the index to be inspected.  The basename is name of any
1614 of the index files but with the `.X.ebwt` or `.rev.X.ebwt` suffix
1615 omitted.  `bowtie-inspect` first looks in the current directory for the
1616 index files, then looks in the `indexes` subdirectory under the
1617 directory where the currently-running `bowtie` executable is located,
1618 then looks in the directory specified in the `BOWTIE_INDEXES`
1619 environment variable.
1620
1621   Options
1622
1623     -a/--across <int>
1624
1625 When printing FASTA output, output a newline character every `<int>`
1626 bases (default: 60).
1627
1628     -n/--names
1629
1630 Print reference sequence names, one per line, and quit.
1631
1632     -s/--summary
1633
1634 Print a summary that includes information about index settings, as well
1635 as the names and lengths of the input sequences.  The summary has this
1636 format:
1637
1638     Colorspace  <0 or 1>
1639     SA-Sample   1 in <sample>
1640     FTab-Chars  <chars>
1641     Sequence-1  <name>  <len>
1642     Sequence-2  <name>  <len>
1643     ...
1644     Sequence-N  <name>  <len>
1645
1646 Fields are separated by tabs.
1647
1648     -e/--ebwt-ref
1649
1650 By default, when `bowtie-inspect` is run without `-s` or `-n`, it
1651 recreates the reference nucleotide sequences using the bit-encoded
1652 reference nucleotides kept in the `.3.ebwt` and `.4.ebwt` index files.
1653 When `-e/--ebwt-ref` is specified, `bowtie-inspect` recreates the
1654 reference sequences from the Burrows-Wheeler-transformed reference
1655 sequence in the `.1.ebwt` file instead.  The reference recreation
1656 process is much slower when `-e/--ebwt-ref` is specified.  Also, when
1657 `-e/--ebwt-ref` is specified and the index is in colorspace, the
1658 reference is printed in colors (A=blue, C=green, G=orange, T=red).
1659
1660     -v/--verbose
1661
1662 Print verbose output (for debugging).
1663
1664     --version
1665
1666 Print version information and quit.
1667
1668     -h/--help
1669
1670 Print usage information and quit.
1671