tabix.txt

   1 tabix(1)                     Bioinformatics tools                     tabix(1)
   2
   3
   4
   5 NAME
   6        bgzip - Block compression/decompression utility
   7
   8        tabix - Generic indexer for TAB-delimited genome position files
   9
  10 SYNOPSIS
  11        bgzip [-cdh] [-b virtualOffset] [-s size] [file]
  12
  13        tabix [-0] [-p gff|bed|sam|vcf] [-s seqCol] [-b begCol] [-e endCol] [-S
  14        lineSkip] [-c metaChar] in.tab.bgz [region1 [region2 [...]]]
  15
  16
  17 DESCRIPTION
  18        Tabix indexes a TAB-delimited genome position file in.tab.bgz and  cre-
  19        ates  an  index file in.tab.bgz.tbi when region is absent from the com-
  20        mand-line. The input data file must be position sorted  and  compressed
  21        by  bgzip  which has a gzip(1) like interface. After indexing, tabix is
  22        able to quickly retrieve data lines overlapping  regions  specified  in
  23        the  format  "chr:beginPos-endPos". Fast data retrieval also works over
  24        network if URI is given as a file name and in this case the index  file
  25        will be downloaded if it is not present locally.
  26
  27
  28 OPTIONS OF TABIX
  29        -p STR    Input  format  for indexing. Valid values are: gff, bed, sam,
  30                  vcf and psltab. This option should not  be  applied  together
  31                  with  any  of  -s, -b, -e, -c and -0; it is not used for data
  32                  retrieval because this setting is stored in the  index  file.
  33                  [gff]
  34
  35        -s INT    Column of sequence name. Option -s, -b, -e, -S, -c and -0 are
  36                  all stored in the index  file  and  thus  not  used  in  data
  37                  retrieval. [1]
  38
  39        -b INT    Column of start chromosomal position. [4]
  40
  41        -e INT    Column of end chromosomal position. [5]
  42
  43        -S INT    Skip first INT lines in the data file. [0]
  44
  45        -c CHAR   Skip lines started with character CHAR. [#]
  46
  47        -0        Specify  that  the position in the data file is 0-based (e.g.
  48                  UCSC files) rather than 1-based.
  49
  50
  51 EXAMPLE
  52        grep  -v  ^"#"  unsorted.gff  |  sort  -k1,1  -k4,4n  |  bgzip   -c   >
  53        sorted.gff.gz;
  54
  55        tabix -p gff sorted.gff.gz;
  56
  57        tabix sorted.gff.gz chr1:10,000,000-20,000,000;
  58
  59
  60 NOTES
  61        It  is straightforward to achieve overlap queries using the standard B-
  62        tree index (with or without binning) implemented in all SQL  databases,
  63        or  the R-tree index in PostgreSQL and Oracle. But there are still many
  64        reasons to use tabix. Firstly, tabix  directly  works  with  a  lot  of
  65        widely  used  TAB-delimited  formats such as GFF/GTF and BED. We do not
  66        need to design database schema or specialized binary formats.  Data  do
  67        not need to be duplicated in different formats, either. Secondly, tabix
  68        works on compressed data files while most SQL  databases  do  not.  The
  69        GenCode annotation GTF can be compressed down to 4%.  Thirdly, tabix is
  70        fast. The same indexing algorithm is known to work efficiently  for  an
  71        alignment with a few billion short reads. SQL databases probably cannot
  72        easily handle data at this scale. Last but not the  least,  tabix  sup-
  73        ports remote data retrieval. One can put the data file and the index at
  74        an FTP or HTTP server, and other users or even  web  services  will  be
  75        able to get a slice without downloading the entire file.
  76
  77
  78 AUTHOR
  79        Tabix  was  written  by Heng Li. The BGZF library was originally imple-
  80        mented by Bob Handsaker and modified by Heng Li for remote file  access
  81        and in-memory caching.
  82
  83
  84 SEE ALSO
  85        samtools(1)
  86
  87
  88
  89 tabix-0.1.0                     2 November 2009                       tabix(1)