tabix(1)                     Bioinformatics tools                     tabix(1)


NAME
       bgzip - Block compression/decompression utility

       tabix - Generic indexer for TAB-delimited genome position files

SYNOPSIS
       bgzip [-cdh] [-b virtualOffset] [-s size] [file]

       tabix [-0] [-p gff|bed|sam|vcf] [-s seqCol] [-b begCol] [-e endCol] [-S
       lineSkip] [-c metaChar] in.tab.bgz [region1 [region2 [...]]]


DESCRIPTION
       Tabix indexes a TAB-delimited genome position file in.tab.bgz and  cre-
       ates  an  index file in.tab.bgz.tbi when region is absent from the com-
       mand-line. The input data file must be position sorted  and  compressed
       by  bgzip  which has a gzip(1) like interface. After indexing, tabix is
       able to quickly retrieve data lines overlapping  regions  specified  in
       the  format  "chr:beginPos-endPos". Fast data retrieval also works over
       network if URI is given as a file name and in this case the index  file
       will be downloaded if it is not present locally.


OPTIONS OF TABIX
       -p STR    Input  format  for indexing. Valid values are: gff, bed, sam,
                 vcf and psltab. This option should not  be  applied  together
                 with  any  of  -s, -b, -e, -c and -0; it is not used for data
                 retrieval because this setting is stored in the  index  file.
                 [gff]

       -s INT    Column of sequence name. Option -s, -b, -e, -S, -c and -0 are
                 all stored in the index  file  and  thus  not  used  in  data
                 retrieval. [1]

       -b INT    Column of start chromosomal position. [4]

       -e INT    Column of end chromosomal position. [5]

       -S INT    Skip first INT lines in the data file. [0]

       -c CHAR   Skip lines started with character CHAR. [#]

       -0        Specify  that  the position in the data file is 0-based (e.g.
                 UCSC files) rather than 1-based.


EXAMPLE
       grep  -v  ^"#"  unsorted.gff  |  sort  -k1,1  -k4,4n  |  bgzip   -c   >
       sorted.gff.gz;

       tabix -p gff sorted.gff.gz;

       tabix sorted.gff.gz chr1:10,000,000-20,000,000;


NOTES
       It  is straightforward to achieve overlap queries using the standard B-
       tree index (with or without binning) implemented in all SQL  databases,
       or  the R-tree index in PostgreSQL and Oracle. But there are still many
       reasons to use tabix. Firstly, tabix  directly  works  with  a  lot  of
       widely  used  TAB-delimited  formats such as GFF/GTF and BED. We do not
       need to design database schema or specialized binary formats.  Data  do
       not need to be duplicated in different formats, either. Secondly, tabix
       works on compressed data files while most SQL  databases  do  not.  The
       GenCode annotation GTF can be compressed down to 4%.  Thirdly, tabix is
       fast. The same indexing algorithm is known to work efficiently  for  an
       alignment with a few billion short reads. SQL databases probably cannot
       easily handle data at this scale. Last but not the  least,  tabix  sup-
       ports remote data retrieval. One can put the data file and the index at
       an FTP or HTTP server, and other users or even  web  services  will  be
       able to get a slice without downloading the entire file.


AUTHOR
       Tabix  was  written  by Heng Li. The BGZF library was originally imple-
       mented by Bob Handsaker and modified by Heng Li for remote file  access
       and in-memory caching.


SEE ALSO
       samtools(1)


tabix-0.1.0                     2 November 2009                       tabix(1)