+++ /dev/null
-tabix(1) Bioinformatics tools tabix(1)
-
-
-
-NAME
- bgzip - Block compression/decompression utility
-
- tabix - Generic indexer for TAB-delimited genome position files
-
-SYNOPSIS
- bgzip [-cdh] [-b virtualOffset] [-s size] [file]
-
- tabix [-0] [-p gff|bed|sam|vcf] [-s seqCol] [-b begCol] [-e endCol] [-S
- lineSkip] [-c metaChar] in.tab.bgz [region1 [region2 [...]]]
-
-
-DESCRIPTION
- Tabix indexes a TAB-delimited genome position file in.tab.bgz and cre-
- ates an index file in.tab.bgz.tbi when region is absent from the com-
- mand-line. The input data file must be position sorted and compressed
- by bgzip which has a gzip(1) like interface. After indexing, tabix is
- able to quickly retrieve data lines overlapping regions specified in
- the format "chr:beginPos-endPos". Fast data retrieval also works over
- network if URI is given as a file name and in this case the index file
- will be downloaded if it is not present locally.
-
-
-OPTIONS OF TABIX
- -p STR Input format for indexing. Valid values are: gff, bed, sam,
- vcf and psltab. This option should not be applied together
- with any of -s, -b, -e, -c and -0; it is not used for data
- retrieval because this setting is stored in the index file.
- [gff]
-
- -s INT Column of sequence name. Option -s, -b, -e, -S, -c and -0 are
- all stored in the index file and thus not used in data
- retrieval. [1]
-
- -b INT Column of start chromosomal position. [4]
-
- -e INT Column of end chromosomal position. [5]
-
- -S INT Skip first INT lines in the data file. [0]
-
- -c CHAR Skip lines started with character CHAR. [#]
-
- -0 Specify that the position in the data file is 0-based (e.g.
- UCSC files) rather than 1-based.
-
-
-EXAMPLE
- grep -v ^"#" unsorted.gff | sort -k1,1 -k4,4n | bgzip -c >
- sorted.gff.gz;
-
- tabix -p gff sorted.gff.gz;
-
- tabix sorted.gff.gz chr1:10,000,000-20,000,000;
-
-
-NOTES
- It is straightforward to achieve overlap queries using the standard B-
- tree index (with or without binning) implemented in all SQL databases,
- or the R-tree index in PostgreSQL and Oracle. But there are still many
- reasons to use tabix. Firstly, tabix directly works with a lot of
- widely used TAB-delimited formats such as GFF/GTF and BED. We do not
- need to design database schema or specialized binary formats. Data do
- not need to be duplicated in different formats, either. Secondly, tabix
- works on compressed data files while most SQL databases do not. The
- GenCode annotation GTF can be compressed down to 4%. Thirdly, tabix is
- fast. The same indexing algorithm is known to work efficiently for an
- alignment with a few billion short reads. SQL databases probably cannot
- easily handle data at this scale. Last but not the least, tabix sup-
- ports remote data retrieval. One can put the data file and the index at
- an FTP or HTTP server, and other users or even web services will be
- able to get a slice without downloading the entire file.
-
-
-AUTHOR
- Tabix was written by Heng Li. The BGZF library was originally imple-
- mented by Bob Handsaker and modified by Heng Li for remote file access
- and in-memory caching.
-
-
-SEE ALSO
- samtools(1)
-
-
-
-tabix-0.1.0 2 November 2009 tabix(1)