Caltech BioHub

The Caltech BioHub: Unified access to diverse bioinformatics datasets

The number and diversity of bioinformatics data sources, as well as their ever-increasing sizes, pose numerous challenges to investigators wanting to perform integrative data analyses in their research. The need to combine myriad data sources, formats and qualities ranging from well-vetted annotations to mere hypotheses often impedes such efforts.

The BioHub is a relational database and Python API developed at Caltech that manages associations between numerous genomic-sequence-based and transcript-based datasets in order to provide centralized query services and uniform data access. The Biohub was designed to permit biologists to draw on and combine many disparate data sources for integrative analyses such as gene network modeling. The central feature in Biohub design is the Sequence Registry which relates diverse data and annotations to individual genomic sequence features - usually genes. Key BioHub design features include:

The poster presented at PSB 2004 described the status of the current BioHub prototype, demonstrated the ways it has been used to date (e.g. in assessing the quality of oligonucleotide probe libraries, for identifying common regulatory elements, etc.), and described design plans for future BioHub development.