The Wold Lab: Bioinformatics Tools

Bioinformatics Tools

ChIPSeq Peak Finder - Description coming soon.

BioHub - The BioHub is a relational database and Python API developed at Caltech that manages associations between numerous genomic-sequence-based and transcript-based datasets in order to provide centralized query services and uniform data access. The Biohub was designed to permit biologists to draw on and combine many disparate data sources for integrative analyses such as gene network modeling. The central feature in Biohub design is the Sequence Registry which relates diverse data and annotations to individual genomic sequence features - usually genes.

Cistematic - The core of Cistematic is a Python package with a rich set of API's that simplify the collection and analysis of candidate cis-elements from a number of different motif-finding and phylogenetic footprinting programs such as MEME, AlignACE, Co-Bind, and FootPrinter. Cistematic assesses the significance of each motif by comparing it to its prevalence genome-wide. A web front-end using Python Server Pages is built on top of the Webware application server, which allows for an interactive setup and exploration of the results.

CompClust - CompClust is a python package written using the pyMLX and IPlot APIs. It provides software tools to explore and quantify relationships between clustering results. Its development has been largely built around needs of microarray data analysis but could be easily used in other domains. Briefly pyMLX provides an provides for efficient and convenient execution of many clustering algorithms using a extendable library of algorithms. It also provides many-to-many linkages between data features and annotations (such as cluster labels, gene names, gene ontology information, etc.) This linkages are are persistant through data manipulations. IPlot provides an abstraction of the plotting process in which any arbitrary feature or derived feature of the data can be projected onto any feature of the plot, including the X,Y coordinates of points, marker symbol, marker size, maker/line color, etc. These plots are intrinsically linked to the dataset, the View and the Labeling classes found within pyMLX.

MAD: Motif Analysis and Detection - MAD is a python package that provides tools for locating and analyzing candidate regulatory motifs (factor binding sites) using PWMs. MAD provides tools for the visualization of motifs in genomic sequences (intergenic or otherwise) with their appropriate significance. It also provides algorithms for optimizing (refining) motif PWMs based on clustering results (coexpression), phylogenetic information (orthology across genomes), and cooperativity with other motifs.

Mussa - Mussa is an N-way version of the FamilyRelations/secomp 2-way comparative sequence analysis programs. Given DNA sequence from N species, Mussa uses all possible pairwise comparions to derive an N-wise comparison. For example, given sequences 1,2,3, and 4, Mussa makes 6 2-way comparisons: 1vs2, 1vs3, 1vs4, 2vs3, 2vs4, and 3vs4. It then compares all the links between these comparisons, saving those that satisfy a transitivity requirement. The saved paths are then displayed in an interactive viewer

Pymerase - Pymerase is a tool intended to generate a python object model, relational database, and an object-relational model connecting the two. However it has been extended to also output web pages, gui widgets, tab delimited text parsers, etc. It can be easily extended to output whatever else you might like. We are currently using Pymerase for BioHub development and other projects.

Sigmoid - The Sigmoid project is intended to produce a database of cellular signaling pathways and models thereof, to marshall the major forms of data and knowledge required as input to cellular modeling software and also to organize the outputs. Such cellular signaling and regulatory pathways are commonly hand-drawn in biological literature as an aid to intuitive understanding. Pathway databases can provide the same assistance in the context of attempts to achieve a quantitative understanding of cellular processes by numerical simulation. They can also serve as an aid to capturing and querying both expert knowledge and heterogeneous data sets pertaining to pathways. Cell model databases are a subject of current research. SIGMOID works at the interface of these two areas.