Mining gene expression data by interpreting principal components

Joseph C. Roden1*, Brandon King2, Diane Trout2, Ali Mortazavi2, Barbara J. Wold2, Christopher E. Hart2

1 Jet Propulsion Laboratory, California Institute of Technology
2 Division of Biology, Calfiornia Institute of Technology
* Corresponding author

Email addresses:


Our paper was submitted to BMC Bioinoformatics on 2005-07-03. It was accepted for publication on 2006-02-16.

View publication at BMC


Figure 1: (.png 80K)
Figure 2: (.png 348K)
Figure 3: (.png 104K)
Figure 4: (.png 360K)
Figure 5: (.png 124K)


The datasets we analyzed were originally obtained from the following sources:

Software for PCA Interpretation

There are three ways for you to learn more about our software, ordered by increasing effort and committment to using the software:
  1. Interested researchers can use our web-based application, CompClustWeb, to interactively review the results of our analysis of the GNF and diabetes datasets we generated for this publication.

  2. We have a convienent Windows package CompClustShell v1.2 (Updated 12/14/2005) which contains all of the software packages needed to begin to explore CompClust, including the PCA interpretation software. It is a Windows executable that provides a Python shell with CompClust and dependent packages pre-loaded. We have a a tutorial [ html | pdf ] that walks users through how to use the CompClust programming API to generate PCA interpretation results.

  3. Software developers can download and install the complete source code for1 the CompClust v1.2 Python package and use the programming interface to generate PCA interpretations for additional datasets. Note that the CompClust source installation requires additional Python packages, and so the installation can be involved. We are working to simplify installation and plan to provide user-friendly installers for other operating systems in the near future.
NOTICE: We released an updated version of CompClustShell and the CompClust source code, including revised tutorials on 13 December 2005. We have since revised the corresponding CompClust web pages and installation instructions. - JR, 2005-12-14

Supplemental Files

GNF pc07-outliers.txt
GNF pc07-condition-groups.txt
GNF pc07-eigenvector.png
GNF pc07-outliers.png
GNF pc07-outlier-trajectories-order-original.png
GNF pc07-outlier-trajectories-order-meandiff.png

Supplemental PCA Interpretation Results

PCA interpretation results for GNF for smaller extreme gene sets.
PCA interpretation results for GNF for larger extreme gene sets.
PCA interpretation results for diabetes for smaller extreme gene sets.
PCA interpretation results for diabetes for larger extreme gene sets.

Supplemental PCA Stability Analysis

Results of an analysis of the stability of PCA on the diabetes dataset.