Mining gene expression data by interpreting principal components
Joseph C. Roden1*, Brandon King2, Diane Trout2, Ali Mortazavi2, Barbara J. Wold2, Christopher E. Hart2
1 Jet Propulsion Laboratory, California Institute of Technology
2 Division of Biology, Calfiornia Institute of Technology
* Corresponding author
Email addresses:
JR: joe.roden@jpl.nasa.gov
BK: kingb@caltech.edu
DT: diane@caltech.edu
AM: alim@caltech.edu
BW: woldb@caltech.edu
CH: hart@caltech.edu
Submission
Our paper was submitted to BMC Bioinoformatics on 2005-07-03. It was
accepted for publication on 2006-02-16.
View publication at BMC
Figures
Figure 1: (.png 80K)
Figure 2: (.png 348K)
Figure 3: (.png 104K)
Figure 4: (.png 360K)
Figure 5: (.png 124K)
Datasets
The datasets we analyzed were originally obtained from the following sources:
Software for PCA Interpretation
There are three ways for you to learn more about our software, ordered by increasing effort and committment to using the software:
-
Interested researchers can use our web-based application,
CompClustWeb, to interactively review the results of our
analysis of the GNF and diabetes datasets we generated for this publication.
-
We have a convienent Windows package CompClustShell v1.2 (Updated 12/14/2005) which contains all of the software packages needed to begin to explore CompClust, including the PCA interpretation software. It is a Windows executable that provides a Python shell with CompClust and dependent packages pre-loaded. We have a a tutorial [ html | pdf ] that walks users through how to use the CompClust programming API to generate PCA interpretation results.
-
Software developers can download and install the complete source code for1 the
CompClust v1.2 Python package and use the programming interface to generate PCA interpretations for additional datasets. Note that the CompClust source installation requires additional Python packages, and so the installation can be involved. We are working to simplify installation and plan to provide user-friendly installers for other operating systems in the near future.
NOTICE: We released an updated version of CompClustShell and the CompClust source code, including revised tutorials on 13 December 2005. We have since revised the corresponding CompClust web pages and installation instructions. - JR, 2005-12-14
Supplemental Files
GNF pc07-outliers.txt
GNF pc07-condition-groups.txt
GNF pc07-eigenvector.png
GNF pc07-outliers.png
GNF pc07-outlier-trajectories-order-original.png
GNF pc07-outlier-trajectories-order-meandiff.png
Supplemental PCA Interpretation Results
PCA interpretation results for GNF for smaller extreme gene sets.
PCA interpretation results for GNF for larger extreme gene sets.
PCA interpretation results for diabetes for smaller extreme gene sets.
PCA interpretation results for diabetes for larger extreme gene sets.
Supplemental PCA Stability Analysis
Results of an analysis of the stability of PCA on the diabetes dataset.