Performing batch PCA interpretation

One advantage of the matplotlib based pcaGinzu object is that it contains a function called generateResults() which implements a batch analysis mode. This function can be used to generate all of the graphical and textual results across all principal components.

Because calls to generateResults() will end up creating a large number of plots (one PCA projection plot and two trajectory plots per principal component), we recommend you execute generateResults() in a fresh python shell session in which matplotlib figures are not automatically appearing. If you are using CompClustShell or a plain Python shell, start a new session and do not use the show() command prior to calling generateResults(). If you are using an ipython shell, start a new session without using the optional "-pylab" argument. It is not a huge problem if all of the plots appear, as you can always use close('all') to get rid of them. But they really slow down the plot generation.

So in a new shell, import the necessary packages (see 2.1 for all of our import statements, or just execute the a minumum set below), then load the dataset, and create the pcaGinzu object:

from compClust.mlx import pcaGinzu
from compClust.util import LoadExample

cho = LoadExample.LoadCho()
pcaginzu = pcaGinzu.pcaGinzuVisualizeMatplotlib(cho, nOutliers=10, sigCutoff=0.01)

The following call will create a complete set of of plots (PCA projections with extreme points, trajectory plots, etc) and tab-delimited text tables (extreme point and significant condition lists) in the current directory for all of the principal components analyzed. The two arguments passed to the function specify a list of row labeling names and a list of column labeling names that should be included in the resulting extreme point and significant condition text output files.

pcaginzu.generateResults(['cho clusters', 'diagem clusters', 'common name', 'orfs'], ['time points'])

A set of extreme point scatter and trajectory plots, as well as extreme point lists and ordered condition lists with signficant condition groups will be generated in the current directory.

While a directory full of files can be helpful in certain circumstances, we also find the CompClustWeb interface a useful means to browse through the results of PCA interpretation.

Joe Roden 2005-12-13