View PCA projection scatter plots

Given this new IPlot-based PCAGinzu object, we will next create and display a PCAGinzu data point PCA projection scatter plot. This will allow us to see where all of the points lie in a subspace of the principal component dimensions (e.g. in the principal component 1 vs. principal component 2 sub-space), and visualize the points (e.g. genes) that are extreme for one principal component.

The following call to plotPCvcPCWithOutliersInY will create a scatter plot displaying all of the data points (rows, e.g. genes) in a 2D subspace of (a projection within) the principal component dimensions. For example, we will plot principal component 1 along the X axis, and component 2 along the Y axis. The high and low extreme points for the second principal component that have likelihoods less than or equal to 0.05 are highlighed in red and blue, respectively. That value extreme point likelihood cutoff was set by the above PCAGinzu call.

ipcaginzu.plotPCvsPCWithOutliersInY(0,1)

The above command produces the plot that was previously shown in Figure 1.

Users can plot any pair of principal components against each other. Our standard is to plot $PC_{n-1}$ vs. , or in some cases $PC_{n}$ vs. because often corresponds to the magnitude of expression. Whichever principal copmonent is specified by the second argument to the function will appear on the Y-axis and it's extreme points will be highlighted.

These plots are interactive, so clicking on a single data point allows you to see the identity of that specific gene.

Note: One crucial detail is that the IPlot-based PCAGinzu object uses zero-based array indexing, e.g. it counts each principal component as 0, 1, 2, etc. The web interface is built using this version, and because Python arrays and lists are 0-based, it was simpler to make these functions compatible. The MLX-based pcaGinzu and pcaGinzuVisualizeMatlab objects are 1-based, i.e. arguments are naturally numbered as the principal components are 1, 2, 3, etc.

Joe Roden 2005-12-13