View outlier trajectory plots

We will now display the extreme point trajectories, first in original order, i.e. the unsorted native ordering of the dataset. This ordering would be the same ordering as any other unsorted plot that one might chose to create.

pcaginzu.plotPCNOutlierRowsInOriginalColumnOrder(2)

In the next plot we will see the same data as above but sorted to emphasize thohse conditions with the greatest the difference of mean "high" and mean "low" expression. "High" vectors are the red ones, defined as those that have the maximum value on the given principal component axis. Likewise, the low vectors in blue are (as one might guess) are the ones that started off as the most negative of all the values in the selected principal component.

This graphic frequently has an "X" shape to it. The left side of the plot can emphasize any conditions in which the high extreme points (in red) have significantly higher values than the low extreme points. Likewise, the right side of the plot emphasizes any conditions in which the low extreme points (in blue) have significantly higher values than the high extreme points.

pcaginzu.plotPCNOutlierRowsInSigGroupOrder(2)

As with the IPLot-based equivalent graphic, this reordering of the conditions is not very meaningful without knowing the ordering. An optional second argument allows us to attach names to the X-axis, which is useful if the number of dimensions to plot is not very large. (In the GNF dataset, for example, we can't attach labels to the 158 dimensions and still expect to see the distinct labels). Here, we retrieve and attach the time points in hours for the cell cycle data, and provide those labels to the routine.

l = cho.getLabeling('time points')
times = [x[0] for x in l.getLabelsByCols(range(17))]
pcaginzu.plotPCNOutlierRowsInSigGroupOrder(2,times)

Now it is clearer that the high extreme genes are much higher than the low genes at around 9-10 hours and 16 hours. Likewise the low genes are much higher than the high genes at approx 3-6 hours. There is essentially no significant difference in the high and low gene's expression in hours 12-15 and hours 1-2.

Joe Roden 2005-12-13