Covariate analysis of Cho dataset

The scoreColumnLabelingsForPCN() function attempts to identify any column labelings that correlate well with a particular principal component's condition partitioning.

Unfortunately the Cho yeast cell cycling dataset only has one covariate, time (the "time points" column labeling), so it is not really a good example of this feature. Furthermore, it was loaded as a discrete covariate, so we're not able to do a proper analysis of what is naturally a continuous covariate (unless we explicitly define a new column labeling containing time as a numeric covariate). Nonetheless, we can give it a try as a discrete variable to see what happens.

The function that computes covariate scores for a specific principal component is called as follows:

pcaginzu.scoreColumnLabelingsForPCN(1)

This returns a ColumnScore object to hold 1 discrete covariate correlation score, and 3 continuous covariate correlation scores.

We can determine which principal component best correlates with time, by looking at all of the principal components. E.g.:

for i in range(1,pcaginzu.rowPCAView.numCols+1):
  print i, pcaginzu.scoreColumnLabelingsForPCN(i)[0].scores

The above commands result in the following output:

1 0.682976972108
2 0.686202158131
3 0.625067714467
4 0.657751569535
5 0.582238520955
6 0.539481092126
7 0.602061519391
8 0.563922061919
9 0.0
10 0.539481092126
11 0.539481092126
12 0.539481092126
13 0.0
14 0.0
15 0.539481092126
16 0.0
17 0.539481092126

Apparently principal component 2 generates the condition partitioning that best explains time in this dataset.

Joe Roden 2005-12-13