Attach labelings to a dataset

(The following examples assume we used Option 1 above, namely that we changed directories prior to loading files. If you used Option 2 you can adapt the statements as needed to use your complete data path).

Once we have a dataset loaded we can attach one or more "Labelings" to it to provide additional information about the various rows and columns (or even individual data values) of the dataset. For instance one typically wants a unique name (e.g. a gene ID) for each row attached to the dataset so the summary plots can tell you the name of an interesting looking vector. With the Cho data set, this would be ORF (gene) identifiers. The file "ORFs.rlab" contains one identifier per line and can be associated with the dataset as follows:

orfs = labelings.Labeling(cho, 'orfs')
orfs.labelRows('ORFs.rlab')
cho.setPrimaryRowLabeling(orfs)

A "primary" row labeling needs to have a unique value for each row. For the Cho dataset the ORF identifiers serve that purpose.

Once we have our primary labeling we might as well attach some other useful pieces of information. Included with the Cho dataset are additional row annotations containing such things as gene common names, and cluster membership (the cluster IDs to which a gene belongs). We have cluster labelings for both manually-assigned clusters (assigned by Cho in [Cho et al., 1998]) and computer-generated clusters (derived by the CompClust DiagEM algorithm). We can load these additional labelings as follows:

names = labelings.Labeling(cho, 'common name')
names.labelRows('CommonNames.rlab')

em = labelings.Labeling(cho, 'diagem clusters')
em.labelRows('EM.rlab')

cho_clustering = labelings.Labeling(cho, 'cho clusters')
cho_clustering.labelRows('ChoClassification.rlab')

We can also attach information to the columns of the dataset. The primaryColumnLabeling is used for many of the condition (x-axis) labels along the plots. For the Cho data set, this would be the time (in hours) at which each sample was taken for the time course experiment.

times = labelings.Labeling(cho, 'time points')
times.labelCols('times.clab')
cho.setPrimaryColumnLabeling(times)

Joe Roden 2005-12-13