next up previous contents
Next: sortDatasetByLabel Up: Integrated Methods Previous: subset   Contents

labelUsing

labelUsing() is a very powerful method of the Dataset class. It allows one to label any Dataset or View using the labeling from and other Dataset or View to which it is directly or indirectly attached. An example may help.

Let's assume we have a dataset which has two labelings attached to it. One defined the results of partitioning the data into three classes. It contains the labels '0', '1', and '2' and has labeled every row. The second labeling contains various annotations about the data. Now, if we created a subset from the first labeling (say we want to look at partition '2') via the Labelings method, how can we find out the annotation information for the data in this subset? We could look at the values in each datapoint, find them in the original dataset and then look at the label, but that is tedious and error prone is there is duplicate data. Instead we use labelUsing() which returns a new labeling for the subset which contains the labels for the corresponding datapoints.

>>> ds = Dataset(MLab.rand(9,2))
>>> classes = Labeling(ds)
>>> classes.addLabelToRows('0',[0,1,2])
>>> classes.addLabelToRows('1',[3,4,5])
>>> classes.addLabelToRows('2',[6,7,8])
>>> annot = Labeling(ds)
>>> annot.labelRows(['a1','b2','c4','d','e','f',
... 'a2','g','e3'])
>>> sub = ds.subset(classes, '2')
>>> annot.getRowLabels()
['a1', 'b2', 'c4', 'd', 'e', 'f', 'a2', 'g', 'e3']
>>> ds.getData()
[[ 0.93310058, 0.59187013,]
 [ 0.62153167, 0.28014728,]
 [ 0.66348588, 0.06767605,]
 [ 0.19404136, 0.3603994 ,]
 [ 0.39297128, 0.4130477 ,]
 [ 0.3661778 , 0.89579421,]
 [ 0.32203168, 0.02883008,]
 [ 0.53443021, 0.62014562,]
 [ 0.79724813, 0.12202285,]]
>>> sub.getData()
[[ 0.32203168, 0.02883008,]
 [ 0.53443021, 0.62014562,]
 [ 0.79724813, 0.12202285,]]
>>> class2lab = sub.labelUsing(annot)
>>> class2lab.getRowLabels()
['a2', 'g', 'e3']

As you can see, the Labeling class2lab contains the row labels of the ds object for the rows from which it was derived. This functionality is not limited to parent-child relationship. Let's create another subset of odd rows and label it using the class2lab labeling.

>>> odds = ds.subsetRows([1,3,5,7])
>>> odds.getData()
[[ 0.62153167, 0.28014728,]
 [ 0.19404136, 0.3603994 ,]
 [ 0.3661778 , 0.89579421,]
 [ 0.53443021, 0.62014562,]]
>>> odd_and_class2 = odds.labelUsing(class2lab)
>>> odd_and_class2.getLabels()
['g']

Thus, only the row annotated with 'g', is both a odd row and in the class '2' subset. By the clever use of labels, the labelUsing() method can effectively provide the intersection of two datasets. It will be noted throughout the tutorial whenever a method has an analogous set operation such as union, difference, etc.


next up previous contents
Next: sortDatasetByLabel Up: Integrated Methods Previous: subset   Contents
Lucas Scharenbroich 2003-08-27