RowSubsetView

Next: ColumnSubsetView Up: Views Previous: ColumnFunctionView Contents

RowSubsetView

A SubsetView is an extremely useful view type which allows one to pull interesting data from a dataset and examine it independently. Although this section only considers the RowSubsetView, the ColumnSubsetView is just as useful and operates identically.

For our example of using the RowSubsetView, we will construct a random matrix and create a subset which contains all the rows which have a magnitude above a certain threshold. An important aspect of using a SubsetView is the key system. Every row and column of a dataset has a unique key associated with it. When constructing a RowSubsetView, a list of row keys is passed in to indicate which rows are to be taken from the parent dataset. It should be noted that there is no limit on the length of the key list. It is perfectly acceptable to have a subset which is larger than the parent dataset. Obviously, in this case, some of the rows will be duplicates.

For this example we will need yet another module - the MLab module, which provides Matlab®-style functions. Execute the following to load these modules.

>>> import MLab

Now we'll build our random dataset and look for rows with a magnitude larger than 1.0.

 
>>> ds = Dataset(MLab.rand(10,2))
>>> data = ds.getData()
>>> sum_of_squares = Numeric.sum(data*data, 1)
>>> mag = Numeric.sqrt(sum_of_squares)
>>> large_mag = filter(lambda x : x > 1.0, mag)
>>> large_rows = map(lambda x : mag.tolist().index(x), 
... large_mag)

Now that we have the list of rows which have magnitudes greater than 1.0, we can pass this list into the constructor of the RowSubsetView.

>>> large_subset = RowSubsetView(ds, large_keylist)
>>> large_subset
RowSubsetView: None, 2 by 2
>>> ds.getData()
[[ 0.04412353, 0.97403276,]
 [ 0.55236453, 0.0979613 ,]
 [ 0.19320844, 0.57923186,]
 [ 0.77776194, 0.4972173 ,]
 [ 0.76949269, 0.32971311,]
 [ 0.44192556, 0.12342816,]
 [ 0.35265034, 0.60821456,]
 [ 0.85184562, 0.27371284,]
 [ 0.54120302, 0.94661057,]
 [ 0.87526697, 0.54308629,]]
>>> large_subset.getData()
[[ 0.54120302, 0.94661057,]
 [ 0.87526697, 0.54308629,]]

So, for this particular random dataset, there are only two vectors with a magnitude larger than 1.0, which happen to be the last two vectors. For an example of replicating data in a SubsetView, let's create a view which triples the data above.

>>> larger_subset = RowSubsetView(ds, large_rows * 3)
>>> larger_subset.getData()
[[ 0.54120302, 0.94661057,]
 [ 0.87526697, 0.54308629,]
 [ 0.54120302, 0.94661057,]
 [ 0.87526697, 0.54308629,]
 [ 0.54120302, 0.94661057,]
 [ 0.87526697, 0.54308629,]]

Quick and easy. Now, before we move on, be sure to remove these two views from the original dataset. Refer to the previous sections if you are still unsure how to proceed.

Next: ColumnSubsetView Up: Views Previous: ColumnFunctionView Contents

Lucas Scharenbroich 2003-08-27