Retrieving data

Next: Outputting the data Up: Datasets Previous: Naming a Dataset Contents

Retrieving data

Now, as you have seen, one is able to retrieve the full dataset via the getData() method, but what if your dataset contains many thousands of rows and you are only interested in one of them? It seems to be a waste to have to sift through a full dataset just to get a single entry. In fact, there are two methods which provide row and column access to the dataset. They are getRowData() and getColData(). These methods take a integer argument and returns the corresponding row or column as a 1-D vector. Note that the indices start at zero. If a row or column is too large, a ValueError() is raised. Let's try an example:

>>> ds.getRowData(0)
[ 1., 1., 1.,]
>>> ds.getRowData(1)
[ 2., 2., 2.,]
>>> ds.getRowData(2)
[ 3., 3., 3.,]
>>> ds.getRowData(3)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/home/lscharen/checkout/code/python/compClust/mlx/
        Dataset.py", line 219, 
  in getRowData
    raise ValueError()
ValueError

An interesting consequence of getRowData() raising an excepting, is that you can write a function which prints out each row of data without ever having to know how many rows of data there are! Consider the following function and its output:

>>> def foo(dataset):
       i = 0
       while (1):
          try:
            print str(i+1) + " " + str(dataset.getRowData(i))
            i = i + 1
          except:
             break
 
>>> foo(ds)
1 [ 1., 1., 1.,]
2 [ 2., 2., 2.,]
3 [ 3., 3., 3.,]

However, there is an easy way to determine how many rows and columns are in a Dataset object, and that is by using the getNumRows() and getNumCols() methods. For our example object, it should have three rows and three columns.

>>> ds.getNumRows()
3
>>> ds.getNumCols()
3

Next: Outputting the data Up: Datasets Previous: Naming a Dataset Contents

Lucas Scharenbroich 2003-08-27