Load a dataset

The Dataset class is capable of casting a number of different sources of information into the proper format such as strings (which are interpreted as file names), list of lists, and numeric arrays to list the most common.

To avoid having to construct the full path name for all of the files that we'll be loading it is convenient to just change to the directory that they're all located in.

For this tutorial will be starting with the Cho Yeast cell cycling data set ([Cho et al., 1998]) as it is fairly small and easy to work with.

If you used the default options for the windows installer the data should be located in C:$\backslash$Program Files$\backslash$CompClustShell$\backslash$Examples$\backslash$

# This cd command is actually ipython specific, if you're
# using python, use os.chdir(os.path.join('Examples', 'ChoCellCycling'))
cd Examples/ChoCellCycling

# at last some data
cho = datasets.Dataset("ChoCycling.dat", 'cho')

As a finally check if you type "cho.numRows" python should return 384.



Brandon King 2005-07-29