next up previous contents
Next: Views Up: Datasets Previous: Retrieving data   Contents

Outputting the data

At this point, there is really only one more method worth mentioning - the writeDataset() method which pretty-prints the dataset to a given output stream. By default, writeDataset() writes to stdout with a tab delimiter, but other combinations are possible. The example code below shows several variations, including dumping a dataset to a file which is readable by any program which can read tab-delimited text files (which includes Matlab and Excel).

>>> # need the os and sys package
... import os
... import sys
>>> ds.writeDataset()
1.0     1.0     1.0
2.0     2.0     2.0
3.0     3.0     3.0
>>> ds.writeDataset(sys.stderr)
1.0     1.0     1.0
2.0     2.0     2.0
3.0     3.0     3.0
>>> ds.writeDataset(delimiter='*')
1.0*1.0*1.0
2.0*2.0*2.0
3.0*3.0*3.0
>>> ds.writeDataset(sys.stderr, '*')
1.0*1.0*1.0
2.0*2.0*2.0
3.0*3.0*3.0
>>> stream = open('ds-dump.txt','w')
>>> ds.writeDataset(stream)
>>> stream.close()
>>> result = os.system('cat ds-dump.txt')
1.0     1.0     1.0
2.0     2.0     2.0
3.0     3.0     3.0

The last topic to touch on before moving on to the View section is to discuss how the Dataset constructor handles multiple data types as initializers.

The Dataset constructor can take in the following object types:

StringType
Assumes the the string represents a tab-delimited filename and attempts to read the data from said file.
FileType
A FileType represents an open file. The Dataset constructor will attempt to read from this file.
ListType
The list is converted to a Numeric array.
TupleType
The tuples is converted to a Numeric array.
ArrayType
The array is used as is
Dataset instance
A new object is created which references the existing dataset. This does not make a copy of the data.

Also, it is usefule to know that is the object type is a StringType, it may be a valid URL as well. This allows PyMLX to load dataset directly from the web. The file loader will also transparently handle Gzip compressed files is the file name ends in .gz.

The constructor attempts to be smart about loading files. A text input file may have optional string labels in the first column of the dataset. If the first column is not a valid number, then the loader assumes that the first column contains textual labels and will build the dataset from column two onward.

The implication of this is that if your dataset contains numerical labels in the first column, these must be dealt with in your own code as a special case.

And now, on to View ...


next up previous contents
Next: Views Up: Datasets Previous: Retrieving data   Contents
Lucas Scharenbroich 2003-08-27