Table of Contents

Class: Dataset compClust/mlx/datasets/Dataset.py

Implementation of the IDataset interface.

The Dataset class provides the basic mechanisms for wrapping a set of data vectors in an object framework. Only basic operations are exposed to the user-level, though many helper functions must be implemented to provide for Labeling/View integration.

Base Classes   
IDataset
Methods   
__castDataset
__getCol
__getRow
__init__
__repr__
__str__
addLabeling
addView
getColData
getColKey
getColKeys
getData
getKeyMax
getKeys
getLabeling
getLabelings
getLineage
getName
getNumAxis
getNumCols
getNumRows
getRowData
getRowKey
getRowKeys
getView
getViews
isDirty
removeLabeling
removeView
resetVars
setName
writeDataset
  __castDataset 
__castDataset ( self,  obj )

Initializes the dataset with the object. Casting (conversion) rules are as follows:

If obj is a it is cast (converted) using --------------- ---------------------------- String open a stream and use read_dataset_delimited_stream() FileType (stream) read_dataset_delimited_stream() ListType Numeric.array() TupleType Numeric.array() Numeric.ArrayType None MA.MaskedArray None Instance(Dataset) None

Otherwise, None is returned.

Note: read_dataset_delimited_stream() must be used instead of Note: Scientific.IO.ArrayIO.readArray(), since the stream cannot be Note: gaurenteed to be based on a file and thus have a filename.

  __getCol 
__getCol ( self,  key )

Returns a Numeric vector corresponding to the specified row of the dataset. If the row is out of range, None is returned

  __getRow 
__getRow ( self,  key )

Returns a Numeric vector corresponding to the specified row of the dataset. If the row is out of range, None is returned

  __init__ 
__init__ ( self,  data )

Exceptions   
ValueError()
  __repr__ 
__repr__ ( self )

  __str__ 
__str__ ( self )

  addLabeling 
addLabeling ( self,  labeling )

Add a labeling to the list of labelings in the dataset

  addView 
addView ( self,  view )

Add a view to the view list of the dataset

  getColData 
getColData ( self,  col )

Returns the data vector of a column of the dataset. If the column is out of range, a ValueError() is raised. The vector itself is a Masked Array.

  getColKey 
getColKey ( self,  col )

Returns the key for a given column of the dataset.

If the column is out of range a ValueError() is raised.

Exceptions   
ValueError()
  getColKeys 
getColKeys ( self )

Returns the full set of column keys for the dataset.

They will be returned in a list of length equal to the total number of columns and arranged such that the nth key in the list corresponds to the nth column of the dataset. getColKeys() is a wrapper around getKeys().

  getData 
getData ( self,  key=None )

Returns the full dataset as a Numeric array.

getData() will, by default, return the full dataset as a Numeric array. If key is not None, the vector with the specified key will be returned. If such a vector does not exists, a ValueError() is raised. The data is returned in a Masked Array.

Exceptions   
ValueError()
  getKeyMax 
getKeyMax ( self )

Returns the largest valid key for the dataset.

  getKeys 
getKeys ( self,  axis=0 )

Returns the valid keys for an axis in the order which they appear in the dataset.

An axis equal to zero will return the row keys and any other value will return the column keys.

  getLabeling 
getLabeling ( self,  name )

Returns the labeling with the given name, or None if it does not exist. If multiple labelings exist with the same name, the first one encountered is returned.

  getLabelings 
getLabelings ( self )

Returns a list of all the Labelings tied to this dataset.

  getLineage 
getLineage ( self )

Returns a list of lists of all paths from this dataset to its root dataset(s).

Because of supersets, there may be multiple base Dataset objects.

  getName 
getName ( self )

Returns the name of the Dataset or View.

  getNumAxis 
getNumAxis ( self,  axis=0 )

Returns the number of elements along a particular axis

  getNumCols 
getNumCols ( self )

Returns the number of columns (dimensions or features) in the dataset.

  getNumRows 
getNumRows ( self )

Returns the number of rows (samples) in the dataset.

  getRowData 
getRowData ( self,  row )

Returns the data vector of a row of the dataset. If the row is out of range, a ValueError() is raised. The vector itself is a Masked Array.

  getRowKey 
getRowKey ( self,  row )

Returns the key for a given row of the dataset.

If the row is out of range a ValueError() is raised.

Exceptions   
ValueError()
  getRowKeys 
getRowKeys ( self )

Returns the full set of row keys for the dataset.

The keys will be returned in a list of length equal to the total number of rows and arranged such that the nth key in the list corresponds to the nth row of the dataset. getRowKeys() is a wrapper around getKeys().

  getView 
getView ( self,  name )

Returns the view with the given name, or None if it does not exit

  getViews 
getViews ( self )

Return a list of all the views attatched to this dataset.

This may be conceived as a list of the child views from this node where the view structure is a tree with the base dataset at the root. In the case of superset views this structure breaks down, but is still valid for finding all the views associated with a dataset.

  isDirty 
isDirty ( self )

Determine if the data in the Dataset or View is dirty.

isDirty() returns 1 if a parent of the View has changed (or potentially changed) its data. A dirty views should recompute and data which depends on the data of its parent.

  removeLabeling 
removeLabeling ( self,  labeling )

Removes a labeling and all of its associated labels from the dataset.

  removeView 
removeView ( self,  view )

Removes a view from a dataset if it exists

  resetVars 
resetVars ( self )

Clears internal variables.

Removes all references to labeling and view objects attached to this dataset. The objects are detatch()ed to avoid dangling references.

  setName 
setName ( self,  name )

Sets the name of the Dataset or View.

  writeDataset 
writeDataset (
        self,
        stream=sys.stdout,
        delimiter="\t",
        rowLabeling=None,
        )

Writes a dataset out to a stream.

The optional argument labelRows allows for you to specify a row labeling to be prepended to each line in the data. The row labeling must contain at least one label per row and only the first label is used. If you simply set rowLabeling to 1, then an ordinal count will be prepended to each line in the data.


Table of Contents

This document was automatically generated on Wed Aug 27 14:24:55 2003 by HappyDoc version 2.1