The Dataset class provides the basic mechanisms for wrapping a set
of data vectors in an object framework. Only basic operations are
exposed to the user-level, though many helper functions must be
implemented to provide for Labeling/View integration.
Methods
|
|
|
|
__castDataset
|
__castDataset ( self, obj )
Initializes the dataset with the object. Casting (conversion)
rules are as follows: If obj is a it is cast (converted) using
--------------- ----------------------------
String open a stream and use read_dataset_delimited_stream()
FileType (stream) read_dataset_delimited_stream()
ListType Numeric.array()
TupleType Numeric.array()
Numeric.ArrayType None
MA.MaskedArray None
Instance(Dataset) None
Otherwise, None is returned.
Note: read_dataset_delimited_stream() must be used instead of
Note: Scientific.IO.ArrayIO.readArray(), since the stream cannot be
Note: gaurenteed to be based on a file and thus have a filename.
|
|
__getCol
|
__getCol ( self, key )
Returns a Numeric vector corresponding to the specified row of the dataset.
If the row is out of range, None is returned
|
|
__getRow
|
__getRow ( self, key )
Returns a Numeric vector corresponding to the specified row of the dataset.
If the row is out of range, None is returned
|
|
__init__
|
__init__ ( self, data )
|
|
__repr__
|
__repr__ ( self )
|
|
__str__
|
__str__ ( self )
|
|
addLabeling
|
addLabeling ( self, labeling )
Add a labeling to the list of labelings in the dataset
|
|
addView
|
addView ( self, view )
Add a view to the view list of the dataset
|
|
getColData
|
getColData ( self, col )
Returns the data vector of a column of the dataset. If the column is
out of range, a ValueError() is raised. The vector itself is a Masked
Array.
|
|
getColKey
|
getColKey ( self, col )
Returns the key for a given column of the dataset.
If the column is out of range a ValueError() is raised.
|
|
getColKeys
|
getColKeys ( self )
Returns the full set of column keys for the dataset.
They will be returned in a list of length equal to the total number of
columns and arranged such that the nth key in the list corresponds to the
nth column of the dataset. getColKeys() is a wrapper around getKeys().
|
|
getData
|
getData ( self, key=None )
Returns the full dataset as a Numeric array.
getData() will, by default, return the full dataset as a Numeric array.
If key is not None, the vector with the specified key will be returned.
If such a vector does not exists, a ValueError() is raised. The data
is returned in a Masked Array.
|
|
getKeyMax
|
getKeyMax ( self )
Returns the largest valid key for the dataset.
|
|
getKeys
|
getKeys ( self, axis=0 )
Returns the valid keys for an axis in the order which they appear in the
dataset. An axis equal to zero will return the row keys and any other value will
return the column keys.
|
|
getLabeling
|
getLabeling ( self, name )
Returns the labeling with the given name, or None if it does not exist.
If multiple labelings exist with the same name, the first one encountered
is returned.
|
|
getLabelings
|
getLabelings ( self )
Returns a list of all the Labelings tied to this dataset.
|
|
getLineage
|
getLineage ( self )
Returns a list of lists of all paths from this dataset to its root
dataset(s). Because of supersets, there may be multiple base Dataset objects.
|
|
getName
|
getName ( self )
Returns the name of the Dataset or View.
|
|
getNumAxis
|
getNumAxis ( self, axis=0 )
Returns the number of elements along a particular axis
|
|
getNumCols
|
getNumCols ( self )
Returns the number of columns (dimensions or features) in the dataset.
|
|
getNumRows
|
getNumRows ( self )
Returns the number of rows (samples) in the dataset.
|
|
getRowData
|
getRowData ( self, row )
Returns the data vector of a row of the dataset. If the row is out of
range, a ValueError() is raised. The vector itself is a Masked Array.
|
|
getRowKey
|
getRowKey ( self, row )
Returns the key for a given row of the dataset.
If the row is out of range a ValueError() is raised.
|
|
getRowKeys
|
getRowKeys ( self )
Returns the full set of row keys for the dataset.
The keys will be returned in a list of length equal to the total number
of rows and arranged such that the nth key in the list corresponds to the
nth row of the dataset. getRowKeys() is a wrapper around getKeys().
|
|
getView
|
getView ( self, name )
Returns the view with the given name, or None if it does not exit
|
|
getViews
|
getViews ( self )
Return a list of all the views attatched to this dataset.
This may be conceived as a list of the child views from this node where
the view structure is a tree with the base dataset at the root. In the
case of superset views this structure breaks down, but is still valid for
finding all the views associated with a dataset.
|
|
isDirty
|
isDirty ( self )
Determine if the data in the Dataset or View is dirty.
isDirty() returns 1 if a parent of the View has changed (or potentially
changed) its data. A dirty views should recompute and data which depends
on the data of its parent.
|
|
removeLabeling
|
removeLabeling ( self, labeling )
Removes a labeling and all of its associated labels from the dataset.
|
|
removeView
|
removeView ( self, view )
Removes a view from a dataset if it exists
|
|
resetVars
|
resetVars ( self )
Clears internal variables.
Removes all references to labeling and view objects attached
to this dataset. The objects are detatch()ed to avoid dangling
references.
|
|
setName
|
setName ( self, name )
Sets the name of the Dataset or View.
|
|
writeDataset
|
writeDataset (
self,
stream=sys.stdout,
delimiter="\t",
rowLabeling=None,
)
Writes a dataset out to a stream.
The optional argument labelRows allows for you to specify a
row labeling to be prepended to each line in the data. The row
labeling must contain at least one label per row and only the first
label is used. If you simply set rowLabeling to 1, then an ordinal count
will be prepended to each line in the data.
|