ColumnPCAView

Next: RowPCAView Up: Views Previous: TransformView Contents

ColumnPCAView

The ColumnPCAView operates on the column space (rows) of a dataset and transforms the columns via a Principal Component Analysis (PCA) matrix. In PCA space each axis corresponds to one of the eigenvector in the covariance matrix of the dataset. These eigenvectors are sorted by eigenvalue (largest first), so the first dimension captures the most variance, the second dimension the second-most, etc. The ColumnPCAView is most useful for visualization and pre-processing.

The best way to illustrate the usefulness of the ColumnPCAView is to create a dataset with high linear dependence. In this case, the PCA projection will project the majority of the data into a single dimension. To try this, let's create a very simple 2D dataset.

>>> ds = Dataset([[1,1],[2,1.9],[2.9,3],[4,4],[4.95,5.02]])
>>> ds.getData()
[[ 1.  , 1.  ,]
 [ 2.  , 1.9 ,]
 [ 2.9 , 3.  ,]
 [ 4.  , 4.  ,]
 [ 4.95, 5.02,]]
>>> pv = ColumnPCAView(ds)
>>> pv.getData()
[[-1.41411205,-0.01694448,]
 [-2.75667127,-0.10374734,]
 [-4.17247777, 0.02071938,]
 [-5.65644819,-0.06777792,]
 [-7.04994162,-0.03497432,]]

Notice that the magnitude of the values in the second column are much smaller than the values in the first column. Also, the values in the first column are almost equal to the magnitude of each point, which is its distance along the line .

Lucas Scharenbroich 2003-08-27