next up previous contents
Next: Labelings Up: Views Previous: CachedView   Contents

MinLDView

The MinLDView stands for Minimal Linear Dependence View and it does as its name suggests - minimize the linear dependence of a dataset. In this context, the linear dependence of a dataset is measured by fitting a power function to the eigenvalues of a dataset's covariance matrix. The function $x^{-\gamma }$ is fit and the parameter $\gamma $ is interpreted as the degree of linear dependence in the dataset. The larger $\gamma $, the higher the linear dependence.

Since high linear dependence greatly affects the ability to segregate the data, we would like a way to reduce this value. The solution, is to scale and rotate the dataset in such a way as to compensate for the intrinsic linear dependence. The formula used is


\begin{displaymath}
D' = DV^{\top}S^{-1}V
\end{displaymath} (1)

where $D$ is the original dataset, $V$ is the matrix from the $SVD$, and $S^{-1}$ is a diagonal matrix of eigenvalue reciprocals. This operation amounts to first rotating the data into PCA space, normalizing the variance in each dimension, and then rotating the data back to the original feature space.

Using the same dataset we used in the ColumnPCAView example to compare and contrast with:

>>> ds = Dataset([[1,1],[2,1.9],[2.9,3],[4,4],[4.95,5.02]])
>>> ds.getData()
[[ 1.  , 1.  ,]
 [ 2.  , 1.9 ,]
 [ 2.9 , 3.  ,]
 [ 4.  , 4.  ,]
 [ 4.95, 5.02,]]
>>> ldv = MinLDView(ds)
>>> ldv.getData()
[[ 1.42990478, 0.59329554,]
 [ 7.92372264,-3.93665154,]
 [-0.91718921, 6.8437997 ,]
 [ 5.71961912, 2.37318217,]
 [ 3.53328951, 6.52308277,]]

It is interesting to compare these values with those in the ColumnPCAView. Notice that the values in the first columns are highly correlated, but the values in the second column have be scaled by a factor of 5 to 300.


next up previous contents
Next: Labelings Up: Views Previous: CachedView   Contents
Lucas Scharenbroich 2003-08-27