Principal Components Analysis (PCA) Background

Principal Components Analysis (PCA) is a numerical procedure for analyzing the sources of variation present in a multi-dimensional dataset. We employ it to analyze gene expression microarray datasets, but the concept is general to any multidimensional dataset worth analyzing.

We carry out PCA by applying singular value decomposition (SVD) to the covariance matrix of $D$, $cov(D)$, to produce the decomposition that contains the eigenvectors of $cov(D)$ in the columns of $U$ and eigenvalues in the diagonal of $S$ such that the eigenvalues are sorted by descending size.

Each covariance eigenvector, or principal component, explains a fraction of the total variance contained in the dataset, and each principal component $P_{n+1}$ is orthogonal to the previous principal component $P_n$. such that they define the basis of a new vector space $P$.

Joe Roden 2005-12-13