Setting the significant condition group threshold

Another important optional parameter is the sigCutoff threshold. For each principal component we partition the original dataset columns (aka conditions, e.g. samples, tissues, etc.) into three condition groups Up, Flat, and Down, depending on the value of this principal component's high and low extreme points in the original dataset column space.

sigCutoff is the significance level below which you reject the hypothesis that the high extreme points and low extreme points are drawn from the same distribution. A Wilcoxon rank sum test is used to determine the likelihood of this hypothesis for each original dimension (condition), and the conditions that meet this threshold are labeled as "Up" or "Down" conditions, depending on the direction of the difference. The default value of this threshold is sigCutoff = 0.05.

For an original column dimension (condition) of the dataset, if the mean of one principal component's high extreme points is much higher than the mean of that principal component's low extreme points, this condition is likely to be placed in the "Up" condition set for that principal component. Likewise, if the mean of the low extreme points is much higher than the mean of the high extreme points, this condition is likely to be placed in the "Down" condition set. The exact determination is based on whether the p-value of the Wilcoxon rank sum test is equal to or lower than sigCutoff.

It is also possible to estimate the Student's t-Test p-value as an alternative. We chose Wilcoxon rank-sum test because it has an adjustment for low sample sizes, and in part because it could compare two different sample set sizes.

Joe Roden 2005-12-13