Principal Component Analysis for clustering gene expression data

Principal Component Analysis for clustering gene expression data

Ka Yee Yeung, Walter L. Ruzzo

Supplementary Web Site

Technical Report UW-CSE-00-11-03 (November 2000) postscript pdf.
June 2001 version postscript pdf (Bioinformatics 2001, volume 17, number 9, pages 763-774)
Most recent written description of this work (Dec 2001) in Chapter 4 of Ka Yee's thesis.
Enlarged and colored figures in June 2001 version
Additional figures not shown in June 2001 version
Additional results of double-sized synthetic data sets
Detailed description of the adjusted Rand index and the clustering algorithms used (including illustrations of the adjusted Rand indices with real clustering results from the paper) postscript pdf.
Gene Expression data sets used:
- Unfortunately, the original web site containing the yeast cell cycle data by Cho et al. 1998 no longer seems to work. If you are interested in the full data, you can get the processed data from Spellman et al. or the raw data from SMD. In both cases, you would need to select the experimental conditions that you need.
  - Due to popular demand, we are making the subset of 384 genes we used in Ka Yee's dissertation available (as text-delimited file).
    - raw subset data without any normalization
    - log-transformed subset data
    - standardized subset data
    - legend.
  - We are also making the subset of 237 genes (corresponding to 4 MIPS categories) used in Ka Yee's dissertation availabel (as text-delimited file).
    - log-transformed
    - standardized
- The ovary data set is not publicly available yet. Unfortunately, we do not have permission to distribute this data.
Due to popular demand, this web site was updated on 3/24/2006.
If you have any questions or comments on this paper, feel free to email Ka Yee
Back to Ka Yee's research page.