Principal Component Analysis for clustering gene expression data
Ka Yee Yeung,
Walter L. Ruzzo
Supplementary Web Site
- Technical Report UW-CSE-00-11-03 (November 2000)
- June 2001 version
2001, volume 17, number 9, pages 763-774)
- Most recent written description of this work (Dec 2001)
in Chapter 4 of Ka Yee's thesis.
- Enlarged and colored figures in June 2001 version
- Additional figures not shown in June 2001 version
- Additional results of double-sized synthetic data sets
- Detailed description of the adjusted Rand index
and the clustering algorithms used
(including illustrations of the adjusted Rand indices with real clustering
results from the paper)
- Gene Expression data sets used:
the original web site containing the yeast cell cycle data
by Cho et al. 1998 no longer seems to work.
If you are interested in the full data, you can get the processed
Spellman et al. or the raw data from
SMD. In both cases, you would need to select the experimental
conditions that you need.
- Due to popular demand, we are making the subset of 384 genes we used in
Ka Yee's dissertation available (as text-delimited file).
We are also making the subset of 237 genes (corresponding to 4 MIPS
categories) used in Ka Yee's dissertation availabel (as text-delimited file).
- The ovary data set is not publicly available yet. Unfortunately, we do
not have permission to distribute this data.
Due to popular demand, this web site was updated on 3/24/2006.
If you have any questions or comments on this paper, feel free to email Ka Yee
Back to Ka Yee's research page.