/* ** princomp.src - computes principal components of a data matrx ** ** developed by Mico Loretan (loretanm@frb.gov), February 1996 ** ** Format: { p,v,a } = princomp(x,j) ** ** Inputs: x NxK data matrix, N > K, full rank ** ** J scalar, number of principal components to ** be computed (J <= K) ** ** Outputs: p NxJ matrix of the first J principal components of X ** in descending order of amount of variance "explained". ** ** v Jx1 vector of fractions of variance explained ** ** a JxK matrix of factor loadings, such that x = p * a + error ** ** Notes: ** ** (i) This proc makes use of Gauss's proc "eigrs2()," which ** computes the eigenvalues _and_ the normalized eigenvectors of a ** real symmetric matrix, in ascending order of the eigenvalues. ** ** (ii) The principal components are computed indirectly, by ** first computing the eigenvectors of the x'x matrix because ** computing the xx' matrix (and its eigenvectors) directly ** could exceed available memory for moderate and large values of n. ** ** (iii) GAUSS's eighv() function has been substituted for ** eigrs2(). - Ron Schoenberg ** ** Note: This algorithm is based on: Henri Theil, Priciples of ** Econometrics, 1971, New York: Wiley, pp. 46-56 (principal comps.) */ proc(3) = princomp(x,j); local k, first_j, eigvals, eigvecs, evals, evecs, p, v, a; /* three output variables */ k = cols(x); /* quick input sanity check */ if j > k; errorlog "Can't compute more than cols(x) principal components."; end; endif; /* indices of first j principal components */ first_j = seqa(k,-1,j); /* First, compute eigenvectors of x'x */ {eigvals,eigvecs} = eighv(moment(x,0)); /* sorted in ascending order */ evals = submat(eigvals,first_j,1); /* pick off j largest values */ evecs = submat(eigvecs,0,first_j); /* pick off corr. eigenvectors */ /* ** We need the eigenvectors of x'x to have non-unit length; in ** fact, the length of each eigenvector must equal the square root ** of the corresponding eigenvalue. ** ** Note: (a') is also equal to the matrix of "factor loadings" ** for the j principal components (Theil, p. 54). ** (I.e., we have x=p*a'+v, where (a') is the matrix of ** least squares coefficients of the regression of x on p. ** If j=k, then x=p*a', and all variation in x is "explained.") */ a = evecs .* sqrt(evals'); /* ** Second, get eigenvectors of xx' (= principal components of x!) */ p = (x*a)./(evals'); /* these eigenvectors have unit length */ /* ** Third, compute fraction of total variation explained by ** each of the j principal components. */ v = evals/sumc(eigvals); retp(p,v,a'); endp; /* princomp() */