The QR (eigenvalue) algorithm

We've discussed how to compute a factorization

$$A = QR,$$

where $Q$ is orthogonal and $R$ is upper-triangular. For our presentation of the QR algorithm, we assume that $A= A^T$, i.e. $A$ is symmetric. The QR algorithm applies more generally, but we focus only on this case.

Because $A$ is symmetric, we know that, by the Spectral Theorem, $A = U D U^T$ where $D$ is a diagonal matrix (containing the eigenvalues) and $U$ is an orthogonal matrix (containing the eigenvectors). We continue with the philosophy of taking powers of the matrix, to assist us in finding the eigenvalues:

$$A^k = U D^k U^T.$$

The $2 \times 2$ case

We will consider the case where $A$ is $2 \times 2$ first. We assume $|\lambda_1| > |\lambda_2|$. If we look at the first column of $A^k$, we are computing the vector

$$r_1^{(k)}=A^k e_1, \quad e_1 = \begin{bmatrix} 1 & 0 \end{bmatrix}^T.$$

This is very similar to the Power method with starting vector $x^{(0)} = e_1$. Indeed, let $u_1$ and $u_2$ be the two columns of $U$, so that

$$ r_1^{(k)}=A^k e_1 = U D^k U^T e_1 = \begin{bmatrix} u_1 & u_2 \end{bmatrix} \begin{bmatrix} \lambda^k_1 & 0 \\ 0 & \lambda^k_2 \end{bmatrix} \begin{bmatrix} u_1^T \\ u_2^T \end{bmatrix} e_1, $$$$ = \lambda_1^k (u_1^T e_1) u_1 + \lambda_2^k (u_2^Te_1) u_2 = \lambda_1^k (u_1^T e_1) u_1 \left(1 + O \left( \left| \frac{\lambda_2}{\lambda_1}\right|^k \right) \right), \quad k \to \infty.$$

So, it follows that $A^ke_1$ aligns with the dominant eigevector $u_1$ as $k \to \infty$. A nice way to mathematically capture this fact is:

$$\lim_{k\to \infty} \frac{ |u_1^Tr_1^{(k)}|}{|u_2^Tr_1^{(k)}|} = \infty.$$

i.e., the projection on to the first eigenvector is asymptotically larger the projection onto the second. Typically, the same is true of $A^k e_2$, the second column of $A^k$:

$$ A^k e_2 = U D^k U^T e_2 = \begin{bmatrix} u_1 & u_2 \end{bmatrix} \begin{bmatrix} \lambda^k_1 & 0 \\ 0 & \lambda^k_2 \end{bmatrix} \begin{bmatrix} u_1^T \\ u_2^T \end{bmatrix} e_2, $$$$ = \lambda_1^k (u_1^T e_2) u_1 + \lambda_2^k (u_2^Te_2) u_2.$$

For notational convenience, define $\beta_{ij} = u_i^T e_j$, all of which we assume are non-zero. To try to isolate the second eigenvector, we can try Gram-Schmidt:

$$r_2^{(k)} = A^ke_2 - \left( \frac{(A^ke_1)^T A^k e_2}{(A^k e_1)^T A^k e_1)} \right) A^k e_1.$$

This creates a vector $r_k$ that is orthogonal to the first column of $A$. So, we can hope that $r_k$ will begin to look like the second eigenvector of $A$:

$$r_2^{(k} = \lambda_1^k \beta_{12} u_1 + \lambda_2^k \beta_{22} u_2 - \frac{(\lambda_1^k \beta_{11} u_1 + \lambda_2^k \beta_{21} u_2)^T(\lambda_1^k \beta_{12} u_1 + \lambda_2^k \beta_{22} u_2)}{(\lambda_1^k \beta_{11} u_1 + \lambda_2^k \beta_{21} u_2)^T(\lambda_1^k \beta_{11} u_1 + \lambda_2^k \beta_{21} u_2)}(\lambda_1^k \beta_{11} u_1 + \lambda_2^k \beta_{21} u_2)$$

We compute the products that occur in the fraction using the rule: If $u$ and $v$ are orthogonal, then $(\alpha u + \beta v)^T(\gamma u + \delta v) = \alpha \gamma + \beta \delta$ for scalars $\alpha,\beta,\gamma,\delta$. We find

$$ (\lambda_1^k \beta_{11} u_1 + \lambda_2^k \beta_{21} u_2)^T(\lambda_1^k \beta_{12} u_1 + \lambda_2^k \beta_{22} u_2) = \lambda_1^{2k} \beta_{11}\beta_{12} + \lambda_2^{2k} \beta_{22} \beta_{21},$$$$ (\lambda_1^k \beta_{11} u_1 + \lambda_2^k \beta_{21} u_2)^T(\lambda_1^k \beta_{11} u_1 + \lambda_2^k \beta_{21} u_2) = \lambda_1^{2k} \beta_{11}^2 + \lambda_2^{2k}\beta_{21}^2.$$

The ratio is then

$$\frac{(\lambda_1^k \beta_{11} u_1 + \lambda_2^k \beta_{21} u_2)^T(\lambda_1^k \beta_{12} u_1 + \lambda_2^k \beta_{22} u_2)}{(\lambda_1^k \beta_{11} u_1 + \lambda_2^k \beta_{21} u_2)^T(\lambda_1^k \beta_{11} u_1 + \lambda_2^k \beta_{21} u_2)} = \frac{ \lambda_1^{2k} \beta_{11}\beta_{12} + \lambda_2^{2k} \beta_{22} \beta_{21}} {\lambda_1^{2k} \beta_{11}^2 + \lambda_2^{2k} \beta_{22}^2} = \frac{ \beta_{11}\beta_{12} + \delta^{2k} \beta_{22} \beta_{21}} {\beta_{11}^2 + \delta^{2k} \beta_{22}^2}, ~~ \delta = \lambda_2/\lambda_1$$

To see if this vector $r_2^{(k)}$ points in the direction of $u_1$ or $u_2$ we take the inner products, using orthogonality:

$$ {u_1^Tr_2^{(k)}} = {\lambda_1^k} \beta_{12} - \frac{ \beta_{11}\beta_{12} + \delta^{2k} \beta_{22} \beta_{21}} {\beta_{11}^2 + \delta^{2k} \beta_{22}^2} {\lambda_1^k} \beta_{11} = \lambda_2^k \delta^{-k} \left( \beta_{12} - \frac{ \beta_{11}\beta_{12} + \delta^{2k} \beta_{22} \beta_{21}} {\beta_{11}^2 + \delta^{2k} \beta_{22}^2}\right).$$

It can be shown that the quantity in parenthesis is $O(|\delta|^{2k})$ as $k \to \infty$. In other words:

$$ {u_1^Tr_2^{(k)}} = \lambda_2^k O(|\delta|^k).$$

We now compare this to

$${u_2^Tr_2^{(k)}} = \lambda_2^k \left( \beta_{22} - \frac{ \beta_{11}\beta_{12} + \delta^{2k} \beta_{22} \beta_{21}} {\beta_{11}^2 + \delta^{2k} \beta_{22}^2} \beta_{12} \right).$$

This time the quantity in parenthesis has no reason to vanish as $k \to \infty$. We see that (typically)

$$\lim_{k \to \infty} \frac{|u_2^Tr_2^{(k)}|}{|u_1^Tr_2^{(k)}|} = \infty.$$

This is a long-winded way of saying that if you apply Gram-Schmidt to the columns of $A^k$, for a symmetric matrix $A$, then the orthogonal matrix that results will converge to the matrix $U$ of eigenvectors of $A$. This means that:

Theorem 1

Suppose $A$ is a symmetric, non-singular matrix $A$ with eigenvalues that have distinct magnitudes

$$|\lambda_1| > |\lambda_2| > \cdots > |\lambda_n|.$$

If $A^k = QR = Q^{(k)} R^{(k)}$ is the QR factorization of $A^k$, then as $k \to \infty$,

$$(Q^{(k)})^T A Q^{(k)} = (Q^{(k)})^T U D U^TQ^{(k)} \to D.$$

Computational issue: We should NOT compute $A^k$ for $k$ large! This will typically result in overflow and underflow. We need another way to compute $Q^{(k)}$. Consider the sequence:

$A = Q_1 R_1, \quad \text{ the QR factorization of $A$},$
$A_1 = R_1 Q_1, \quad \text{reverse the order of multiplication},$
$A_1 = Q_2 R_2, \quad \text{ the QR factorization of $A_1$},$
$A_2 = R_2 Q_2, \quad \text{reverse the order of multiplication},$
$A_2 = Q_3 R_3, \quad \text{ the QR factorization of $A_2$},$
$A_3 = R_3 Q_3, \quad \text{reverse the order of multiplication},$
$\vdots$

Theorem 2

For any $k \geq 1$, $A_k$ has the same eigenvalues as $A$, $Q_1 Q_{2} \cdots Q_k = Q^{(k)} \to U$ and $A_k \to D$ as $k \to \infty$.

Proof

We first show that the eigenvalues never change. $A = Q_1 R_1 = Q_1 R_1 Q_1 Q_1^T = Q_1 A_1 Q_1^T$. So, $A$ is similar to $A_1$. In general, it follows that $A_{k-1} = Q_k A_k Q_{k}^T$ and therefore $A_k$ is similar to $A$ for every $k$, by induction:

$$A_k = Q^T_k A_{k-1} Q_{k} = (Q_1 Q_2 \cdots Q_k)^T A Q_1 Q_{2} \cdots Q_k. $$

Next, we show that $Q_1 Q_{2} \cdots Q_k = Q^{(k)}$ for every $k$. For $k = 1$, $A = Q_1R_1$ and $A^1= Q^{(1)} R^{(1)}$, and by the uniqueness of the QR factorization $Q_1 = Q^{(1)}$. Now, assume $Q_{1}Q_{2} \cdots Q_{k-1} = Q^{(k-1)}$ and we show it holds for $k \to k +1$ (induction). So, we have

$$A^{k-1} = Q_{1} Q_{2} \cdots Q_{k-1} R^{(k-1)} = Q^{(k-1)}R^{(k-1)},$$$$ A^k = A Q_{1} Q_{2} \cdots Q_{k-1} R^{(k-1)},$$$$ A Q_{1} Q_{2} \cdots Q_{k-1} = Q_1 Q_2 \cdots Q_{k-1} A_{k-1}, $$$$ A^k = Q_1 Q_2 \cdots Q_{k-1} A_{k-1} R^{(k-1)},$$$$A_{k-1} = Q_k R_k,$$$$ A^k = Q_1 Q_2 \cdots Q_{k-1} Q_k R_k R^{(k-1)}.$$

By the uniqueness of the QR factorization ($R_k R^{(k-1)}$ is upper-triangular with positive diagonal entries), we find $Q_1 Q_2 \cdots Q_{k-1} Q_k = Q^{(k)}$. The convergence then follows from Theorem 1.

We need a convergence criteria for the QR algorithm. From the Gershgorin Circle Theorem, it follows that if the sum of the off-diagonal entries of $A_k$ in each row are each less then $\epsilon > 0$ then the eigenvalues are all known to an accuracy of $\epsilon$.

QR eigenvalue algorithm

for $k = 1,2,\ldots,N_{\max}$
   Set $[Q,R] = \text{QR}(A)$ % Do this with MGS
   Set $A = RQ$
   If $\|A - \mathrm{diag}(A)\|_\infty < \mathrm{TOL}$, output $\mathrm{diag}(A)$
endfor

In [2]:
A = randn(5);
A = A + A'; % Convergence is slow on some matrices
D = QRalg(A,400,10e-10)'
flipud(eigs(A))'
ans =

QR completed in 197 iterations


D =

    5.1809   -4.6179    3.3188    1.8265   -0.4698


ans =

    5.1809   -4.6179    3.3188    1.8265   -0.4698
In [1]:
A = randn(5);
A = A*A'; % Convergence is much faster on others
D = QRalg(A,400,10e-10)'
flipud(eigs(A))'
ans =

QR completed in 39 iterations


D =

   29.9783   16.1580    8.2269    1.2845    0.1065


ans =

   29.9783   16.1580    8.2269    1.2845    0.1065

Some final remarks on eigenvalue algorithms:

  1. This is not the QR algorithm that is used in practice. Typically, a symmetric matrix is reduced to tridiagonal form using Householder transformations. Then a QR algorithm, that is mathematically equivalent to the one we discussed, is used that exploits the tridiagonal structure. This gives a much more efficient algorithm.

  2. To see how the Power method can fail, consider the matrix $$A = \begin{bmatrix} -2 & 0 \\ 0 & 2 \end{bmatrix}.$$ We want to compute the eigenvalues with the symmetric Power method. Choose $x = x^{(0)} = [x_1, x_2]^T$. It follows that
    $$\lambda^{(k)} = \frac{x^T A^{2k-1} x}{x^T A^{2k-2} x}.$$ And then
    $$x^T A^{2k-1} x = (-2)^{2k-1} x_1^2 + 2^{2k-1} x_2^2 = 2^{2k-1}(x_2^2 - x_1^2),$$ $$x^T A^{2k-2} x = (-2)^{2k-2} x_1^2 + 2^{2k-2} x_2^2 = 2^{2k-2}(x_2^2 + x_1^2).$$ So, we find $$\lambda^{(k)} = 2\frac{x_2^2 - x_1^2}{x_2^2 + x_1^2} \neq \pm 2,$$ for most choices of $x_1$ and $x_2$. This is the worst case for a numerical algorithms: It converges to the wrong answer! This can be fixed. If one consider $A + I = \mathrm{diag}(-1,3)$, the power method will converge, $\lambda^{(k)} \to 3$ as $k \to \infty$. And because we added the identity matrix, we subtract one: $\lambda^{(k)} -1 \to 2$, a true eigenvalue of $A$.

  3. This failure of the Power method and its remedy, shows us the idea of "shifting" to knock the matrix into more convenient form where the method converges faster (or actually converges). To get the full power of the QR algorithm, one should use the so-called Wilkinson shift which greatly accelerates convegence.