Iterative methods in linear algebra

We now discuss methods in linear algebra that are iterative. The methods we will discuss come from two problems:

  1. Solving linear systems $Ax = b$
  2. Computing eigenvalues $(A-\lambda I)v = 0$, $v \neq 0$.

To discuss what it means to perform linear algebraic operations in a iterative manner we need a mechanism to measure the difference between two vectors and and the difference between two matrices.

Norms of vectors and matrices

A norm $\|x\|$ for a vector $x \in \mathbb R^{n}$ must satisfy the following properties:

  • $\|x\| \geq 0$ for all $x \in \mathbb R^n$
  • $\|x\| = 0$ if and only if $x = 0$ ($x$ is the zero vector)
  • $\|\alpha x\| = |\alpha| \|x\|$ for any $x \in \mathbb R^n, ~ \alpha \in \mathbb R$
  • $\|x + y\| \leq \|x\| + \|y\|$ for any $x,y \in \mathbb R^n$ (triangle inequality)

There are many different norms. One important class is the $\ell_p$ norms: For $1 \leq p < \infty$ and $x = (x_1,x_2,\ldots,x_n)^T$ define

\begin{align} \|x\|_p = \left( \sum_{i=1}^n |x_i|^p \right)^{1/p}. \end{align}

We will use this with $p =1,2$ and if one sends $p \to \infty$ we have

\begin{align} \|x\|_\infty = \max_{1 \leq i \leq n} |x_i|. \end{align}

The $\ell_2$ norm is commonly referred to as the Euclidean norm because the norm of $x -y$ for two vectors $x,y \in \mathbb R^3$ gives the straight-line distance between the two points $x$ and $y$ in three-dimensional space.

Let us now, check the four properties of a norm for the $\ell_2$ norm

\begin{align} \|x\|_2 = \left( \sum_{i=1}^n |x_i|^2 \right)^{1/2}. \end{align}
  • $\|x\| \geq 0$ for all $x \in \mathbb R^n$

This is clear from the definition.

  • $\|x\| = 0$ if and only if $x = 0$ ($x$ is the zero vector)

If $x = 0$ then $\|x\|_2 = 0$. If $\|x\|_2 = 0$ then $\sum_{i} |x_i|^2 = 0$. This implies $|x_i| = 0$ for each $i$ and therefore $x =0$.

  • $\|\alpha x\| = |\alpha| \|x\|$ for any $x \in \mathbb R^n, ~ \alpha \in \mathbb R$

The $i$th entry of $\alpha x$ is $\alpha x_i$ and so

$$ \|\alpha x\|_2 = \left( \sum_{i=1}^n |\alpha x_i|^2 \right)^{1/2} = \left( |\alpha|^2 \sum_{i=1}^n |x_i|^2 \right)^{1/2} = |\alpha| \left( \sum_{i=1}^n |x_i|^2 \right)^{1/2}= |\alpha| \|x\|_2. $$

Showing

  • $\|x + y\| \leq \|x\| + \|y\|$ (triangle inequality)

is much more involved. We need an intermediate result.

The Cauchy-Schwarz inequality

For $x,y \in \mathbb R^n$

$$ \left| x^T y \right| = \left| \sum_{i=1}^n x_i y_i\right| \leq \|x\|_2 \|y\|_2.$$

Before we prove this in general, let's verify it for $\mathbb R^2$

$$|x_1y_1 + x_2 y_2| \overset{\mathrm{?}}{\leq} \sqrt{x_1^2 + x_2^2} \sqrt{y_1^2 + y_2^2}$$$$(x_1y_1 + x_2 y_2)^2 \overset{\mathrm{?}}{\leq} (x_1^2 + x_2^2) (y_1^2 + y_2^2)$$$$x^2_1y^2_1 + x^2_2 y^2_2 + 2 x_1x_2y_1y_2\overset{\mathrm{?}}{\leq} x_1^2y_1^2 + x_2^2 y_2^2 + x_1^2y_2^2 + x_2^2 y_1^2$$$$ 2 x_1x_2y_1y_2\overset{\mathrm{?}}{\leq} x_1^2y_2^2 + x_2^2 y_1^2$$$$ 0\overset{\mathrm{?}}{\leq} x_1^2y_2^2 + x_2^2 y_1^2 - 2 x_1x_2y_1y_2$$$$ 0\overset{\mathrm{?}}{\leq} (x_1y_2 - x_2 y_1)^2 $$

This last inequality is true, and the Cauchy-Schwarz inequality follows in $\mathbb R^2$.

From this last calculation, it is clear that performing this type of calculation for general $n$ is going to be difficult and so, we need a new strategy. For $x,y \in \mathbb R^n$ and $\lambda \in \mathbb R$ consider the norm

$$\|x-\lambda y\|_2^2 = \sum_{i=1}^n (x_i-\lambda y_i)^2 = \sum_{i=1}^n x_i^2 - 2 \lambda \sum_{i=1}^n x_i y_i + \lambda^2 \sum_{i=1}^n y_i^2 = \|x\|_2^2 - 2 \lambda x^T y + \lambda^2 \|y\|_2^2.$$

Notice that the right-hand side has all the terms we encounter in the Cauchy--Schwarz inequality.

If we thing about this, just as a function of $\lambda$, keeping $x,y \in \mathbb R^n$ fixed, we have a parabola. We look at the minimum of the parabola: If $f(\lambda) = a + b \lambda + c \lambda^2, c > 0$ then $f$ attains its minimum when $\lambda = -b/(2c)$ and $f(\lambda) \geq a - b^2/(4c)$.

From this with $a = \|x|_2^2$, $b = -2 x^T y$ and $c = \|y\|_2^2$ we have

$$ \|x-\lambda y\|_2^2 \geq \|x\|_2^2 - \frac{(x^T y)^2}{\|y\|_2^2} \geq 0. $$

Note that this has to be non-negative because the minimum of a non-negative functions is also non-negative. Rearranging this last inequality, we have

$$(x^Ty)^2 \leq \|x\|_2^2 \|y\|_2^2$$

which is just the square of the Cauchy-Schwarz inequality.

To return to the triangle inequality, we compute (set $\lambda = -1$ in the previous calculation)

$$\|x + y\|_2^2 = \|x\|_2^2 + 2 x^T y + \|y\|_2^2 \leq \|x\|_2^2 + 2 |x^T y| + \|y\|_2^2$$$$\|x\|_2^2 + 2 |x^T y| + \|y\|_2^2 \leq \|x\|_2^2 + 2 \|x\|_2 \|y\|_2 + \|y\|_2^2$$

by the Cauchy-Schwarz inequality. But

$$\|x\|_2^2 + 2 \|x\|_2 \|y\|_2 + \|y\|_2^2 = (\|x\|_2 + \|y\|_2)^2$$

and summarizing we have

$$\|x + y\|_2^2 \leq (\|x\|_2 + \|y\|_2)^2.$$

Upon taking a square-root we see that $\|x + y\|_2 \leq \|x\|_2 + \|y\|_2$ as desired.

This actually follows for any $\ell_p$ norm but we will not prove it here.

Distance

The distance between two vectors $x,y \in \mathbb R^n$, given a norm $\|\cdot \|$ is defined to be

$$\|x - y\|.$$

Convergence

A sequence of vectors $\{x^{(k)}\}_{k=1}^\infty$, $x^{(k)} \in \mathbb R^n$ is said to converge to $x$ with respect to the norm $\|\cdot \|$ if given any $\epsilon > 0$ there exists $N(\epsilon) > 0$ such that

$$ \|x^{(k)} - x\| < \epsilon, \quad \text{ for all } \quad k \geq N(\epsilon). $$

Equivalently, $\lim_{k \to \infty} \|x^{(k)} - x\| = 0$.

Theorem

A sequence of vectors $\{x^{(k)}\}_{k=1}^\infty$, $x^{(k)} \in \mathbb R^n$ converge to $x$ with respect to the norm $\|\cdot \|_p$, $1 \leq p \leq \infty$, if and only if the components converge

$$ x_i^{(k)} \to x_i, \quad \text{as} \quad n \to \infty$$

for all $1 \leq i \leq n$.

Proof

We first prove it for $p = \infty$. We have

$$ |x_i^{(k)} -x_i| \leq \max_{1 \leq i \leq n} |x_i^{(k)}-x_i| = \|x^{(k)} - x\|_\infty$$

And so, convergence with respect to $\|\cdot \|_\infty$ (the right-hand side tends to zero as $k \to \infty$) implies that each of the individual components converge.

Now, assume that each of the individual components converge. For every $\epsilon > 0$ there exists $N_i(\epsilon)$ such that $k \geq N_{i}(\epsilon)$ implies that

$$ |x_i^{(k)} -x_i| < \epsilon.$$

Given $\epsilon > 0$, let $k \geq \max_{1\leq i \leq n} N_{i} (\epsilon)$. Then

$$ |x_i^{(k)} -x_i| < \epsilon, \quad \text{for every } 1 \leq i \leq n$$

and hence $\|x^{(k)} - x\|_\infty < \epsilon$.

To prove the theorem for general $1 \leq p < \infty$ we show $x^{(k)}$ converges to $x$ with respect to $\|\cdot\|_\infty$ if and only if it converges to $x$ with respect to $\|\cdot \|_p$. First, for any $x \in \mathbb R^n$ we have

$$ \|x\|_p = \left( \sum_{i=1}^n |x_i|^p \right)^{1/p} \leq \left( \sum_{i=1}^n \max_{1 \leq i \leq n} |x_i|^p \right)^{1/p} = \left( n \max_{1 \leq i \leq n} |x_i|^p \right)^{1/p} = n^{1/p} \|x\|_\infty. $$

Replacing $x$ with $x^{(k)} - x$ we have that $\|x^{(k)} - x\|_p \leq n^{1/p} \|x^{(k)} - x\|_\infty$. Thus convergence with respect to $\|\cdot \|_\infty$ implies convergence with respect to $\|\cdot\|_p$.

For the reverse inequality, note that for any $1 \leq j \leq n$

$$|x_j| \leq \left(\sum_{i = 1}^n |x_i|^p \right)^{1/p}$$

Indeed, if $x_j = 0$, this follows immediately. If $x_j \neq 0$ then

$$\left(\sum_{i = 1}^n |x_i|^p \right)^{1/p} = |x_j| \underbrace{\left( 1 + \sum_{i \neq j} \frac{|x_i|^p}{|x_j|^p} \right)^{1/p}}_{\geq 1} \geq |x_j| $$$$ \|x\|_\infty = \max_{1 \leq j \leq n} |x_j| \leq \left(\sum_{i = 1}^n |x_i|^p \right)^{1/p} = \|x\|_p.$$

Replacing $x$ with $x^{(k)} - x$ we have that $\|x^{(k)} - x\|_\infty \leq \|x^{(k)} - x\|_p$. Thus convergence with respect to $\|\cdot \|_p$ implies convergence with respect to $\|\cdot\|_\infty$.

The final logic is the following:

  • If a sequence converges with respect to $\|\cdot\|_p$, it converges with respect to $\|\cdot\|_\infty$ and therefore the individual components converge.

  • If the individual components converge, the sequence converges with respect to $\|\cdot\|_\infty$ and it then converges with respect to $\|\cdot\|_p$.

Matrix norms

We can also define norms on the set of all $n \times m$ matrices. We will just concentrate on norms for square $n \times n$ matrices. A matrix norm $\|\cdot \|$ should satisfy the following for all $n \times n$ matrices $A$ and $B$ and real numbers $\alpha$

  • $\|A\| \geq 0$
  • $\|A\| = 0$ if and only if $A = 0$ is the zero matrix
  • $\|\alpha A\| = |\alpha| \|A\|$
  • $\|A + B \| \leq \|A\| + \|B\|$
  • $\|AB\| \leq \|A\|\|B\|$

Note the last condition. This requires more than norms for vectors.

The distance between to matrices is then defined as $\|A - B\|$, as in the case of vectors.

We can construct a matrix norm from a vector norm.

Theorem

Let $\|\cdot\|$ be a norm on vectors in $\mathbb R^n$. Then

$$ \|A\| = \max_{\|x\| = 1} \|Ax\|$$

gives a matrix norm.

This is called the induced matrix norm.

Proof

  • Because $\|Ax\| \geq 0$ it follows that $\|A\| \geq 0$.
  • If $A = 0$ then $\|Ax\| = 0$ and $\|A\| = 0$. If $\|A\| = 0$ then $Ae_j = 0$. Since $Ae_j$ gives the $j$th column of $A$ where $e_j$ is the $j$th standard basis vector we know that every column of $A$ is zero and thus $A = 0$.
  • $\|\alpha A\| = \max_{\|x\| = 1} \|\alpha A x\| = |\alpha| \max_{\|x\| = 1} \| A x\|$ from the analogous property for the vector norm.
  • For the triangle inequality
$$ \|A + B\| \leq \max_{\|x\| = 1} \|(A + B)x\| \leq \max_{\|x\| = 1} \left( \|Ax\| + \|Bx\| \right) \leq \max_{\|x\| = 1, \|y\| = 1} \left( \|Ax\| + \|By\| \right)$$$$ \max_{\|x\| = 1, \|y\| = 1} \left( \|Ax\| + \|By\| \right) = \max_{\|x\| = 1} \|Ax\| + \max_{\|xy\| = 1} \|By\| = \|A\| + \|B\|$$

Therefore $\|A + B\| \leq \|A\| + \|B\|$.

  • Let $b(x) = \|Bx\| \neq 0$ for at least one $x$. Then $\|Bx/b(x)\| = 1$ for such an $x$. Then

    $$\|AB\| = \max_{\|x\| = 1,~~ b(x) \neq 0} b(x) \|A (B x/b(x)) \| \leq \max_{\|x\| =1} \left[ \max_{\|x\| = 1} b(x) \right] \|A x\| = \|B\| \|A\|.$$

    If $b(x) = 0$ for all $x$, then $B = 0$ and $AB = 0$, and we find $\|AB\| = 0$ from the definition.

Given an norm, it is important to find a formula (if possible) for the induced matrix norm.

Theorem

$$\|A\|_\infty = \max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}|.$$

The induced $\ell^\infty$ matrix norm is the "maximum absolute row sum."

Proof

First, recall the definition

$$\|A\|_\infty = \max_{\|x\| = 1} \|A x\|_\infty.$$

If $A = 0$, the formula is correct, it gives 0. Assume $A \neq 0$. Given a vector $x \in \mathbb R^n$ using the formula for the matrix-vector product

$$ \|Ax\|_\infty = \max_{1 \leq i \leq n} \left| \sum_{j=1}^n a_{ij} x_j \right|. $$

Now find the first row $i$ such that

$$\max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}| = \sum_{j=1}^n |a_{ij}|.$$

Define the $\mathrm{sign}(x)$ function by $\mathrm{sign}(x) = 1$ if $x > 0$, $\mathrm{sign}(x) = -1$ if $x < 0$ and $\mathrm{sign}(0) = 0$. Choose the vector $x$ by the rule $x_j = \mathrm{sign}(a_{ij})$, $1 \leq j\leq n$. Then because $A \neq 0$ one such $a_{ij}$ must be nonzero and $\|x\|_\infty = 1$. For this choice of $x$ it follows that

$$ \|Ax\|_\infty = \max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}|, $$

and therefore

$$\|A\|_\infty \geq \max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}|$$

.

Now, we must show the reverse inequality which is much easier: For any $x \in \mathbb R^n$, $\|x\|_\infty = 1$, by the triangle inequality

$$ \|Ax\|_\infty = \max_{1 \leq i \leq n} \left| \sum_{j=1}^n a_{ij} x_j \right| \leq \max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}| |x_j| \leq \max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}|.$$

The last inequality follows because $|x_j| \leq 1$ for each $j$. Taking the maximum of this expression over all $\|x|_\infty = 1$ we find

$$ \|A\|_\infty \leq \max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}|$$

which shows

$$ \|A\|_\infty = \max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}|.$$

It is important to note that the matrix norm $\|A\|_\infty$ is NOT the largest entry, in absolute value, of the matrix.

Theorem

$$\|A\|_1 = \max_{1 \leq j \leq n} \sum_{i=1}^n |a_{ij}|.$$

The induced $\ell^1$ matrix norm is the "maximum absolute column sum."

Proof

First, recall the definition

$$\|A\|_1 = \max_{\|x\| = 1} \|A x\|_1.$$

Given a vector $x \in \mathbb R^n$ using the formula for the matrix-vector product

$$ \|Ax\|_1 = \sum_{i=1}^n \left| \sum_{j=1}^n a_{ij} x_j \right|. $$

Now find the first column $j$ such that

$$\max_{1 \leq j \leq n} \sum_{i=1}^n |a_{ij}| = \sum_{i=1}^n |a_{ij}|.$$

Choose the vector $x$ by the rule $x_j = 1$, $x_i = 0$ for $i \neq j$. For this choice of $x$ it follows that

$$ \|Ax\|_1 = \sum_{i=1}^n |a_{ij}| = \max_{1 \leq j \leq n} \sum_{i=1}^n |a_{ij}|$$

and therefore

$$\|A\|_\infty \geq \max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}|.$$

Now, we must show the reverse inequality which is, again, much easier: For any $x \in \mathbb R^n$, $\|x\|_1 = 1$, by the triangle inequality

$$ \|Ax\|_1 = \sum_{i=1}^n \left| \sum_{j=1}^n a_{ij} x_j \right| \leq \sum_{j=1}^n \sum_{i=1}^n |a_{ij} x_j| = \sum_{j=1}^n |x_j| \sum_{i=1}^n |a_{ij}| \leq \sum_{j=1}^n |x_j| \max_{1 \leq j \leq n} \sum_{i=1}^n |a_{ij}| = \max_{1 \leq j \leq n} \sum_{i=1}^n |a_{ij}|.$$

The last inequality follows because $\|x\|_1 = 1$. Taking the maximum of this expression over all $\|x|_1 = 1$ we find

$$ \|A\|_1 \leq \max_{1 \leq j \leq n} \sum_{i=1}^n |a_{ij}|$$

which shows

$$ \|A\|_1 = \max_{1 \leq j \leq n} \sum_{i=1}^n |a_{ij}|.$$