We now discuss methods in linear algebra that are iterative. The methods we will discuss come from two problems:
To discuss what it means to perform linear algebraic operations in a iterative manner we need a mechanism to measure the difference between two vectors and and the difference between two matrices.
A norm $\|x\|$ for a vector $x \in \mathbb R^{n}$ must satisfy the following properties:
There are many different norms. One important class is the $\ell_p$ norms: For $1 \leq p < \infty$ and $x = (x_1,x_2,\ldots,x_n)^T$ define
\begin{align} \|x\|_p = \left( \sum_{i=1}^n |x_i|^p \right)^{1/p}. \end{align}We will use this with $p =1,2$ and if one sends $p \to \infty$ we have
\begin{align} \|x\|_\infty = \max_{1 \leq i \leq n} |x_i|. \end{align}The $\ell_2$ norm is commonly referred to as the Euclidean norm because the norm of $x -y$ for two vectors $x,y \in \mathbb R^3$ gives the straight-line distance between the two points $x$ and $y$ in three-dimensional space.
Let us now, check the four properties of a norm for the $\ell_2$ norm
\begin{align} \|x\|_2 = \left( \sum_{i=1}^n |x_i|^2 \right)^{1/2}. \end{align}This is clear from the definition.
If $x = 0$ then $\|x\|_2 = 0$. If $\|x\|_2 = 0$ then $\sum_{i} |x_i|^2 = 0$. This implies $|x_i| = 0$ for each $i$ and therefore $x =0$.
The $i$th entry of $\alpha x$ is $\alpha x_i$ and so
$$ \|\alpha x\|_2 = \left( \sum_{i=1}^n |\alpha x_i|^2 \right)^{1/2} = \left( |\alpha|^2 \sum_{i=1}^n |x_i|^2 \right)^{1/2} = |\alpha| \left( \sum_{i=1}^n |x_i|^2 \right)^{1/2}= |\alpha| \|x\|_2. $$Showing
is much more involved. We need an intermediate result.
For $x,y \in \mathbb R^n$
$$ \left| x^T y \right| = \left| \sum_{i=1}^n x_i y_i\right| \leq \|x\|_2 \|y\|_2.$$Before we prove this in general, let's verify it for $\mathbb R^2$
$$|x_1y_1 + x_2 y_2| \overset{\mathrm{?}}{\leq} \sqrt{x_1^2 + x_2^2} \sqrt{y_1^2 + y_2^2}$$$$(x_1y_1 + x_2 y_2)^2 \overset{\mathrm{?}}{\leq} (x_1^2 + x_2^2) (y_1^2 + y_2^2)$$$$x^2_1y^2_1 + x^2_2 y^2_2 + 2 x_1x_2y_1y_2\overset{\mathrm{?}}{\leq} x_1^2y_1^2 + x_2^2 y_2^2 + x_1^2y_2^2 + x_2^2 y_1^2$$$$ 2 x_1x_2y_1y_2\overset{\mathrm{?}}{\leq} x_1^2y_2^2 + x_2^2 y_1^2$$$$ 0\overset{\mathrm{?}}{\leq} x_1^2y_2^2 + x_2^2 y_1^2 - 2 x_1x_2y_1y_2$$$$ 0\overset{\mathrm{?}}{\leq} (x_1y_2 - x_2 y_1)^2 $$This last inequality is true, and the Cauchy-Schwarz inequality follows in $\mathbb R^2$.
From this last calculation, it is clear that performing this type of calculation for general $n$ is going to be difficult and so, we need a new strategy. For $x,y \in \mathbb R^n$ and $\lambda \in \mathbb R$ consider the norm
$$\|x-\lambda y\|_2^2 = \sum_{i=1}^n (x_i-\lambda y_i)^2 = \sum_{i=1}^n x_i^2 - 2 \lambda \sum_{i=1}^n x_i y_i + \lambda^2 \sum_{i=1}^n y_i^2 = \|x\|_2^2 - 2 \lambda x^T y + \lambda^2 \|y\|_2^2.$$Notice that the right-hand side has all the terms we encounter in the Cauchy--Schwarz inequality.
If we thing about this, just as a function of $\lambda$, keeping $x,y \in \mathbb R^n$ fixed, we have a parabola. We look at the minimum of the parabola: If $f(\lambda) = a + b \lambda + c \lambda^2, c > 0$ then $f$ attains its minimum when $\lambda = -b/(2c)$ and $f(\lambda) \geq a - b^2/(4c)$.
From this with $a = \|x|_2^2$, $b = -2 x^T y$ and $c = \|y\|_2^2$ we have
$$ \|x-\lambda y\|_2^2 \geq \|x\|_2^2 - \frac{(x^T y)^2}{\|y\|_2^2} \geq 0. $$Note that this has to be non-negative because the minimum of a non-negative functions is also non-negative. Rearranging this last inequality, we have
$$(x^Ty)^2 \leq \|x\|_2^2 \|y\|_2^2$$which is just the square of the Cauchy-Schwarz inequality.
To return to the triangle inequality, we compute (set $\lambda = -1$ in the previous calculation)
$$\|x + y\|_2^2 = \|x\|_2^2 + 2 x^T y + \|y\|_2^2 \leq \|x\|_2^2 + 2 |x^T y| + \|y\|_2^2$$$$\|x\|_2^2 + 2 |x^T y| + \|y\|_2^2 \leq \|x\|_2^2 + 2 \|x\|_2 \|y\|_2 + \|y\|_2^2$$by the Cauchy-Schwarz inequality. But
$$\|x\|_2^2 + 2 \|x\|_2 \|y\|_2 + \|y\|_2^2 = (\|x\|_2 + \|y\|_2)^2$$and summarizing we have
$$\|x + y\|_2^2 \leq (\|x\|_2 + \|y\|_2)^2.$$Upon taking a square-root we see that $\|x + y\|_2 \leq \|x\|_2 + \|y\|_2$ as desired.
This actually follows for any $\ell_p$ norm but we will not prove it here.
The distance between two vectors $x,y \in \mathbb R^n$, given a norm $\|\cdot \|$ is defined to be
$$\|x - y\|.$$A sequence of vectors $\{x^{(k)}\}_{k=1}^\infty$, $x^{(k)} \in \mathbb R^n$ is said to converge to $x$ with respect to the norm $\|\cdot \|$ if given any $\epsilon > 0$ there exists $N(\epsilon) > 0$ such that
$$ \|x^{(k)} - x\| < \epsilon, \quad \text{ for all } \quad k \geq N(\epsilon). $$Equivalently, $\lim_{k \to \infty} \|x^{(k)} - x\| = 0$.
A sequence of vectors $\{x^{(k)}\}_{k=1}^\infty$, $x^{(k)} \in \mathbb R^n$ converge to $x$ with respect to the norm $\|\cdot \|_p$, $1 \leq p \leq \infty$, if and only if the components converge
$$ x_i^{(k)} \to x_i, \quad \text{as} \quad n \to \infty$$for all $1 \leq i \leq n$.
We first prove it for $p = \infty$. We have
$$ |x_i^{(k)} -x_i| \leq \max_{1 \leq i \leq n} |x_i^{(k)}-x_i| = \|x^{(k)} - x\|_\infty$$And so, convergence with respect to $\|\cdot \|_\infty$ (the right-hand side tends to zero as $k \to \infty$) implies that each of the individual components converge.
Now, assume that each of the individual components converge. For every $\epsilon > 0$ there exists $N_i(\epsilon)$ such that $k \geq N_{i}(\epsilon)$ implies that
$$ |x_i^{(k)} -x_i| < \epsilon.$$Given $\epsilon > 0$, let $k \geq \max_{1\leq i \leq n} N_{i} (\epsilon)$. Then
$$ |x_i^{(k)} -x_i| < \epsilon, \quad \text{for every } 1 \leq i \leq n$$and hence $\|x^{(k)} - x\|_\infty < \epsilon$.
To prove the theorem for general $1 \leq p < \infty$ we show $x^{(k)}$ converges to $x$ with respect to $\|\cdot\|_\infty$ if and only if it converges to $x$ with respect to $\|\cdot \|_p$. First, for any $x \in \mathbb R^n$ we have
$$ \|x\|_p = \left( \sum_{i=1}^n |x_i|^p \right)^{1/p} \leq \left( \sum_{i=1}^n \max_{1 \leq i \leq n} |x_i|^p \right)^{1/p} = \left( n \max_{1 \leq i \leq n} |x_i|^p \right)^{1/p} = n^{1/p} \|x\|_\infty. $$Replacing $x$ with $x^{(k)} - x$ we have that $\|x^{(k)} - x\|_p \leq n^{1/p} \|x^{(k)} - x\|_\infty$. Thus convergence with respect to $\|\cdot \|_\infty$ implies convergence with respect to $\|\cdot\|_p$.
For the reverse inequality, note that for any $1 \leq j \leq n$
$$|x_j| \leq \left(\sum_{i = 1}^n |x_i|^p \right)^{1/p}$$Indeed, if $x_j = 0$, this follows immediately. If $x_j \neq 0$ then
$$\left(\sum_{i = 1}^n |x_i|^p \right)^{1/p} = |x_j| \underbrace{\left( 1 + \sum_{i \neq j} \frac{|x_i|^p}{|x_j|^p} \right)^{1/p}}_{\geq 1} \geq |x_j| $$$$ \|x\|_\infty = \max_{1 \leq j \leq n} |x_j| \leq \left(\sum_{i = 1}^n |x_i|^p \right)^{1/p} = \|x\|_p.$$Replacing $x$ with $x^{(k)} - x$ we have that $\|x^{(k)} - x\|_\infty \leq \|x^{(k)} - x\|_p$. Thus convergence with respect to $\|\cdot \|_p$ implies convergence with respect to $\|\cdot\|_\infty$.
The final logic is the following:
If a sequence converges with respect to $\|\cdot\|_p$, it converges with respect to $\|\cdot\|_\infty$ and therefore the individual components converge.
If the individual components converge, the sequence converges with respect to $\|\cdot\|_\infty$ and it then converges with respect to $\|\cdot\|_p$.
We can also define norms on the set of all $n \times m$ matrices. We will just concentrate on norms for square $n \times n$ matrices. A matrix norm $\|\cdot \|$ should satisfy the following for all $n \times n$ matrices $A$ and $B$ and real numbers $\alpha$
Note the last condition. This requires more than norms for vectors.
The distance between to matrices is then defined as $\|A - B\|$, as in the case of vectors.
We can construct a matrix norm from a vector norm.
Let $\|\cdot\|$ be a norm on vectors in $\mathbb R^n$. Then
$$ \|A\| = \max_{\|x\| = 1} \|Ax\|$$gives a matrix norm.
This is called the induced matrix norm.
Therefore $\|A + B\| \leq \|A\| + \|B\|$.
Let $b(x) = \|Bx\| \neq 0$ for at least one $x$. Then $\|Bx/b(x)\| = 1$ for such an $x$. Then
$$\|AB\| = \max_{\|x\| = 1,~~ b(x) \neq 0} b(x) \|A (B x/b(x)) \| \leq \max_{\|x\| =1} \left[ \max_{\|x\| = 1} b(x) \right] \|A x\| = \|B\| \|A\|.$$
If $b(x) = 0$ for all $x$, then $B = 0$ and $AB = 0$, and we find $\|AB\| = 0$ from the definition.
Given an norm, it is important to find a formula (if possible) for the induced matrix norm.
The induced $\ell^\infty$ matrix norm is the "maximum absolute row sum."
First, recall the definition
$$\|A\|_\infty = \max_{\|x\| = 1} \|A x\|_\infty.$$If $A = 0$, the formula is correct, it gives 0. Assume $A \neq 0$. Given a vector $x \in \mathbb R^n$ using the formula for the matrix-vector product
$$ \|Ax\|_\infty = \max_{1 \leq i \leq n} \left| \sum_{j=1}^n a_{ij} x_j \right|. $$Now find the first row $i$ such that
$$\max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}| = \sum_{j=1}^n |a_{ij}|.$$Define the $\mathrm{sign}(x)$ function by $\mathrm{sign}(x) = 1$ if $x > 0$, $\mathrm{sign}(x) = -1$ if $x < 0$ and $\mathrm{sign}(0) = 0$. Choose the vector $x$ by the rule $x_j = \mathrm{sign}(a_{ij})$, $1 \leq j\leq n$. Then because $A \neq 0$ one such $a_{ij}$ must be nonzero and $\|x\|_\infty = 1$. For this choice of $x$ it follows that
$$ \|Ax\|_\infty = \max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}|, $$and therefore
$$\|A\|_\infty \geq \max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}|$$.
Now, we must show the reverse inequality which is much easier: For any $x \in \mathbb R^n$, $\|x\|_\infty = 1$, by the triangle inequality
$$ \|Ax\|_\infty = \max_{1 \leq i \leq n} \left| \sum_{j=1}^n a_{ij} x_j \right| \leq \max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}| |x_j| \leq \max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}|.$$The last inequality follows because $|x_j| \leq 1$ for each $j$. Taking the maximum of this expression over all $\|x|_\infty = 1$ we find
$$ \|A\|_\infty \leq \max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}|$$which shows
$$ \|A\|_\infty = \max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}|.$$It is important to note that the matrix norm $\|A\|_\infty$ is NOT the largest entry, in absolute value, of the matrix.
The induced $\ell^1$ matrix norm is the "maximum absolute column sum."
First, recall the definition
$$\|A\|_1 = \max_{\|x\| = 1} \|A x\|_1.$$Given a vector $x \in \mathbb R^n$ using the formula for the matrix-vector product
$$ \|Ax\|_1 = \sum_{i=1}^n \left| \sum_{j=1}^n a_{ij} x_j \right|. $$Now find the first column $j$ such that
$$\max_{1 \leq j \leq n} \sum_{i=1}^n |a_{ij}| = \sum_{i=1}^n |a_{ij}|.$$Choose the vector $x$ by the rule $x_j = 1$, $x_i = 0$ for $i \neq j$. For this choice of $x$ it follows that
$$ \|Ax\|_1 = \sum_{i=1}^n |a_{ij}| = \max_{1 \leq j \leq n} \sum_{i=1}^n |a_{ij}|$$and therefore
$$\|A\|_\infty \geq \max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}|.$$Now, we must show the reverse inequality which is, again, much easier: For any $x \in \mathbb R^n$, $\|x\|_1 = 1$, by the triangle inequality
$$ \|Ax\|_1 = \sum_{i=1}^n \left| \sum_{j=1}^n a_{ij} x_j \right| \leq \sum_{j=1}^n \sum_{i=1}^n |a_{ij} x_j| = \sum_{j=1}^n |x_j| \sum_{i=1}^n |a_{ij}| \leq \sum_{j=1}^n |x_j| \max_{1 \leq j \leq n} \sum_{i=1}^n |a_{ij}| = \max_{1 \leq j \leq n} \sum_{i=1}^n |a_{ij}|.$$The last inequality follows because $\|x\|_1 = 1$. Taking the maximum of this expression over all $\|x|_1 = 1$ we find
$$ \|A\|_1 \leq \max_{1 \leq j \leq n} \sum_{i=1}^n |a_{ij}|$$which shows
$$ \|A\|_1 = \max_{1 \leq j \leq n} \sum_{i=1}^n |a_{ij}|.$$