For the first time, we turn out attention to the factorization of a non-square matrix. The technique we describe is widely used in regression and data analysis.
We assume $A$ is an $m \times n$ matrix and $m \geq n$. Everything we describe can be applied to $A^T$ in the $n \geq m$ case, to give full generality.
Let $A$ be an $m \times n$ matrix.
The rank of $A$, denoted $\mathrm{rank}(A)$ is the number of linearly independent columns of $A$ (equivalently the number of linearly independent rows).
The nullity of $A$ denoted $\mathrm{nullity}(A)$ is given by $n - \mathrm{rank}(A)$. It gives the dimension of the null space of $A$, i.e. the size of the largest linearly independent set of vectors $v \in \mathbb R^n$ such that $Av =0$.
Note that $\mathrm{rank}(A) \leq n$.
The SVD will be based, first, on a factorzation of the $n\times n$ matrix $A^T A$. So the following is of use:
It follows that $(A^TA)^T = A^T (A^T)^T = A^T A$. Similar calculations follow for $AA^T$.
Let $v\in \mathbb R^n$ be such that $Av = 0$. Then it is clear that $A^TAv = 0$. Now, if $v \in \mathbb R^n$ such that $A^TA v = 0$ we have $$ 0 = v^T A^T A v = \|Av\|_2^2 = 0.$$ This then implies that $Av = 0$. Since the two matrices share the same null vectors (or nullspace) the nullity must agree.
This follows directly from the definition of rank because the two matrices have the same number of columns: $n - \mathrm{nullity}(A)= n - \mathrm{nullity}(A^TA)$.
See the proof that $\displaystyle \|A\|_2 = [\rho(A^TA)]^{1/2}$ in Lecture 17.
Assume for $v \neq 0$, $\lambda \neq 0$, $A^TAv = \lambda v$. Then $A A^T Av = \lambda A v$. Let $w = Av$ and we have $AA^T w = \lambda w$. So, if $w\neq 0$ then $\lambda$ is an eigenvalue of $AA^T$ and $w$ cannob be zero as $A^TAv \neq 0 \Rightarrow Av \neq 0$. Now assume $AA^Tw = \lambda w$ for $\lambda\neq0$ and $w \neq 0$. Then $A^T AA^T w = \lambda A^T w$. Let $v = A^Tw$ and we have $A^TA v = \lambda v$ and we can conclude that $\lambda$ is an eigenvalue if $v \neq 0$. Again, if $v = 0$ then $A^TAw = 0$ which contradicts $\lambda,w\neq 0$. So, $\lambda$ is an eigenvalue of $A^TA$.
The singular values $\sigma_1,\sigma_2,\ldots,\sigma_n$ of an $m \times n$ matrix $A$ are the square roots of the eigenvalues of $A^TA$.
Note that this means that the rank of $A$ is equal to the number of non-zero singular values.
A singular value decomposition of an $m \times n$ real matrix $A$ is a factorization of the form
$$\underbrace{A}_{m \times n} = \underbrace{U}_{m\times m, ~\text{orthogonal}}~~ \underbrace{\Sigma}_{m \times n, ~\text{diagonal}} ~~\underbrace{V^T}_{n \times n, ~\text{orthogonal}},$$where the diagonal matrix $\Sigma$ contains the singular values
$$\Sigma = \begin{bmatrix} \sigma_1 & 0 & \cdots & 0\\ 0 & \sigma_2 & \ddots & \vdots\\ \vdots & \ddots & \ddots & 0\\ 0 & \cdots & 0 & \sigma_n \\ 0 & \cdots & \cdots & 0\\ \vdots & && \vdots\\ 0 & \cdots & \cdots & 0 \end{bmatrix}.$$Now, we must discuss the construction of the orthogonal matrices $U$ and $V$. Consider
$$A = U \Sigma V^T,$$$$A^T A = V \Sigma^T U^T U \Sigma V^T = V \Sigma^T \Sigma V^T$$.
To compute the product $\Sigma^T \Sigma$, let $D$ be the (diagonal) matrix of eigenvalues of $A^TA$ and we write
$$\Sigma = \begin{bmatrix} D^{1/2} \\ 0 \end{bmatrix},$$where the $0$ stands for a $m-n \times n$ block of zeros. Then
$$\Sigma^T \Sigma = \begin{bmatrix} D^{1/2} & 0 \end{bmatrix} \begin{bmatrix} D^{1/2} \\ 0 \end{bmatrix} = D.$$Therefore $A^TA = V D V^T$ and we conclude that $V$ is the matrix of (normalized, orthonormal) eigenvectors of the symmetric matrix $A^TA$.
Perhaps the most difficult step in computing the SVD is finding the matrix U. We write the equation for the SVD, assuming we know $V$ and $\Sigma$:
$$AV = U \Sigma.$$$$A\begin{bmatrix} v_1 & v_2& \ldots& v_n\end{bmatrix} = \begin{bmatrix} u_1 & u_2 & \ldots &u_n &u_{n+1} & \ldots& u_m\end{bmatrix} \begin{bmatrix} D^{1/2} \\ 0 \end{bmatrix}.$$The first $n$ columns, give the equations:
$$Av_j = \sigma_j u_j, \quad j =1,2,\ldots n.$$We assume the first $k$ singular values are non-zero:
$$u_j = \sigma_j^{-1} Av_j , \quad j =1,2,\ldots k.$$The vectors $u_{k+1}, \ldots, u_m$ are arbitrary, with the exception of the fact that we want $U$ to be an orthogonal matrix. So, we can find $u_{k+1}, \ldots, u_m$ via the Gram-Schmidt process.
For $1 \leq i \neq j \leq k$ we check
$$u_i^Tu_j = \frac{1}{\sigma_i\sigma_j} v_i^T A^TA v_j = \frac{\sigma_j^2}{\sigma_i\sigma_j} v_i^T v_j = 0,$$because $v_i$ and $v_j$ are orthonormal eigenvectors ($v_j$ with eigenvalue $\sigma_j^2$. Normality, $\|u_i\|_2 =1$ also follows if one takes $i = j$ in the above calculation, assuming, of course, that $v_i$ is normalized.
Find a singular value decomposition of
$$A = \begin{bmatrix} 1 & -1 \\ 2 & 0 \\ 1 & 1 \end{bmatrix}.$$