Chapter 9 Linear Algebra

Python (and other numerical computation languages) map the mathematical concepts of matrix and vector to the corresponding data structures and functions. This is very fortunate because matrix manipulation is pretty much impossible without computers besides a limited number of simple tasks. Here we discuss numpy only but python includes other, more-or-less compatible frameworks, including tensorflow and pythorch that provide vectors and matrices.

9.1 Numpy Arrays as Vectors and Matrices

The basic data structure that corresponds to matrices and vectors are numpy arrays. One-dimensional arrays are just vectors, two-dimensional arrays are matrices (and higher-dimensional arrays are tensors).

9.1.1 1-D Arrays as Vectors

Numpy 1-D arrays can be used as vectors in \(\mathbb{R}^n\) and they implement all basic operations: addition, scalar multiplication, and the inner (dot) product.

9.1.1.1 Creating vectors

We create three example vectors:

a = np.array([1, 2, 3, 4])
a
## array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])
c = np.array([9, 10, 11, 12])

9.1.1.2 Vector operations

Vector addition and scalar multiplication can be just done using +, - and * vectorized operators:

a + b
## array([ 6,  8, 10, 12])
2*a
## array([2, 4, 6, 8])

We can also demonstrate that \(\boldsymbol{a}\), \(\boldsymbol{b}\) and \(\boldsymbol{c}\) are not linearly independent by showing that \(2\boldsymbol{b} - \boldsymbol{c} - \boldsymbol{a} = \boldsymbol{0}\):

2*b - c - a
## array([0, 0, 0, 0])

Note that the answer is not a number but a length-4 vector of zeros.

Exercise 9.1 Word embeddings are low-dimensional vectors to describe human words. These are computed based on words’ co-occurrence in texts. As similar words tend to occur in similar contexts, they tend to have similar embedding vectors. Interestingly, one can also do certain mathematical operations with embedding vectors

Below is the first five components (out of 100) for Berlin, Germany, France, and Paris (based on Stanford’s glove.twitter.27B.100d.txt):

word 1 2 3 4 5
Berlin -0.562 0.630 -0.453 -0.299 -0.006
Germany 0.194 0.507 0.287 0.132 -0.281
France 0.605 -0.678 -0.436 -0.019 -0.291
Paris -0.074 -0.855 -0.689 -0.057 -0.139

Compute Berlin - Germany + France. How close do you get to Paris? (Compute the difference between the expression and Paris).

See the solution

9.1.1.3 Vector (matrix) product

Matrix multiplication (dot product) can be done with @ symbol as multiplication sign, there is also np.dot function and .dot method. For instance, we create vectors \(\boldsymbol{a} = (1, 2)\) and \(\boldsymbol{b} = (11, 12)^T\):

a = np.array([1, 2])  # row vector
b = np.array([[11], [12]])  # column vector

and compute

a @ b  # inner (matrix) product
## array([35])

It is important to keep in mind that the ordinary multiplication sign * is not matrix product but elementwise multiplication with certain rules to handle mismatch in dimension (broadcasting). For instance

a * b  # elementwise product (broadcast as dimensions do not match)
## array([[11, 22],
##        [12, 24]])

This is not the matrix product! Note also that we did not get any error here (although we sometimes may get) as the computation is still valid, just not what we want to get here.

Exercise 9.2

Compute the following vector norms using basic numpy, without using the dedicated np.linalg.norm() function:
  • Euclidean norm of vector \((1,1)\)
  • Euclidean norm of \((1, 2, 2)\)
  • Euclidean norm of \((3, 2, 0, 2, 0, 2, 0, 2)\) using matrix product (inner product), not element-wise product.
  • Manhattan norm of \((1,1)\)
  • Chessboard norm of \((2,1)\)

Now repeat the same with np.linalg.norm().

Exercise 9.3 Compute the normalized versions of vectors in the previous exercise, using the given norms.

9.1.2 2-D Arrays as Matrices

In as similar fashion as we used 1-D arrays as vectors, we can use 2-D arrays as matrices. The operations +, -, and * work as expected for two types of operations—elementwise addition, subtraction, and multiplication; and for adding subtracting constants and multiplying with a scalar. Here are a few examples:

A = np.array([[1, 2], [3, 4]])
A + 10  # add constant matrix of correct size
## array([[11, 12],
##        [13, 14]])
A*10  # scalar multiplication
## array([[10, 20],
##        [30, 40]])

These are intuitive results and do not usually create problems.

Exactly as in case of vectors, matrix product can be done with @ and not with *, the latter is elementwise multiplication.

There are a few other handy ways to create matrices: np.eye creates a unit matrix with given size as

np.eye(4)  # unit matrix of size 4
## array([[1., 0., 0., 0.],
##        [0., 1., 0., 0.],
##        [0., 0., 1., 0.],
##        [0., 0., 0., 1.]])

np.diag either creates a diagonal matrix of a given vector, or if given a matrix, returns its diagonal:

np.diag([1, 2, 3])  # create a diagonal matrix
## array([[1, 0, 0],
##        [0, 2, 0],
##        [0, 0, 3]])
A = np.arange(12).reshape((3,4))
A  # note: non-square matrix
## array([[ 0,  1,  2,  3],
##        [ 4,  5,  6,  7],
##        [ 8,  9, 10, 11]])
np.diag(A)  # return diagonal of the matrix
## array([ 0,  5, 10])

Obviously, one can use all other functions that create array, e.g. np.ones and np.zeros.

Finally, one can transpose matrices using .T attribute. This is handy for manually creating column vectors:

1 + np.arange(4).T  # column vector 1, 2, 3, 4
## array([1, 2, 3, 4])

Exercise 9.4 Create matrices \(\mathsf{A} = \begin{pmatrix} 1 & 1\\ 1 & 11 \end{pmatrix}\) and \(\mathsf{I}\), \(2\times2\) unit matrix. Compute matrix product \(\mathsf{A}\cdot\mathsf{I}\) and elementwise product \(\mathsf{A}\odot\mathsf{I}\).

See the solution

Exercise 9.5 Create row vectors \[\begin{equation*} \boldsymbol{x}_{1}^{T} = \begin{pmatrix} 1 & 2 & 3 & 4 & 5 \end{pmatrix} \qquad \boldsymbol{x}_{2}^{T} = \begin{pmatrix} -1 & 2 & -3 & 4 & -5 \end{pmatrix} \qquad \boldsymbol{x}_{3}^{T} = \begin{pmatrix} 5 & 4 & 3 & 2 & 1 \end{pmatrix} \end{equation*}\] Note: if \(\boldsymbol{x}^{T}\) is a row vector, then \(\boldsymbol{x}\) is a column vector, i.e. \[\begin{equation} \boldsymbol{x}_1 = \begin{pmatrix} 1 \\ 2 \\ 3 \\ 4 \\ 5 \end{pmatrix} \end{equation}\] Stack all three row vectors into a matrix \[\begin{equation*} \mathsf{X} = \begin{pmatrix} 1 & 2 & 3 & 4 & 5\\ -1 & 2 & -3 & 4 & -5\\ 5 & 4 & 3 & 2 & 1\\ \end{pmatrix} \end{equation*}\] Create column vector \[\begin{equation*} \qquad \boldsymbol{\beta} = \begin{pmatrix} 0.1 & 0.1 & 0.1 & 0.1 & 0.1 \end{pmatrix}^{T} \end{equation*}\]

Remember: \[\begin{equation} \begin{pmatrix} 0.1 & 0.1 & 0.1 & 0.1 & 0.1 \end{pmatrix}^{T} = \begin{pmatrix} 0.1 \\ 0.1 \\ 0.1 \\ 0.1 \\ 0.1 \end{pmatrix} \end{equation}\] Now compute:

  • \(\boldsymbol{x}_1^T \cdot \boldsymbol{\beta}\)
  • \(\boldsymbol{x}_2^T \cdot \boldsymbol{\beta}\)
  • \(\boldsymbol{x}_3^T \cdot \boldsymbol{\beta}\)
  • \(\mathsf{X} \cdot \boldsymbol{\beta}\)

See the solution

9.2 Matrix product

TBD