Chapter 9 Linear Algebra

Python (and other numerical computation languages) map the mathematical concepts of matrix and vector to the corresponding data structures and functions. This is very fortunate because matrix manipulation is pretty much impossible without computers besides a limited number of simple tasks. Here we discuss numpy only but python includes other, more-or-less compatible frameworks, including tensorflow and pythorch that provide vectors and matrices.

9.1 Numpy Arrays as Vectors and Matrices

The basic data structure that corresponds to matrices and vectors are numpy arrays. One-dimensional arrays are just vectors, two-dimensional arrays are matrices (and higher-dimensional arrays are tensors).

9.1.1 1-D Arrays as Vectors

Numpy 1-D arrays can be used as vectors in \(\mathbb{R}^n\) and they implement all basic operations: addition, scalar multiplication, and the inner (dot) product.

9.1.1.1 Creating vectors

We create three example vectors:

a = np.array([1, 2, 3, 4])
a

## array([1, 2, 3, 4])

b = np.array([5, 6, 7, 8])
c = np.array([9, 10, 11, 12])

9.1.1.2 Vector operations

Vector addition and scalar multiplication can be just done using +, - and * vectorized operators:

a + b

## array([ 6,  8, 10, 12])

2*a

## array([2, 4, 6, 8])

We can also demonstrate that \(\boldsymbol{a}\), \(\boldsymbol{b}\) and \(\boldsymbol{c}\) are not linearly independent by showing that \(2\boldsymbol{b} - \boldsymbol{c} - \boldsymbol{a} = \boldsymbol{0}\):

2*b - c - a

## array([0, 0, 0, 0])

Note that the answer is not a number but a length-4 vector of zeros.

Exercise 9.1 Word embeddings are low-dimensional vectors to describe human words. These are computed based on words’ co-occurrence in texts. As similar words tend to occur in similar contexts, they tend to have similar embedding vectors. Interestingly, one can also do certain mathematical operations with embedding vectors

Below is the first five components (out of 100) for Berlin, Germany, France, and Paris (based on Stanford’s glove.twitter.27B.100d.txt):

word	1	2	3	4	5
Berlin	-0.562	0.630	-0.453	-0.299	-0.006
Germany	0.194	0.507	0.287	0.132	-0.281
France	0.605	-0.678	-0.436	-0.019	-0.291
Paris	-0.074	-0.855	-0.689	-0.057	-0.139

Compute Berlin - Germany + France. How close do you get to Paris? (Compute the difference between the expression and Paris).

See the solution

9.1.1.3 Vector (matrix) product

Matrix multiplication (dot product) can be done with @ symbol as multiplication sign, there is also np.dot function and .dot method. For instance, we create vectors \(\boldsymbol{a} = (1, 2)\) and \(\boldsymbol{b} = (11, 12)^T\):

a = np.array([1, 2])  # row vector
b = np.array([[11], [12]])  # column vector

and compute

a @ b  # inner (matrix) product

## array([35])

It is important to keep in mind that the ordinary multiplication sign * is not matrix product but elementwise multiplication with certain rules to handle mismatch in dimension (broadcasting). For instance

a * b  # elementwise product (broadcast as dimensions do not match)

## array([[11, 22],
##        [12, 24]])

This is not the matrix product! Note also that we did not get any error here (although we sometimes may get) as the computation is still valid, just not what we want to get here.

Exercise 9.2

Compute the following vector norms using basic numpy, without using the dedicated np.linalg.norm() function:

Euclidean norm of vector \((1,1)\)
Euclidean norm of \((1, 2, 2)\)
Euclidean norm of \((3, 2, 0, 2, 0, 2, 0, 2)\) using matrix product (inner product), not element-wise product.
Manhattan norm of \((1,1)\)
Chessboard norm of \((2,1)\)

Now repeat the same with np.linalg.norm().

Exercise 9.3 Compute the normalized versions of vectors in the previous exercise, using the given norms.

9.1.2 2-D Arrays as Matrices

In as similar fashion as we used 1-D arrays as vectors, we can use 2-D arrays as matrices.⁶ The operations +, -, and * work as expected for two types of operations—elementwise addition, elementwise subtraction, and elementwise multiplication. They can also be used for adding and subtracting constants, for and for multiplying with a scalar. Here are a few examples:

A = np.array([[1, 2], [3, 4]])
A + 10  # add constant matrix of correct size

## array([[11, 12],
##        [13, 14]])

A*10  # scalar multiplication

## array([[10, 20],
##        [30, 40]])

These are intuitive results and do not usually create problems.

There are a few other handy ways to create matrices: np.eye creates a unit matrix with given size as

np.eye(4)  # unit matrix of size 4

## array([[1., 0., 0., 0.],
##        [0., 1., 0., 0.],
##        [0., 0., 1., 0.],
##        [0., 0., 0., 1.]])

np.diag either creates a diagonal matrix of a given vector, or if given a matrix, returns its diagonal:

np.diag([1, 2, 3])  # create a diagonal matrix

## array([[1, 0, 0],
##        [0, 2, 0],
##        [0, 0, 3]])

A = np.arange(12).reshape((3,4))
A  # note: non-square matrix

## array([[ 0,  1,  2,  3],
##        [ 4,  5,  6,  7],
##        [ 8,  9, 10, 11]])

np.diag(A)  # return diagonal of the matrix

## array([ 0,  5, 10])

Obviously, for making matrices you can use all other functions that create arrays, such as np.ones and np.zeros.

Finally, one can transpose matrices using .T attribute. For instance:

A = np.arange(4).reshape((2,2))
A

## array([[0, 1],
##        [2, 3]])

A.T

## array([[0, 2],
##        [1, 3]])

B = A.reshape((1,4))  # row vector (1x4 matrix)
B

## array([[0, 1, 2, 3]])

B.T  # column vector (4x1 matrix)

## array([[0],
##        [1],
##        [2],
##        [3]])

9.1.3 Numpy matrices

Besides arrays, numpy also includes dedicated matrix objects. Their usage is fairly similar to that of arrays, but unfortunately, they also behave differently for certain operations. Here are just a single example:

m = np.matrix("1 2; 3 4")  # easy to create
m  # printed in a fairly similar fashion as arrays

## matrix([[1, 2],
##         [3, 4]])

m*m  # '*' is matrix product, not elementwise product!

## matrix([[ 7, 10],
##         [15, 22]])

Unlike arrays, matrices use the asterisk * for matrix product. So otherwise identical code may be correct or not correct, depending on what exact data types are used.

Matrices can be converted to arrays by np.array():

np.array(m)  # an array

## array([[1, 2],
##        [3, 4]])

However, certain functions, e.g. vectorizers (see Section 22.2.1), return sparse matrices. These are in many ways similar to arrays and matrices, but are much more efficient for keeping data that mostly contains zeros. A sparse matrix can be converted to arrays with .toarray() method–this makes and ordinary dense array out it. np.array(), instead of converting it to a dense array, creates an array that contains a sparse matrix instead…

As a result of these confusing differences, we do not use numpy matrices in this book.

9.2 Matrix operations

Many common matrix operations, such as addition or multiplication work elementwise. For instance

A = np.array([[1, 2], [3, 4]])
B = np.array([[11, 12], [13, 14]])
A - B

## array([[-10, -10],
##        [-10, -10]])

A*B

## array([[11, 24],
##        [39, 56]])

However, elementwise operations between matrices and vectors are less intuitive. Consider the same matrix A as above:

A = np.array([[1, 2], [3, 4]])
A

## array([[1, 2],
##        [3, 4]])

and you want to divide each row of it with a particular value: the first row with 2 and the second row with 3. Hence you may want to write

v = np.array([2, 3])
A/v

## array([[0.5       , 0.66666667],
##        [1.5       , 1.33333333]])

However, this results in the first column of A divided by “2” and the second column by “3”. If you want the division to be done row-wise, you need to transform v into a column vector:

v = np.array([[2], [3]])
v

## array([[2],
##        [3]])

A/v

## array([[0.5       , 1.        ],
##        [1.        , 1.33333333]])

Exercise 9.4

Use the same matrix A as above. Use matrix-vector operations to

multiply its first column by 10 and the second column by 100.
add 1 to its first row and 2 to its second row.

The solution

9.3 Matrix product

Exactly as in case of vectors, matrix product can be done with @ and not with *, the latter is elementwise multiplication.⁷ For instance:

A = np.array([[1, 2, 3], [3, 4, 5]])
A

## array([[1, 2, 3],
##        [3, 4, 5]])

B = np.array([[-1], [0], [-2]])
B

## array([[-1],
##        [ 0],
##        [-2]])

Their matrix product is

A @ B

## array([[ -7],
##        [-13]])

If you get the dimensions wrong, numpy will give an error:

B @ A

## ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 2 is different from 1)

Exercise 9.5 Create matrices \(\mathsf{A} = \begin{pmatrix} 1 & 1\\ 1 & 11 \end{pmatrix}\) and \(\mathsf{I}\), \(2\times2\) unit matrix. Compute matrix product \(\mathsf{A}\cdot\mathsf{I}\) and elementwise product \(\mathsf{A}\odot\mathsf{I}\).

See the solution

Exercise 9.6 Create row vectors \[\begin{equation*} \boldsymbol{x}_{1}^{T} = \begin{pmatrix} 1 & 2 & 3 & 4 & 5 \end{pmatrix} \qquad \boldsymbol{x}_{2}^{T} = \begin{pmatrix} -1 & 2 & -3 & 4 & -5 \end{pmatrix} \qquad \boldsymbol{x}_{3}^{T} = \begin{pmatrix} 5 & 4 & 3 & 2 & 1 \end{pmatrix} \end{equation*}\] Note: if \(\boldsymbol{x}^{T}\) is a row vector, then \(\boldsymbol{x}\) is a column vector, i.e. \[\begin{equation} \boldsymbol{x}_1 = \begin{pmatrix} 1 \\ 2 \\ 3 \\ 4 \\ 5 \end{pmatrix} \end{equation}\] Stack all three row vectors into a matrix (create this by stacking the row vectors): \[\begin{equation*} \mathsf{X} = \begin{pmatrix} 1 & 2 & 3 & 4 & 5\\ -1 & 2 & -3 & 4 & -5\\ 5 & 4 & 3 & 2 & 1\\ \end{pmatrix} \end{equation*}\] Create column vector \[\begin{equation*} \qquad \boldsymbol{\beta} = \begin{pmatrix} 0.1 & 0.1 & 0.1 & 0.1 & 0.1 \end{pmatrix}^{T} \end{equation*}\]

Remember: \[\begin{equation} \begin{pmatrix} 0.1 & 0.1 & 0.1 & 0.1 & 0.1 \end{pmatrix}^{T} = \begin{pmatrix} 0.1 \\ 0.1 \\ 0.1 \\ 0.1 \\ 0.1 \end{pmatrix} \end{equation}\] Now compute:

\(\boldsymbol{x}_1^T \cdot \boldsymbol{\beta}\)
\(\boldsymbol{x}_2^T \cdot \boldsymbol{\beta}\)
\(\boldsymbol{x}_3^T \cdot \boldsymbol{\beta}\)
\(\mathsf{X} \cdot \boldsymbol{\beta}\)

See the solution

Exercise 9.7 Let:

X = np.array([[1, 1], [2, 2], [3, 3]])
y = np.array([[1], [2], [3]])

Explain the difference between

b2 = np.array([[1], [1]])
y - X @ b2

## array([[-1],
##        [-2],
##        [-3]])

and

b1 = np.array([1, 1])
y - X @ b1

## array([[-1, -3, -5],
##        [ 0, -2, -4],
##        [ 1, -1, -3]])

Why does numpy give different answers?

The solution

9.4 Inverse matrix

Inverse matrix can be computed with np.linalg.inv():

A = 1 + np.arange(4).reshape((2,2))
A

## array([[1, 2],
##        [3, 4]])

np.linalg.inv(A)

## array([[-2. ,  1. ],
##        [ 1.5, -0.5]])

As we have hardly any experience with matrix operations, we usually cannot evaluate if the result looks reasonable. But you can test that

np.linalg.inv(A) @ A

## array([[1.00000000e+00, 0.00000000e+00],
##        [1.11022302e-16, 1.00000000e+00]])

results in the (almost a) unit matrix. More specifically, you can see a numeric error: the top-right element is not “0” but “4.44e-16”. One can frequently see such errors when performing matrix operations.

Numpy also has a distinct data type, matrix. It behaves mostly in a similar fashion as array, except for the multiplication operator where. This may create quite a bit of confusion, as seemingly similar objects behave differently. We will (mostly) stay with numpy arrays and not discuss numpy matrices in these notes.↩︎
Fine print applies. * is elementwise product for numpy arrays. For numpy matrices, it is matrix product! This is one reason to avoid matrices and to stay consistently with arrays.↩︎