Chapter 9 Linear Algebra
Python (and other numerical computation languages) map the mathematical concepts of matrix and vector to the corresponding data structures and functions. This is very fortunate because matrix manipulation is pretty much impossible without computers besides a limited number of simple tasks. Here we discuss numpy only but python includes other, more-or-less compatible frameworks, including tensorflow and pythorch that provide vectors and matrices.
9.1 Numpy Arrays as Vectors and Matrices
The basic data structure that corresponds to matrices and vectors are numpy arrays. One-dimensional arrays are just vectors, two-dimensional arrays are matrices (and higher-dimensional arrays are tensors).
9.1.1 1-D Arrays as Vectors
Numpy 1-D arrays can be used as vectors in \(\mathbb{R}^n\) and they implement all basic operations: addition, scalar multiplication, and the inner (dot) product.
9.1.1.1 Creating vectors
We create three example vectors:
= np.array([1, 2, 3, 4])
a a
## array([1, 2, 3, 4])
= np.array([5, 6, 7, 8])
b = np.array([9, 10, 11, 12]) c
9.1.1.2 Vector operations
Vector addition and scalar multiplication can be just done using +
,
-
and *
vectorized
operators:
+ b a
## array([ 6, 8, 10, 12])
2*a
## array([2, 4, 6, 8])
We can also demonstrate that \(\boldsymbol{a}\), \(\boldsymbol{b}\) and \(\boldsymbol{c}\) are not linearly independent by showing that \(2\boldsymbol{b} - \boldsymbol{c} - \boldsymbol{a} = \boldsymbol{0}\):
2*b - c - a
## array([0, 0, 0, 0])
Note that the answer is not a number but a length-4 vector of zeros.
Exercise 9.1 Word embeddings are low-dimensional vectors to describe human words. These are computed based on words’ co-occurrence in texts. As similar words tend to occur in similar contexts, they tend to have similar embedding vectors. Interestingly, one can also do certain mathematical operations with embedding vectors
Below is the first five components (out of 100) for Berlin, Germany, France, and Paris (based on Stanford’s glove.twitter.27B.100d.txt):
word | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|
Berlin | -0.562 | 0.630 | -0.453 | -0.299 | -0.006 |
Germany | 0.194 | 0.507 | 0.287 | 0.132 | -0.281 |
France | 0.605 | -0.678 | -0.436 | -0.019 | -0.291 |
Paris | -0.074 | -0.855 | -0.689 | -0.057 | -0.139 |
Compute Berlin - Germany + France. How close do you get to Paris? (Compute the difference between the expression and Paris).
See the solution
9.1.1.3 Vector (matrix) product
Matrix multiplication (dot product) can be done with @
symbol as
multiplication sign, there is also np.dot
function and .dot
method. For instance, we create vectors \(\boldsymbol{a} = (1, 2)\) and
\(\boldsymbol{b} = (11, 12)^T\):
= np.array([1, 2]) # row vector
a = np.array([[11], [12]]) # column vector b
and compute
@ b # inner (matrix) product a
## array([35])
It is important to keep in mind that the ordinary multiplication sign
*
is not matrix product but elementwise multiplication with
certain rules to handle mismatch in dimension (broadcasting). For
instance
* b # elementwise product (broadcast as dimensions do not match) a
## array([[11, 22],
## [12, 24]])
This is not the matrix product! Note also that we did not get any error here (although we sometimes may get) as the computation is still valid, just not what we want to get here.
Exercise 9.2
Compute the following vector norms using basic numpy, without using the dedicatednp.linalg.norm()
function:
- Euclidean norm of vector \((1,1)\)
- Euclidean norm of \((1, 2, 2)\)
- Euclidean norm of \((3, 2, 0, 2, 0, 2, 0, 2)\) using matrix product (inner product), not element-wise product.
- Manhattan norm of \((1,1)\)
- Chessboard norm of \((2,1)\)
Now repeat the same with np.linalg.norm()
.
Exercise 9.3 Compute the normalized versions of vectors in the previous exercise, using the given norms.
9.1.2 2-D Arrays as Matrices
In as similar fashion as we used 1-D arrays as vectors, we can use 2-D
arrays as matrices.6
The operations +
, -
, and
*
work as expected for two types of operations—elementwise
addition, elementwise subtraction, and elementwise multiplication.
They can also be used for adding and subtracting
constants, for and for multiplying with a scalar. Here are a few examples:
= np.array([[1, 2], [3, 4]])
A + 10 # add constant matrix of correct size A
## array([[11, 12],
## [13, 14]])
*10 # scalar multiplication A
## array([[10, 20],
## [30, 40]])
These are intuitive results and do not usually create problems.
There are a few other handy ways to create matrices: np.eye
creates
a unit matrix with given size as
4) # unit matrix of size 4 np.eye(
## array([[1., 0., 0., 0.],
## [0., 1., 0., 0.],
## [0., 0., 1., 0.],
## [0., 0., 0., 1.]])
np.diag
either creates a diagonal matrix of a given vector, or if
given a matrix, returns its diagonal:
1, 2, 3]) # create a diagonal matrix np.diag([
## array([[1, 0, 0],
## [0, 2, 0],
## [0, 0, 3]])
= np.arange(12).reshape((3,4))
A # note: non-square matrix A
## array([[ 0, 1, 2, 3],
## [ 4, 5, 6, 7],
## [ 8, 9, 10, 11]])
# return diagonal of the matrix np.diag(A)
## array([ 0, 5, 10])
Obviously, for making matrices you
can use all other functions that create arrays, such as
np.ones
and np.zeros
.
Finally, one can transpose matrices using .T
attribute. For
instance:
= np.arange(4).reshape((2,2))
A A
## array([[0, 1],
## [2, 3]])
A.T
## array([[0, 2],
## [1, 3]])
= A.reshape((1,4)) # row vector (1x4 matrix)
B B
## array([[0, 1, 2, 3]])
# column vector (4x1 matrix) B.T
## array([[0],
## [1],
## [2],
## [3]])
9.2 Matrix operations
Many common matrix operations, such as addition or multiplication work elementwise. For instance
= np.array([[1, 2], [3, 4]])
A = np.array([[11, 12], [13, 14]])
B - B A
## array([[-10, -10],
## [-10, -10]])
*B A
## array([[11, 24],
## [39, 56]])
However, elementwise operations between matrices and vectors are less intuitive. Consider the same matrix A as above:
= np.array([[1, 2], [3, 4]])
A A
## array([[1, 2],
## [3, 4]])
and you want to divide each row of it with a particular value: the first row with 2 and the second row with 3. Hence you may want to write
= np.array([2, 3])
v /v A
## array([[0.5 , 0.66666667],
## [1.5 , 1.33333333]])
However, this results in the first column of A divided by “2” and the second column by “3”. If you want the division to be done row-wise, you need to transform v into a column vector:
= np.array([[2], [3]])
v v
## array([[2],
## [3]])
/v A
## array([[0.5 , 1. ],
## [1. , 1.33333333]])
Exercise 9.4
Use the same matrix A as above. Use matrix-vector operations to- multiply its first column by 10 and the second column by 100.
- add 1 to its first row and 2 to its second row.
9.3 Matrix product
Exactly as in case of vectors, matrix product can be done with @
and
not with *
, the latter is elementwise multiplication.7
For instance:
= np.array([[1, 2, 3], [3, 4, 5]])
A A
## array([[1, 2, 3],
## [3, 4, 5]])
= np.array([[-1], [0], [-2]])
B B
## array([[-1],
## [ 0],
## [-2]])
Their matrix product is
@ B A
## array([[ -7],
## [-13]])
If you get the dimensions wrong, numpy will give an error:
@ A B
## ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 2 is different from 1)
Exercise 9.5 Create matrices \(\mathsf{A} = \begin{pmatrix} 1 & 1\\ 1 & 11 \end{pmatrix}\) and \(\mathsf{I}\), \(2\times2\) unit matrix. Compute matrix product \(\mathsf{A}\cdot\mathsf{I}\) and elementwise product \(\mathsf{A}\odot\mathsf{I}\).
See the solution
Exercise 9.6 Create row vectors \[\begin{equation*} \boldsymbol{x}_{1}^{T} = \begin{pmatrix} 1 & 2 & 3 & 4 & 5 \end{pmatrix} \qquad \boldsymbol{x}_{2}^{T} = \begin{pmatrix} -1 & 2 & -3 & 4 & -5 \end{pmatrix} \qquad \boldsymbol{x}_{3}^{T} = \begin{pmatrix} 5 & 4 & 3 & 2 & 1 \end{pmatrix} \end{equation*}\] Note: if \(\boldsymbol{x}^{T}\) is a row vector, then \(\boldsymbol{x}\) is a column vector, i.e. \[\begin{equation} \boldsymbol{x}_1 = \begin{pmatrix} 1 \\ 2 \\ 3 \\ 4 \\ 5 \end{pmatrix} \end{equation}\] Stack all three row vectors into a matrix (create this by stacking the row vectors): \[\begin{equation*} \mathsf{X} = \begin{pmatrix} 1 & 2 & 3 & 4 & 5\\ -1 & 2 & -3 & 4 & -5\\ 5 & 4 & 3 & 2 & 1\\ \end{pmatrix} \end{equation*}\] Create column vector \[\begin{equation*} \qquad \boldsymbol{\beta} = \begin{pmatrix} 0.1 & 0.1 & 0.1 & 0.1 & 0.1 \end{pmatrix}^{T} \end{equation*}\]
Remember: \[\begin{equation} \begin{pmatrix} 0.1 & 0.1 & 0.1 & 0.1 & 0.1 \end{pmatrix}^{T} = \begin{pmatrix} 0.1 \\ 0.1 \\ 0.1 \\ 0.1 \\ 0.1 \end{pmatrix} \end{equation}\] Now compute:
- \(\boldsymbol{x}_1^T \cdot \boldsymbol{\beta}\)
- \(\boldsymbol{x}_2^T \cdot \boldsymbol{\beta}\)
- \(\boldsymbol{x}_3^T \cdot \boldsymbol{\beta}\)
- \(\mathsf{X} \cdot \boldsymbol{\beta}\)
See the solution
Exercise 9.7 Let:
= np.array([[1, 1], [2, 2], [3, 3]])
X = np.array([[1], [2], [3]]) y
Explain the difference between
= np.array([[1], [1]])
b2 - X @ b2 y
## array([[-1],
## [-2],
## [-3]])
and
= np.array([1, 1])
b1 - X @ b1 y
## array([[-1, -3, -5],
## [ 0, -2, -4],
## [ 1, -1, -3]])
Why does numpy give different answers?
9.4 Inverse matrix
Inverse matrix can be computed with np.linalg.inv()
:
= 1 + np.arange(4).reshape((2,2))
A A
## array([[1, 2],
## [3, 4]])
np.linalg.inv(A)
## array([[-2. , 1. ],
## [ 1.5, -0.5]])
As we have hardly any experience with matrix operations, we usually cannot evaluate if the result looks reasonable. But you can test that
@ A np.linalg.inv(A)
## array([[1.00000000e+00, 0.00000000e+00],
## [1.11022302e-16, 1.00000000e+00]])
results in the (almost a) unit matrix. More specifically, you can see a numeric error: the top-right element is not “0” but “4.44e-16”. One can frequently see such errors when performing matrix operations.
Numpy also has a distinct data type, matrix. It behaves mostly in a similar fashion as array, except for the multiplication operator where. This may create quite a bit of confusion, as seemingly similar objects behave differently. We will (mostly) stay with numpy arrays and not discuss numpy matrices in these notes.↩︎
Fine print applies.
*
is elementwise product for numpy arrays. For numpy matrices, it is matrix product! This is one reason to avoid matrices and to stay consistently with arrays.↩︎