Chapter 9 Linear Algebra
Python (and other numerical computation languages) map the mathematical concepts of matrix and vector to the corresponding data structures and functions. This is very fortunate because matrix manipulation is pretty much impossible without computers besides a limited number of simple tasks. Here we discuss numpy only but python includes other, more-or-less compatible frameworks, including tensorflow and pythorch that provide vectors and matrices.
9.1 Numpy Arrays as Vectors and Matrices
The basic data structure that corresponds to matrices and vectors are numpy arrays. One-dimensional arrays are just vectors, two-dimensional arrays are matrices (and higher-dimensional arrays are tensors).
9.1.1 1-D Arrays as Vectors
Numpy 1-D arrays can be used as vectors in \(\mathbb{R}^n\) and they implement all basic operations: addition, scalar multiplication, and the inner (dot) product.
9.1.1.1 Creating vectors
We create three example vectors:
= np.array([1, 2, 3, 4])
a a
## array([1, 2, 3, 4])
= np.array([5, 6, 7, 8])
b = np.array([9, 10, 11, 12]) c
9.1.1.2 Vector operations
Vector addition and scalar multiplication can be just done using +
,
-
and *
vectorized
operators:
+ b a
## array([ 6, 8, 10, 12])
2*a
## array([2, 4, 6, 8])
We can also demonstrate that \(\boldsymbol{a}\), \(\boldsymbol{b}\) and \(\boldsymbol{c}\) are not linearly independent by showing that \(2\boldsymbol{b} - \boldsymbol{c} - \boldsymbol{a} = \boldsymbol{0}\):
2*b - c - a
## array([0, 0, 0, 0])
Note that the answer is not a number but a length-4 vector of zeros.
Exercise 9.1 Word embeddings are low-dimensional vectors to describe human words. These are computed based on words’ co-occurrence in texts. As similar words tend to occur in similar contexts, they tend to have similar embedding vectors. Interestingly, one can also do certain mathematical operations with embedding vectors
Below is the first five components (out of 100) for Berlin, Germany, France, and Paris (based on Stanford’s glove.twitter.27B.100d.txt):
word | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|
Berlin | -0.562 | 0.630 | -0.453 | -0.299 | -0.006 |
Germany | 0.194 | 0.507 | 0.287 | 0.132 | -0.281 |
France | 0.605 | -0.678 | -0.436 | -0.019 | -0.291 |
Paris | -0.074 | -0.855 | -0.689 | -0.057 | -0.139 |
Compute Berlin - Germany + France. How close do you get to Paris? (Compute the difference between the expression and Paris).
See the solution
9.1.1.3 Vector (matrix) product
Matrix multiplication (dot product) can be done with @
symbol as
multiplication sign, there is also np.dot
function and .dot
method. For instance, we create vectors \(\boldsymbol{a} = (1, 2)\) and
\(\boldsymbol{b} = (11, 12)^T\):
= np.array([1, 2]) # row vector
a = np.array([[11], [12]]) # column vector b
and compute
@ b # inner (matrix) product a
## array([35])
It is important to keep in mind that the ordinary multiplication sign
*
is not matrix product but elementwise multiplication with
certain rules to handle mismatch in dimension (broadcasting). For
instance
* b # elementwise product (broadcast as dimensions do not match) a
## array([[11, 22],
## [12, 24]])
This is not the matrix product! Note also that we did not get any error here (although we sometimes may get) as the computation is still valid, just not what we want to get here.
Exercise 9.2
Compute the following vector norms using basic numpy, without using the dedicatednp.linalg.norm()
function:
- Euclidean norm of vector \((1,1)\)
- Euclidean norm of \((1, 2, 2)\)
- Euclidean norm of \((3, 2, 0, 2, 0, 2, 0, 2)\) using matrix product (inner product), not element-wise product.
- Manhattan norm of \((1,1)\)
- Chessboard norm of \((2,1)\)
Now repeat the same with np.linalg.norm()
.
Exercise 9.3 Compute the normalized versions of vectors in the previous exercise, using the given norms.
9.1.2 2-D Arrays as Matrices
In as similar fashion as we used 1-D arrays as vectors, we can use 2-D
arrays as matrices. The operations +
, -
, and
*
work as expected for two types of operations—elementwise
addition, subtraction, and multiplication; and for adding subtracting
constants and multiplying with a scalar. Here are a few examples:
= np.array([[1, 2], [3, 4]])
A + 10 # add constant matrix of correct size A
## array([[11, 12],
## [13, 14]])
*10 # scalar multiplication A
## array([[10, 20],
## [30, 40]])
These are intuitive results and do not usually create problems.
Exactly as in case of vectors, matrix product can be done with @
and
not with *
, the latter is elementwise multiplication.
There are a few other handy ways to create matrices: np.eye
creates
a unit matrix with given size as
4) # unit matrix of size 4 np.eye(
## array([[1., 0., 0., 0.],
## [0., 1., 0., 0.],
## [0., 0., 1., 0.],
## [0., 0., 0., 1.]])
np.diag
either creates a diagonal matrix of a given vector, or if
given a matrix, returns its diagonal:
1, 2, 3]) # create a diagonal matrix np.diag([
## array([[1, 0, 0],
## [0, 2, 0],
## [0, 0, 3]])
= np.arange(12).reshape((3,4))
A # note: non-square matrix A
## array([[ 0, 1, 2, 3],
## [ 4, 5, 6, 7],
## [ 8, 9, 10, 11]])
# return diagonal of the matrix np.diag(A)
## array([ 0, 5, 10])
Obviously, one can use all other functions that create array,
e.g. np.ones
and np.zeros
.
Finally, one can transpose matrices using .T
attribute. This is
handy for manually creating column vectors:
1 + np.arange(4).T # column vector 1, 2, 3, 4
## array([1, 2, 3, 4])
Exercise 9.4 Create matrices \(\mathsf{A} = \begin{pmatrix} 1 & 1\\ 1 & 11 \end{pmatrix}\) and \(\mathsf{I}\), \(2\times2\) unit matrix. Compute matrix product \(\mathsf{A}\cdot\mathsf{I}\) and elementwise product \(\mathsf{A}\odot\mathsf{I}\).
See the solution
Exercise 9.5 Create row vectors \[\begin{equation*} \boldsymbol{x}_{1}^{T} = \begin{pmatrix} 1 & 2 & 3 & 4 & 5 \end{pmatrix} \qquad \boldsymbol{x}_{2}^{T} = \begin{pmatrix} -1 & 2 & -3 & 4 & -5 \end{pmatrix} \qquad \boldsymbol{x}_{3}^{T} = \begin{pmatrix} 5 & 4 & 3 & 2 & 1 \end{pmatrix} \end{equation*}\] Note: if \(\boldsymbol{x}^{T}\) is a row vector, then \(\boldsymbol{x}\) is a column vector, i.e. \[\begin{equation} \boldsymbol{x}_1 = \begin{pmatrix} 1 \\ 2 \\ 3 \\ 4 \\ 5 \end{pmatrix} \end{equation}\] Stack all three row vectors into a matrix \[\begin{equation*} \mathsf{X} = \begin{pmatrix} 1 & 2 & 3 & 4 & 5\\ -1 & 2 & -3 & 4 & -5\\ 5 & 4 & 3 & 2 & 1\\ \end{pmatrix} \end{equation*}\] Create column vector \[\begin{equation*} \qquad \boldsymbol{\beta} = \begin{pmatrix} 0.1 & 0.1 & 0.1 & 0.1 & 0.1 \end{pmatrix}^{T} \end{equation*}\]
Remember: \[\begin{equation} \begin{pmatrix} 0.1 & 0.1 & 0.1 & 0.1 & 0.1 \end{pmatrix}^{T} = \begin{pmatrix} 0.1 \\ 0.1 \\ 0.1 \\ 0.1 \\ 0.1 \end{pmatrix} \end{equation}\] Now compute:
- \(\boldsymbol{x}_1^T \cdot \boldsymbol{\beta}\)
- \(\boldsymbol{x}_2^T \cdot \boldsymbol{\beta}\)
- \(\boldsymbol{x}_3^T \cdot \boldsymbol{\beta}\)
- \(\mathsf{X} \cdot \boldsymbol{\beta}\)
See the solution