r"""Broyden Root Finders.
This module contains several different root finders that implement
Broyden's method.
Mathematical Background
=======================
The basic idea is to use a linear approximation to the objective
function
.. math::
\d\vect{G} \approx \mat{J}\cdot \d\vect{x}
where :math:`\d\vect{G} = \vect{G}(x) - \vect{G}(x_0)` and
:math:`\d\vect{x} = \vect{x} - \vect{x}_0`.
The goal of the Broyden methods is to find the value
:math:`\vect{x}_0` such that :math:`\vect{G}(\vect{x}_0) = 0`. This
is done using the Newton step:
.. math::
\vect{x}_0 = \vect{x} - \mat{J}^{-1}\cdot\vect{G}.
If the Jacobian is exact, then this should give quadratic convergence
as one approaches the root. However, it can be expensive to compute
the Jacobian (at least :math:`N^2+1` function evaluations for a simple
finite difference approximation), so instead, the Broyden method uses
a sequence approximations, starting with the identity and using a
minimal update in the sense of satisfying the secant condition while
minimizing the Frobenius norm of the change. In the following we
shall us bra-ket notation for vectors.
We shall first discuss the standard Broyden method which imposes a
"minimal" update such that the secant condition $\ket{\d{G}} =
\mat{J}\ket{\d{x}}$ holds.
Standard Broyden Method
~~~~~~~~~~~~~~~~~~~~~~~
For a given step :math:`\ket{\d{x}}` where the function changes by
:math:`\ket{\d{G}}` the update must satisfy the secant condition
.. math::
\ket{\d{G}} = \mat{J}\ket{\d{x}}.
We can write any such update in the following form
.. math::
\mat{J}^{-1} \rightarrow \mat{J}^{-1}
+ \frac{\ket{\d{x}} - \mat{J}^{-1}\ket{\d{G}}\bra{v}}
{\braket{v|\d{G}}}
where :math:`\ket{v}` is any vector not orthogonal to :math:`\ket{\d{G}}`.
This ambiguity of the multidimensional secant method causes problems
if not properly resolved, but Broyden showed that good results can be
obtained if the update is chosen to minimize the Frobenius norm of the
Jacobian:
.. math::
\norm{\mat{J}}_{\text{Frobenius}} = \sum_{ij}\sqrt{J_{ij}^2}.
This yields the update
.. math::
\mat{J} \rightarrow
\mat{J} + \frac{(\ket{\d{G}} - \mat{J}\ket{\d{x}})\bra{\d{x}}}
{\braket{\d{x}|\d{x}}}
which can be analytically inverted using the Shermann-Morrison formula
to find :math:`\ket{v} = \left[\mat{J}^{-1}\right]^{T}\ket{\d{x}}`:
.. math::
\mat{J}^{-1} \rightarrow \mat{J}^{-1}
+ \frac{\left(\ket{\d{x}} - \mat{J}^{-1}\ket{\d{G}}\right)
\bra{\d{x}}\mat{J}^{-1}}
{\braket{\d{x}|\mat{J}^{-1}|\d{G}}}.
An alternative update (Broyden's "first" method) is found by minimizing
the norm of the inverse Jacobian, which yields :math:`\ket{v} =
\ket{\d{G}}`
.. math::
\mat{J}^{-1} \rightarrow
\mat{J}^{-1} + \frac{(\ket{\d{x}} - \mat{J}^{-1}\ket{\d{G}})\bra{\d{G}}}
{\braket{\d{G}|\d{G}}},
however, we have found that this does not always work as well. It may
be used, however, even if the inverse Jacobian develops a singular
direction (which can cause trouble for the previous update).
Generalized Broyden Method
~~~~~~~~~~~~~~~~~~~~~~~~~~
This method is a generalization of the one described in
[Bierlaire:2006]_ and defines the new Jacobian $\mat{J}$ at the point
$x$ that minimizes a weighted combination of previous "residuals"
$\ket{r_i}$ along with a minimal change condition on a matrix residual
$\mat{R}$:
.. math::
\min_{\mat{J}}\left(\sum_{i}\norm{w_i\ket{r_i}}_{2}^{2}
+\norm{\mat{R}\mat{\Gamma}}^{2}_{F}\right)
where $w_i$ are weights and $\mat{\Gamma}$ will be discussed later and
specifies the weighting of the matrix residual. The residuals may
take the following form:
.. math::
\ket{r_i} &= \begin{cases}
\ket{\d{G}_i} - \mat{J}\ket{\d{x}_i} & \text{1)}\\
\ket{\d{x}_i} - \mat{J}^{-1}\ket{\d{G}_i} & \text{2)}
\end{cases}\\
\mat{R} &= \begin{cases}
\mat{J} - \mat{J}_{0} & \text{a)}\\
\mat{J}^{-1} - \mat{J}^{-1}_{0} & \text{b)}
\end{cases}
The method 1a) corresponds to the generalization of Broyden's first or
"Good" method while the combination 2b) corresponds to the
generalization of Broyden's second or "Bad" method. This second
method with $\mat{\Gamma} = \mat{1}$ is the method employed in
[Johnson:1988]_ with the slight modification that $\ket{\d{x}_i} =
\ket{x_{i+1}} - \ket{x_{i}}$. The two mixed cases 1b) and 2a) are
much more difficult to solve as they are linear in neither $\mat{J}$
nor $\mat{J}^{-1}$.
In what follows, we shall use the notation $\mat{B} = \mat{J}^{-1}$
for the inverse approximate Jacobian. Boldface capital letters like
this refer to "large" $N\times N$ matrices that we typically will not
be able to form. Boldface lowercase letters such as $\mat{w} =
\diag(w_i)$ denote "small" $n\times n$ matrices that we can easily
store. We define mixed matrices using bra-ket notation such as
$\ket{\d{\mat{G}}}$ and $\ket{\d{\mat{X}}}$ for the $N \times n$
matrices of the vectors $[\ket{\d{\mat{X}}}]_{:i} = \ket{\d{x}_i}$.
We may now write the minimization condition as:
.. math::
\min_{\mat{J}}
\tr\left(\ket{\mat{r}}\mat{w}^2\bra{\mat{r}}
+ \mat{R}\mat{\Gamma}\mat{\Gamma}^T\mat{R}^T\right).
Using the relationship $\partial J^{-1}_{MN}/\partial J_{AB} =
J^{-1}_{MA}J^{-1}_{BN}$ we arrive at the following normal equations:
.. ::
.. (G - JX) W^2 X^T = (J - J0)Gamma
.. (G - BX) W^2 X^T = -B (B - B0)Gamma B
.. -B(X - BG) W^2 G^T B = (J - J0)Gamma
.. -B(X - BG) W^2 G^T B= -B (B - B0)Gamma B
.. math::
\mat{B}_{\text{1a)}} &= \mat{B}_{0} +
(\ket{\d\mat{X}} - \mat{B}_{0}\ket{\d\mat{G}})\cdot\mat{w}^2
\bra{\d\mat{X}}\cdot\frac{1}{
\mat{B}_{0}\ket{\d\mat{G}}\mat{w}^2\bra{\d\mat{X}}
+ \mat{\Gamma}\mat{\Gamma}^{T}}\cdot\mat{B}_{0}, \\
\mat{B}_{\text{2b)}} &= \mat{B}_{0}
+ (\ket{\d\mat{X}} - \mat{B}_{0}\ket{\d\mat{G}})\cdot\mat{w}^2
\bra{\d\mat{G}}\cdot\frac{1}{
\ket{\d\mat{G}}\mat{w}^2\bra{\d\mat{G}}
+ \mat{\Gamma}\mat{\Gamma}^{T}}
The weights have the following effects:
1) If not using the subspace decomposition method, then the weights affect the
inversion procedure. In particular, the ratio
$\mat{\Gamma}\mat{\Gamma}^{T}\mat{w}^{-2}$ is important for determining the
relative weighting of the old Jacobian and the new steps.
2) If components of $\ket{\d\mat{X}}$ and $\ket{\d\mat{G}}$ are linearly
dependent -- i.e. the secant conditions are not all consistent -- them the
weights determine how to mix these.
3) If using the subspace decomposition method, then for the linearly independent
components, the weights are formally irrelevant, but they can still be used
to determine which components should be truncated or ignored (see section
:ref:`sec:truncation-algorithm`). They will also play a role when dynamic
restarts are considered (see :ref:`sec:dynamic-restarts`).
.. note:: Here we present a little bit of analysis. The first step will be to
define $\ket{\mat{X}} \approx \ket{\d\mat{X}}\mat{w}$ and $\ket{\mat{G}}
\approx \ket{\d\mat{G}}\mat{w}$ so that they have full rank and we can write
$\ket{\mat{X}} = \ket{\uvect{X}}\mat{x}$, $\ket{\mat{G}} =
\ket{\uvect{G}}\mat{g}$ with $\mat{x}$ and $\mat{g}$ being invertable. (In
the memory limited context, these will be small. They can be computed using
an SVD or rank-revealing QR decomposition.) The two updates can now be
written as:
.. math::
\mat{B}_{2a)}
= \mat{B}_{0}
- \mat{B}_{0}\ket{\uvect{G}}\bra{\uvect{G}}
+ \ket{\mat{X}}\mat{g}^{-1}\bra{\uvect{G}}
= \mat{B}_{0}
+ (\ket{\mat{X}} - \mat{B}_{0}\ket{\mat{G}})
\mat{g}^{-1}(\mat{g}^T)^{-1}\bra{\mat{G}}.
The first equality expresses the update as first subtracting off the
projection of the previous approximation on the subspace spanned by
$\ket{\uvect{G}}$ and then reconstructing this from the new data. In this
sense the update is minimal and can be expressed as minimizing the Frobenius
norm. This follows from equation :eq:`proj_inv` with $\mat{\Gamma} =
\ket{\uvect{G}_{\perp}}$.
The main feature of this update is that, if the function is linear, then
previous updates will be preserved so that after $N$ independent updates,
the approximation will be exact: *NO NO NO. Not True!*
>>> np.random.seed(0)
>>> import numpy.random
>>> from numpy import mat
>>> from numpy.linalg import svd, qr, inv
>>> ns = numpy.random.randint(1,10,size=10)
>>> N = sum(ns)
>>> X_ = mat(np.random.rand(N,N))
>>> J_ = mat(np.random.rand(N,N))
>>> G_ = J_*X_
>>> B = mat(np.eye(N))
>>> B1 = mat(np.eye(N))
>>> n0 = 0
>>> for n in ns:
... X = X_[:,n0:n0+n]
... G = G_[:,n0:n0+n]
... n0 += n
... _, g = qr(G)
... _, x = qr(X)
... Xhat = svd(X)[0][:,:n]
... Xperp = svd(X)[0][:,n:]
... Ghat = svd(G)[0][:,:n]
... Gperp = svd(G)[0][:,n:]
... B = B + (X - B*G)*inv(g.T*g)*G.T
... B1 = B1 - B1*Ghat*Ghat.T + X*inv(g)*Ghat.T
.. note:: I am trying to form a better intuition for these equations, so
consider the following. First assume that the $\mat{B}_0$ is not singular
and that the matrices $\ket{\d{\mat{X}}}$ and $\ket{\d\mat{G}}$ are not rank
deficient. (I.e., f there are $n$ previous points, then these have rank $n$.
Otherwise, use the weights to reduce the size so that they satisfy this
property appropriately. The weights can now be ignored.) In this case, one
can satisfy the secant conditions exactly if the equations have the following
structure:
.. math::
\mat{B} &= \mat{B}_{0} +
(\ket{\d\mat{X}} - \mat{B}_{0}\ket{\d\mat{G}})\cdot
\braket{\mat{A}|\d\mat{G}}^{-1}\bra{\mat{A}}
for an arbitrary choice of $\bra{\mat{A}}$ insofar as
$\braket{\mat{A}|\d\mat{G}}$ is not singular. The ambiguity in the choice of
$\bra{\mat{A}}$ is now captured in the desire to minimize the change from
$\mat{B}_{0}$ and we have:
.. math::
\bra{\mat{A}_{\text{1a)}}} & = \bra{\d\mat{X}}\mat{B}_{0},\\
\bra{\mat{A}_{\text{2b)}}} & = \bra{\d\mat{G}}.
>>> from numpy.random import rand
>>> from numpy import dot
>>> from numpy.linalg import inv, svd
>>> np.random.seed(0)
>>> dG = rand(5,3)
>>> dX = rand(5,3)
>>> B0 = rand(5,5)
>>> Gamma = dot(svd(dG)[0][:,-2:], rand(2,2))
>>> np.allclose(0, dot(dG.T, Gamma))
True
>>> np.allclose(dot(dG.T, inv(dot(dG, dG.T) + dot(Gamma, Gamma.T))),
... dot(inv(dot(dG.T, dG)), dG.T))
True
>>> A = dG.T
>>> R = dX - dot(B0, dG)
>>> B1 = B0 + dot(R, dot(inv(dot(A, dG)), A))
[Bierlaire:2006]_ suggests that a good choice of weights $w_i$ is
something like
.. math::
w_i = \norm{x - x_i}^{-2}
= \frac{1}{\braket{\d\mat{X}|\d\mat{X}}_{ii}}
which weights the small steps (and hence more likely to be in the
linear regime) more highly than long secant steps. [Johnson:1988]_
suggests a similar criterion based on $\norm{G_{i}}$ or a fixed set of
weights depending on iteration number.
A good choice of $\mat{B}_{0}$ and $\mat{\Gamma}$ is not so simple,
but we can be lead by the original Broyden method which sets
$\mat{B}_{0}$ to be the previous approximate inverse Jacobian.
Another option is to set $\mat{B}_{0} = \mat{1}$ or some other
constant independent of iteration. If enough iterates are maintained,
then this might also work (but I am not aware of this method having
been explored).
.. _sec:dynamic-restarts:
Dynamic Restarts
----------------
One interesting idea is to include an additional Jacobian residual
term $\mat{R} = \mat{B} - \mat{B}_{-1}$ in addition to $\mat{R}_{0} =
\mat{B} - \mat{B}_{0}$ where $\mat{B}_{0}$ is constant (typically
$\mat{B}_{0}\propto \mat{1}$) and $\mat{B}_{-1}$ is the previous
approximation. If we also take $\mat{\Gamma}_{0} = w_{0}\mat{1}$,
then the weight $w_{0}$ will "pull" the inverse Jacobian back towards
$\mat{B}_{0}$. This is useful if one can formulate some slow but
approximately convergent iteration scheme. [Johnson:1988]_, for
example, advocates restarting the algorithm and resetting the Jacobian
approximation after some progress has been made. Instead, by dynamically
increasing $w_0$ (if the Broyden steps are not improving the residual
for example) once can continuously "restart" the approximation by
pulling it back toward $\mat{B}_{0}$. This gives the following
generalized equations (which we express in several forms here):
.. math::
\mat{J}_{\text{1a)}} &= \Bigl[
\ket{\d\mat{G}}\mat{w}^2\bra{\d\mat{X}}
+ \mat{J}_{-1}\mat{\Gamma}\mat{\Gamma}^{T}
+ \mat{J}_{0}w_0^2
\Bigr]\cdot\Bigl[
\ket{\d\mat{X}}\mat{w}^2\bra{\d\mat{X}}
+ \mat{\Gamma}\mat{\Gamma}^{T}
+ \mat{1}w_0^2
\Bigr]^{-1},\\
&= \mat{J}_{-1} +
\Bigl(
(\ket{\d\mat{G}} - \mat{J}_{-1}\ket{\d\mat{X}})\cdot\mat{w}^2
\bra{\d\mat{X}}
+ (\mat{J}_{0} - \mat{J}_{-1})w_0^2
\Bigr)\cdot\frac{1}{
\ket{\d\mat{X}}\mat{w}^2\bra{\d\mat{X}}
+ \mat{\Gamma}\mat{\Gamma}^{T} + \mat{1}w_0^2},\\
\mat{B}_{\text{1a)}} &= \mat{B}_{-1} +
\Bigl(
(\ket{\d\mat{X}} - \mat{B}_{-1}\ket{\d\mat{G}})\cdot\mat{w}^2
\bra{\d\mat{X}}
+ (\mat{1} - \mat{B}_{-1}\mat{B}_{0}^{-1})w_0^2
\Bigr)\cdot\frac{1}{
\mat{B}_{-1}\ket{\d\mat{G}}\mat{w}^2\bra{\d\mat{X}}
+ \mat{\Gamma}\mat{\Gamma}^{T} + \mat{B}_{-1}\mat{B}_{0}^{-1}w_0^2
}\cdot\mat{B}_{-1}, \\
\mat{B}_{\text{2b)}} &= \Bigl[
\ket{\d\mat{X}}\mat{w}^2\bra{\d\mat{G}}
+ \mat{B}_{-1}\mat{\Gamma}\mat{\Gamma}^{T}
+ \mat{B}_{0}w_0^2
\Bigr]\cdot\Bigl[
\ket{\d\mat{G}}\mat{w}^2\bra{\d\mat{G}}
+ \mat{\Gamma}\mat{\Gamma}^{T}
+ \mat{1}w_0^2
\Bigr]^{-1},\\
&= \mat{B}_{-1} +
\Bigl(
(\ket{\d\mat{X}} - \mat{B}_{-1}\ket{\d\mat{G}})\cdot\mat{w}^2
\bra{\d\mat{G}}
+ (\mat{B}_{0} - \mat{B}_{-1})w_0^2
\Bigr)\cdot\frac{1}{
\ket{\d\mat{G}}\mat{w}^2\bra{\d\mat{G}}
+ \mat{\Gamma}\mat{\Gamma}^{T} + \mat{1}w_0^2
}
This dynamic restart mechanism can be easily used if one can relate
the objective function $\vect{G}(\vect{x})$ to an approximately
convergent iteration:
.. math::
\vect{x} \mapsto \vect{F}(\vect{x})
= \vect{x} - \mat{B}_{0}\vect{G}(\vect{x}).
Pulling the approximate inverse Jacobian $\mat{B} \rightarrow
\mat{B}_{0}$ will pull the iterations back to this (hopefully)
convergent iteration. Note that the typical linear mixing scheme
$\vect{x} \mapsto (1-\alpha)\vect{x} + \alpha\vect{F}(\vect{x})$ is
simply equivalent to scaling $\mat{B}_{0} \rightarrow
\alpha\mat{B}_{0}$ which may also be simply implemented.
.. note:: The diagonals of the matrix $\mat{J}_0 = \mat{B}_0^{-1}$ are
specified by the parameter :attr:`IProblem.j0_scale`. The
parameter $\alpha$ is treated separately in the solver.
Regularization
--------------
Finally, the choice of $\mat{\Gamma}\mat{\Gamma}^{T}$ is such that the
denominators are positive definite. Several choices are
discussed in [Bierlaire:2006]_:
"Geometric approach" :
Here we set $\mat{\Gamma} = \mat{0}$ which is fine unless there are
singular or dependent direction which are common. It is also sufficient with
a dynamic restart where $w_0^2$ regularizes the denominator. It is
completely unsuitable, however, for a memory limited approach.
"Subspace decomposition" :
The subspace decomposition method makes use of the following formulae:
.. math::
\frac{1}{\ket{G}\bra{H} +
\ket{G_\perp}w_{r}^2\bra{G_\perp} + w_0^2\mat{1}} =
\frac{1}{w_0^2}
\left(\mat{1} - \ket{G}\frac{1}
{\mat{1}w_0^2 + \braket{H|G}}\bra{H}\right)
\left(\mat{1} - \frac{\ket{G_\perp}w_r^2\bra{G_\perp}}
{w_0^2 + w_r^2}\right),\\
\frac{1}{\ket{G}\bra{H} +
\ket{H_\perp}w_{r}^2\bra{H_\perp} + w_0^2\mat{1}} =
\frac{1}{w_0^2}
\left(\mat{1} - \frac{\ket{H_\perp}w_r^2\bra{H_\perp}}
{w_0^2 + w_r^2}\right)
\left(\mat{1} - \ket{G}\frac{1}
{\mat{1}w_0^2 + \braket{H|G}}\bra{H}\right)
>>> np.random.seed(0)
>>> w02 = 0.1
>>> wr2 = 0.2
>>> N = 10
>>> m = 5
>>> I = np.eye(N)
>>> i = np.eye(m)
>>> dG = np.mat(np.random.rand(N,m))
>>> H = np.mat(np.random.rand(N,m))
>>> dG_perp = np.linalg.svd(dG)[0][:,m:]
>>> assert(np.allclose(dG.T*dG_perp, 0))
>>> B = np.mat(np.random.rand(N,N))
>>> D = dG*H.T + dG_perp*wr2*dG_perp.T + I*w02
>>> Dinv = ((I - dG*np.linalg.inv(i*w02 + H.T*dG)*H.T)*
... (I - wr2*dG_perp/(wr2+w02)*dG_perp.T))/w02
>>> np.allclose(D*Dinv,I)
True
>>> H_perp = np.linalg.svd(H)[0][:,m:]
>>> assert(np.allclose(H.T*H_perp, 0))
>>> B = np.mat(np.random.rand(N,N))
>>> D = dG*H.T + H_perp*wr2*H_perp.T + I*w02
>>> Dinv = ((I - wr2*H_perp/(wr2+w02)*H_perp.T)*
... (I - dG*np.linalg.inv(i*w02 + H.T*dG)*H.T))/w02
>>> np.allclose(D*Dinv,I)
True
For the
We start with the subspace decomposition for method 2b). Here we set
$\mat{\Gamma} = \ket{G_\perp}\mat{w}_r$, arriving at the following inversion
formula:
The fraction has the following forms for method 1a) and 2b) respectively:
.. math::
\frac{1}{\ket{G}\bra{X} +
\mat{B}^{-1}\ket{\Gamma}\bra{\Gamma} + w_0^2\mat{1}},\\
\frac{1}{\ket{G}\bra{G} +
\ket{\Gamma}\bra{\Gamma} + w_0^2\mat{1}}
The subspace decomposition method chooses $\ket{\Gamma}$ to be orthogonal to
either $\ket{X}$, $
This method sets $\mat{\Gamma} = \ket{\uvect{X}_{\perp}}$ (case 1a) or
$\mat{\Gamma} = \ket{\uvect{G}_{\perp}}$ (case 1a or 2b). These are the
basis vectors orthogonal to $\ket{\d\mat{X}}$ or $\ket{\d\mat{G}}$
respectively. Let the denominator be written as:
(Old notes) A couple of points. First, if $w_0 = 0$, then the results are
independent of the magnitude of $\mat{\Gamma}$. To see this, note that one
has explicitly (for case 2b) for example):
.. math::
\frac{1}{\ket{\uvect{G}}\mat{x}\bra{\uvect{G}}
+ \lambda\ket{\uvect{G}_\perp}\bra{\uvect{G}_\perp}} =
\ket{\uvect{G}}\mat{x}^{-1}\bra{\uvect{G}}
+ \frac{1}{\lambda}\ket{\uvect{G}_\perp}\bra{\uvect{G}_\perp}
The second term will be eliminated when acted upon by the numerator
factor $\bra{\d\mat{G}}$. This method poses some formal difficulty for the
inversion of the inverse formulation of case 2b) as the denominator contains
the mixed term $\mat{B}_{-1}\ket{\d\mat{G}}\mat{w}^2\bra{\d\mat{X}}$,
however, since one should generally have $\mat{B}_{-1}\ket{\d\mat{G}} \approx
\ket{\d\mat{X}}$, this should not pose any difficulty in practise.
Technically, we implement this by simply performing an SVD of the denominator
and dropping the orthogonal subspace.
.. note:: The standard Broyden method is reproduced with this approach if
there is no restart parameter $w_0 = 0$ and only a single previous vector
$\ket{X}$ is kept.
"Numerical approach":
They recommend a numerical approach of performing a Modified
Cholesky decomposition :func:`mmf.math.linalg.modchol` of $\mat{A}
= \ket{\d\mat{G}}\mat{w}^2\bra{\d\mat{X}} = \mat{L}\mat{L}^{T} -
\mat{E}$ where $\mat{E} = \mat{\Gamma}\mat{\Gamma}^T$ is a
diagonal matrix as small as possible making $\mat{A} + \mat{E}$
safely positive definite and well conditioned.
Unfortunately, this Modified Cholesky based approach does not work
for method 1a) where the denominator is both large ($N\times N$)
and slightly asymmetric. (Ideally $\mat{B}_{0}\ket{\d\mat{G}}
\approx \ket{\d\mat{X}}$ so the asymmetry should be small.)
Our implementation of this is the same as the "direct inversion"
approach discussed below, but with a finite $\mat{\Gamma}$ such
that the smallest singular value is not less than
:attr:`JinvBroyden.min_svd` times the largest.
"Direct inversion":
If $\mat{C} = \mat{\Gamma}\mat{\Gamma}^{T} + w_{0}^2\mat{1}$ is easily
invertible, then the following formula suggested by
[Johnson:1988]_ can be used. (In [Johnson:1988]_ they set
$\mat{\Gamma} = w_{0}\mat{1}$ keeping and so the requirement of minimizing
the effect of $\mat{\Gamma}$ comes in the limit $w_0 \rightarrow 0$. Note
that this is not exactly the same as our restart condition because in this
case the Jacobian is pulled back to the previous Jacobian, not back to the
restart Jacobian.):
.. math::
\frac{1}{\mat{C} + \ket{\mat{a}}\bra{\mat{b}}} =
\mat{C}^{-1} - \mat{C}^{-1}\ket{\mat{a}}\cdot
\frac{1}{\mat{1}
+ \braket{\mat{b}|\mat{C^{-1}|\mat{a}}}}
\cdot\bra{\mat{b}}\mat{C}^{-1}.
>>> from numpy.linalg import inv
>>> np.random.seed(3); N=10; n=3;
>>> X = np.mat(np.random.rand(N, n))
>>> G = np.mat(np.random.rand(N, n))
>>> C = np.mat(np.random.rand(N, N))
>>> I = np.mat(np.eye(N))
>>> i = np.mat(np.eye(n))
>>> np.allclose(inv(C + G*X.T),
... inv(C) - inv(C)*G*inv(i + X.T*inv(C)*G)*X.T*inv(C))
True
This only requires inverting the small $n\times n$ matrix $\mat{1} +
\braket{\mat{X}|\mat{\mat{G}}}$. One further simplification arises when the
factor of $\bra{\mat{b}}$ appears to the left as is the case in both methods
without the restart:
.. math::
:label: proj_inv
\bra{\mat{b}}\frac{1}{\mat{C} + \ket{\mat{a}}\bra{\mat{b}}} =
\frac{1}{\mat{1} + \braket{\mat{b}|\mat{C^{-1}|\mat{a}}}}
\cdot\bra{\mat{b}}\mat{C}^{-1}.
Symmetric Jacobian
------------------
If it is known that the Jacobian is symmetric, then this constraint
can be built into the minimization with a Lagrange multiplier.
.. todo:: Finish figuring out how to solve the symmetric problem::
>> A = rand(5,5)
>> B = rand(5,5)
>> B = B+B.T
>> I = eye(5)
>> M = zeros((25,25))
>> for i in xrange(5):
.. for j in xrange(i+1):
.. for a in xrange(5):
.. for b in xrange(5):
.. M[i+5*j, a+5*b] = (I[i,a]*A[j,b] + I[j,a]*A[i,b]
.. +I[j,b]*A[i,a] + I[i,b]*A[j,a])
See http://dx.doi.org/10.1016/0024-3795(89)90067-0
Memory-Limited Extended Broyden Method
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Here we consider a slight generalization of the previous method in a memory
limited approach. First, we keep a limited set of $n$ previous points
$\ket{\mat{X}}$ and function evaluations at those points $\ket{\mat{G}}$. These
define two $n-1$ dimensional subspaces spanned by their differences:
$\ket{\uvect{X}} = \Sp\ket{\delta\mat{X}}$ and $\ket{\uvect{G}} =
\Sp\ket{\delta\mat{G}}$. This leads to the "stateless" version described
first. Second, we keep track of an additional $2m$ basis vectors for spanning
the domain and range of the inverse Jacobian approximation. Thus, we need
$N\times(m+n)$ storage. We express everything in terms of these subspaces, and
so work only with $(m+n-1)\times(m+n-1)$ dimensional matrices as described
below. Finally, for simplicity, we assume that the original problem is
appropriately scaled so that $\mat{B}_{0} = \mat{1}$.
Full "Inefficient" Version
--------------------------
The general update has the form:
.. math::
\mat{B} \rightarrow
\mat{B} +
\left((\ket{\d\mat{X}} - \mat{B}\ket{\d\mat{G}})\bra{H}
+ (1-\mat{B})w_0^2\right)\cdot
\frac{1}{\ket{\d\mat{G}}\bra{H} +
\ket{\mat{\Gamma}}w_{r}^2\bra{\mat{\Gamma}} + w_0^2\mat{1}}
where $\ket{H} \in \{\ket{\d\mat{X}}, \ket{\d\mat{G}}\}$ for methods 1a) and 2b)
respectively.
We start with the subspace decomposition method with $\mat{\Gamma} =
\ket{G_\perp}$ or $\mat{\Gamma} = \ket{H_\perp}$. The inverse Jacobian has the
following general form where $\ket{R} = \ket{\d\mat{X}} -
\mat{B}\ket{\d\mat{G}}$ and $\ket{H} \in \{\ket{\d\mat{X}},\ket{\d\mat{G}}\}$
for methods 1a) and 2b) respectively.
.. math::
\begin{aligned}
\mat{B} &\rightarrow
\mat{B} +
\left(\ket{R}\bra{H} + (1-\mat{B})w_0^2\right)\cdot
\frac{1}{\ket{\d\mat{G}}\bra{H} +
\ket{\d\mat{G}_\perp}w_{r}^2\bra{\d\mat{G}_\perp} + w_0^2\mat{1}},\\
&= \mat{1} + f(\mat{B} - \mat{1})\left(\mat{1} -
\ket{\d\uvect{G}}\bra{\d\uvect{G}}\right)
+ (\ket{\d\mat{X}} - \ket{\d\mat{G}})
\frac{1}{\mat{1}w_0^2 + \braket{H|\d\mat{G}}}
\left((1-f)\bra{H} +
f\braket{H|\d\mat{G}}
\braket{\d\mat{G}|\d\mat{G}}^{-1}\bra{\d\mat{G}}\right)
\end{aligned}
>>> np.random.seed(0)
>>> w02 = 0.1
>>> wr2 = 0.2
>>> N = 10
>>> m = 5
>>> I = np.eye(N)
>>> i = np.eye(m)
>>> dG = np.mat(np.random.rand(N,m))
>>> dX = np.mat(np.random.rand(N,m))
>>> H = np.mat(np.random.rand(N,m))
>>> dG_perp = np.linalg.svd(dG)[0][:,m:]
>>> dG_hat = np.linalg.svd(dG)[0][:,:m]
>>> assert(np.allclose(dG.T*dG_perp, 0))
>>> B = np.mat(np.random.rand(N,N))
>>> R = dX - B*dG
>>> B1 = B + (R*H.T + (I-B)*w02)*np.linalg.inv(
... dG*H.T + dG_perp*wr2*dG_perp.T + I*w02)
>>> f = wr2/(wr2+w02)
>>> B2 = (I + f*(B-I)*(I - dG_hat*dG_hat.T)
... + (dX - dG)*np.linalg.inv(i*w02 + H.T*dG)*(
... (1-f)*H.T + f*H.T*dG*np.linalg.inv(dG.T*dG)*dG.T))
>>> np.allclose(B1,B2)
True
.. math::
\begin{aligned}
\mat{B} &\rightarrow
\mat{B} +
\left(\ket{R}\bra{H} + (1-\mat{B})w_0^2\right)\cdot
\frac{1}{\ket{\d\mat{G}}\bra{H} +
\ket{H_\perp}w_{r}^2\bra{H_\perp} + w_0^2\mat{1}} \\
&= \mat{1} + f(\mat{B} - \mat{1})
+ \left[\ket{\d\mat{X}} - \ket{\d\mat{G}}
- f(\mat{B} - \mat{1})
(\ket{\d\mat{G}} + w_0^2\ket{H}\braket{H|H}^{-1})\right]
\frac{1}{\mat{1}w_0^2 + \braket{H|\d\mat{G}}}\bra{H}
\end{aligned}
>>> np.random.seed(0)
>>> w02 = 0.1
>>> wr2 = 0.2
>>> N = 10
>>> m = 5
>>> I = np.eye(N)
>>> i = np.eye(m)
>>> dG = np.mat(np.random.rand(N,m))
>>> dX = np.mat(np.random.rand(N,m))
>>> H = np.mat(np.random.rand(N,m))
>>> H_perp = np.linalg.svd(H)[0][:,m:]
>>> H_hat = np.linalg.svd(H)[0][:,:m]
>>> assert(np.allclose(H.T*H_perp, 0))
>>> B = np.mat(np.random.rand(N,N))
>>> R = dX - B*dG
>>> B1 = B + (R*H.T + (I-B)*w02)*np.linalg.inv(
... dG*H.T + H_perp*wr2*H_perp.T + I*w02)
>>> f = wr2/(wr2+w02)
>>> B2 = (I + f*(B-I)
... + (dX - dG - f*(B-I)*(dG + w02*H*np.linalg.inv(H.T*H)))
... *np.linalg.inv(i*w02 + H.T*dG)*H.T)
>>> np.allclose(B1, B2)
True
From these expressions we can determine the range and domain of the final
inverse Jacobian. Let the previous inverse be expressed as $\mat{B} = \mat{1} +
\ket{\alpha}\bra{\beta}$.
Method 1a)
++++++++++
For method 1a) $\ket{H} = \ket{\d\mat{X}}\mat{w}^2$. Choosing $\mat{\Gamma} =
\ket{\uvect{H}_\perp}$ gives the following, which reduces to Broyden's original
"Good" method in the limit of $w_0 \rightarrow 0$ and keeping exactly one
previous step.
.. math::
\mat{B} = \mat{1} + f(\mat{B} - \mat{1})
+ \left[\ket{\d\mat{X}} - \ket{\d\mat{G}}
- f(\mat{B} - \mat{1})
(\ket{\d\mat{G}}
+ w_0^2\ket{\d\mat{X}}
\braket{\d\mat{X}|\d\mat{X}}^{-1}\mat{w}^{-2})\right]
\frac{1}{\mat{1}w_0^2 + \mat{w}^2
\braket{\d\mat{X}|\d\mat{G}}}\mat{w}^2\bra{\d\mat{X}}.
The range is $\Sp(\ket{\alpha}, \ket{\d\mat{X} - \d\mat{G}})$ and the domain is
$\Sp(\bra{\beta}, \bra{\d\mat{X}}, \bra{\d\mat{G}})$. Choosing $\mat{\Gamma} =
\ket{\d\uvect{G}_\perp}$ gives the following:
.. math::
\mat{B} = \mat{1} +
f(\mat{B} - \mat{1})\left(
\mat{1} - \ket{\d\uvect{G}}\bra{\d\uvect{G}}\right)
+ (\ket{\d\mat{X}} - \ket{\d\mat{G}})
\frac{1}{\mat{1}w_0^2 + \mat{w}^2\braket{\d\mat{X}|\d\mat{G}}}\mat{w}^2
\left((1-f)\bra{\d\mat{X}}
+ f\braket{\d\mat{X}|\d\uvect{G}}\bra{\d\uvect{G}}\right)
The range is $\Sp(\ket{\alpha}, \ket{\d\mat{X} - \d\mat{G}})$ and the
domain is $\Sp(\bra{\beta}, \bra{\d\mat{X}})$.
Method 2b)
++++++++++
Both expressions are the same for method 2b) since $\ket{H} = \ket{\d\mat{G}}$
and the final expression for $\mat{\Gamma} = \ket{\d\uvect{G}_\perp}$ is the
following, which reduces to Broyden's original "Bad" method in the limit of $w_0
\rightarrow 0$ and keeping exactly one previous step.
.. math::
\mat{B} = \mat{1} +
f(\mat{B} - \mat{1})\left(\mat{1}
- \ket{\d\uvect{G}}\bra{\d\uvect{G}}\right)
+ (\ket{\d\mat{X}} - \ket{\d\mat{G}})
\frac{1}{\mat{1}w_0^2 +
\mat{w}^2\braket{\d\mat{G}|\d\mat{G}}}\mat{w}^2\bra{\d\mat{G}}.
The range is $\Sp(\ket{\alpha}, \ket{\d\mat{X} - \d\mat{G}})$ and the domain is
$\Sp(\bra{\beta}, \bra{\d\mat{G}})$.
.. note:: These reduce to the following in the limit of $f \rightarrow 0$ where
one sets $w_r\rightarrow 0$. This limit is a "stateless" version where the
Jacobian does not depend on the previous Jacobian (however, in preliminary
investigations, this has not performed well.)
.. math::
\mat{B}_{\text{1a)}} &= \mat{1} +
(\ket{\d\mat{X}} - \ket{\d\mat{G}})\cdot
\frac{1}{w_0^2\mat{1} + \mat{w}^2\cdot\braket{\d\mat{X}|\d\mat{G}}}
\cdot\mat{w}^2\cdot\bra{\d\mat{X}}\\
\mat{B}_{\text{2b)}} &= \mat{1} +
(\ket{\d\mat{X}} - \ket{\d\mat{G}})\cdot
\frac{1}{w_0^2\mat{1} + \mat{w}^2\cdot\braket{\d\mat{G}|\d\mat{G}}}
\cdot\mat{w}^2\cdot\bra{\d\mat{G}}
>>> np.random.seed(0)
>>> w02 = 0.1
>>> N = 10
>>> m = 5
>>> I = np.eye(N)
>>> dG = np.mat(np.random.rand(N,m))
>>> dX = np.mat(np.random.rand(N,m))
>>> B = np.mat(np.random.rand(N,N))
>>> R = dX - B*dG
>>> w2 = np.mat(np.diag(np.random.rand(m)**2))
>>> B1a = B + (R*w2*dX.T + (I - B)*w02)*np.linalg.inv(
... B*dG*w2*dX.T + B*w02)*B
>>> B2b = B + (R*w2*dG.T + (I - B)*w02)*np.linalg.inv(
... dG*w2*dG.T + I*w02)
>>> B1a_ = I + (dX - dG)*np.linalg.inv(
... np.eye(m)*w02 + w2*dX.T*dG)*w2*dX.T
>>> B2b_ = I + (dX - dG)*np.linalg.inv(
... np.eye(m)*w02 + w2*dG.T*dG)*w2*dG.T
>>> np.allclose(B1a, B1a_), np.allclose(B2b, B2b_)
True, True
We begin by representing the Jacobian and inverse in a limited
subspace as follows:
.. math::
\mat{J} &= \mat{1} + \ket{\uvect{\alpha}}\mat{j}\bra{\uvect{\beta}},\\
\mat{B} &= \mat{1} + \ket{\uvect{\alpha}}\mat{b}\bra{\uvect{\beta}}.
The inversion relation gives
.. math::
\mat{b} &= \frac{-1}{\mat{j}^{-1} +
\braket{\uvect{\beta}|\uvect{\alpha}}},\\
\mat{j} &= \frac{-1}{\mat{b}^{-1} +
\braket{\uvect{\beta}|\uvect{\alpha}}}.
We drop the use of the $\mat{\Gamma}$ matrix, instead relying on the
projectors $\ket{\uvect{\alpha}}$ and $\ket{\uvect{\beta}}$ and
minimizing the following
.. math::
&\min_{\mat{J}}\left(\sum_{j}
\Norm{(\ket{\d{G}_i} - \mat{J}\ket{\d{X}_i})\mat{w}_{ij}}_{2}^{2}
+ w_0^{2} \norm{\mat{J} - \mat{J}_{0}}_{F}^{2}
+ w_r^{2} \norm{\mat{J} - \mat{1}}_{F}^{2}
\right),\\
&\min_{\mat{B}}\left(\sum_{j}
\Norm{(\ket{\d{X}_{i}} - \mat{J}\ket{\d{G}_{i}})\mat{w}_{ij}}_{2}^{2}
+ w_0^{2} \norm{\mat{B} - \mat{J}_{0}}_{F}^{2}
+ w_r^{2} \norm{\mat{B} - \mat{1}}_{F}^{2}
\right).
Here we have three weights $\mat{w}$ specifies the priority of the
secant conditions: This may be a matrix, allowing for more complicated
preference schemes involving all $n(n-1)/2$ differences rather than
just the $n-1$ basis vectors.
.. note:: It is assumed that $\mat{w}$ is symmetric in this sense that
$\mat{w}^2$ is used everywhere. The input need not be symmetric
and $\mat{w}^{2} \equiv \mat{w}\mat{w}^T$ will be used throughout.
.. note:: In order for this to work well, the Jacobian of the objective function
must be well represented by the identity $\mat{1}$ plus a correction of
low rank. If, for a given problem, the Jacobian has the form
$\mat{J} \approx \mat{J}_0 + \ket{}\bra{}$ then one should rescale the
function $\ket{\mat{G}} \rightarrow \mat{J}_0^{-1} \ket{\mat{G}}$ so that the
leading term is the identity. It is assumed here that this rescaling has
been performed.
This is related to the notion of "linear mixing" as an iterative scheme:
.. math::
x \rightarrow (1-\alpha)x + \alpha F(x).
If this is known to converge, then one can probably assume that $\mat{B}
\approx \alpha\mat{1} + \ket{}\bra{}$ and one should rescale the objective
$G(x) \rightarrow \alpha G(x)$. If this has been done consistently, then
pulling the Jacobian back toward the identity with a non-zero $w_r$ will pull
the Newton iteration back toward the linear mixing which should restore
convergence.
for methods 1a) and 2b) respectively. This gives
.. math::
\mat{j}_{\text{1a)}} &=
\left\{
\braket{\uvect{\alpha}|\d\mat{G}-\d\mat{X}}\mat{w}^2
\braket{\d\mat{X}|\uvect{\beta}}
+
w_{0}^2\braket{\uvect{\alpha}|\uvect{\alpha}_{0}}
\mat{j}_{0}
\braket{\uvect{\beta}_{0}|\uvect{\beta}}
\right\}
\cdot
\frac{1}{\braket{\uvect{\beta}|\d\mat{X}}\mat{w}^2
\braket{\d\mat{X}|\uvect{\beta}} + w_{0}^2 + w_{r}^2},\\
\mat{b}_{\text{2b)}} &=
\left\{
\braket{\uvect{\alpha}|\d\mat{X}-\d\mat{G}}\mat{w}^2
\braket{\d\mat{G}|\uvect{\beta}}
+
w_{0}^2\braket{\uvect{\alpha}|\uvect{\alpha}_{0}}
\mat{b}_{0}
\braket{\uvect{\beta}_{0}|\uvect{\beta}}
\right\}
\cdot
\frac{1}{\braket{\uvect{\beta}|\d\mat{G}}\mat{w}^2
\braket{\d\mat{G}|\uvect{\beta}} + w_{0}^2 + w_{r}^2}.
From these formula, it is clear that no information will be lost if we
have the following:
1a)
.. math::
\ket{\uvect{\alpha}} &=
\Sp(\ket{\uvect{\alpha}_{0}}, \ket{\d\mat{X}-\d\mat{G}}),\\
\ket{\uvect{\beta}} &=
\Sp(\ket{\uvect{\beta}_{0}}, \ket{\d\mat{X}}),\\
2b)
.. math::
\ket{\uvect{\alpha}} &=
\Sp(\ket{\uvect{\alpha}_{0}}, \ket{\d\mat{X}-\d\mat{G}}),\\
\ket{\uvect{\beta}} &=
\Sp(\ket{\uvect{\beta}_{0}}, \ket{\d\mat{G}}),\\
The purpose of the $w_{0}$ term here is to minimize the change of the
approximation from the previous result. This is important for the
representation of the Jacobian in the subspace spanned by
$\ket{\uvect{\alpha}}$ and $\ket{\uvect{\beta}}$ that lies outside of
the space of secant conditions. The parameters $w_{r}$ is new here and
"pulls" the solution back toward the identity $\mat{1}$ which ideally
describes a convergent (but slow) method. This simulates a "restart"
when large -- resetting the Jacobian to the identity and thus should
be used when convergence is poor (or there is a divergence).
The numeric issue here is dealing with poorly behaved matrices. To
deal with this, we introduce a parameter `singular_tol` with the goal
that all singular values of the diadic $\mat{b}$ of the final inverse
Jacobian lie between `singular_tol` and `1/singular_tol`. To do this,
all matrix multiplications of the smaller matrices are done using the
SVD with singular values outside the range of `singular_tol**2`
truncated in intermediate calculations and a final truncation within
the required range.
Memory-limited Implementation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To limit the amount of memory required, we store the following:
$\ket{\mat{X}}$, $\ket{\mat{G}}$ :
$N \times n$ matrices: the columns of which are the previous steps
and function evaluations. (The oldest steps may be replaced and
indices used to keep track of the ordering to minimize the
requirement for working with this array.)
$\ket{\uvect{A}}$, $\ket{\uvect{B}}$ :
$N \times m - (n-1)$ matrices representing the extra basis vectors
required to span the full space of $\ket{\uvect{\alpha}}$ and
$\ket{\uvect{\beta}}$.
In order to prevent working with the large matrices, we form the
following matrices:
.. math::
\begin{pmatrix}
\braket{\mat{X}-\mat{G}|\mat{X}-\mat{G}}
& \braket{\uvect{A}|\mat{X}-\mat{G}}\\
\braket{\mat{X}-\mat{G}|\uvect{A}}
& \braket{\uvect{A}|\uvect{A}}
\end{pmatrix},
\qquad
\begin{pmatrix}
\braket{\mat{X}|\mat{X}}
& \braket{\uvect{B}|\mat{X}}\\
\braket{\mat{X}|\uvect{B}}
& \braket{\uvect{B}|\uvect{B}}
\end{pmatrix},
\qquad
\begin{pmatrix}
\braket{\mat{G}|\mat{G}}
& \braket{\uvect{B}|\mat{G}}\\
\braket{\mat{G}|\uvect{B}}
& \braket{\uvect{B}|\uvect{B}}
\end{pmatrix}.
.. note:: This is predicated on the following analysis for extracting
the span of a set of vectors. Consider the SVD of
a set of vectors:
.. math::
\ket{\mat{X}} =
\begin{pmatrix}
\ket{\uvect{U}} & \ket{\uvect{U}_{\perp}}
\end{pmatrix}
\begin{pmatrix}
\mat{\sigma}\\
& \mat{0}
\end{pmatrix}
\begin{pmatrix}
\mat{V}^{\dagger}\\
\mat{V}_{\perp}^{\dagger}
\end{pmatrix}
The desired basis $\ket{\uvect{U}} = \Sp(\ket{\mat{X}})$ may be
expressed in terms of the original vectors as
.. math::
\ket{\uvect{U}} = \ket{\mat{X}}\mat{V}\mat{\sigma}^{-1}.
Hence, we may determine this from the original matrix by
considering the eigenvalue decomposition of the small matrix
.. math::
\braket{\mat{X}|\mat{X}} =
\begin{pmatrix}
\mat{V} & \mat{V}_{\perp}
\end{pmatrix}
\begin{pmatrix}
\mat{\sigma}^2\\
& \mat{0}
\end{pmatrix}
\begin{pmatrix}
\mat{V}^{\dagger}\\
\mat{V}_{\perp}^{\dagger}
\end{pmatrix}.
A rank revealing Cholesky decomposition of $\braket{\mat{X}|\mat{X}} =
\mat{r}^{T}\mat{r}$ may also be similarly used.
A corollary is that, given an orthonormal basis
$\ket{\uvect{\alpha}_{0}}$ and a new set of vectors $\ket{\beta}$,
we may construct the augmented basis by diagonalizing the small
matrix
.. math::
\braket{\beta|\beta} -
\braket{\beta|\uvect{\alpha}_{0}}
\braket{\uvect{\alpha}_{0}|\beta}
&= \mat{v} \mat{\sigma}^2 \mat{v}^{T},\\
\begin{pmatrix}
\ket{\uvect{\alpha}_{0}} &
\ket{\uvect{\alpha}_{1}}
\end{pmatrix} &=
\begin{pmatrix}
\ket{\uvect{\alpha}_{0}} &
\ket{\beta}
\end{pmatrix}
\cdot
\begin{pmatrix}
\mat{1} & -\braket{\uvect{\alpha}_{0}|\beta}\mat{v}\mat{\sigma}^{-1}\\
\mat{0} & \mat{v}\mat{\sigma}^{-1}
\end{pmatrix}.
>>> np.random.seed(0)
>>> alpha_0 = np.linalg.qr(np.random.rand(10, 5))[0][:,:5]
>>> beta = np.random.rand(10, 2)
>>> alpha_0_beta = np.dot(alpha_0.T, beta)
>>> s2, v = np.linalg.eigh(np.dot(beta.T, beta)
... - np.dot(alpha_0_beta.T, alpha_0_beta))
>>> s = np.sqrt(s2)
>>> alpha_1 = np.dot(beta - np.dot(alpha_0, alpha_0_beta), v/s[None,:])
$\ket{\uvect{\alpha}_{0}}$ and $\ket{\uvect{\alpha}_{1}}$ are
orthogonal and normalized:
>>> np.allclose(np.dot(alpha_1.T, alpha_0), 0)
True
>>> np.allclose(np.dot(alpha_1.T, alpha_1), np.eye(2))
True
For large systems, we cannot store the full matrices. If the number
of iterations is small, then we can form all of the matrices in terms
of dyads (see :class:`DyadicSum`), however, if the number of
iterations becomes too large, then we must truncate the spaces. Let
us consider the following information with $N$ being the (large) size
of the space, and $n$ be the number of previous steps maintained.
$\ket{x}$, $\ket{x_0}$, $\ket{g}$, $\ket{g_0}$ :
1-d arrays of length `N` of the current and previous steps along
with the value of the objective function at these points.
$\ket{\uvect{X}}$, $\ket{\uvect{G}}$ :
$N \times (n-1)$ arrays of orthonormal basis vectors spanning the
space of difference vectors $\ket{x_i} - \ket{x_j}$ and $\ket{g_i}
- \ket{g_j}$ respectively.
$\ket{\uvect{B}_x}$, $\ket{\uvect{B}_g}$ :
$N \times m$ arrays of orthonormal basis vectors for representing
the inverse Jacobian.
Given these, we can express everything in terms of various $n \times
n$ matrices in these bases. Thus, the inverse Jacobian is expressed
as:
.. math::
\mat{B} = \mat{1} + \ket{\uvect{X}}\mat{b}\bra{\uvect{G}}.
One might like to include an optional preconditioner $\mat{B}
\rightarrow \mat{B}\op{B}_0$. For method 1a) the preconditioner
factors to the right: to implement this, the simplest approach is to
simply precondition the initial function $G(x) \rightarrow
\op{B}_{0}G(x)$. For method 2b) a simple factorization does
not work quite as well. [Johnson:1988]_ also suggests allowing for a
factor $\alpha$ to reduce the weight of the initial identity
approximation:
.. math::
\mat{B} = \alpha\mat{1} + \ket{\uvect{X}}\mat{b}\bra{\uvect{G}}.
We now express everything in terms of the basis vectors
$\ket{\uvect{X}}$ and $\ket{\uvect{G}}$. Thus, we define the
following matrices:
.. math::
\ket{\d\mat{X}} &= \ket{\uvect{X}}\mat{d}_x, \\
\ket{\d\mat{G}} &= \ket{\uvect{G}}\mat{d}_g.
This allows use to express the equations as:
.. math::
\mat{B}\ket{\uvect{G}} = \ket{\uvect{G}}\alpha
+ \ket{\uvect{X}}\mat{b}
.. math::
\mat{B}_{\text{1a)}} &\rightarrow
\mat{1} + \ket{\uvect{X}}\mat{b}\bra{\uvect{G}}
+
\left(\ket{\uvect{X}}\mat{d}_x
- \mat{B}_{0}\ket{\uvect{G}}\mat{d}_g\right)
\cdot\mat{w}^2\cdot\mat{d}_{x}^{T}\bra{\uvect{X}}\cdot\frac{1}{
\mat{B}_{0}\ket{\uvect{G}}
\mat{d}_g\mat{w}^2\mat{d}_x^{T}\bra{\uvect{X}}
+ w_{0}^2\mat{\Gamma}\mat{\Gamma}^{T}}\cdot
(\mat{1} + \ket{\uvect{X}}\mat{b}\bra{\uvect{G}}), \\
\mat{B}_{\text{2b)}} &= \mat{B}_{0}
+ (\ket{\d\mat{X}} - \mat{B}_{0}\ket{\d\mat{G}})\cdot\mat{w}^2
\bra{\d\mat{G}}\cdot\frac{1}{
\ket{\d\mat{G}}\mat{w}^2\bra{\d\mat{G}}
+ w_{0}^2\mat{\Gamma}\mat{\Gamma}^{T}}
For extremely large systems, we wish to represent the inverse Jacobian
as a dyadic sum:
.. math::
\mat{B} = (\mat{1} + \mat{U}\mat{\sigma}\mat{V}^{T})\mat{B}_{0}
where $\mat{U}$ and $\mat{V}$ are $n \times k$ where $n$ is the size
of the problem (large) and $k$ is the number of previous steps we
keep (small). The main difficulty with these formula is inverting the
large matrix. The "Good" method makes this even more difficult
because the denominator is not symmetric while the "Bad" method --
apart from possibly suffering some of the numerical instabilities of
the "Bad" Broyden method -- does not easily allow the form of the
dyadic inverse Jacobian to be preserved (if $\mat{B}_{0}$ is
non-trivial).
.. _sec:truncation-algorithm:
Truncation Algorithm
--------------------
In the memory limited-implementation, one must choose how to discard
the previous steps. This choice must be made in two place: the set of
previous steps $\ket{x}$ and $\ket{g}$ to be retained, and the
truncation of the current inverse Jacobian approximation $\mat{B}$.
In principle, given $n < N$ previous steps,
it is likely that the all of the steps are linearly independent, and
so the minimization condition is irrelevant - one simply solves the
$n$ secant equations in the subspace spanned by the difference
vectors. (The subspace decomposition method has this property -
Johnson's method does not quite have this property when $0 < w_0$.)
Once one needs to limit
.. [Broyden:1965] C. G. Broyden, `A Class of Methods for Solving
Nonlinear Simultaneous Equations
<http://www.jstor.org/stable/2003941>`_, Math. Comput. 19(92),
577-593 (1965)
.. [Johnson:1988] D. D. Johnson, `Modified Broyden's method for
accelerating convergence in self-consistent calculations
<http://dx.doi.org/10.1103/PhysRevB.38.12807>`_, Phys. Rev. B
38(18), 12807--12813 (1988)
.. [Bierlaire:2006] M. Bierlaire, F. Crittin, and M. Themans, `A
multi-iterate method to solve systems of nonlinear equations
<http://dx.doi.org/10.1016/j.ejor.2006.09.080>`_, European Journal
of Operational Research 183(1), 20 -- 41 (2006)
Parameters
==========
A modification of the Newton step can be used to allow manual control
of some (small) subset of the parameters. As an example (and in our
notations) we consider a set of chemical potentials :math:`\mu` and
the respective densities :math:`N`. Allowing the chemical potentials
to change to soon, before the other parameters have relaxed, can throw
off an iteration leading to wild oscillations. Instead, one might
want to wait until a degree of convergence for fixed chemical
potentials has been achieved, and then let the vary.
We partition the problem and Jacobian as
.. math::
\begin{pmatrix}
\d{G}\\
\d{N}
\end{pmatrix} =
\mat{J}\cdot
\begin{pmatrix}
\d{x}\\
\d{\mu}
\end{pmatrix}
The full Broyden step would be to find :math:`\d{x}` and :math:`\d{\mu}`
from :math:`\d{G}` and :math:`\d{N}` using :math:`\mat{J}^{-1}`.
Instead, we need to find :math:`\d{x}` (and :math:`\d{N}`) from
:math:`\d{G}` and the specified "control" parameters :math:`\d{\mu}`.
To do this, we partition the inverse Jacobian as
.. math::
\mat{J}^{-1} = \begin{pmatrix}
\cdots & \cdots\\
\cdots & \mat{d}
\end{pmatrix}
One can solve the problem explicit using the full partitioning, but in
practise, the matrices are large and one can really only form the small
block :math:`\mat{d}` corresponding to the small number of control
parameters. The following formula allow one to compute the required
components using only this small matrix, and the inverse Jacobian
(often stored in a compact dyadic representation):
.. math::
\begin{pmatrix}
\d{G}\\
\d{N}
\end{pmatrix} &=
\left(
\mat{1} - \begin{pmatrix}
\mat{0} & \mat{0}\\
\mat{0} & \mat{d}^{-1}
\end{pmatrix}
\left[
\mat{J}^{-1} - \mat{1}\right]
\right)
\cdot
\begin{pmatrix}
\d{G}\\
\d{\mu}
\end{pmatrix}, \\
\begin{pmatrix}
\d{x}\\
\d{\mu}
\end{pmatrix} &=
\mat{J}^{-1}
\cdot
\begin{pmatrix}
\d{G}\\
\d{N}
\end{pmatrix}.
For calculational purposes we form
.. math::
\begin{pmatrix}
a\\
b
\end{pmatrix} &=
\mat{J}^{-1}
\cdot
\begin{pmatrix}
\d{G}\\
\d{\mu}
\end{pmatrix}, \\
\d{N} &= \d{\mu} - \mat{d}^{-1}(b - \d{\mu}), \\
\begin{pmatrix}
\d{x}\\
\d{\mu}
\end{pmatrix} &=
\mat{J}^{-1}
\cdot
\begin{pmatrix}
\d{G}\\
\d{N}
\end{pmatrix}.
.. todo:: Use :func:`scipy.linalg.fblas.dgemm` to allow matrix
multiplications etc. to be done in place. In general, minimize
memory footprint and make everything work for very large matrices
(maybe with an option if it is slower).
Sensitive Parameters
~~~~~~~~~~~~~~~~~~~~
In some situations, certain parameters may be considered "sensitive".
Typically, these are collective parameters. For example, in problems
representing functions over a grid, one may have a couple of
parameters (such as chemical potentials) where the dependent quantity
to be set to zero is an integral of other quantities. Premature
optimization of these parameter can cause problems.
A typical case is one where the other "local" parameters must first
achieve a certain degree of convergence before the dependent quantity
can be accurately estimated. If one attempts to adjust the sensitive
parameter before this convergence had been achieved, then the
parameters will fluctuate wildly.
What one needs to do is to hold these sensitive parameters fixed for a
while, and then adjust them once the local parameters have settled
down. In order to ensure that these "noisy" parameters do not mess up
the Jacobian, one must call :meth:`Broyen.update` or
:meth:`Broyden.update_J` with an appropriate argument `skip_inds` that
identifies the noisy parameters. One may use this in conjunction with
the previous techniques to provide steps that are consistent with
holding these sensitive parameters fixed until the "noise" is
sufficiently reduced that updates can be resumed.
Characterizing this "noise" can be tricky: one might for example look
for the change in error from a previous iteration. However, if a very
small step size has been taken (for example as a result of trying to
improve the Jacobian or because of a line-search), then this could
result in an incorrect assessment about the degree of convergence.
The simplest criterion is for the case of models where holding the
sensitive parameters fixed (and ignoring the error in this component)
will lead to a convergent iteration. Here the natural test for"noise"
is to check for a weak level of convergence of the other parameters
(maybe in conjunction with the change in the parameter error change
from iteration to iteration). This is equivalent to an efficient
method of solving the following recursive problem: Define an outer
problem with only the sensitive parameters. The objective function
solves the inner problem with these parameters fixed, then the outer
problem solves for the sensitive parameters.
.. _sec-notes:
Linear Algebra
==============
Here we summarize some of the linear algebra required here.
Subspace Reduction
~~~~~~~~~~~~~~~~~~
One of the problems we shall encounter is that we have a matrix
expressed in a basis as
.. math::
\begin{pmatrix} \ket{a_0} & \ket{a_1} \end{pmatrix}
\begin{pmatrix}
\mat{c}_{11} & \mat{c}_{12}\\
\mat{c}_{21} & \mat{c}_{22}
\end{pmatrix}
\begin{pmatrix} \bra{b_0} \\ \bra{b_1} \end{pmatrix}
and we need to reduce the rank of the subspaces $\ket{a_1}$ and
$\bra{b_1}$ while holding subspaces $\ket{a_0}$ and $\bra{b_0}$
fixed. I am not yet sure what the optimal solution to this problem
is, but a very good solution can be obtained by performing two SVD's:
.. math::
\begin{pmatrix}
\mat{c}_{21} & \mat{c}_{22}
\end{pmatrix}
&=
\mat{q}_{a}\mat{\sigma}_{a}\mat{V}^{T}_{a},\\
\begin{pmatrix}
\mat{c}_{12} \\ \mat{c}_{22}
\end{pmatrix}
&=
\mat{V}_{b}
\begin{pmatrix}
\mat{\sigma}_{b}\\
\mat{0}
\end{pmatrix}
\mat{q}_{b}^{T}.
The truncation scheme is then to use the two transformations obtained
to determine the rearrangement. This typically approaches the optimal
truncation to within a couple of percent (though a tuned matrix could
probably break this).
To demonstrate this, we project out one state from a 4 by 4 matrix.
We compute the optimal projection using a non-linear solver:
>>> np.random.seed(0)
>>> def my_projection(A):
... A = np.asmatrix(A)
... P = np.asmatrix(np.diag([1.0, 1.0, 1.0, 0.0]))
... qa, _, _ = np.linalg.svd(A[2:,:])
... _, _, qbt = np.linalg.svd(A[:,2:])
... Qa = Qbt = np.eye(4)
... Qa[2:,2:] = qa
... Qbt[2:, 2:] = qbt
... return Qa*P*Qa.T*A*Qbt.T*P*Qbt
>>> solver = sp.optimize.fmin_powell
>>> def optimal_projection(A):
... P = np.asmatrix(np.diag([1.0, 1.0, 1.0, 0.0]))
... def trans(A, x):
... t1, t2 = x
... qa = np.mat([[np.cos(t1), -np.sin(t1)],
... [np.sin(t1), np.cos(t1)]])
... qbt = np.mat([[np.cos(t2), -np.sin(t2)],
... [np.sin(t2), np.cos(t2)]])
... Qa = Qbt = np.eye(4)
... Qa[2:,2:] = qa
... Qbt[2:,2:] = qbt
... return Qa.T*P*Qa*A*Qbt.T*P*Qbt
... def f(x):
... return np.linalg.norm(trans(A,x) - A, 2)
... x = solver(f, [0, 0], disp=False)
... return trans(A, x)
>>> def rel_err(A):
... e1 = np.linalg.norm(my_projection(A) - A, 2)
... e0 = np.linalg.norm(optimal_projection(A) - A, 2)
... return(abs(e0-e1)/abs(e0))
>>> errs = []
>>> for n in xrange(10):
... #print("%i of 10" % n)
... errs.append(rel_err(np.random.rand(4,4) - np.random.rand(4,4)))
>>> np.mean(errs), np.std(errs), np.max(errs) < 0.1 # doctest: +SKIP
(0.01..., 0.02..., True)
To do this we note that we can perform a block diagonalization
of the matrix using the SVD of the lower-right block $\mat{c}_{22} =
\mat{u}\mat{\sigma}\mat{v}^{T}$ and truncating the basis based on
this. This follows from the following decomposition:
.. math::
\begin{pmatrix}
\ket{a_0} &
\ket{a_0}\mat{c}_{12}\mat{c}_{22}^{-1} + \ket{a_1}
\end{pmatrix}
\begin{pmatrix}
\mat{c}_{11} - \mat{c}_{12}\mat{c}_{22}^{-1}\mat{c}_{21} \\
& \mat{c}_{22}
\end{pmatrix}
\begin{pmatrix}
\bra{b_0} \\
\mat{c}_{22}^{-1}\mat{c}_{21}\bra{b_0} + \bra{b_1}
\end{pmatrix}
Notes
=====
Here we summarize some problems, convergence characteristics etc.
1) In one case we got stuck in a loop adding a point that was already
included. What happened is that we were using
:attr:`JinvBroyden.x_method` `='min'` which was choosing a
particular point. A step was made from this point producing a new
point which was added, but the new point did not have a lower
$\norm{G(x)}$ so the same point was used over and over again with
no significant update to the Jacobian.
A similar problem can occur with :attr:`JinvBroyden.replace_method`
`='largest'`. The resolution consists of one of the following:
a) Using to :attr:`JinvBroyden.x_method` `= 'recent'` and
:attr:`JinvBroyden.replace_method` `='oldest'` to ensure that
that all the points are cycled through and that the newest point
is used. These are now the defaults.
b) Using a line-search to ensure that a downhill step is chosen to
ensure that the newest points have the smallest error.
To Do
=====
.. todo:: Check why things work better with limited memory options.
I.e. keeping a large :attr:`JinvBroyden.n_prev` can make things worse.
"""
from __future__ import division
__all__ = ['DyadicSum', 'JinvBroyden', 'ExtendedBroyden',
'_JinvExtendedBroyden', 'JinvExtendedBroydenFull']
import warnings
import copy
import math
import numpy as np
# We import these here in case we want to change in future to use more
# efficient implementations.
from numpy import dot
from numpy.linalg import eigh
import scipy as sp
from zope.interface import implements
import mmf.objects
from mmf.objects import StateVars, process_vars
from mmf.objects import Required, Computed, Delegate
import mmf.math.linalg as mmf_linalg
from mmf.solve.solver_interface import IJinv
_FINFO = np.finfo(float)
_EPS = _FINFO.eps
_TINY = _FINFO.tiny
_SINGULAR_TOL = 1e-6
[docs]class DyadicSum(StateVars):
r"""Represents a matrix as a sum of $n$ dyads of length $N$:
.. math::
\mat{M} = \mat{1} + \sum_{n}\ket{a_n}\sigma_n\bra{b_n}
= \mat{1} + \ket{\mat{A}}\mat{\sigma}\bra{\mat{B}}.
The two sets of bases are stored in $N \times n$ matrices
$\ket{\mat{A}}$ and $\ket{\mat{B}}$ and the $n\times n$ matrix
$\mat{\sigma}$.
If :attr:`use_svd` is `True`, then the singular value
decomposition of $\mat{\sigma}$ is used to keep $\mat{\sigma}$
diagonal with positive entries and only these diagonals are stored.
.. todo:: Perform an analysis of the inverse jacobian, changing bases
if needed to make the dyads linearly independent.
.. note:: We assume everything is real here: no complex
conjugation is performed with transposition.
Examples
--------
>>> np.random.seed(3)
>>> N, n, m = 20, 5, 3
>>> At = np.random.random((N, n+m))
>>> B = np.random.random((n+m, N))
>>> at = At[:, :n]
>>> b = B[:n, :]
>>> sigma = np.eye(n)
>>> s = DyadicSum(at=at, b=b, sigma=sigma, n_max=n)
>>> np.allclose(np.dot(s.at*s.sigma, s.b), np.dot(at, b))
True
Now we add the remaining terms. The dyadic sum should be the
best rank n approximation of `At*B`:
>>> s.add_dyad(at=At[:, n:], b=B[n:, :], sigma=np.eye(m))
>>> U, d, Vt = np.linalg.svd(np.dot(At, B))
>>> d[n:] = 0 # Make rank deficient approx
>>> np.allclose(np.dot(s.at*s.sigma, s.b), np.dot(U*d, Vt))
True
Note that the approximation is not exact:
>>> np.allclose(np.dot(s.at*s.sigma, s.b), np.dot(At, B))
False
Examples
--------
>>> J = DyadicSum()
>>> x = np.mat([[1.0],
... [2.0]])
>>> J*x - x
matrix([[ 0.],
[ 0.]])
>>> b = np.mat([[1.0, 0.0]])
>>> a = np.mat([[0.0],
... [2.0]])
>>> J.add_dyad(a, b)
>>> J*x
matrix([[ 1.],
[ 4.]])
>>> x.T*J
matrix([[ 5., 2.]])
>>> import mmf.archive
>>> A = mmf.archive.Archive()
>>> A.insert(J=J)
'J'
>>> s = str(A)
>>> d = {}
>>> exec(s, d)
>>> np.allclose(J.asarray(), d['J'].asarray())
True
Here is an example of limiting the size of the dyadic.
>>> np.random.seed(3)
>>> s1 = np.eye(10)
>>> s2 = DyadicSum(n_max=10)
>>> for n in xrange(100):
... a = np.random.random(10)
... b = np.random.random(10)
... _add_dyad(s1, a, b)
... s2.add_dyad(a, b)
This should still be exact because the space is only ten
dimensional.
>>> np.allclose(s1, s2.asarray())
True
The dynamic_range parameter is a more mathematical way of keeping
the basis fixed. It ensures that the most significant singular
values will be kept.
>>> s1 = DyadicSum(dynamic_range=0.01)
>>> s2 = DyadicSum()
>>> for n in xrange(100):
... a = np.random.random(20)
... b = np.random.random(20)
... s1.add_dyad(a, b)
... s2.add_dyad(a, b)
>>> d1 = np.linalg.svd(s1.asarray(), compute_uv=False)
>>> d2 = np.linalg.svd(s2.asarray(), compute_uv=False)
>>> (abs(d1 - d2)/max(d2)).max() < 0.02
True
"""
_state_vars = [
('at', np.empty((0, 0), dtype=float),
r"""`(N, n`) array $\ket{\mat{A}}$. We use the ket form
here because that is most often used and more efficient
(presently, :func:`numpy.dot` makes copies if the arrays
are not both `C_CONTIGUOUS`."""),
('b', np.empty((0, 0), dtype=float),
r"""`(n, N`) array $\bra{\mat{B}}$."""),
('sigma', np.empty(0, dtype=float),
r"""`(n, n)` array $\mat{\sigma}$ or `(n, )` array
$\diag\mat{\sigma}$ if :attr:`use_svd`."""),
('n_max', np.inf,
r"""Maximum number $n$ of Dyads to store. If this is finite,
then the $N \times n$ matrices are pre-allocated, and when
$m$ new dyads are added, they replace the least significant $m$
dyads, requiring no more allocations, otherwise the dyads
are added as required."""),
('inplace', False,
r"""If :attr:`n_max` is finite, then this signifies that
the matrices :attr:`at` and :attr:`b` should be allocated
only once and all operations should be done inplace on
these. In particular, when new dyads are added, the least
significant components of the ole matrix are dropped
*first* then the dyads are added.
If this is `False`, then the new dyads are added first and
*then* the least-significant components are dropped. This
latter form requires more storage, but may be more
accurate.
.. note:: Presently, the inplace operations cannot be performed
because `scipy` does not expose the correct BLAS
routines, so the only purpose of this flag is to
somewhat limit memory usage, but this is probably
quite ineffective at the moment."""),
('use_svd', True,
r"""If `True`, then :meth:`orthogonalize` will be run
whenever a new dyad is added. (Generally a good idea, but
can be slow if there are a lot of dyads)."""),
('dynamic_range', _EPS,
r"""Each call to :meth:`orthogonalize` will only keep the
singular values within this fraction of the highest."""),
]
process_vars()
[docs] def __init__(self, *v, **kw):
if 'at' in kw or 'b' in kw or 'sigma' in kw:
self.at = np.ascontiguousarray(self.at)
self.b = np.ascontiguousarray(self.b)
assert(self.at.shape[0] == self.b.shape[1])
assert ((self.at.shape[1], self.b.shape[0]) == self.sigma.shape or
(self.at.shape[1], self.b.shape[0]) == self.sigma.shape*2)
self.orthogonalize()
if 'n_max' in kw:
if self.n_max < np.inf:
self.use_svd = True
if len(self.b) != self.n_max:
# Reset n_max
self.orthogonalize()
if ('use_svd' in kw and not self.use_svd and self.n_max < np.inf):
raise ValueError(
"Finite n_max = %i requires svd." % (self.n_max, ))
[docs] def reset(self):
"""Reset the matrix.
Examples
--------
>>> M = DyadicSum()
>>> M.asarray()
array(1.0)
>>> M.add_dyad([1, 0], [1, 0])
>>> M.add_dyad([0, 2], [1, 0])
>>> M.add_dyad([3, 0], [0, 1])
>>> M.add_dyad([0, 4], [0, 1])
>>> M.asarray()
array([[ 2., 3.],
[ 2., 5.]])
>>> M.reset()
>>> M.asarray()
array(1.0)
"""
self.suspend()
self.at = np.empty((0, 0), dtype=float)
self.b = np.empty((0, 0), dtype=float)
self.sigma = np.empty(0, dtype=float)
self.resume()
[docs] def add_dyad(self, at, b, sigma=None):
r"""Add `|a>d<b|` to `J`.
Parameters
----------
at, b : array
These should be either 1-d arrays (for a single dyad) or
2-d arrays (`(N, m)` for `at` and `(m, N)` for `b`)
arrays representing the bra's `<a|` and `<b|`.
sigma : array
Matrix linking `at` and `b`. Must be provided if `len(at.T)`
and `len(b)` are different.
Examples
--------
Here is a regression test for a bug: adding a set of dyads
that would push the total number beyond `n_max`.
>>> np.random.seed(0)
>>> at = np.random.random((10, 3))
>>> b = np.random.random((3, 10))
>>> s = DyadicSum(n_max=2)
>>> s.add_dyad(at=at, b=b)
"""
# Checking
at = np.asarray(at)
b = np.asarray(b)
if 1 == len(at.shape):
at = at.reshape((len(at), 1))
if 1 == len(b.shape):
b = b.reshape((1, len(b)))
a = at.T
if sigma is None:
assert len(a) == len(b)
sigma = np.eye(len(a), len(b))
else:
sigma = np.asarray(sigma)
if 1 == len(sigma.shape):
sigma = np.diag(sigma)
assert len(a) == sigma.shape[0]
assert len(b) == sigma.shape[1]
assert a.shape[1] == b.shape[1]
if 0 < len(self.b):
assert self.at.shape[0] == at.shape[0]
assert self.b.shape[1] == b.shape[1]
if self.n_max == np.inf or not self.inplace:
# Just add them
at_ = np.hstack([self.at, at])
b_ = np.vstack([self.b, b])
self_sigma = self.sigma
if 1 == len(self_sigma.shape):
self_sigma = np.diag(self_sigma)
sigma_ = mmf_linalg.block_diag((self_sigma, sigma))
else:
assert len(a) <= self.n_max
assert len(b) <= self.n_max
at_ = self.at
b_ = self.b
sigma_ = np.diag(self.sigma)
na, nb = sigma.shape
sigma_[-na:, -nb:] = sigma
at_[:, -na:] = at
b_[-nb:, :] = b
else:
# Dyadic sum is empty: just add the new ones
at_ = at
b_ = b
sigma_ = sigma
self.orthogonalize(at=at_, b=b_, sigma=sigma_)
[docs] def orthogonalize(self, at=None, b=None, sigma=None):
r"""Perform a QR decomposition of the dyadic components to
make them orthogonal.
Notes
-----
Let $\ket{\mat{A}} =$ :attr:`at` and $\bra{\mat{B}} =$
:attr:`b` and $\mat{\sigma} =$ :attr:`sigma`:
.. math::
\ket{\mat{A}}\mat{\sigma}\bra{\mat{B}}
= \ket{\uvect{Q}_a}\mat{r}_a\mat{\sigma}\mat{r}_b^T\bra{\uvect{Q}_b}
= \ket{\uvect{Q}_a}\mat{u}\mat{d}\mat{v}^T\bra{\uvect{Q}_b}
= \ket{\uvect{A}}\mat{d}\bra{\uvect{B}}
In order to compute the QR factorization of $\ket{\mat{A}}$
without working with the large matrices, we could do a Cholesky
decomposition on
.. math::
\braket{\mat{A}|\mat{A}} &= \mat{r}_a^T\mat{r}_{a}\\
\ket{\uvect{Q}} &= \ket{\mat{A}}\mat{r}_{a}^{-1}
.. note:: These should be nicely accelerated using the BLAS
routines for performing the operations in place. In
principle, :func:`numpy.dot` should do this, but at present
there are issues with contiguity that force copies to be
made. Also, SciPy does not yet expose all BLAS operations,
in particular `_SYRK` is missing.
We perform a QR decomposition of these
.. math::
\mat{A} &= \mat{Q}_{A}\mat{R}_{A}, \\
\mat{B} &= \mat{Q}_{B}\mat{R}_{B}
and then an SVD of the product of the R factors
$\mat{U}\mat{D}\mat{V}^{\dagger} = \mat{R}_{A}\mat{R}_{B}^T$
to obtain the new basis:
.. math::
\mat{M} = \mat{A}\mat{B}^{T}
= \mat{Q}_{A}\mat{R}_{A}\mat{R}_{B}^{T}\mat{Q}_{B}^T
= \left(\mat{Q}_{A}\mat{U}\right)\mat{D}
\left(\mat{V}^{\dagger}\mat{Q}_{B}^T\right)
.. math::
\mat{A}_{\text{new}} &= \mat{Q}_{A}\mat{U}\sqrt{\mat{D}}, \\
\mat{B}_{\text{new}} &= \mat{Q}_{B}\mat{V}^{*}\sqrt{\mat{D}}.
"""
if at is None:
at = self.at
if 0 == np.prod(at.shape):
# Empty. Do nothing. This should only happen on
# default initialization.
return
if b is None:
b = self.b
if sigma is None:
sigma = self.sigma
if 1 == len(sigma.shape):
sigma = np.diag(sigma)
cholesky = False
if cholesky:
# Cholesky version: avoids working with large arrays.
# This could be optimized using BLAS and LAPACK here for
# both inplace operations and to avoid making copies.
# Right now there is no point in the added complication.
aa = dot(at, at.T)
bb = dot(b, b.T)
la = np.linalg.cholesky(aa)
lb = np.linalg.cholesky(bb)
ra = la.T
rb = lb.T
at = sp.linalg.cho_solve(clow=(lb, True), b=at.T).T
b = sp.linalg.cho_solve(clow=(lb, True), b=b)
else:
qa, ra = np.linalg.qr(at) # at = qa*ra
qb, rb = np.linalg.qr(b.T) # b = rb.T*qb.T
sigma = dot(ra, dot(sigma, rb.T))
if self.use_svd:
u, d_, vt = np.linalg.svd(sigma)
max_inds = np.where(d_/d_.max() >=
self.dynamic_range)[0]
if 0 < len(max_inds):
max_ind = min(max_inds[-1] + 1, self.n_max)
else:
max_ind = self.n_max
max_ind = min(max_ind, len(u))
at = dot(qa, u[:, :max_ind])
b = dot(vt[:max_ind, :], qb.T)
sigma = d_[:max_ind]
else:
at = qa
b = qb.T
self.__dict__['at'] = at
self.__dict__['b'] = b
self.__dict__['sigma'] = sigma
[docs] def asarray(self):
"""Return the matrix.
Examples
--------
>>> M = DyadicSum()
>>> M.asarray()
array(1.0)
>>> M.add_dyad([1, 0], [1, 0])
>>> M.add_dyad([0, 2], [1, 0])
>>> M.add_dyad([3, 0], [0, 1])
>>> M.add_dyad([0, 4], [0, 1])
>>> M.asarray()
array([[ 2., 3.],
[ 2., 5.]])
"""
if len(self.b) == 0:
return np.array(1.0)
if 1 == len(self.sigma.shape):
M = dot(self.at*self.sigma, self.b)
else:
M = dot(self.at, dot(self.sigma, self.b))
return M + np.eye(*M.shape)
[docs] def diag(self, k=0):
r"""Return the diagonal of the matrix.
Examples
--------
>>> M = DyadicSum()
>>> M.diag()
array(1.0)
>>> M.add_dyad([1, 0], [1, 0])
>>> M.add_dyad([0, 2], [1, 0])
>>> M.add_dyad([3, 0], [0, 1])
>>> M.add_dyad([0, 4], [0, 1])
>>> M.diag()
array([ 2., 5.])
"""
if len(self.b) == 0:
return np.array(1.0)
b = self.b
if 1 == len(self.sigma.shape):
at = self.at*self.sigma
else:
at = dot(self.at, self.sigma)
na = at.shape[0]
nb = b.shape[1]
ka = max(0, -k)
kb = max(0, k)
n = min(na - ka, nb - kb)
d = (at[ka:ka+n, :].T*b[:, kb:kb+n]).sum(axis=0)
if 0 == k:
d += 1
return d
def __mul__(self, x):
r"""Matrix multiplication: Return self*x."""
if 0 == len(self.b):
res = x
else:
shape = x.shape
if 1 == len(shape):
x = x.reshape((len(x), 1))
if 1 == len(self.sigma.shape):
res = x + dot(self.at,
np.multiply(self.sigma[:, None],
dot(self.b, x)))
else:
res = x + dot(self.at, dot(self.sigma,
dot(self.b, x)))
res = res.reshape(shape)
return res
def __rmul__(self, x):
r"""Matrix multiplication: Return x*self."""
if 0 == len(self.b):
res = x
else:
shape = x.shape
if 1 == len(shape):
x = x.reshape((1, len(x)))
if 1 == len(self.sigma.shape):
res = x + dot(
np.multiply(dot(x, self.at),
self.sigma[None, :]), self.b)
else:
res = x + dot(dot(dot(x, self.at),
self.sigma), self.b)
res = res.reshape(shape)
return res
def __getitem__(self, key):
r"""Minimal support of two-dimensional indexing.
Examples
--------
>>> np.random.seed(1)
>>> d = DyadicSum()
>>> for n in xrange(10):
... d.add_dyad(np.random.rand(100), np.random.rand(100))
>>> inds = np.s_[[1, 2, 3, 5, 4, -98], 20:-2]
>>> np.allclose(d[inds],
... d.asarray()[inds])
True
>>> inds = np.array([1, 3, 4, 2])
>>> np.allclose(d[inds[:, None], inds[None, :]],
... d.asarray()[inds[:, None], inds[None, :]])
True
.. warning:: Assumes that the indexing is for extracting
sub-arrays. This is different from numpy.
>>> np.allclose(d[inds, inds],
... d.asarray()[inds, inds])
False
>>> np.allclose(d[inds, inds],
... d.asarray()[inds[:, None], inds[None, :]])
True
"""
if 2 == len(key):
at = self.at[key[0], :]
b = self.b[: , key[1]]
if 2 < len(b.shape):
b = np.rollaxis(b, 0, -1)
if 1 == len(self.sigma.shape):
res = dot(at*self.sigma, b).squeeze()
else:
res = dot(at, dot(self.sigma, b)).sqeeze()
if res.shape == ():
res = res.reshape((1, 1))
ka = np.arange(self.at.shape[0])[key[0]].ravel()
kb = np.arange(self.b.shape[1])[key[1]].ravel()
res[np.where(ka[:, None] == kb[None, :])] += 1
else:
raise ValueError(
"% only supports two-dimensional indexing. Got %s" %
(self.__class__.__name__, repr(key)))
return res
[docs]class JinvExtendedBroydenFull(StateVars):
r"""Class to represent an inverse Jacobian for the Extended
Broyden method using full matrices (for small problems only).
"""
implements(IJinv)
_state_vars = [
('curr', None,\
r"""Index of current point in :attr`:x`. The Jacobian
was updated to be valid at this point. In the process of adding a
point, the current point may be pushed off of the stack (since only
:attr:`n_prev` points are kept. In this case, :attr:`curr` will be
``curr < -len(xs)`` (will only happen if :meth:`add_point` is
passed the flag `replace=False`."""),
('xs', [],\
r"List of previous steps $x_i$ (newest last)."),
('gs', [],\
r"List of previous objectives $g_i = G(x_i)$."),
('j_inv', None, r"""Current approximation."""),
('j_inv0', 1.0,\
r"""Convergent approximation $\mat{B}_0$ to pull back
to."""),
('method', 'J',\
r"""`'J'` for Broyden's good method 1a) and `'B'` for the \"bad\"
method 2b)."""),
('n_prev', 1000, r"""Number $n$ of previous steps to
keep and use."""),
('max_j_inv', 100.0,
"""If :attr:`robust`, then the inverse Jacobian is clipped
so that the maximum value is this."""),
('min_j_inv', 0.0001,
"""The inverse Jacobian can become singular. If this is
the case, then a step might have very small length even if
the error is not small (the step size can be reduced by a
factor of the minimum singular value of the inverse
Jacobian). If the step reduction is less than this value,
then the step is scaled up to have length
:attr:`min_step`."""),
('_eps', _EPS,\
r"""Two points in :attr:`xs` are considered equal if
the norm of their difference is less than this. No two
points in :attr:`xs` will have a distance less than
this."""),
('_x0', None, "Internal storage for :attr:`x0`"),
('_g0', None, "Internal storage for :attr:`g0`"),
]
process_vars()
def __init__(self, *v, **kw):
self.j_inv0 = np.asarray(self.j_inv0)
if self.j_inv is None:
self.j_inv = self.j_inv0
else:
self.j_inv = np.asarray(self.j_inv)
if 0 < len(self.xs):
N = len(self.xs[0])
if (self.j_inv.shape != (N, N) or
self.j_inv0.shape != (N, N)):
one = np.eye(N, N)
self.j_inv0 = np.dot(self.j_inv0, one)
self.j_inv = np.dot(self.j_inv, one)
@property
def x(self):
if self.xs != [] and self.curr is not None:
return self.xs[self.curr]
else:
return None
@property
def g(self):
if self.gs != [] and self.curr is not None:
return self.gs[self.curr]
else:
return None
@property
def x0(self):
if self._x0 is None:
return self.x
else:
return self._x0
@property
def g0(self):
if self._g0 is None:
return self.g
else:
return self._g0
def get_w(self, dXdX, dGdG, x_scale=1):
r"""Return weights `w, w_0, w_r`.
Parameters
----------
x_scale : array, optional
Typical scale over which `g(x)` is expected to be linear. This is
used to calculate the weights `w`:
.. math::
w_{mn} = \delta_{mn}
\frac{\texttt{x\_scale}_n}
{\norm{\vect{x} - \vect{x}_n}}.
Thus, thus is greater than one for points where the approximation
should be linear and less than one elsewhere. This only really
affects the dynamic restart mechanism where `w_0=1`. In
this case, components where `w/w_0 > 1` will be maintained while
other components will be restarted.
Returns
-------
w : array
Array of weights for `dX`. This is $\mat{w}$.
w_r : array
Pull back weight pulling the Jacobian inverse back to the previous
approximation $\mat{B}_{-1}$. This is $w_r$ (assumes $\mat{\Gamma} =
w_r\uvect{G}_{\perp}$ for example)). As with the regular weights,
this only plays a role if we are employing a dynamic restart, in
which case the Jacobian is also pulled back toward the previous value
instead of to $\mat{B}_0$ for larger values of `w_r/w_0`.
Typically this is small so it only serves to project the inverse when
the other weights are zero.
"""
w = np.diag(
np.sqrt(np.divide(1.0,
np.diag(dXdX)/x_scale**2 + _SINGULAR_TOL)))
#w[-1,-1] = 1
#if 1 < len(w):
# w[-2, -2] = 1
#if 2 < len(w):
# w[-3, -3] = 1
#w = dXdX[-1,-1]/(np.diag(dXdX) + _SINGULAR_TOL)
w_r = np.sqrt(_SINGULAR_TOL)
return w, w_r
def __mul__(self, dg, mutate=False):
r"""Return the matrix multiplication `j_inv*dg`.
Parameters
----------
dg : array
This should be in column vector format with dimension
`dg.shape = (N,_)`.
mutate : bool
If `True`, then `dg` will be mutated, otherwise a copy will
be returned. (Note: with the present implementation of
:func:`numpy.dot`, one copy of `dg` will be made).
"""
res = np.dot(self.j_inv, dg)
if mutate:
# Should use BLAS if we can... don't know how yet.
dg[::] = res
return res
def __rmul__(self, dxt, mutate=False):
r"""Return the matrix multiplication `dxt*j_inv`.
Parameters
----------
dxt : array
This should be in row vector format with dimension
`dg.shape = (_,N)`.
mutate : bool
If `True`, then `dxt` will be mutated, otherwise a copy will
be returned. (Note: with the present implementation of
:func:`numpy.dot`, one copy of `dxt` will be made).
"""
res = np.dot(dxt, self.j_inv)
if mutate:
# Should use BLAS if we can... don't know how yet.
dxt[::] = res
return res
def reset(self):
r"""Reset the Jacobian to a diagonal form."""
self.j_inv = self.j_inv0
def set_point(self, x, g, **kw):
r"""Add `x` and `g` and make this the current point `(x, g)` and update
the Jacobian to be valid here.
Parameters
----------
x, g : arrays
"""
x0 = self.x
g0 = self.g
x = np.asarray(x).ravel()
g = np.asarray(g).ravel()
self.curr = self.add_point(x, g)
self.update(**kw)
return self.incompatibility(x, g, x0, g0)
def add_point(self, x, g, replace=True):
r"""Add the point `(x, g)` to the Jacobian information. Does not
necessarily update the Jacobian (use :meth:`update`) but may. Does not
change the current point (use :meth:`set_point`).
Parameters
----------
x, g : arrays
replace : bool (optional)
If `True`, then any points in :attr:`xs` with distance less
than :attr:`_eps` from `x` (`inf` norm) are removed before
the new point is added. (Could be useful if points need to
be recomputed at a higher tolerance for example.)
If `False`, then new point is only added if there are no
points within distance :attr:`_eps` from `x` and a
:exc:`ValueError` exception is raised if `g` is not within
distance :attr:`_eps` from the corresponding value in
:attr:`gs`. The old points are not moved in this case.
"""
x = np.asarray(x).ravel()
g = np.asarray(g).ravel()
xs = self.xs
gs = self.gs
norms = np.array([abs(x - _x).max() for _x in self.xs])
inds = np.where(norms < self._eps)[0]
if 0 == len(inds):
xs.append(x)
gs.append(g)
curr = -1
else:
curr = inds[-1] - len(xs)
if replace:
for ind in reversed(inds):
del xs[ind]
del gs[ind]
xs.append(x)
gs.append(g)
curr = -1
else:
g_inds = np.where([abs(g - _g) for _g in self.gs[inds]])[0]
if 0 == len(g_inds):
pass
else:
raise ValueError(
("New `x` matches current `xs%s` but `g` does " +
"not match `gs%s`") % (str(inds), str(g_inds)))
self.suspend()
n_prev = self.n_prev
self.xs = self.xs[-n_prev:]
self.gs = self.gs[-n_prev:]
self.resume()
return curr
def update(self, x0=None, g0=None,
skip_inds=None, include_skip=False, mags=1, warn=False,
restart=False, x_scale=1.0):
r"""Update the inverse Jacobian so that it is valid at the
point `(x, g)` (the current point by default). Does not necessarily add
`x` and `g` (use :meth:`add_point`) or set the current point (use
:meth:`set_point`) but may do either or both depending on the algorithm.
Return a measure of the incompatibility of the new data with
the current approximation. (Not implemented yet.)
Parameters
----------
restart : bool
If `True`, then the weight `w_0=1` is used to dynamically restart the
calculation by pulling the inverse Jacobian back to `B_0`.
Examples
--------
Here we perform some tests with a linear function
>>> np.random.seed(0)
>>> N = 4
>>> J = np.random.rand(N, N)
>>> x0 = np.random.rand(N)
>>> x = np.random.rand(N)
>>> def g(x):
... return np.dot(J, x - x0)
>>> B = JinvExtendedBroydenFull(method='J')
>>> B.add_point(x0, g(x0))
-1
>>> B.set_point(x, g(x))
0
>>> np.allclose(B*(g(x) - g(x0)), x - x0)
True
>>> for n in xrange(N):
... _x = np.random.rand(N)
... incompatibility = B.set_point(_x, g(_x));
>>> np.allclose(B.j_inv, np.linalg.inv(J))
True
>>> B = JinvExtendedBroydenFull(method='B')
>>> B.add_point(x0, g(x0))
-1
>>> B.set_point(x, g(x))
0
>>> np.allclose(B*(g(x) - g(x0)), x - x0)
True
>>> for n in xrange(N):
... _x = np.random.rand(N)
... incompatibility = B.set_point(_x, g(_x));
>>> np.allclose(B.j_inv, np.linalg.inv(J))
True
"""
if x0 is None:
x0 = self.x
g0 = self.g
self._x0 = None
self._g0 = None
else:
x0 = self._x0 = np.asarray(x0).ravel()
g0 = self._g0 = np.asarray(g0).ravel()
inds = np.where(np.array([abs(x0 - _x).max() for _x in self.xs])
>= self._eps)[0]
if 0 == len(inds):
# Can't update as there is no information
pass
else:
j_inv = np.asarray(self.j_inv)
j_inv0 = np.asarray(self.j_inv0)
dX = x0[:,None] - np.vstack(self.xs).T[:,inds]
dG = g0[:,None] - np.vstack(self.gs).T[:,inds]
one = np.eye(len(x0))
# get weights and apply
dXdX = np.dot(dX.T, dX)
dGdG = np.dot(dG.T, dG)
w, w_r = self.get_w(dGdG=dGdG, dXdX=dXdX, x_scale=x_scale)
if restart:
w_0 = 1.0
else:
w_0 = 0.0
dG = np.dot(dG, w)
dX = np.dot(dX, w)
BdG = np.dot(j_inv, dG)
numer = dX - BdG
# Form denominator and correct numerator
if 'J' == self.method:
# Broyden's good method
_U, _d, _Vt = np.linalg.svd(dX, full_matrices=True)
Gamma = _U[:,len(np.where(_d >= _SINGULAR_TOL)[0]):]
# j_inv*J0
_tmp = np.linalg.solve(j_inv0.T, j_inv.T).T
denom = (np.dot(BdG, dX.T)
+ w_r**2*np.dot(Gamma, Gamma.T)
+ w_0**2*_tmp)
numer = np.dot(numer, dX.T) + (one - _tmp)*w_0**2
else:
# Broyden's "bad" method
_U, _d, _Vt = np.linalg.svd(dG, full_matrices=True)
Gamma = _U[:,len(np.where(_d >= _SINGULAR_TOL)[0]):]
denom = (np.dot(dG, dG.T)
+ w_r**2*np.dot(Gamma, Gamma.T)
+ w_0**2*one)
numer = np.dot(numer, dG.T) + (j_inv0 - j_inv)*w_0**2
# Invert denominator and multiply using SVD. This is where the
# appropriate choice of Gamma must be made.
u, d, vt = np.linalg.svd(denom)
di = np.divide(1, d)
bad_inds = np.where(d < _SINGULAR_TOL)
if False:
# Subspace decomposition method
di[bad_inds] = 1
else:
# Subspace ignoring method???
di[bad_inds] = 0
denom_inv = np.dot(vt.T*di, u.T)
if 'J' == self.method:
# Broyden's "good" method
denom_inv = np.dot(denom_inv, j_inv)
j_inv_new = j_inv + np.dot(numer, denom_inv)
# Remove bad components.
u, d, vt = np.linalg.svd(j_inv_new)
bad_inds = np.where(np.logical_or(d < _SINGULAR_TOL,
d > 1/_SINGULAR_TOL))
d[bad_inds] = 1
j_inv_new = np.dot(u*d, vt)
# Update variables
self.j_inv = j_inv_new
def incompatibility(self, x, g, x0, g0):
r"""Return a dimensionless measure of the compatibility of a step from
`x0` to `x` with the current Jacobian. An incompatibility of
0 means perfectly compatible (this is what should happen if
the function is linear and the Jacobian is correct along the
specified step).
"""
return 0
raise NotImplementedError
def asarray(self):
"""Return the Jacobian matrix.
Examples
--------
>>> M = JinvExtendedBroydenFull()
>>> M.asarray()
array(1.0)
"""
return np.array(self.j_inv)
def change_basis(self, f, f_inv=None):
if 0 == len(self.xs):
return
self.suspend()
# gs must be G(xs), can't use transformed value.
self.xs = []
self.gs = []
u, d, vt = np.linalg.svd(self.j_inv)
u = f(u)
if f_inv is None:
vt = f(vt.T).T
else:
vt = f_inv(vt)
self.j_inv = np.dot(u*d, vt)
u, d, vt = np.linalg.svd(self.j_inv0)
u = f(u)
if f_inv is None:
vt = f(vt.T).T
else:
vt = f_inv(vt)
self.j_inv0 = np.dot(u*d, vt)
self.resume()
def get_x(self, x_controls=None, min_step=_EPS, max_step=np.inf):
r"""Return `x`: the Broyden step to get to the root of `G(x)`.
This is the downhill Newton step from the point specified by
:attr:`curr` that satisfies the constraints `x_controls`.
Parameters
----------
x_controls : (array, array), optional
Tuple `(inds, vals)` indices and the respective values of the
control parameters.
min_step : float, optional
Minimum step length for testing. Used to give some
non-zero step `dx` of length `sqrt(min_step) even if
`j_inv` is singular. This may be random. A warning will
be raised.
max_step : float, optional
Maximum length of the step.
Examples
--------
Here is a little test with a linear function
>>> j = np.array([[1., 2, 3], [1, -3, 2], [-2, 1, 4]])
>>> j_inv = JinvExtendedBroydenFull(j_inv=np.linalg.inv(j))
>>> sol = np.array([1, 2, 3])
>>> def G(x):
... return np.dot(j, x - sol)
>>> x0 = np.array([0.0, 0.0, 0.0])
>>> x = np.array([1.0, 2.0, 3.0])
Now try to step to the solution, but hold the second variable as
a control at 1. Since the function is exactly linear and we
have explicitly given the correct Jacobian, we should end up
with a root in the other two parameters.
>>> j_inv.set_point(x=x, g=G(x))
0
>>> x = j_inv.get_x(x_controls=([1], [1.0]))
>>> x
array([ 1.5, 1. , 3.5])
>>> np.allclose(G(x), [0., 4.5, 0.])
True
This needs to work with very large arrays, so here is an
example. Here we use an example where the vectors have length
$2^{20}$ and we have $2^{11} = 2048$ controls. Implemented poorly,
the arrays would be too large to be indexed.
>> np.random.seed(0)
>> n_x = 2**20
>> n_d = 2**11
>> a = np.random.random(n_x)
>> b = np.random.random(n_x)
>> g = np.random.random(n_x)
>> x = np.zeros(n_x)
>> inds = np.unique(np.random.randint(0, n_x, size=n_d))
>> n_d = len(inds)
>> vals = np.random.rand(n_d)
>> x_controls = (inds, vals)
>> j_inv = JinvExtendedBroydenFull(dyadic=True)
>> j_inv.j_inv.add_dyad(a, b)
>> j_inv.set_point(x=a, g=b)
0.0
>> x = j_inv.get_x(x_controls=x_controls)
>> np.allclose(x[inds], vals)
True
"""
if 0 == len(self.xs):
raise ValueError("No `xs` stored: call update first.")
x = self.x
g = self.g
j_inv = self.j_inv
if x_controls is not None and 0 < len(x_controls[0]):
inds, vals = x_controls
inds = np.asarray(inds, dtype=int)
vals = np.asarray(vals)
dg_dmu = copy.copy(g)
d = j_inv[inds[:, None], inds[None, :]]
dg_dmu[inds] = x[inds] - vals
dmu = _ket(dg_dmu[inds])
b = (np.asarray(j_inv*_ket(dg_dmu)))[inds, :]
dN = dmu - np.linalg.solve(d, b - dmu)
dg_dN = dg_dmu
dg_dN[inds] = dN[:]
dx_dmu = np.asarray(j_inv*_ket(dg_dN))
assert np.allclose(dx_dmu[inds], dmu)
# Set these to make sure no round-off errors have
# accumulated.
dx_dmu[inds].flat = x[inds] - vals
dx = -dx_dmu
else:
dx = -np.asarray(j_inv*_ket(g))
dx = np.asarray(dx).ravel()
g_mag = np.linalg.norm(g)
dx_mag = np.linalg.norm(dx)
if dx_mag == 0 and g_mag > 0:
warnings.warn("Warning: Singular Jinv, taking random step")
dx = np.random.rand(len(dx))*np.sqrt(min_step)
elif np.divide(dx_mag, g_mag) < self.min_j_inv:
warnings.warn("Warning: Singular Jinv, taking minimum step")
dx *= math.sqrt(min_step)/np.linalg.norm(dx)
dx_len = math.sqrt((dx*dx).sum())
if max_step < dx_len:
dx *= max_step/dx_len
return x + dx
[docs]class _JinvExtendedBroyden(StateVars):
r"""Class to represent an inverse Jacobian for the Extended
Broyden method. Presently, this class does not do any of the
special processing to minimize the memory footprint described in
the module description.
We represent the inverse Jacobian as:
.. math::
\mat{B} = \mat{1} +
\ket{\uvect{\alpha}}\mat{b}\bra{\uvect{\beta}}
Notes
-----
Since python is implemented in C, the arrays are typically stored
as `C_CONTIGUOUS` and :func:`numpy.dot` is efficient if both
arguments are `C_CONTIGUOUS`, otherwise copies are made. Thus,
where one needs to compute $\braket{\mat{A}|\mat{B}}$, one should
store the transpose of the first matrix $\bra{\mat{A}}$.
To Do
-----
.. todo:: Reduce memory footprint. In particular, one can make
use of set of previous points $\ket{\mat{X}}$ and function
evaluations $\ket{\mat{G}}$ to reduce the space required by
the dyadic basis $\ket{\uvect{\alpha}}$ and $\ket{\uvect{\beta}}$.
"""
implements(IJinv)
_state_vars = [
('X', Required,
r"$N\times n$ array $\ket{\mat{X}}$ of previous steps $x_i$."),
('G', Required,
r"$N\times n$ array $\ket{\mat{G}}$ of objectives $G(x_i)$."),
('_jinv', Delegate(DyadicSum)),
('_b', None,
r"Dyadic portion of the inverse Jacobian in the implied basis."),
('method', 1,
r"""`1` for Broyden's good method and `2` for the \"bad\"
method."""),
('n_prev_steps', np.inf, r"""Number $n$ of previous steps to
keep and use."""),
('_check', True, r"""If `True`, perform some potentially
expensive sanity checks."""),
]
process_vars()
[docs] def __init__(self, *v, **kw):
N, n = self.X.shape
assert self.G.shape == self.X.shape
if self._alpha is None:
self._alpha = np.zeros((N, 0), dtype=float)
self._betat = np.zeros((0, N), dtype=float)
m = self._alpha.shape[1]
assert m == self._betat.shape[0]
assert self._alpha.shape == (N, m)
assert self._betat.shape == (m, N)
if self._check:
# Check orthonormality of basese
assert np.allclose(np.dot(self.betat, self.alpha),
np.eye(m))
if self._b is None:
self._b = np.zeros((m, m), dtype=float)
assert self._b.shape == (m, m)
def __mul__(self, dg, mutate=False):
r"""Return the matrix multiplication `B*dg`.
Parameters
----------
dg : array
This should be in column vector format with dimension
`dg.shape = (N,_)`.
mutate : bool
If `True`, then `dg` will be mutated, otherwise a copy will
be returned. (Note: with the present implementation of
:func:`numpy.dot`, one copy of `dg` will be made).
Notes
-----
.. math::
\mat{B}\ket{\d{g}} = \ket{\d{g}} +
\begin{pmatrix}
\ket{G - X}\mat{q}_{a} &
\ket{\uvect{a}}
\end{pmatrix}
\mat{b}
\begin{pmatrix}
\mat{q}_{b}\bra{X,G}\\
\bra{\uvect{b}}
\end{pmatrix}\ket{\d{g}}
"""
dg = np.asarray(dg)
if 1 == len(dg.shape):
dg = dg[:, None]
na = self._qa.shape[1]
ma = self._a.shape[1]
if 1 == self.method:
# The default dot copies dg here, so hopefully it is small.
XGdg = dot(dg.T, self.X).T
else:
assert 2 == self.method
XGdg = dot(dg.T, self.G).T
tmp = dot(self._b, np.vstack([dot(self._qb, XGdg),
dot(self._bt, dg)]))
tmp1 = dot(self._qa, tmp[:na,:])
tmp2 = tmp[na:,:]
if mutate:
res = dg
else:
res = 1*dg
res += np.dot(self.G, tmp1)
res -= np.dot(self.X, tmp1)
res += np.dot(self._a, tmp2)
return res
def __rmul__(self, dxt, mutate=False):
r"""Return the matrix multiplication `dxt*B`.
Parameters
----------
dxt : array
This should be in row vector format with dimension
`dg.shape = (_,N)`.
mutate : bool
If `True`, then `dxt` will be mutated, otherwise a copy will
be returned. (Note: with the present implementation of
:func:`numpy.dot`, one copy of `dxt` will be made).
Notes
-----
.. math::
\bra{\d{x}}\mat{B} = \bra{\d{x}} +
\bra{\d{x}}
\begin{pmatrix}
\ket{G - X}\mat{q}_{a} &
\ket{\uvect{a}}
\end{pmatrix}
\mat{b}
\begin{pmatrix}
\mat{q}_{b}\bra{X,G}\\
\bra{\uvect{b}}
\end{pmatrix}
"""
nb = self._qb.shape[0]
mb = self._bt.shape[0]
tmp = dot(np.hstack([dot(dot(dxt, self.G) - dot(dxt, self.X),
self._qa),
dot(dxt, self._a)]), self._b)
tmp1 = dot(tmp[:, :nb], self._qb)
tmp2 = tmp[:, nb:]
if 1 == self.method:
# The default dot copies dg here, so hopefully it is small.
tmp1 = dot(self.X, tmp1.T).T
else:
assert 2 == self.method
tmp1 = dot(self.G, tmp1.T).T
if mutate:
res = dxt
else:
res = 1*dxt
res += tmp1
res += np.dot(tmp2, self._bt)
return res
[docs] def update(self, x0=None, g0=None):
r"""Update the inverse Jacobian so that it is valid at the
point `(x, g)`.
Return a measure of the incompatibility of the new data with
the current approximation.
Examples
--------
Here we perform some tests with a linear function
>>> np.random.seed(0)
>>> N = 4
>>> J = np.random.rand(N, N)
>>> x0 = np.random.rand(N)
>>> def g(x):
... return np.dot(J, x - x0)
>>> B = _JinvExtendedBroyden(X=np.zeros((N,1)),
... G=g(0*x0)[:,None])
>>> x = np.random.rand(N)
>>> B.update(x, g(x))
>>> np.allclose(B*(g(x) - g(x0)), x - x0)
True
>>> for n in xrange(N):
... x = np.random.rand(N)
... B.update(x, g(x))
Notes
-----
Our goal here is to minimize the amount of processing of large
matrices. As such, we try to work with projections as much as
possible. We start with the following large matrices:
$\ket{X}$, $\ket{G}$, $\ket{x}$, $\ket{g}$, $\ket{a_0}$,
$\ket{b_0}$.
We use the following:
.. math::
\ket{\delta\mat{x}} &= \ket{x} - \ket{X_0}, \qquad
\ket{\delta\mat{g}} = \ket{g} - \ket{G_0},\\
\ket{\d\mat{X}} &= \ket{\mat{X}} - \ket{x}, \qquad
\ket{\d\mat{G}} = \ket{\mat{G}} - \ket{g},\\
\ket{\uvect{\alpha}_0} &= \begin{pmatrix}
\ket{\mat{G} - \mat{X}}\mat{q}_{a} &
\ket{\mat{A}_0}
\end{pmatrix},\\
\bra{\uvect{\beta}_0} &= \begin{pmatrix}
\mat{q}_{b}\bra{\mat{X}, \mat{G}} \\
\bra{\mat{B}_0}
\end{pmatrix},\\
\ket{\uvect{\alpha}} &= \begin{pmatrix}
\ket{\uvect{\alpha}_0} &
\ket{\uvect{\alpha}_1}
\end{pmatrix},\\
\bra{\uvect{\beta}} &= \begin{pmatrix}
\bra{\uvect{\beta}_0}\\
\bra{\uvect{\beta}_1}
\end{pmatrix},\\
\ket{\uvect{\alpha}_1} &=
\ket{\uvect{\alpha}_0}\mat{c}_{a} +
\ket{\delta{g} - \delta{x}}\mat{d}_{a},\\
\bra{\uvect{\beta}_1} &=
\mat{c}_{b}^{T}\bra{\uvect{\beta}_0} +
\mat{d}_{b}^{T}\bra{\delta{x}, \delta{g}},\\
\mat{d}_{a} &= \mat{v}_{a}\mat{s}_{a}^{-1},\qquad
\mat{c}_{a} =
-\braket{\uvect{\alpha}_0|\delta{g} - \delta{x}}
\mat{v}_{a}\mat{s}_{a}^{-1},\\
\mat{d}_{b} &= \mat{v}_{b}\mat{s}_{b}^{-1},\qquad
\mat{c}_{b} =
-\braket{\uvect{\beta}_0|\delta{x},\delta{g}}
\mat{v}_{b}\mat{s}_{b}^{-1},\\
\mat{v}_{a}\mat{s}_{a}^2\mat{v}_{a}^{T} &=
\braket{\delta{g} - \delta{x}|\delta{g} - \delta{x}}
- \braket{\delta{g} - \delta{x}|\uvect{\alpha}_0}
\braket{\uvect{\alpha}_0|\delta{g} - \delta{x}},\\
\mat{v}_{b}\mat{s}_{b}^2\mat{v}_{b}^{T} &=
\braket{\delta{x},\delta{g}|\delta{x},\delta{g}}
- \braket{\delta{x},\delta{g}|\uvect{\beta}_0}
\braket{\uvect{\beta}_0|\delta{x},\delta{g}},\\
\braket{\d\mat{G}-\d\mat{X}|\uvect{\alpha}} &= \begin{pmatrix}
\braket{\d\mat{G}-\d\mat{X}|\uvect{\alpha}_0} &
\braket{\d\mat{G}-\d\mat{X}|\uvect{\alpha}_1}
\end{pmatrix},\\
\braket{\d\mat{G} - \d\mat{X}|\uvect{\alpha}_0} &= \begin{pmatrix}
\braket{\d\mat{G} - \d\mat{X}|\mat{G} - \mat{X}}\mat{q}_{a} &
\braket{\d\mat{G} - \d\mat{X}|\mat{A}_0}
\end{pmatrix},\\
\braket{\d\mat{G} - \d\mat{X}|\uvect{\alpha}_1} &=
\braket{\d\mat{G} - \d\mat{X}|\uvect{\alpha}_0}\mat{c}_{a} +
\braket{\d\mat{G} - \d\mat{X}|\delta{g}-\delta{x}}\mat{d}_{a},\\
\braket{\d\mat{X},\d\mat{G}|\uvect{\beta}} &= \begin{pmatrix}
\braket{\d\mat{X},\d\mat{G}|\uvect{\beta}_0} &
\braket{\d\mat{X},\d\mat{G}|\uvect{\beta}_1}
\end{pmatrix} ,\\
\braket{\d\mat{X},\d\mat{G}|\uvect{\beta}_0} &= \begin{pmatrix}
\braket{\d\mat{X},\d\mat{G}|\mat{X},\mat{G}}\mat{q}_{b}^{T} &
\braket{\d\mat{X},\d\mat{G}|\mat{B}_0}
\end{pmatrix},\\
\braket{\d\mat{X},\d\mat{G}|\uvect{\beta}_1} &=
\braket{\d\mat{X},\d\mat{G}|\uvect{\beta}_0}\mat{c}_{b} +
\braket{\d\mat{X},\d\mat{G}|\delta{x},\delta{g}}\mat{d}_{b},\\
\braket{\uvect{\alpha}|\uvect{\alpha}_0}
&= \braket{\uvect{\beta}|\uvect{\beta}_0} = \begin{pmatrix}
\mat{1}\\ \mat{0}
\end{pmatrix},\\
\braket{\beta|\alpha} &=
\begin{pmatrix}
\mat{q}_{b}\braket{\mat{X}, \mat{G}|
\mat{G} - \mat{X}}\mat{q}_{a} &
\mat{q}_{b}\braket{\mat{X}, \mat{G}|
\mat{G} - \mat{X}}\mat{q}_{a} \\
\end{pmatrix}
"""
# First we cast all of the arrays as matrices so we can
# use * as matrix multiplication. Large matrices are
# preceeded by an underscore.
x = np.asarray(x0)
g = np.asarray(g0)
if 1 == len(x.shape):
x = x[:,None]
if 1 == len(g.shape):
g = g[:,None]
_X = np.asmatrix(self.X)
_G = np.asmatrix(self.G)
_b0 = np.asmatrix(self._bt.T)
_a0 = np.asmatrix(self._a)
qa = np.asmatrix(self._qa)
qb = np.asmatrix(self._qb)
# We assume that g is only one column, so a copy here is okay.
_dg = np.asmatrix(g - _G[:,0])
_dx = np.asmatrix(x - _X[:,0])
_dgdx = _dg - _dx
# Consider optimizing these copies away later
_dG = np.asmatrix(_G - g)
_dX = np.asmatrix(_X - x)
_dGdX = _dG - _dX
_GX = np.asmatrix(_G - _X)
if 1 == self.method:
_dxg = _dx
_XG = np.asmatrix(_X)
_dXG = _dX
else:
_dxg = _dg
_XG = np.asmatrix(_G)
_dXG = _dG
# These are all of the brackets required.
dXdX = _dX.T*_dX
dGdG = _dG.T*_dG
dXG_XG = _dXG.T*_XG
dXG_dxg = _dXG.T*_dxg
dXG_b0 = _dXG.T*_b0
dxg_XG = _dxg.T*_XG
dxg_GX = _dxg.T*_GX
dxg_dxg = _dxg.T*_dxg
dxg_b0 = _dxg.T*_b0
dxg_a0 = _dxg.T*_a0
dGdX_a0 = _dGdX.T*_a0
GX_a0 = _GX.T*_a0
XG_GX = _XG.T*_GX
X_a0 = _X.T*_a0
g_a0 = g.T*_a0
x_a0 = x.T*_a0
XG_a0 = _XG.T*_a0
b0_a0 = _b0.T*_a0
GX_b0 = _GX.T*_b0
dGdX_GX = _dGdX.T*_GX
dGdX_dgdx = _dGdX.T*_dgdx
dgdx_GX = _dgdx.T*_GX
dgdx_XG = _dgdx.T*_XG
dgdx_a0 = _dgdx.T*_a0
dxg_dgdx = _dxg.T*_dgdx
dgdx_dgdx = _dgdx.T*_dgdx
dgdx_b0 = _dgdx.T*_b0
# Now we form the small matrices
dxg_beta0 = np.hstack([dxg_XG*qb.T, dxg_b0])
ub, sb, _ = np.linalg.svd(dxg_dxg - dxg_beta0*dxg_beta0.T)
# Only keep significant basis vectors
inds = np.where(sb >= _SINGULAR_TOL)[0]
sb = sb[inds]
ub = ub[:, inds]
db = np.asmatrix(ub/(np.sqrt(sb)[None, :] + _SINGULAR_TOL**2))
cb = -dxg_beta0.T*db
dXG_beta0 = np.hstack([dXG_XG*qb.T, dXG_b0])
dXG_beta1 = dXG_beta0*cb + dXG_dxg*db
dXG_beta = np.hstack([dXG_beta0, dXG_beta1])
dGdX_a0 = GX_a0 - (g_a0 - x_a0)
dgdx_alpha0 = np.hstack([dgdx_GX*qa, dgdx_a0])
ua, sa, _ = np.linalg.svd(dgdx_dgdx -
dgdx_alpha0*dgdx_alpha0.T)
# Only keep significant basis vectors (assuming everything has
# been scaled to be of unit norm here...)
inds = np.where(sa >= _SINGULAR_TOL)[0]
sa = sa[inds]
ua = ua[:, inds]
da = np.asmatrix(ua/(np.sqrt(sa)[None, :] + _SINGULAR_TOL**2))
ca = -dgdx_alpha0.T*da
dGdX_alpha0 = np.hstack([dGdX_GX*qa, dGdX_a0])
dGdX_alpha1 = dGdX_alpha0*ca + dGdX_dgdx*da
dGdX_alpha = np.hstack([dGdX_alpha0, dGdX_alpha1])
# the basis vectors alpha and beta are stacked from three sets
# with sizes n, m0 and (m-m0)
N, n_ = self.X.shape
na0, na = self._qa.shape
nb, nb0 = self._qb.shape
ma0 = self._a.shape[1] # Previous size of aux space
mb0 = self._bt.shape[0]
ma = ma0 + ca.shape[1] # New size of aux space
mb = mb0 + cb.shape[1]
eye0 = np.asmatrix(np.eye(na + ma0, nb + mb0, dtype=float))
eye1 = np.asmatrix(np.eye(na + ma, nb + mb, dtype=float))
alpha_alpha0 = np.asmatrix(np.eye(na + ma, na + ma0, dtype=float))
beta_beta0 = np.asmatrix(np.eye(nb + mb, nb + mb0, dtype=float))
beta0_dgdx = np.vstack([qb*dgdx_XG.T,
dgdx_b0.T])
dxg_alpha0 = np.hstack([dxg_GX*qa, dxg_a0])
beta0_alpha0 = np.bmat(
[[qb*XG_GX*qa, qb*XG_a0],
[GX_b0.T*qa, b0_a0]])
beta0_alpha1 = beta0_alpha0*ca + beta0_dgdx*da
beta1_alpha0 = cb.T*beta0_alpha0 + db.T*dxg_alpha0
beta1_alpha1 = (cb.T*(beta0_alpha0*ca + beta0_dgdx*da) +
db.T*(dxg_alpha0*ca + dxg_dgdx*da))
beta_alpha = np.bmat([[beta0_alpha0, beta0_alpha1],
[beta1_alpha0, beta1_alpha1]])
############################################################
# Now we have the bases and all the required matrices, so we
# do the real work.
def inv(A):
if 0 == np.prod(A.shape):
# Empty matrix needs a degenerate case
return A.T
return np.linalg.pinv(A, rcond=_SINGULAR_TOL**2)
w, w0, wr = self.get_w(dXdX, dGdG)
if w is not None:
w = np.asarray(w)
if 1 == len(w.shape):
dGdX_alpha *= w[:, None]
dXG_beta *= w[:, None]
else:
dGdX_alpha = dot(w.T, dGdX_alpha)
dXG_beta = dot(w.T, dXG_beta)
if 1 == self.method:
num = dGdX_alpha.T*dXG_beta
if self._b is not None:
j0 = -inv(inv(self._b) + beta0_alpha0)
num += w0*w0*alpha_alpha0*j0*beta_beta0.T
denom = dXG_beta.T*dXG_beta + (w0*w0 + wr*wr)*np.eye(nb+mb)
j = num*inv(denom)
b = -inv(inv(j) + beta_alpha)
elif 2 == self.method:
num = -dGdX_alpha.T*dXG_beta
if self._b is not None:
num += w0*w0*alpha_alpha0*self._b*beta_beta0.T
denom = dXG_beta.T*dXG_beta + (w0*w0 + wr*wr)*np.eye(nb+mb)
b = num*inv(denom)
# Set singular values to zero so they don't contribute.
if 0 == np.prod(b.shape):
# There is no dyadic contribution.
b = None
_a = None
_bt = None
else:
u, d, vt = np.linalg.svd(b, full_matrices=False)
bad_inds = np.where(np.logical_or(d < _SINGULAR_TOL,
1/_SINGULAR_TOL < d))[0]
d[bad_inds] = 0.0
b = dot(np.multiply(u,d[None, :]), vt)
# Now that we have b, we perform an svd in the auxiliary basis to
# determine what basis vectors to keep.
qa_, qbt_ = self._restrict(b, na, nb)
b[na:, nb:] = qa_*b[na:, nb:]*qbt_
# na + ma0 ma - ma0
# |alpha> = [alpha0, alpha0*ca + dgdx*da]
# = [GX*qa, [a0, [GX*qa, a0]*ca + dgdx*da]]
# = [GX*qa, A_new]
# = [GX*qa, [GX, a0, dgdx]*A]
# |A_new> = [GX, a0, dgdx]*A
# A_[na_ + ma0 + (ma - ma0), ma = ma0 + (ma-ma0)]
# A = [[0, qa*ca]
# [1, ca]
# [0, da]]
A = np.bmat(
[[np.zeros((na0, ma0)), qa*ca[:na,:]],
[np.eye(ma0, ma0), ca[na:,:]],
[np.zeros((da.shape[0], ma0)), da]])
Bt = np.bmat(
[[np.zeros((nb0, mb0)), qb.T*cb[:nb,:]],
[np.eye(mb0, mb0), cb[nb:,:]],
[np.zeros((db.shape[0], mb0)), db]]).T
A = A*qa_.T
Bt = qbt_.T*Bt
if ma > self.max_aux_basis_size:
ma = self.max_aux_basis_size
A = A[:, :ma]
b = b[:ma, :]
if mb > self.max_aux_basis_size:
mb = self.max_aux_basis_size
Bt = Bt[:mb, :]
b = b[:, :mb]
_a = np.hstack([_GX, _a0, _dgdx])*A
_bt = Bt*np.vstack([_XG.T,
_b0.T,
_dxg.T])
self.suspend()
self._a = _a
self._bt = _bt
self._b = b
# These are not needed: just for checking and debugging
_alpha0 = np.hstack([_GX*qa, _a0])
_alpha1 = _alpha0*ca + _dgdx*da
_alpha = np.hstack([_alpha0, _alpha1])
self.resume()
[docs] def get_w(self, dXdX, dGdG):
r"""Return weights `w, w0, wr`.
Returns
-------
w : array
Array of weights for `dx`.
"""
N, n_ = self.X.shape
ma = self._a.shape[1]
mb = self._bt.shape[0]
na = self._qa.shape[1]
nb = self._qb.shape[0]
w = np.eye(dXdX.shape[0])
w0 = _SINGULAR_TOL
wr = 0
return w, w0, wr
def _restrict(self, b, na=0, nb=0):
r"""Return unitary matrices `(qa, qb.T)` which can be used to
transform `b` so that truncating the matrix will introduce
minimal errors. The first `n0` rows and columns of `b` will
be held fixed.
The present implementation is an approximation: it is not
optimal.
Parameters
----------
b : array of shape (n,n)
Array to be rearranged and truncated.
na, nb : int
Number of rows and columns to be held fixed. If this is
`0`, then the SVD will be used: ``qa, d, qb.T = svd(b)``.
"""
ua, da, vat = np.linalg.svd(b[na:, :])
ub, db, vbt = np.linalg.svd(b[:, nb:])
return ua, vbt
@staticmethod
def _ortho(a0, b):
r"""Return `(A, B, s)` such that
`a1 = dot(a0, A) + dot(b, B)` is orthonormal and
orthogonal to `a0`.
Parameters
----------
a0 : array
The columns of this must be orthonormal.
b : array
Need not be orthonormal.
Returns
-------
A, B : array
s : array
This is the list of singular values for the
transformation. If these are too small, then the
corresponding dimensions should be dropped. I.e. use only
`a1[:, np.where(tol < s)[0]]`
Examples
--------
>>> np.random.seed(0)
>>> a0 = np.linalg.qr(np.random.rand(10, 5))[0][:,:5]
>>> b = np.random.rand(10, 2)
>>> A, B, s = JinvExtendedBroyden._ortho(a0, b)
>>> a1 = np.dot(a0, A) + np.dot(b, B)
>>> np.allclose(np.dot(a1.T, a0), 0)
True
>>> np.allclose(np.dot(a1.T, a1), np.eye(2))
True
Notes
-----
If `a0` and `b` are big (in the first dimension), then it is
best if `a0` is `F_CONTIGUOUS` (i.e. the transpose of a
`C_CONTIGUOUS` array) and `b` is `C_CONTIGUOUS` to avoid
:func:`np.dot` from making unneeded copies. (One copy will
always be made when computing `dot(b.T, b)`).
Uses the relationship
.. math::
\braket{\beta|\beta} -
\braket{\beta|\uvect{\alpha}_{0}}
\braket{\uvect{\alpha}_{0}|\beta}
&= \mat{v} \mat{\sigma}^2 \mat{v}^{T},\\
\begin{pmatrix}
\ket{\uvect{\alpha}_{0}} &
\ket{\uvect{\alpha}_{1}}
\end{pmatrix} &=
\begin{pmatrix}
\ket{\uvect{\alpha}_{0}} &
\ket{\beta}
\end{pmatrix}
\cdot
\begin{pmatrix}
\mat{1} & -\braket{\uvect{\alpha}_{0}|\beta}\mat{v}
\mat{\sigma}^{-1}\\
\mat{0} & \mat{v}\mat{\sigma}^{-1}
\end{pmatrix}.
"""
a0_b = dot(a0.T, b)
s2, v = eigh(dot(b.T, b) - dot(a0_b.T, a0_b))
s = np.sqrt(s2)
A = -np.dot(a0_b, v/s[None, :])
B = v/s[None, :]
return (A, B, s)
[docs] def asarray(self):
"""Return the Jacobian matrix.
Examples
--------
>>> M = _JinvExtendedBroyden()
>>> M.asarray()
array(1.0)
"""
if np.prod(self._b.shape) == 0:
return np.array(1.0)
_alpha = np.hstack([dot(self.G - self.X, self._qa), self._a])
if 1 == self.method:
_XG = self.X
else:
_XG = self.G
_betat = np.vstack([dot(self._qb, _XG.T),
self._bt])
M = dot(dot(_alpha, self._b), _betat)
return M + np.eye(self.X.shape[0], self.G.shape[0], dtype=float)
class _JinvExtendedBroydenMemoryLimited(StateVars):
r"""Class to represent an inverse Jacobian for the Extended
Broyden method.
We represent the inverse Jacobian as:
.. math::
\mat{B} &= \mat{1} + \ket{\uvect{\alpha}}\mat{b}\bra{\uvect{\beta}},\\
\ket{\uvect{\alpha}} &= \begin{pmatrix}
\ket{G - X}\mat{q}_{a} & \ket{\uvect{a}}\end{pmatrix},\\
\bra{\uvect{\beta}} &= \begin{pmatrix}
\mat{q}_{b}\bra{X,G},\\
\bra{\uvect{b}}\end{pmatrix},
with the choice of $\ket{X}$ (method 1) or $\ket{G}$ (method 2)
for $\ket{\uvect{\beta}}$ depends on the method.
Notes
-----
Since python is implemented in C, the arrays are typically stored
as `C_CONTIGUOUS` and :func:`numpy.dot` is efficient if both
arguments are `C_CONTIGUOUS`, otherwise copies are made. Thus,
where one needs to compute $\braket{\mat{A}|\mat{B}}$, one should
store the transpose of the first matrix $\bra{\mat{A}}$.
To Do
-----
.. todo:: Add support for non-square matrices. This is only a
bookkeeping issue, but probably not that useful so maybe leave
for a while.
"""
implements(IJinv)
_state_vars = [
('X', Required,
r"$N\times n$ array $\ket{\mat{X}}$ of previous steps $x_i$."),
('G', Required,
r"$N\times n$ array $\ket{\mat{G}}$ of objectives $G(x_i)$."),
('_a', None,
r"$N\times m$ array of extra range basis vectors."),
('_bt', None, r"""$m\times N$ array of extra domain basis
vectors stored in row-vector format for improved
:func:`numpy.dot` performance."""),
('_qa', None, r"""Transformation $\mat{q}_{a}$ such that
$(\ket{G}-\ket{X})\mat{q}_{a}$ is the first part of the
basis for the range."""),
('_qb', None, r"""Transformation $\mat{q}_{b}$ such that
$\mat{q}_{b}*\bra{X,G}$ is the first part of the basis for
the domain."""),
('_b', None,
r"Dyadic portion of the inverse Jacobian in the implied basis."),
('method', 1,
r"""`1` for Broyden's good method and `2` for the \"bad\"
method."""),
('max_aux_basis_size', np.inf, r"""Maxiumum number of extra
basis states to store for representing the inverse
Jacobian."""),
('_check', True, r"""If `True`, perform some potentially
expensive sanity checks."""),
]
process_vars()
def __init__(self, *v, **kw):
N, n_ = self.X.shape
assert self.G.shape == self.X.shape
if self._a is None:
self._a = np.zeros((N, 0), dtype=float)
self._bt = np.zeros((0, N), dtype=float)
ma = self._a.shape[1]
mb = self._bt.shape[0]
assert self._a.shape == (N, ma)
assert self._bt.shape == (mb, N)
if self._qa is None:
_GX = self.G - self.X
GXGX = np.dot(_GX.T, _GX)
del _GX
u, d, vt = np.linalg.svd(GXGX)
inds = np.where(d/d.max() > _SINGULAR_TOL)[0]
self._qa = u[:,inds]/np.sqrt(d[inds])[None,:]
na = self._qa.shape[1]
assert self._qa.shape == (n_, na)
if self._qb is None:
if 1 == self.method:
_XG = self.X
else:
_XG = self.G
XGXG = np.dot(_XG.T, _XG)
del _XG
u, d, vt = np.linalg.svd(XGXG)
inds = np.where(d/d.max() > _SINGULAR_TOL)[0]
self._qb = (u[:,inds]/np.sqrt(d[inds])[None,:]).T
nb = self._qb.shape[0]
assert self._qb.shape == (nb, n_)
if self._check:
# Check orthonormality of basese
qa = np.asmatrix(self._qa)
_GX = self.G - self.X
GXGX = np.asmatrix(np.dot(_GX.T, _GX))
GXa0 = np.dot(_GX.T, self._a)
a0a0 = np.dot(self._a.T, self._a)
del _GX
assert np.allclose(qa.T*GXGX*qa, np.eye(na))
assert np.allclose(GXa0, 0)
assert np.allclose(a0a0, np.eye(ma))
qb = np.asmatrix(self._qb)
if 1 == self.method:
_XG = self.X
else:
_XG = self.G
b0XG = np.dot(self._bt, _XG)
b0b0 = np.dot(self._bt, self._bt.T)
XGXG = np.asmatrix(np.dot(_XG.T, _XG))
del _XG
assert np.allclose(qb*XGXG*qb.T, np.eye(nb))
assert np.allclose(b0XG, 0)
assert np.allclose(b0b0, np.eye(mb))
if self._b is None:
self._b = np.zeros((na + ma, nb + mb), dtype=float)
assert self._b.shape == (na + ma, nb + mb)
def __mul__(self, dg, mutate=False):
r"""Return the matrix multiplication `B*dg`.
Parameters
----------
dg : array
This should be in column vector format with dimension
`dg.shape = (N,_)`.
mutate : bool
If `True`, then `dg` will be mutated, otherwise a copy will
be returned. (Note: with the present implementation of
:func:`numpy.dot`, one copy of `dg` will be made).
Notes
-----
.. math::
\mat{B}\ket{\d{g}} = \ket{\d{g}} +
\begin{pmatrix}
\ket{G - X}\mat{q}_{a} &
\ket{\uvect{a}}
\end{pmatrix}
\mat{b}
\begin{pmatrix}
\mat{q}_{b}\bra{X,G}\\
\bra{\uvect{b}}
\end{pmatrix}\ket{\d{g}}
"""
dg = np.asarray(dg)
if 1 == len(dg.shape):
dg = dg[:, None]
na = self._qa.shape[1]
ma = self._a.shape[1]
if 1 == self.method:
# The default dot copies dg here, so hopefully it is small.
XGdg = dot(dg.T, self.X).T
else:
assert 2 == self.method
XGdg = dot(dg.T, self.G).T
tmp = dot(self._b, np.vstack([dot(self._qb, XGdg),
dot(self._bt, dg)]))
tmp1 = dot(self._qa, tmp[:na,:])
tmp2 = tmp[na:,:]
if mutate:
res = dg
else:
res = 1*dg
res += np.dot(self.G, tmp1)
res -= np.dot(self.X, tmp1)
res += np.dot(self._a, tmp2)
return res
def __rmul__(self, dxt, mutate=False):
r"""Return the matrix multiplication `dxt*B`.
Parameters
----------
dxt : array
This should be in row vector format with dimension
`dg.shape = (_,N)`.
mutate : bool
If `True`, then `dxt` will be mutated, otherwise a copy will
be returned. (Note: with the present implementation of
:func:`numpy.dot`, one copy of `dxt` will be made).
Notes
-----
.. math::
\bra{\d{x}}\mat{B} = \bra{\d{x}} +
\bra{\d{x}}
\begin{pmatrix}
\ket{G - X}\mat{q}_{a} &
\ket{\uvect{a}}
\end{pmatrix}
\mat{b}
\begin{pmatrix}
\mat{q}_{b}\bra{X,G}\\
\bra{\uvect{b}}
\end{pmatrix}
"""
nb = self._qb.shape[0]
mb = self._bt.shape[0]
tmp = dot(np.hstack([dot(dot(dxt, self.G) - dot(dxt, self.X),
self._qa),
dot(dxt, self._a)]), self._b)
tmp1 = dot(tmp[:, :nb], self._qb)
tmp2 = tmp[:, nb:]
if 1 == self.method:
# The default dot copies dg here, so hopefully it is small.
tmp1 = dot(self.X, tmp1.T).T
else:
assert 2 == self.method
tmp1 = dot(self.G, tmp1.T).T
if mutate:
res = dxt
else:
res = 1*dxt
res += tmp1
res += np.dot(tmp2, self._bt)
return res
def update(self, x0=None, g0=None):
r"""Update the inverse Jacobian so that it is valid at the
point `(x, g)`.
Return a measure of the incompatibility of the new data with
the current approximation.
Examples
--------
Here we perform some tests with a linear function
>>> np.random.seed(0)
>>> N = 4
>>> J = np.random.rand(N, N)
>>> x0 = np.random.rand(N)
>>> def g(x):
... return np.dot(J, x - x0)
>>> B = _JinvExtendedBroyden(X=np.zeros((N,1)),
... G=g(0*x0)[:,None])
>>> x = np.random.rand(N)
>>> B.update(x, g(x))
>>> np.allclose(B*(g(x) - g(x0)), x - x0)
True
>>> for n in xrange(N):
... x = np.random.rand(N)
... B.update(x, g(x))
Notes
-----
Our goal here is to minimize the amount of processing of large
matrices. As such, we try to work with projections as much as
possible. We start with the following large matrices:
$\ket{X}$, $\ket{G}$, $\ket{x}$, $\ket{g}$, $\ket{a_0}$,
$\ket{b_0}$.
We use the following:
.. math::
\ket{\delta\mat{x}} &= \ket{x} - \ket{X_0}, \qquad
\ket{\delta\mat{g}} = \ket{g} - \ket{G_0},\\
\ket{\d\mat{X}} &= \ket{\mat{X}} - \ket{x}, \qquad
\ket{\d\mat{G}} = \ket{\mat{G}} - \ket{g},\\
\ket{\uvect{\alpha}_0} &= \begin{pmatrix}
\ket{\mat{G} - \mat{X}}\mat{q}_{a} &
\ket{\mat{A}_0}
\end{pmatrix},\\
\bra{\uvect{\beta}_0} &= \begin{pmatrix}
\mat{q}_{b}\bra{\mat{X}, \mat{G}} \\
\bra{\mat{B}_0}
\end{pmatrix},\\
\ket{\uvect{\alpha}} &= \begin{pmatrix}
\ket{\uvect{\alpha}_0} &
\ket{\uvect{\alpha}_1}
\end{pmatrix},\\
\bra{\uvect{\beta}} &= \begin{pmatrix}
\bra{\uvect{\beta}_0}\\
\bra{\uvect{\beta}_1}
\end{pmatrix},\\
\ket{\uvect{\alpha}_1} &=
\ket{\uvect{\alpha}_0}\mat{c}_{a} +
\ket{\delta{g} - \delta{x}}\mat{d}_{a},\\
\bra{\uvect{\beta}_1} &=
\mat{c}_{b}^{T}\bra{\uvect{\beta}_0} +
\mat{d}_{b}^{T}\bra{\delta{x}, \delta{g}},\\
\mat{d}_{a} &= \mat{v}_{a}\mat{s}_{a}^{-1},\qquad
\mat{c}_{a} =
-\braket{\uvect{\alpha}_0|\delta{g} - \delta{x}}
\mat{v}_{a}\mat{s}_{a}^{-1},\\
\mat{d}_{b} &= \mat{v}_{b}\mat{s}_{b}^{-1},\qquad
\mat{c}_{b} =
-\braket{\uvect{\beta}_0|\delta{x},\delta{g}}
\mat{v}_{b}\mat{s}_{b}^{-1},\\
\mat{v}_{a}\mat{s}_{a}^2\mat{v}_{a}^{T} &=
\braket{\delta{g} - \delta{x}|\delta{g} - \delta{x}}
- \braket{\delta{g} - \delta{x}|\uvect{\alpha}_0}
\braket{\uvect{\alpha}_0|\delta{g} - \delta{x}},\\
\mat{v}_{b}\mat{s}_{b}^2\mat{v}_{b}^{T} &=
\braket{\delta{x},\delta{g}|\delta{x},\delta{g}}
- \braket{\delta{x},\delta{g}|\uvect{\beta}_0}
\braket{\uvect{\beta}_0|\delta{x},\delta{g}},\\
\braket{\d\mat{G}-\d\mat{X}|\uvect{\alpha}} &= \begin{pmatrix}
\braket{\d\mat{G}-\d\mat{X}|\uvect{\alpha}_0} &
\braket{\d\mat{G}-\d\mat{X}|\uvect{\alpha}_1}
\end{pmatrix},\\
\braket{\d\mat{G} - \d\mat{X}|\uvect{\alpha}_0} &= \begin{pmatrix}
\braket{\d\mat{G} - \d\mat{X}|\mat{G} - \mat{X}}\mat{q}_{a} &
\braket{\d\mat{G} - \d\mat{X}|\mat{A}_0}
\end{pmatrix},\\
\braket{\d\mat{G} - \d\mat{X}|\uvect{\alpha}_1} &=
\braket{\d\mat{G} - \d\mat{X}|\uvect{\alpha}_0}\mat{c}_{a} +
\braket{\d\mat{G} - \d\mat{X}|\delta{g}-\delta{x}}\mat{d}_{a},\\
\braket{\d\mat{X},\d\mat{G}|\uvect{\beta}} &= \begin{pmatrix}
\braket{\d\mat{X},\d\mat{G}|\uvect{\beta}_0} &
\braket{\d\mat{X},\d\mat{G}|\uvect{\beta}_1}
\end{pmatrix} ,\\
\braket{\d\mat{X},\d\mat{G}|\uvect{\beta}_0} &= \begin{pmatrix}
\braket{\d\mat{X},\d\mat{G}|\mat{X},\mat{G}}\mat{q}_{b}^{T} &
\braket{\d\mat{X},\d\mat{G}|\mat{B}_0}
\end{pmatrix},\\
\braket{\d\mat{X},\d\mat{G}|\uvect{\beta}_1} &=
\braket{\d\mat{X},\d\mat{G}|\uvect{\beta}_0}\mat{c}_{b} +
\braket{\d\mat{X},\d\mat{G}|\delta{x},\delta{g}}\mat{d}_{b},\\
\braket{\uvect{\alpha}|\uvect{\alpha}_0}
&= \braket{\uvect{\beta}|\uvect{\beta}_0} = \begin{pmatrix}
\mat{1}\\ \mat{0}
\end{pmatrix},\\
\braket{\beta|\alpha} &=
\begin{pmatrix}
\mat{q}_{b}\braket{\mat{X}, \mat{G}|
\mat{G} - \mat{X}}\mat{q}_{a} &
\mat{q}_{b}\braket{\mat{X}, \mat{G}|
\mat{G} - \mat{X}}\mat{q}_{a} \\
\end{pmatrix}
"""
# First we cast all of the arrays as matrices so we can
# use * as matrix multiplication. Large matrices are
# preceeded by an underscore.
x = np.asarray(x0)
g = np.asarray(g0)
if 1 == len(x.shape):
x = x[:,None]
if 1 == len(g.shape):
g = g[:,None]
_X = np.asmatrix(self.X)
_G = np.asmatrix(self.G)
_b0 = np.asmatrix(self._bt.T)
_a0 = np.asmatrix(self._a)
qa = np.asmatrix(self._qa)
qb = np.asmatrix(self._qb)
# We assume that g is only one column, so a copy here is okay.
_dg = np.asmatrix(g - _G[:,0])
_dx = np.asmatrix(x - _X[:,0])
_dgdx = _dg - _dx
# Consider optimizing these copies away later
_dG = np.asmatrix(_G - g)
_dX = np.asmatrix(_X - x)
_dGdX = _dG - _dX
_GX = np.asmatrix(_G - _X)
if 1 == self.method:
_dxg = _dx
_XG = np.asmatrix(_X)
_dXG = _dX
else:
_dxg = _dg
_XG = np.asmatrix(_G)
_dXG = _dG
# These are all of the brackets required.
dXdX = _dX.T*_dX
dGdG = _dG.T*_dG
dXG_XG = _dXG.T*_XG
dXG_dxg = _dXG.T*_dxg
dXG_b0 = _dXG.T*_b0
dxg_XG = _dxg.T*_XG
dxg_GX = _dxg.T*_GX
dxg_dxg = _dxg.T*_dxg
dxg_b0 = _dxg.T*_b0
dxg_a0 = _dxg.T*_a0
dGdX_a0 = _dGdX.T*_a0
GX_a0 = _GX.T*_a0
XG_GX = _XG.T*_GX
X_a0 = _X.T*_a0
g_a0 = g.T*_a0
x_a0 = x.T*_a0
XG_a0 = _XG.T*_a0
b0_a0 = _b0.T*_a0
GX_b0 = _GX.T*_b0
dGdX_GX = _dGdX.T*_GX
dGdX_dgdx = _dGdX.T*_dgdx
dgdx_GX = _dgdx.T*_GX
dgdx_XG = _dgdx.T*_XG
dgdx_a0 = _dgdx.T*_a0
dxg_dgdx = _dxg.T*_dgdx
dgdx_dgdx = _dgdx.T*_dgdx
dgdx_b0 = _dgdx.T*_b0
# Now we form the small matrices
dxg_beta0 = np.hstack([dxg_XG*qb.T, dxg_b0])
ub, sb, _ = np.linalg.svd(dxg_dxg - dxg_beta0*dxg_beta0.T)
# Only keep significant basis vectors
inds = np.where(sb >= _SINGULAR_TOL)[0]
sb = sb[inds]
ub = ub[:, inds]
db = np.asmatrix(ub/(np.sqrt(sb)[None, :] + _SINGULAR_TOL**2))
cb = -dxg_beta0.T*db
dXG_beta0 = np.hstack([dXG_XG*qb.T, dXG_b0])
dXG_beta1 = dXG_beta0*cb + dXG_dxg*db
dXG_beta = np.hstack([dXG_beta0, dXG_beta1])
dGdX_a0 = GX_a0 - (g_a0 - x_a0)
dgdx_alpha0 = np.hstack([dgdx_GX*qa, dgdx_a0])
ua, sa, _ = np.linalg.svd(dgdx_dgdx -
dgdx_alpha0*dgdx_alpha0.T)
# Only keep significant basis vectors (assuming everything has
# been scaled to be of unit norm here...)
inds = np.where(sa >= _SINGULAR_TOL)[0]
sa = sa[inds]
ua = ua[:, inds]
da = np.asmatrix(ua/(np.sqrt(sa)[None, :] + _SINGULAR_TOL**2))
ca = -dgdx_alpha0.T*da
dGdX_alpha0 = np.hstack([dGdX_GX*qa, dGdX_a0])
dGdX_alpha1 = dGdX_alpha0*ca + dGdX_dgdx*da
dGdX_alpha = np.hstack([dGdX_alpha0, dGdX_alpha1])
# the basis vectors alpha and beta are stacked from three sets
# with sizes n, m0 and (m-m0)
N, n_ = self.X.shape
na0, na = self._qa.shape
nb, nb0 = self._qb.shape
ma0 = self._a.shape[1] # Previous size of aux space
mb0 = self._bt.shape[0]
ma = ma0 + ca.shape[1] # New size of aux space
mb = mb0 + cb.shape[1]
eye0 = np.asmatrix(np.eye(na + ma0, nb + mb0, dtype=float))
eye1 = np.asmatrix(np.eye(na + ma, nb + mb, dtype=float))
alpha_alpha0 = np.asmatrix(np.eye(na + ma, na + ma0, dtype=float))
beta_beta0 = np.asmatrix(np.eye(nb + mb, nb + mb0, dtype=float))
beta0_dgdx = np.vstack([qb*dgdx_XG.T,
dgdx_b0.T])
dxg_alpha0 = np.hstack([dxg_GX*qa, dxg_a0])
beta0_alpha0 = np.bmat(
[[qb*XG_GX*qa, qb*XG_a0],
[GX_b0.T*qa, b0_a0]])
beta0_alpha1 = beta0_alpha0*ca + beta0_dgdx*da
beta1_alpha0 = cb.T*beta0_alpha0 + db.T*dxg_alpha0
beta1_alpha1 = (cb.T*(beta0_alpha0*ca + beta0_dgdx*da) +
db.T*(dxg_alpha0*ca + dxg_dgdx*da))
beta_alpha = np.bmat([[beta0_alpha0, beta0_alpha1],
[beta1_alpha0, beta1_alpha1]])
############################################################
# Now we have the bases and all the required matrices, so we
# do the real work.
def inv(A):
if 0 == np.prod(A.shape):
# Empty matrix needs a degenerate case
return A.T
return np.linalg.pinv(A, rcond=_SINGULAR_TOL**2)
w, w0, wr = self.get_w(dXdX, dGdG)
if w is not None:
w = np.asarray(w)
if 1 == len(w.shape):
dGdX_alpha *= w[:, None]
dXG_beta *= w[:, None]
else:
dGdX_alpha = dot(w.T, dGdX_alpha)
dXG_beta = dot(w.T, dXG_beta)
if 1 == self.method:
num = dGdX_alpha.T*dXG_beta
if self._b is not None:
j0 = -inv(inv(self._b) + beta0_alpha0)
num += w0*w0*alpha_alpha0*j0*beta_beta0.T
denom = dXG_beta.T*dXG_beta + (w0*w0 + wr*wr)*np.eye(nb+mb)
j = num*inv(denom)
b = -inv(inv(j) + beta_alpha)
elif 2 == self.method:
num = -dGdX_alpha.T*dXG_beta
if self._b is not None:
num += w0*w0*alpha_alpha0*self._b*beta_beta0.T
denom = dXG_beta.T*dXG_beta + (w0*w0 + wr*wr)*np.eye(nb+mb)
b = num*inv(denom)
# Set singular values to zero so they don't contribute.
if 0 == np.prod(b.shape):
# There is no dyadic contribution.
b = None
_a = None
_bt = None
else:
u, d, vt = np.linalg.svd(b, full_matrices=False)
bad_inds = np.where(np.logical_or(d < _SINGULAR_TOL,
1/_SINGULAR_TOL < d))[0]
d[bad_inds] = 0.0
b = dot(np.multiply(u,d[None, :]), vt)
# Now that we have b, we perform an svd in the auxiliary basis to
# determine what basis vectors to keep.
qa_, qbt_ = self._restrict(b, na, nb)
b[na:, nb:] = qa_*b[na:, nb:]*qbt_
# na + ma0 ma - ma0
# |alpha> = [alpha0, alpha0*ca + dgdx*da]
# = [GX*qa, [a0, [GX*qa, a0]*ca + dgdx*da]]
# = [GX*qa, A_new]
# = [GX*qa, [GX, a0, dgdx]*A]
# |A_new> = [GX, a0, dgdx]*A
# A_[na_ + ma0 + (ma - ma0), ma = ma0 + (ma-ma0)]
# A = [[0, qa*ca]
# [1, ca]
# [0, da]]
A = np.bmat(
[[np.zeros((na0, ma0)), qa*ca[:na,:]],
[np.eye(ma0, ma0), ca[na:,:]],
[np.zeros((da.shape[0], ma0)), da]])
Bt = np.bmat(
[[np.zeros((nb0, mb0)), qb.T*cb[:nb,:]],
[np.eye(mb0, mb0), cb[nb:,:]],
[np.zeros((db.shape[0], mb0)), db]]).T
A = A*qa_.T
Bt = qbt_.T*Bt
if ma > self.max_aux_basis_size:
ma = self.max_aux_basis_size
A = A[:, :ma]
b = b[:ma, :]
if mb > self.max_aux_basis_size:
mb = self.max_aux_basis_size
Bt = Bt[:mb, :]
b = b[:, :mb]
_a = np.hstack([_GX, _a0, _dgdx])*A
_bt = Bt*np.vstack([_XG.T,
_b0.T,
_dxg.T])
self.suspend()
self._a = _a
self._bt = _bt
self._b = b
# These are not needed: just for checking and debugging
_alpha0 = np.hstack([_GX*qa, _a0])
_alpha1 = _alpha0*ca + _dgdx*da
_alpha = np.hstack([_alpha0, _alpha1])
self.resume()
def get_w(self, dXdX, dGdG):
r"""Return weights `w, w0, wr`.
Returns
-------
w : array
Array of weights for `dx`.
"""
N, n_ = self.X.shape
ma = self._a.shape[1]
mb = self._bt.shape[0]
na = self._qa.shape[1]
nb = self._qb.shape[0]
w = np.eye(dXdX.shape[0])
w0 = _SINGULAR_TOL
wr = 0
return w, w0, wr
def _restrict(self, b, na=0, nb=0):
r"""Return unitary matrices `(qa, qb.T)` which can be used to
transform `b` so that truncating the matrix will introduce
minimal errors. The first `n0` rows and columns of `b` will
be held fixed.
The present implementation is an approximation: it is not
optimal.
Parameters
----------
b : array of shape (n,n)
Array to be rearranged and truncated.
na, nb : int
Number of rows and columns to be held fixed. If this is
`0`, then the SVD will be used: ``qa, d, qb.T = svd(b)``.
"""
ua, da, vat = np.linalg.svd(b[na:, :])
ub, db, vbt = np.linalg.svd(b[:, nb:])
return ua, vbt
@staticmethod
def _ortho(a0, b):
r"""Return `(A, B, s)` such that
`a1 = dot(a0, A) + dot(b, B)` is orthonormal and
orthogonal to `a0`.
Parameters
----------
a0 : array
The columns of this must be orthonormal.
b : array
Need not be orthonormal.
Returns
-------
A, B : array
s : array
This is the list of singular values for the
transformation. If these are too small, then the
corresponding dimensions should be dropped. I.e. use only
`a1[:, np.where(tol < s)[0]]`
Examples
--------
>>> np.random.seed(0)
>>> a0 = np.linalg.qr(np.random.rand(10, 5))[0][:,:5]
>>> b = np.random.rand(10, 2)
>>> A, B, s = JinvExtendedBroyden._ortho(a0, b)
>>> a1 = np.dot(a0, A) + np.dot(b, B)
>>> np.allclose(np.dot(a1.T, a0), 0)
True
>>> np.allclose(np.dot(a1.T, a1), np.eye(2))
True
Notes
-----
If `a0` and `b` are big (in the first dimension), then it is
best if `a0` is `F_CONTIGUOUS` (i.e. the transpose of a
`C_CONTIGUOUS` array) and `b` is `C_CONTIGUOUS` to avoid
:func:`np.dot` from making unneeded copies. (One copy will
always be made when computing `dot(b.T, b)`).
Uses the relationship
.. math::
\braket{\beta|\beta} -
\braket{\beta|\uvect{\alpha}_{0}}
\braket{\uvect{\alpha}_{0}|\beta}
&= \mat{v} \mat{\sigma}^2 \mat{v}^{T},\\
\begin{pmatrix}
\ket{\uvect{\alpha}_{0}} &
\ket{\uvect{\alpha}_{1}}
\end{pmatrix} &=
\begin{pmatrix}
\ket{\uvect{\alpha}_{0}} &
\ket{\beta}
\end{pmatrix}
\cdot
\begin{pmatrix}
\mat{1} & -\braket{\uvect{\alpha}_{0}|\beta}\mat{v}
\mat{\sigma}^{-1}\\
\mat{0} & \mat{v}\mat{\sigma}^{-1}
\end{pmatrix}.
"""
a0_b = dot(a0.T, b)
s2, v = eigh(dot(b.T, b) - dot(a0_b.T, a0_b))
s = np.sqrt(s2)
A = -np.dot(a0_b, v/s[None, :])
B = v/s[None, :]
return (A, B, s)
def asarray(self):
"""Return the Jacobian matrix.
Examples
--------
>>> M = _JinvExtendedBroyden()
>>> M.asarray()
array(1.0)
"""
if np.prod(self._b.shape) == 0:
return np.array(1.0)
_alpha = np.hstack([dot(self.G - self.X, self._qa), self._a])
if 1 == self.method:
_XG = self.X
else:
_XG = self.G
_betat = np.vstack([dot(self._qb, _XG.T),
self._bt])
M = dot(dot(_alpha, self._b), _betat)
return M + np.eye(self.X.shape[0], self.G.shape[0], dtype=float)
[docs]class JinvBroyden(StateVars):
r"""Class to represent an inverse Jacobian for the Generalized
Broyden method.
This is intended to solve root-finding problems
of the form
.. math::
\vect{G}(\vect{x}) = 0.
This is designed for large scale problems where it is typically
prohibitive to store the entire Jacobian, so a limited
memory approximation is typically made by using :attr:`dyadic` and
:attr:`n_prev`. As a result, we assume that the problem is
preconditions so that the Jacobian is well approximated by the
identity.
.. math::
\mat{J} \approx \mat{1}.
In other words, we implicitly expect that the problem is
preconditioned in such a way that
.. math::
\vect{G}(\vect{x}) = \vect{x} - \vect{F}(\vect{x})
where the iteration $\vect{x} \mapsto \vect{F}(\vect{x})$ is
"almost" convergent. This assumption plays out in much of the
internal analysis about when behaviour is singular.
The core of this is one of the following two updates to the
inverse Jacobian:
.. math::
\mat{B}_{\text{1a)}} &= \mat{B}_{0} +
(\ket{\d\mat{X}} - \mat{B}_{0}\ket{\d\mat{G}})\cdot\mat{w}^2
\bra{\d\mat{X}}\cdot\frac{1}{
\mat{B}_{0}\ket{\d\mat{G}}\mat{w}^2\bra{\d\mat{X}}
+ w_{0}^2\mat{\Gamma}\mat{\Gamma}^{T}}\cdot\mat{B}_{0}, \\
\mat{B}_{\text{2b)}} &= \mat{B}_{0}
+ (\ket{\d\mat{X}} - \mat{B}_{0}\ket{\d\mat{G}})\cdot\mat{w}^2
\bra{\d\mat{G}}\cdot\frac{1}{
\ket{\d\mat{G}}\mat{w}^2\bra{\d\mat{G}}
+ w_{0}^2\mat{\Gamma}\mat{\Gamma}^{T}}
where the weights $\mat{w}$ are computed using :meth:`_get_weight`
and the inversion of the denominator is computed using
:meth:`_inversion`. The displacements are computed from the
current step $\ket{x}$ and $\ket{g}$ and the previous steps
$\ket{\mat{X}}$ and $\ket{\mat{G}}$, of which we store the
previous :attr:`n_prev` steps.
.. math::
\ket{\d\mat{X}} &= \ket{x} - \ket{\mat{X}}, \\
\ket{\d\mat{G}} &= \ket{g} - \ket{\mat{G}}.
Examples
--------
We start with a linear function. The Jacobian should fully
represent this.
>>> np.random.seed(0)
>>> A = np.random.random((3, 3))
>>> Ainv = np.linalg.inv(A)
>>> b = np.random.random(3)
>>> x0 = np.random.rand(3)
>>> def g(x):
... return np.dot(A, x) + b
For both :attr:`method` "good" (`'J'`) and "bad" (`'B'`) the
:attr:`inv_method` `'svd'` reproduces the Jacobian once the full
space is sampled:
>>> B = JinvBroyden(dyadic=True, inv_method='svd')
>>> for B.method in ['J', 'B']:
... B.reset_all()
... x_ = 1*x0
... for n in xrange(4):
... g_ = g(x_)
... incompatibility = B.set_point(x=x_, g=g_)
... x_ = B.get_x()
... np.allclose(B.j_inv.asarray(), Ainv)
True
True
The :attr:`inv_method` `'w0'` does not exactly reproduce this as
it "remembers" the initial Jacbian $\mat{1}$, but the results are
almost correct if `w0` is small enough:
>>> B.inv_method = 'w0'
>>> B.x_method = 'recent'
>>> B.w0 = 0.0001
>>> for B.method in ['J', 'B']:
... B.reset_all()
... x_ = 1*x0
... for n in xrange(4):
... g_ = g(x_)
... incompatibility = B.set_point(x=x_, g=g_)
... x_ = B.get_x()
... # abs(B.j_inv.asarray() - Ainv).max()
... print(np.allclose(B.j_inv.asarray(), Ainv),
... np.allclose(B.j_inv.asarray(), Ainv, atol=0.001),
... np.allclose(B.j_inv.asarray(), Ainv, atol=0.003))
(False, True, True)
(False, False, True)
This also demonstrates that the "good" method 1a) converges better
to the Jacobian.
Here we start with a non-trivial Jacobian but no vectors `xs`
and then update. This happens, for example, with
:meth:`change_basis`. We only keep `xs` and `gs` for exact `x =
g(x)` and this is violated by linear changes in the basis, so we
discard the points but keep the Jacobian.
>>> np.random.seed(0)
>>> B = JinvBroyden(dyadic=False, inv_method='svd', method='B')
>>> B0 = np.random.random((4, 4))
>>> B.j_inv = 1*B0
>>> x = np.random.random(4)
>>> g = np.random.random(4)
>>> B.set_point(x=x, g=g)
0.0
This does not yet modify the Jacobian because we have only one point
>>> abs(B0 - B.j_inv).max()
0.0
Now we update the Jacobian
>>> dx = np.random.random(4)
>>> dg = np.random.random(4)
>>> B.set_point(x=x+dx, g=g+dg) # doctest: +ELLIPSIS
0.79...
The new Jacobian now fits the slope properly,
>>> np.allclose(B.j_inv*dg[:, None], dx[:, None])
True
But has not been modified in orthogonal directions (only true for
:attr:`method` `'B'`).
>>> dg_perp = np.linalg.qr(np.vstack([dg, np.random.random(4)]).T)[0][:, 1]
>>> np.allclose(np.dot(dg_perp, dg), 0)
True
>>> np.allclose(np.dot(B0, dg_perp[:, None]), B.j_inv*dg_perp[:, None])
True
If you want to reproduce the original Broyden algorithms, then one
should set: :attr:`n_prev` to `2`, :attr:`x_method` to `'recent'`,
:attr:`inv_method` to `'svd'`, and :attr:`replace_method` to `'oldest'`.
"""
implements(IJinv)
_state_vars = [
('curr', None,\
r"""Index of current point in :attr`:xs`. The Jacobian
was updated to be valid at this point."""),
('xs', None,\
r"""Previous abscissa stored as row-vectors
$\bra{\mat{X}}$"""),
('gs', None,\
r"""Previous `g = G(x)` stored as row-vectors
$\bra{\mat{G}}$"""),
('n_prev', 10,\
r"Number of previous steps to store."),
('method', 'J',
r"""Denotes which method of update is used in
:meth:`update` (see for details)."""),
('x_method', 'recent',
r"""Denotes which method is used to determine the next
abscissa. The two options determine at which point
:meth:`update` updates the Jacobian inverse to be valid
and stores this index in :attr:`curr`. This is the point
from which :meth:`get_x` takes a downhill step. See
:meth:`update` for possible options.
.. warning:: Methods other than `'recent'` should be used
with caution. It is possible to get stuck into a loop
here where the step does not update the Jacobian and so
the same step is repeated over and over. See the
section :ref:`sec-notes`."""),
('replace_method', 'oldest',
r"""Determines which point is replaced in order to satisfy
the memory-limitation :attr:`n_prev`. See
:meth:`add_point` for possible values.
.. warning:: Methods other than `'oldest'` should be used
with caution. It is possible to get stuck into a loop
here where newest point is continually replaced, making
no progress. See yje section :ref:`sec-notes`."""),
('inv_method', 'svd',
r"""Method used by :meth:`_inversion` to regulate
singular denominators. See :meth:`_inversion` for
options."""),
('min_svd', 1e-8,
r"""Minimum singular value allowed in :meth:`_inversion`
for the 'svd' :attr:`inv_method`. This parameter does not seem to
be too sensitive although it may be in problems where the
Jacobian is singular."""),
('w0', 1,
r"""Weight for minimal Jacobian change. Large values
reduce the influence of the previous Jacobian inverse."""),
('w_min', np.sqrt(_EPS),
r"""Minimum weight (to ensure that $\mat{w}^2$ is
invertible)."""),
('dyadic', False,
r"""If `True`, then use a dyadic representation for Jinv. If
this is a number greater than 1, then this number is
passed as :attr:`DyadicSum.n_max` limiting the rank of the
dyadic representation."""),
('j_inv', 1, "Internal representation of matrix"),
('robust', False,
"""If `True`, then use the singular value decomposition to
analyse the Jacobian. Much slower, but could prevent some
numerical problems."""),
('max_j_inv', 100.0,
"""If :attr:`robust`, then the inverse Jacobian is clipped
so that the maximum value is this."""),
('min_j_inv', 0.0001,
"""The inverse Jacobian can become singular. If this is
the case, then a step might have very small length even if
the error is not small (the step size can be reduced by a
factor of the minimum singular value of the inverse
Jacobian). If the step reduction is less than this value,
then the step is scaled up to have length
:attr:`min_step`."""),
('minimize_inverse', False,
"""If `True`, then use the update the minimizes the change
in the Frobenius norm of the inverse Jacobian, otherwise
we use the Sherman-Morrison formula to update so as to
minimize the norm of the Jacobian."""),
('incompatibility_measure', 'dg',
r"""Specifies which criterion to use for measured the
compatibility of an update with the present inverse
Jacobian. See :attr:`incompatibility` for supported
methods."""),
('check_roundoff', True,
r"""If `True`, then do some extra calculations to check
for roundoff error."""),
('roundoff_cut', 0.001,
r"""If the estimated relative rounding errors in dg or dx
are greater than this and :attr:`check_roundoff`, then
these are not used."""),
('_j_inv_transform', NotImplemented,
"""Optional methods for updating the Jacobian inverse in
:meth:`change_basis`. For testing only."""),
('_dx_norm2_min', _EPS**2,
r"""Used in :meth:`update`. If the relative L2 norm
squared of `x0 - x` is less than this, then the point `x`
is removed from the analysis. This prevents naked
singularities in `dx`.
.. warning:: This parameter is crucial to obtain good
convergence. If it is too large, then the nearest --
most significant -- points will not be used resulting
in poor convergence. If it is too small, then
ill-condition updates to the Jacobian may be made with
no information content because of round off errors."""),
('_age', None,
r"""Age of the various points so that one can identify the
oldest."""),
('_x0', None, "Internal storage for :attr:`x0`"),
('_g0', None, "Internal storage for :attr:`g0`"),
]
process_vars()
def __init__(self, *v, **kw):
# pylint: disable-msg=W0231
if self.dyadic:
n_max = np.inf
if self.dyadic > 1:
n_max = self.dyadic
if self.j_inv is 1:
self.j_inv = DyadicSum(n_max=n_max)
elif isinstance(self.j_inv, DyadicSum):
self.j_inv.n_max = n_max
else:
raise NotImplementedError(
"No support yet for initializing a" +
" DyadicSum from an array.")
elif self.j_inv is not 1:
self.j_inv = np.asmatrix(self.j_inv)
@property
def x(self):
if self.xs is not None and self.curr is not None:
return self.xs[self.curr]
else:
return None
@property
def g(self):
if self.gs is not None and self.curr is not None:
return self.gs[self.curr]
else:
return None
@property
def x0(self):
if self._x0 is None:
return self.x
else:
return self._x0
@property
def g0(self):
if self._g0 is None:
return self.g
else:
return self._g0
def __mul__(self, dg):
if self.j_inv is 1:
return dg
else:
return np.asarray(self.j_inv.__mul__(dg))
def __rmul__(self, dxt):
if self.j_inv is 1:
return dxt
else:
return self.j_inv.__rmul__(dxt)
def _get_weight(self, dx, dg):
r"""Return `(w0, w)` where $\mat{w}$ is the non-singular
weight matrix and $w_0$ is the weight for the Jacbian
residual.
Parameters
----------
dx, dg : array
`(n, N)` arrays of previous differences $\bra{\d{X}} =
\bra{x} - \bra{\mat{X}}$ (and similarly for $G$) where
$\bra{x}$ is the current step and $\bra{X}$ are the
previous steps.
Notes
-----
Currently the weights are implemented as
.. math::
w_{ij} = w_{\text{max}}^{-1}
\delta_{ij}\frac{1}{\braket{\d\mat{X}|\d\mat{X}}_{ii}}
These are normalized so that the largest weight is 1 and all
weights are larger than :attr:`w_min`.
"""
w = np.maximum(self.w_min, 1/(np.diag(dot(dx, dx.T))**2 + _TINY))
w = 0.0*w + 1.0;
return (self.w0, np.diag(w/w.max()))
def _inversion(self, dx, dg, j_inv):
r"""Return `(ket_a, bra_b)` where
.. math::
\ket{a} &= \ket{\d{\mat{X}}} - \mat{B}_{0}\ket{\d{\mat{G}}}\\
\bra{b} &= \bra{\mat{B}}
\frac{1}{\ket{\mat{A}}\bra{\mat{B}}
+ w_0^2\mat{\Gamma}\mat{\Gamma}^T}\mat{C}
1a) :
.. math::
\bra{\mat{B}} = \mat{w}^2\bra{\d\mat{X}}, \qquad
\ket{\mat{A}} = \mat{B}_{0}\ket{\d\mat{G}}, \qquad
\mat{C} = \mat{B}_{0},
2b) :
.. math::
\bra{\mat{B}} = \mat{w}^2\bra{\d\mat{G}}, \qquad
\ket{\mat{A}} = \ket{\d\mat{G}}, \qquad
\mat{C} = \mat{1}
Parameters
----------
dx : 2-d array
`(N, n)` array $\ket{\d{\mat{X}}}$.
dg : 2-d array
`(N, n)` array $\ket{\d{\mat{G}}}$.
j_inv : operator
`(N, N)` operator $\mat{B}_{0}$ representing the previous
inverse Jacobian. This is used to compute the residue and
to implement the 1a) `w0` method.
Notes
-----
The following methods permitted as specified by :attr:`inv_method`:
`'svd'` :
Subspace decomposition based on the singular value
decomposition and corresponds to the choice
.. math::
\mat{\Gamma} = \ket{\uvect{B}_{\perp}}
which is the orthonormal complement to the span
$\ket{\uvect{B}}$ of $\ket{\mat{B}}$.
In the basis $\left(\ket{\uvect{B}},
\ket{\uvect{B}_{\perp}}\right)$, the matrix
$\ket{\mat{A}}\bra{\mat{B}}$ is block lower
triangular:
.. math::
\begin{pmatrix}
\braket{\uvect{B}|\mat{A}}
\braket{\mat{B}|\uvect{B}} & \mat{0}\\
\braket{\uvect{B}_{\perp}|\mat{\tilde{A}}}
\braket{\mat{B}|\uvect{B}} & w_{0}^2\mat{1}
\end{pmatrix}^{-1}.
Hence the inverse is also block lower triangular so that the
action of $\bra{\mat{B}}$ will project out only the upper
component and we can ignore $w_{0}$, writing:
.. math::
\bra{b} &= \braket{\mat{B}|\mat{A}}^{-1}
\bra{\mat{B}}\mat{C}\\
\bra{b}_{\text{1a}} &=
\left(
\mat{w}^2\braket{\d\mat{X}|\mat{B}_{0}|\d\mat{G}}
\right)^{-1}
\mat{w}^2\bra{\d\mat{X}}\mat{B}_{0}
= \braket{\d\mat{X}|\mat{B}_{0}|\d\mat{G}}^{-1}
\bra{\d\mat{X}}\mat{B}_{0}\\
\bra{b}_{\text{2b}} &=
\left(
\mat{w}^2\braket{\d\mat{G}|\d\mat{G}}
\right)^{-1}
\mat{w}^2\bra{\d\mat{G}}
= \braket{\d\mat{G}|\d\mat{G}}^{-1}
\bra{\d\mat{G}}
This makes it clear that the weights are irrelevant in this
method: They are only significant if the system is
over-determined or for restricting the subspace.
`'w0'` :
Here we set $\mat{\Gamma} = w_0\mat{1}$. This is
what is done in [Johnson:1988]_. The inversion formula
becomes:
.. math::
\bra{b} &=
\frac{1}{
\braket{\mat{B}|\mat{B}_{0}|\mat{A}}
+ w_0^2\mat{1}}
\bra{\mat{B}}\mat{C}\\
\bra{b}_{\text{1a}} &=
\frac{1}{
\mat{w}^2\braket{\d\mat{X}|\mat{B}_{0}|\d\mat{G}}
+ w_0^2\mat{1}}
\mat{w}^2\bra{\d\mat{X}}\mat{B}_{0}
= \frac{1}{
\braket{\d\mat{X}|\mat{B}_{0}|\d\mat{G}}
+ w_0^2\mat{w}^{-2}}\bra{\d\mat{X}}\mat{B}_{0}\\
\bra{b}_{2b} &=
\frac{1}{
\mat{w}^2\braket{\d\mat{G}|\d\mat{G}}
+ w_0^2\mat{1}}
\mat{w}^2\bra{\d\mat{G}}
= \frac{1}{
\braket{\d\mat{G}|\d\mat{G}}
+ w_0^2\mat{w}^{-2}}\bra{\d\mat{G}}
`'numeric'` :
This implements the numerical method. This is the same as
the `'w0'` method except that `w0` is chosen to be as small
as possible to ensure that the smallest singular value of
the denominator is larger than :attr:`min_svd` times the
largest.
.. note::
A desirable property is that the update work for perfectly
linear models in the sense that, give a linear
model: $\ket{\mat{X}} = \ket{\mat{X}_0} +
\mat{J}^{-1}\ket{\mat{G}}$, the update will satisfy
$\mat{B}\ket{\mat{G}} = \mat{J}^{-1}\ket{\mat{G}}$: In
other words, the action of $\mat{B}$ and $\mat{A}$ are the
same on the subspace $\ket{\uvect{G}}$ spanned by
$\ket{\d\mat{G}}$. This require that:
.. math::
\ket{\mat{G}}\braket{b|\uvect{G}} =
\ket{\mat{G}}\bra{\mat{B}}
\frac{1}{\ket{\mat{A}}\bra{\mat{B}}
+ w_0^2\mat{\Gamma}\mat{\Gamma}^{T}}\mat{C}
\ket{\uvect{G}} = \ket{\uvect{G}}
which is satisfied by both `'svd'` methods but only
approximated by the `'w0'` method in the limit $w_{0}
\rightarrow 0$.
One potential problem is that the weights might be very
small, cause a premature truncation of the SVD, however, a
separate decomposition of the components fails to preserve
the desired behaviour:
>>> np.random.seed(0)
>>> N = 10; n=5; m = 1
>>> A = np.mat(np.random.rand(N, N))
>>> B0 = np.mat(np.random.rand(N, N))
>>> B0 = 0*np.eye(N)
>>> G = np.random.rand(N, n-m)
>>> G = np.mat(np.dot(np.hstack((G, ) + (G[:, :1], )*m),
... np.random.rand(n, n)))
>>> Ug, dg, VTg = np.linalg.svd(G); Ug = Ug[:, :n]; VTg = VTg[:n, :]
>>> Pg = Ug[:, :n-m]
>>> X = A*G
>>> b = X
>>> a = G
>>> w = np.mat(np.diag(np.random.rand(n)))
>>> wi = np.linalg.inv(w)
>>> Qa, Ra = np.linalg.qr(a)
>>> Qb, Rb = np.linalg.qr(b)
>>> U, d, VT = np.linalg.svd(Ra*w*Rb.T)
>>> di = np.where(d > 1e-12, 1/d, 0) # Singular portions.
>>> Inv1 = Qb*VT.T*np.diag(di)*U.T*Qa.T
>>> np.allclose(G*w*b.T*Inv1*Pg, Pg)
True
>>> Ua, da, VTa = np.linalg.svd(a); Ua = Ua[:, :n]; VTa = VTa[:n, :]
>>> Ub, db, VTb = np.linalg.svd(b); Ub = Ub[:, :n]; VTb = VTb[:n, :]
>>> dai = np.where(da > 1e-12, 1/da, 0) # Singular portions.
>>> dbi = np.where(db > 1e-12, 1/db, 0) # Singular portions.
>>> Inv2 = Ub*np.diag(dbi)*VTb*wi*VTa.T*np.diag(dai)*Ua.T
>>> np.allclose(G*w*b.T*Inv2*Pg, Pg)
False
"""
w0, w = self._get_weight(dx=dx, dg=dg)
w2 = dot(w.T, w)
at = dg.T
if 'J' == self.method:
# Broyden's "good" method
if self.dyadic:
b = j_inv.__rmul__(dx)
else:
b = dot(dx, j_inv)
elif 'B' == self.method:
# Broyden's "bad" method
b = dg
else:
raise ValueError("Method 'method' = %s not supported." %
(self.method, ))
# Include weights
b = np.asarray(dot(w2, b))
if 'svd' == self.inv_method:
# Should optimize so that copies are not made.
u, d, vt = np.linalg.svd(dot(b, at))
d_inv = np.where(d/d.max() < self.min_svd, 0, 1/d)
inv = dot(vt.T*d_inv, u.T)
bra_b = dot(inv, b)
elif self.inv_method in ['w0', 'numeric']:
# Johnson's method
den = dot(b, at)
if 'numeric' == self.inv_method:
eigs = np.linalg.eigvals(den)
w02 = self.min_svd*abs(eigs).max() - min(0, eigs.real.min())
else:
w02 = w0**2
den += w02*np.eye(len(b))
bra_b = np.linalg.solve(den, b)
else:
raise ValueError("Method 'inv_method' = %s not supported." %
(self.inv_method, ))
ket_a = dx.T - np.asarray(j_inv*np.mat(dg.T))
return ket_a, bra_b
def add_point(self, x, g):
r"""Add point `(x, g)` to :attr:`xs` and :attr:`gs` and return
the index.
Notes
-----
If using a memory limited version, only :attr:`n_prev` points
are maintained. The order in which points are replaced is
specified by :attr:`replace_method` which may have one of the
following values:
`'furthest'`:
Replace the point that is furthest from `x`, thus keeping
points close to the solution.
`'largest'`:
Replace the point that has the largest $\norm{g}_{2}$, thus
keeping points close to the solution.
`'oldest'`:
Replace the oldest point, thus keeping the most recent
points.
Examples
--------
>>> np.random.seed(0)
>>> B = JinvBroyden()
>>> x0 = np.random.random(3)
>>> g0 = 2*x0
>>> B.add_point(x=x0, g=g0)
0
>>> B.add_point(x=2*x0, g=2*g0)
1
Adding a duplicate `x` value will just change `g`:
>>> len(B.xs)
2
>>> B.add_point(x=x0, g=g0/2)
0
>>> len(B.xs)
2
>>> np.allclose(B.gs[0], g0/2)
True
If adding the point would make more than :attr:`n_prev`
points, then either the `'furthest'` point from `x` or the point
with `'largest'` `g` is replaced, as specified by the flag
:attr:`replace_method`.
>>> B.n_prev = 2
>>> B.replace_method = 'furthest'
>>> B.add_point(x=3*x0, g=0*g0)
0
>>> B.replace_method = 'largest'
>>> B.add_point(x=3.1*x0, g=1*g0)
1
To reproduce the original Broyden behaviour, one should use
the `'oldest'` :attr:`replace_method`:
>>> B = JinvBroyden(replace_method='oldest', n_prev=3)
>>> for n in xrange(5):
... B.add_point(x=n*x0, g=n*g0), B._age
(0, array([0]))
(1, array([1, 0]))
(2, array([2, 1, 0]))
(0, array([0, 2, 1]))
(1, array([1, 0, 2]))
Here is an example of the original Broyden behaviour (these
results were precomputed using original Broyden routines):
>>> def g(x): return np.array([np.linalg.norm(x), x.sum()])
>>> B = JinvBroyden(dyadic=False, method='J', n_prev=2,
... x_method='recent', replace_method='oldest',
... inv_method='svd')
>>> x = np.array([1, 2])
>>> for n in xrange(12):
... incompat = B.set_point(x=x, g=g(x))
... x = B.get_x()
... print x
[-1.23606798 -1. ]
[-2.53373794 0.82504005]
[ 4.85954425 -4.07499765]
[-10.4372347 11.47369596]
[ 14.52239974 -14.63948627]
[-78.27732015 77.51003951]
[ 35.85727402 -35.8634207 ]
[ 132.21082218 -132.33155288]
[ 0.00993506 -0.01682379]
[-0.00230879 0.00360778]
[-0.00615491 0.0056663 ]
[-0.00027107 -0.00017255]
>>> B = JinvBroyden(dyadic=False, method='B', n_prev=2,
... x_method='recent', replace_method='oldest',
... inv_method='svd')
>>> x = np.array([1, 2])
>>> for n in xrange(12):
... incompat = B.set_point(x=x, g=g(x))
... x = B.get_x()
... print x
[-1.23606798 -1. ]
[-2.21588124 0.37800712]
[-1.74692251 0.06694124]
[-1.03087007 0.59532897]
[ 0.73055565 -0.48593282]
[ 2.35041873 -2.10818778]
[-0.27073117 0.2984435 ]
[-0.64849785 0.64808622]
[-0.0555911 0.06113293]
[-0.00685355 0.00753659]
[ 2.20253269e-07 -2.42442632e-07]
[ 5.12344665e-07 -5.17238905e-07]
>>> B = JinvBroyden(dyadic=False, method='J', n_prev=3,
... x_method='recent',
... inv_method='svd')
>>> x = np.array([1, 2])
>>> for n in xrange(5):
... incompat = B.set_point(x=x, g=g(x))
... x = B.get_x()
... print x
[-1.23606798 -1. ]
[-2.53373794 0.82504005]
[ 2.53310799 -2.53310799]
[ 15.48393424 -15.48393424]
[ ...e-12 ...e-12]
"""
# Special case of empty xs
if self.xs is None:
self.xs = 1.0*x[None, :]
self.gs = 1.0*g[None, :]
self._age = np.array([0])
return 0
# Increase ages:
self._age += 1
# Special case of duplicate point
ind = (x == self.xs).all(axis=1)
if ind.any():
# Don't add duplicate points, but change g and reset age.
ind = np.where(ind)[0][0]
self.gs[ind] = g
self._age[ind] = 0
return ind
# Regular case
if len(self.xs) < self.n_prev:
self.xs = np.vstack([self.xs, x])
self.gs = np.vstack([self.gs, g])
ind = len(self.xs) - 1
self._age = np.append(self._age, 0)
else:
if self.replace_method == 'furthest':
dx = x - self.xs
ind = np.argmax((dx*dx).sum(axis=1))
del dx
elif self.replace_method == 'largest':
ind = np.argmax((self.gs*self.gs).sum(axis=1))
elif self.replace_method == 'oldest':
ind = np.argmax(self._age)
else:
raise ValueError(
"Unsupported replace_method = %s" %
(self.replace_method,))
self.xs[ind] = x
self.gs[ind] = g
self._age[ind] = 0
return ind
def update(self, x0=None, g0=None, j_inv=None,
skip_inds=None, include_skip=False, mags=1, warn=False):
r"""Update the inverse Jacobian so that it is valid at the
point `(x0, g0)` (use :attr:`x` and :attr:`g` by default). Does not
necessarily add `x0` and `g0` (use :meth:`add_point`) or set the current
point (use :meth:`set_point`) but may do either or both depending on the
algorithm.
The update method is governed by :attr:`method` which may be
one of the following:
'J' :
Update to minimize the Jacobian and related residues. This
corresponds to Broyden's "good" algorithm.
'B' :
Update to minimize the inverse Jacobian and related
residues. This corresponds to Broyden's "bad" algorithm.
Parameters
----------
x0, g0 : array
New function value `g0 = G(x0)`.
skip_inds : None, array, optional
This is an array of the indices of sensitive parameters.
The change in these indices will be ignored. This is
important if there are "sensitive" or "noisy" parameters
that should only be trusted once a certain degree of
convergence has been achieved.
include_skip : bool
If `True` and `skip_inds` is provided, then cross-terms are
ignored, but both the main and skip blocks are updated.
mags : array, optional
This should be set to the vector of typical magnitudes for
`G`. One option is `abs(G0) + abs(G1)`. Assumed to be `1`
if not provided.
Notes
-----
The inverse Jacobian is updated so that
.. math::
\mat{J}^{-1}\ket{\d{G}} = \ket{\d{x}}.
There are two senses in which the update can be considered as
minimal controlled by the flag :attr:`minimize_inverse`. If
this is `True`, we minimize :math:`\norm{J^{-1}_0 -
J^{-1}_1}_{\text{Frobenius}}`:
.. math::
J^{-1} \rightarrow
J^{-1} + \frac{(\ket{\d{x}} - J^{-1}\ket{\d{G}})\bra{\d{G}}}
{\braket{\d{G}|\d{G}}},
otherwise we minimize :math:`\norm{J_0 - J_1}_{\text{Frobenius}}`:
.. math::
J \rightarrow
J + \frac{(\ket{\d{G}} - J\ket{\d{x}})\bra{\d{x}}}
{\braket{\d{x}|\d{x}}}.
This can be analytically inverted to give an update for the inverse:
.. math::
J^{-1} \rightarrow J^{-1}
+ \frac{\ket{\d{x}} - J^{-1}\ket{\d{G}}\bra{\d{x}}J^{-1}}
{\braket{\d{x}|J^{-1}|\d{G}}}.
Any update that satisfies the secant condition can be expressed as:
.. math::
J^{-1} \rightarrow J^{-1}
+ \frac{\ket{\d{x}} - J^{-1}\ket{\d{G}}\bra{v}}
{\braket{v|\d{G}}}.
as long as :math:`\braket{v|\d{G}}` is not zero. The two
conditions correspond to different choices of either
:math:`\bra{v}=\bra{\d{G}}` to minimize the Jacobian norm
(Broyden's "second" method) or
:math:`\bra{v}=\bra{\d{x}}J^{-1}` to minimize the inverse
Jacobian norm (Broyden's "second" method). Numerical Recipies
has the latter but I am not sure it is better... It is more
complicated.
Examples
--------
>>> B = JinvBroyden()
>>> B.set_point([1, 2], [1, 2])
0.0
>>> B.j_inv
1
>>> B.set_point([3, 4], [3, 4])
0.0
This should change g but not add a new x
>>> B.set_point([1, 2], [4, 5]) # doctest: +ELLIPSIS
1.06...
>>> B.xs
array([[ 1., 2.],
[ 3., 4.]])
>>> B.gs
array([[ 4., 5.],
[ 3., 4.]])
"""
if x0 is None:
x0 = self.x
g0 = self.g
self._x0 = None
self._g0 = None
else:
x0 = self._x0 = np.asarray(x0).ravel()
g0 = self._g0 = np.asarray(g0).ravel()
j_inv = self.j_inv
if j_inv is 1:
j_inv = np.eye(len(x0), dtype=float)
#if skip_inds is not None:
# skip_inds = np.asarray(skip_inds, dtype=int)
# if include_skip:
# dx_skip = 1*dx
# dg_skip = 1*dg
# dg[skip_inds] = 0 # Upper block
# dx[skip_inds] = 0
# dx_dgs = [(dx, dg)]
# if include_skip:
# dx_skip -= dx # Skip block
# dg_skip -= dg
# dx_dgs.append((dx_skip, dg_skip))
#else:
# dx_dgs = [(dx, dg)]
# Add points
if self.xs is None:
# Initial update. Nothing to do.
return
x0 = x0[None, :]
g0 = g0[None, :]
dx = self.xs - x0
dg = self.gs - g0
dx_norms = (dx*dx).sum(axis=1)/(x0*x0).sum(axis=1)
is_ = np.where(dx_norms >= self._dx_norm2_min)[0]
if 0 == len(is_):
# Do nothing.
return
#raise ValueError(
# "No points to update! " +
# "max(dx_norms) = %g < _dx_norm2_min = %g" %
# (dx_norms.max(), self._dx_norm2_min))
dx = dx[is_, :]
dg = dg[is_, :]
dx_norms = dx_norms[is_]
if self.check_roundoff:
# Check calculation for roundoff errors.
x_err = _EPS/np.sqrt(dx_norms + _TINY)
dg_norms = (dg*dg).sum(axis=1)/(g0*g0).sum(axis=1)
g_err = _EPS/np.sqrt(dg_norms + _TINY)
if (x_err > self.roundoff_cut).any() and warn:
print("Warning: %i roundoff errors in dx" %
(x_err > self.roundoff_cut).sum())
if (g_err > self.roundoff_cut).any() and warn:
print("Warning: %i roundoff errors in dg" %
(g_err > self.roundoff_cut).sum())
is_ = np.where(np.logical_and(x_err <= self.roundoff_cut,
g_err <= self.roundoff_cut))[0]
dx = dx[is_, :]
dg = dg[is_, :]
# B -> B + ket_a*bra_b
ket_a, bra_b = self._inversion(dx=dx, dg=dg, j_inv=j_inv)
self.j_inv = j_inv
self.add_dyad(at=ket_a, b=bra_b)
def set_point(self, x, g, **kw):
r"""Add `x` and `g` and make this the current point `(x, g)` and update
the Jacobian to be valid here.
Return a measure of the incompatibility of the new data with
the current approximation. This need not "add" the point to
the current data set (the functionality provided by
:meth:`add_point`).
Parameters
----------
x, g : array
New function value `g = G(x)`.
**kw :
See :meth:`update`.
As a measure of the significance of this correction, we return
the following dimensionless quantity:
.. math::
\chi_G = \frac{\norm{(\mat{J}^{-1} - \mat{J}_{0}^{-1})\d{G}}}
{\max_{i=0, 1}\norm{\mat{J}^{-1}_{i}\d{G}}}
Examples
--------
>>> B = JinvBroyden()
>>> B.set_point([1, 2], [1, 2])
0.0
>>> B.j_inv
1
>>> B.set_point([3, 4], [3, 4])
0.0
This should change g but not add a new x
>>> B.set_point([1, 2], [4, 5]) # doctest: +ELLIPSIS
1.06...
>>> B.xs
array([[ 1., 2.],
[ 3., 4.]])
>>> B.gs
array([[ 4., 5.],
[ 3., 4.]])
"""
x0 = self.x
g0 = self.g
x = np.asarray(x).ravel()
g = np.asarray(g).ravel()
self.curr = self.add_point(x, g)
self.update(**kw)
return self.incompatibility(x, g, x0, g0)
def add_dyad(self, at, b):
r"""Add the dyad $\ket{a}\bra{b}$ to :attr:`j_inv`.
Parameters
----------
at : array
1-d array or 2-d `(N, n)` array $\ket{a}$.
b : array
1-d array or 2-d `(n, N)` array $\ket{b}$.
"""
at = np.asarray(at)
b = np.asarray(b)
if 1 == len(at.shape):
at = at[:, None]
if 1 == len(b.shape):
b = b[None, :]
if self.dyadic:
self.j_inv.add_dyad(at=at, b=b)
else:
self.j_inv += dot(at, b)
def get_x(self, x_controls=None, min_step=_EPS, max_step=np.inf):
r"""Return `x`: the Broyden step to get to the root of `G(x)`.
This is the downhill Newton step from the point specified by
:attr:`curr` that satisfies the constraints `x_controls`.
Parameters
----------
x_controls : (array, array), optional
Tuple `(inds, vals)` indices and the respective values of the
control parameters.
min_step : float, optional
Minimum step length for testing. Used to give some
non-zero step `dx` of length `sqrt(min_step) even if
`j_inv` is singular. This may be random. A warning will
be raised.
max_step : float, optional
Maximum length of the step.
Examples
--------
Here is a little test with a linear function
>>> j = np.array([[1., 2, 3], [1, -3, 2], [-2, 1, 4]])
>>> j_inv = JinvBroyden(j_inv=np.linalg.inv(j))
>>> sol = np.array([1, 2, 3])
>>> def G(x):
... return np.dot(j, x - sol)
>>> x0 = np.array([0.0, 0.0, 0.0])
>>> x = np.array([1.0, 2.0, 3.0])
Now try to step to the solution, but hold the second variable as
a control at 1. Since the function is exactly linear and we
have explicitly given the correct Jacobian, we should end up
with a root in the other two parameters.
>>> j_inv.set_point(x=x, g=G(x))
0.0
>>> x = j_inv.get_x(x_controls=([1], [1.0]))
>>> x
array([ 1.5, 1. , 3.5])
>>> np.allclose(G(x), [0., 4.5, 0.])
True
This needs to work with very large arrays, so here is an
example. Here we use an example where the vectors have length
$2^{20}$ and we have $2^{11} = 2048$ controls. Implemented poorly,
the arrays would be too large to be indexed.
>>> np.random.seed(0)
>>> n_x = 2**20
>>> n_d = 2**11
>>> a = np.random.random(n_x)
>>> b = np.random.random(n_x)
>>> g = np.random.random(n_x)
>>> x = np.zeros(n_x)
>>> inds = np.unique(np.random.randint(0, n_x, size=n_d))
>>> n_d = len(inds)
>>> vals = np.random.rand(n_d)
>>> x_controls = (inds, vals)
>>> j_inv = JinvBroyden(dyadic=True)
>>> j_inv.j_inv.add_dyad(a, b)
>>> j_inv.set_point(x=a, g=b)
0.0
>>> x = j_inv.get_x(x_controls=x_controls)
>>> np.allclose(x[inds], vals)
True
"""
if self.curr is None or 0 == len(self.xs):
raise ValueError(
"No `xs` stored: call update first.")
x = self.xs
g = self.gs
x = self.xs[self.curr]
g = self.gs[self.curr]
j_inv = self.j_inv
if j_inv is 1 or self.dyadic and 0 == len(self.j_inv.at):
dx = -g
if x_controls is not None and 0 < len(x_controls[0]):
dx[x_controls[0]] = x_controls[1] - x[x_controls[0]]
else:
if x_controls is not None and 0 < len(x_controls[0]):
inds, vals = x_controls
inds = np.asarray(inds, dtype=int)
vals = np.asarray(vals)
dg_dmu = copy.copy(g)
n_d = len(inds)
n_x = len(x)
d = j_inv[inds[:, None], inds[None, :]]
dg_dmu[inds] = x[inds] - vals
dmu = _ket(dg_dmu[inds])
b = (np.asarray(j_inv*_ket(dg_dmu)))[inds, :]
dN = dmu - np.linalg.solve(d, b - dmu)
dg_dN = dg_dmu
dg_dN[inds] = dN[:]
dx_dmu = np.asarray(j_inv*_ket(dg_dN))
assert np.allclose(dx_dmu[inds], dmu)
# Set these to make sure no round-off errors have
# accumulated.
dx_dmu[inds].flat = x[inds] - vals
dx = -dx_dmu
else:
dx = -np.asarray(j_inv*_ket(g))
dx = np.asarray(dx).ravel()
g_mag = np.linalg.norm(g)
dx_mag = np.linalg.norm(dx)
if dx_mag == 0 and g_mag > 0:
warnings.warn("Warning: Singular Jinv, taking random step")
dx = np.random.rand(len(dx))*np.sqrt(min_step)
elif np.divide(dx_mag, g_mag) < self.min_j_inv:
warnings.warn("Warning: Singular Jinv, taking minimum step")
dx *= math.sqrt(min_step)/np.linalg.norm(dx)
dx_len = math.sqrt((dx*dx).sum())
if max_step < dx_len:
dx *= max_step/dx_len
return x + dx
def reset(self):
r"""Reset the Jacobian but keep :attr:`xs` and :attr:`gs`.
See Also
--------
:meth:`reset_all`
"""
if self.dyadic:
self.j_inv.reset()
else:
self.j_inv = 1
def reset_all(self):
r"""Reset both the points :attr:`xs`, :attr:`gs`, and the
Jacobian.
See Also
--------
:meth:`reset`
"""
self.suspend()
self.xs = None
self.gs = None
self.curr = None
self.reset()
self.resume()
@property
def J(self):
r"""Current approximation of Jacobian."""
if self.dyadic:
j_inv = self._j_inv.asarray()
else:
j_inv = self._j_inv
return np.linalg.inv(j_inv)
def incompatibility(self, x, g, x0, g0, skip_inds=None, complement=False):
r"""Return a dimensionless measure of the compatibility of the
specified step with the current Jacobian. An incompatibility of
0 means perfectly compatible (this is what should happen if
the function is linear and the Jacobian is correct along the
specified step).
There are several options here as specified by
:attr:`incompatibility_measure`.
`dg` : This option considers the change in the inverse derivative
the direction `dg`
.. math::
\chi_G = \frac{\norm{(\mat{J}^{-1} - \mat{J}_{0}^{-1})\d{G}}}
{\max_{i=0, 1}\norm{\mat{J}^{-1}_{i}\d{G}}}
`sigma` : This option is valid with the :attr:`dyadic` option
and only considers the incompatibility in the space spanned
by the dyads.
Examples
--------
>>> def g(x):
... return x*x + x*np.sin(x.sum())
>>> j = JinvBroyden()
>>> x0 = np.zeros(10)
>>> g0 = g(x0)
>>> j.set_point(x=x0, g=g0)
0.0
>>> x1 = np.ones(10)
>>> g1 = g(x1)
>>> j.incompatibility(x=x1, g=g1, x0=x0, g0=g0) # doctest: +ELLIPSIS
0.377...
This does not modify the Jacobian as can be noted by the fact
that update returns the same incompatibility
>>> j.set_point(x=x1, g=g1) # doctest: +ELLIPSIS
0.377...
After an update, the step should be fully compatibility:
>>> j.incompatibility(x=x1, g=g1, x0=x0, g0=g0) < 1e-12
True
"""
if self.xs is None:
# If there are no other points, then consider the update
# compatible
return 0
dx = x[None, :] - self.xs
dg = g[None, :] - self.gs
if skip_inds is not None:
skip_inds = np.asarray(skip_inds)
if complement:
dx_ = 1*dx
dg_ = 1*dg
dx_[skip_inds] = 0
dg_[skip_inds] = 0
dx -= dx_
dg -= dg_
else:
dx[skip_inds] = 0
dg[skip_inds] = 0
ng = np.linalg.norm(dg)
if 0 == ng:
res = np.linalg.norm(dx)/(ng + _TINY)
elif 'dg' == self.incompatibility_measure:
nx = np.linalg.norm(dx)
dg = dg/(ng + _TINY)
dx = dx/(ng + _TINY)
dx1 = self.__rmul__(dg)
if skip_inds is not None:
if complement:
dx1_ = 1*dx1
dx1_[skip_inds] = 0
dx1 -= dx1_
else:
dx1[skip_inds] = 0
res = (np.linalg.norm(dx - dx1)/
(max(np.linalg.norm(dx1), nx) + _TINY))
elif 'sigma' == self.incompatibility_measure and self.dyadic:
raise NotImplementedError(
"incompatibility_measure '%s' not implemented." %
(self.incompatibility_measure, ))
else:
raise NotImplementedError(
"incompatibility_measure '%s' not implemented." %
(self.incompatibility_measure, ))
return res
def change_basis(self, t):
r"""Use the functions `x_new = t(x)` and `g_new = t(g)` to
update the data corresponding to a change of basis. Raise an
:exc:`AttributeError` if the change cannot be implemented (or
don't define this method.)
Parameters
----------
t : function
Change of basis transformation (should be essentially
matrix multiplication, but for large bases one might want
to prevent forming this matrix explicitly.)
Notes
-----
Let the function be represented by the set of
points :math:`f^{n}_i = f(x^{n}_i)` on the smaller set of
abscissa and :math:`f^{N}_j = f(x^{N}_j)` on the larger set.
The transformation is of the form:
.. math:: f^N_i = T_{ij}f^{n}_{j}
We update the inverse Jacobian in the following way:
.. math:: J^{-1}_N = \mat{1} + T(J^{-1}_n - \mat{1})T^{T}
Several other options have been considered, but in practise
this has proved to be not only the easiest to implement, but
also the most efficient. If we do not add and subtract the
identity, then the resultant inverse Jacobian will be rank
deficient and thus the resulting steps will never explore the
singular directions which is highly undesirable. One can also
think of this as directly transforming the inverse Jacobian
related to the fixed-point problem :math:`f \rightarrow f -
G[f]`.
Also, if using the dyadic representation, then this procedure
only requires applying the linear interpolation operator to
the components of the dyads: Transforming the identity portion
would change the form requiring a significantly more
complicated representation.
We have also explored using :math:`T^{-1}` in place of
:math:`T^T`, but this has several problems: 1) Computing
:math:`T^{-1}` can be expensive if the basis is large, 2)
Since :math:`T` is rank deficient, :math:`T^{-1}` is not
unique, but must be chosen from one of the pseudo-inverses and
it is not clear how to best do this, 3) For the diadic
representation, we would like to work directly with the
interpolation operator on both components of the dyads, so
using the transpose is much easier than having to solve the
inverse problem.
If :math:`T^T\cdot T \approx \mat{1}`, then the interpretation
of the transformation is this: We have built up the Jacobian
inverse as minimal updates from the identity using the
displacements :math:`\ket{\d{x}}` and :math:`\ket{\d{G}}`.
The transformed inverse Jacobian using the above proceedure is
what would have been obtain by minimal updates from the
identity using the transformed displacements
:math:`T\ket{\d{x}}` and :math:`T\ket{\d{G}}`. Thus, the
transformed Jacobian inverse should have the same amount of
information as the original, and should in general reduce the
number of operations required to form a good approximation by
the rank of :math:`J^{-1} - \mat{1}`.
It may be beneficial to choose the interpolation operator such
that :math:`T^T\cdot T = \mat{1}` over the smaller basis. We
have not investigated this, but standard interpolation
operators seem sufficiently close to the identity that this
extra complication is probably not justified.
The code contains additional update methods for testing
purposes to allow these results to be verified (except for the
dyadic representation which can only be easily transformed
using this method). Here we present results for a few
problems.
>>> from scipy.interpolate import *
>>> from mmf.objects import *
>>> def G(f, x, normalize=True):
... c1 = np.cosh(1)
... dx = np.diff(x[1:])
... ddf = np.diff(np.diff(f))/dx/dx
... a = 1
... if normalize:
... a = 1 + 2./dx/dx
... G = np.hstack([[f[0] - c1],
... (f[1:-1] - ddf)/a,
... [f[-1] - c1]])
... return G
>>> def test(N1, N2, normalize, meth, k=3, tol=1e-14):
... '''Return `(i1, i2, iN, gain)`. `i1` is the number of
... steps at low resolution. `i2` is the subsequent
... number of steps required to polish the
... high-resolution solution. `iN` is the number of
... steps at only the high resolution, and `gain = i1+i2
... - iN` is the gain in speed. (Negative is good)'''
... x1 = np.linspace(-1, 1, N1)
... x2 = np.linspace(-1, 1, N2)
... #def t(f):
... # 'The interpolation function'
... # return splev(x2, splrep(x1, f, s=0, k=k))
... t = lambda f: splev(x2, splrep(x1, f, s=0, k=k))
... f = 0*x1 # Initial guess
... g = G(f, x1, normalize)
... j_inv = JinvBroyden(_j_inv_transform=meth)
... j_inv.set_point(x=f, g=g)
... err = np.max(abs(g))
... i1 = 1
... while err > tol:
... # Find solution over x1 abscissa
... f0, g0 = f, g
... f = j_inv.get_x()
... g = G(f, x1, normalize)
... err = np.max(abs(g))
... j_inv.set_point(x=f, g=g)
... i1 += 1
... j_inv.change_basis(t) # Change basis
... f1 = t(f)
... g1 = G(f1, x2, normalize)
... j_inv.set_point(x=f1, g=g1)
... err = np.max(abs(g1))
... i2 = 1
... while err > tol:
... # Find solution over x2 abscissa
... f0, g0 = f1, g1
... f1 = j_inv.get_x()
... g1 = G(f1, x2, normalize)
... err = np.max(abs(g1))
... j_inv.set_point(x=f1, g=g1)
... i2 += 1
... # Now try only at high resolution.
... f1 = 0*f1
... g1 = G(f1, x2, normalize)
... j_inv = JinvBroyden(_j_inv_transform=meth)
... j_inv.set_point(x=f1, g=g1)
... err = np.max(abs(g1))
... i3 = 1
... while err > tol:
... # Find solution over x2 abscissa
... f0, g0 = f1, g1
... f1 = j_inv.get_x()
... g1 = G(f1, x2, normalize)
... err = np.max(abs(g1))
... j_inv.set_point(x=f1, g=g1)
... i3 += 1
... return i1, i2, i3, (i1+i2) - i3
>>> def tabulate(N1, N2, k=3, normalize=True, tol=1e-10):
... meth = Container()
... for meth.inv_l in [0, 1]:
... for meth.inv_r in [0, 1]:
... for meth.set_diag in [0, 1]:
... res = test(N1, N2, normalize,
... k=k, tol=tol,
... meth=dict(meth.items()))
... print repr(meth), res
>>> tabulate(10, 50, k=1, tol=1e-12)
Container(set_diag=0, inv_r=0, inv_l=0) (7, 28, 29, 6)
Container(set_diag=1, inv_r=0, inv_l=0) (7, 28, 29, 6)
Container(set_diag=0, inv_r=1, inv_l=0) (7, 28, 29, 6)
Container(set_diag=1, inv_r=1, inv_l=0) (7, 39, 29, 17)
Container(set_diag=0, inv_r=0, inv_l=1) (7, 29, 29, 7)
Container(set_diag=1, inv_r=0, inv_l=1) (7, 40, 29, 18)
Container(set_diag=0, inv_r=1, inv_l=1) (7, 29, 29, 7)
Container(set_diag=1, inv_r=1, inv_l=1) (7, 41, 29, 19)
Container(set_diag=0, inv_r=0, inv_l=0) (7, 28, 29, 6)
Container(set_diag=1, inv_r=0, inv_l=0) (7, 28, 29, 6)
Container(set_diag=0, inv_r=1, inv_l=0) (7, 28, 29, 6)
Container(set_diag=1, inv_r=1, inv_l=0) (7, 40, 29, 18)
Container(set_diag=0, inv_r=0, inv_l=1) (7, 29, 29, 7)
Container(set_diag=1, inv_r=0, inv_l=1) (7, 40, 29, 18)
Container(set_diag=0, inv_r=1, inv_l=1) (7, 29, 29, 7)
Container(set_diag=1, inv_r=1, inv_l=1) (7, 41, 29, 19)
Container(set_diag=0, inv_r=0, inv_l=0) (7, 28, 29, 6)
Container(set_diag=1, inv_r=0, inv_l=0) (7, 28, 29, 6)
Container(set_diag=0, inv_r=1, inv_l=0) (7, 28, 29, 6)
Container(set_diag=1, inv_r=1, inv_l=0) (7, 42, 29, 20)
Container(set_diag=0, inv_r=0, inv_l=1) (7, 29, 29, 7)
Container(set_diag=1, inv_r=0, inv_l=1) (7, 42, 29, 20)
Container(set_diag=0, inv_r=1, inv_l=1) (7, 29, 29, 7)
Container(set_diag=1, inv_r=1, inv_l=1) (7, 43, 29, 21)
Container(set_diag=0, inv_r=0, inv_l=0) (7, 30, 30, 7)
Container(set_diag=1, inv_r=0, inv_l=0) (7, 30, 30, 7)
Container(set_diag=0, inv_r=1, inv_l=0) (7, 31, 30, 8)
Container(set_diag=1, inv_r=1, inv_l=0) (7, 38, 30, 15)
Container(set_diag=0, inv_r=0, inv_l=1) (7, 32, 30, 9)
Container(set_diag=1, inv_r=0, inv_l=1) (7, 40, 30, 17)
Container(set_diag=0, inv_r=1, inv_l=1) (7, 31, 30, 8)
Container(set_diag=1, inv_r=1, inv_l=1) (7, 41, 30, 18)
The results are as follows: First the number of iterations
performed on the small system. Second is the number of iterations
performed on the system with the interpolation applied to the
Jacobian. The sum of these indicates how many total
iterations were required to converge in the two-stage process.
Third is the number of iterations required to converge to the
final solution using only the large system. The last number
is the improvement of the two-stage method (negative is good).
Here are the old results with the old Broyden method::
Container(set_diag=0, inv_r=0, inv_l=0) (9, 66, 109, -34)
Container(set_diag=1, inv_r=0, inv_l=0) (9, 68, 109, -32)
Container(set_diag=0, inv_r=1, inv_l=0) (9, 95, 109, -5)
Container(set_diag=1, inv_r=1, inv_l=0) (9, 70, 109, -30)
Container(set_diag=0, inv_r=0, inv_l=1) (9, 99, 109, -1)
Container(set_diag=1, inv_r=0, inv_l=1) (9, 96, 109, -4)
Container(set_diag=0, inv_r=1, inv_l=1) (9, 77, 109, -23)
Container(set_diag=1, inv_r=1, inv_l=1) (9, 71, 109, -29)
"""
self.suspend()
j_inv = self.j_inv
# We won't transform the points and function values because
# the function is not linear, so this would make the values
# inaccurate. Instead, we reset these.
self.xs = None
self.gs = None
# Now apply transform to the Jacobian.
if j_inv is 1:
return
if self.dyadic:
j_inv.apply_transform(t)
else:
if not hasattr(self, '_j_inv_transform'):
# One could also apply t directly to the columns and
# then rows of j_inv - I but I don't think this will be
# any faster.
N, N = j_inv.shape
I = np.eye(N)
t_mat = np.mat(np.vstack([t(I[n, :])
for n in xrange(N)]).T)
M, N = t_mat.shape
j_inv = np.eye(M) + t_mat*(j_inv - I)*t_mat.T
else:
d = np.diag(j_inv)
N, N = j_inv.shape
I = np.eye(N)
t_mat = np.mat(np.vstack([t(I[n, :])
for n in xrange(N)]).T)
ti_mat = np.linalg.pinv(t_mat)
if self._j_inv_transform.get('inv_l', False):
tl = ti_mat.T
else:
tl = t_mat
if self._j_inv_transform.get('inv_r', False):
tr = ti_mat
else:
tr = t_mat.T
if self._j_inv_transform.get('direct', False):
j_inv = tl*j_inv*tr
else:
M, N = t_mat.shape
j_inv = np.eye(M) + tl*(j_inv - I)*tr
if self._j_inv_transform.get('set_diag', False):
m = np.arange(j_inv.shape[0])
j_inv[m, m] = t(d)
self.j_inv = j_inv
self.resume()
def _ket(x):
"""Return `x` as a ket (column) vector."""
return np.mat(x.ravel()).T
def _bra(x):
"""Return `x` as a bra (row) vector."""
return np.mat(x.ravel())
def _add_dyad(J, a, b):
"""Add the dyad `|a><b|` to the matrix `J` inplace and return
`J`."""
try:
J.add_dyad(a, b)
except AttributeError:
J += _ket(a)*_bra(b)
# class Jinv(StateVars):
# r"""Class to represent an inverse Jacobian."""
# implements(IJinv)
# _state_vars = [
# ('dyadic', False,
# r"""If `True`, then use a dyadic representation for Jinv. If
# this is a number greater than 1, then this number is
# passed as :attr:`DyadicSum.n_max` limiting the rank of the
# dyadic representation."""),
# ('j_inv', 1, "Internal representation of matrix"),
# ('robust', False,
# """If `True`, then use the singular value decomposition to
# analyse the Jacobian. Much slower, but could prevent some
# numerical problems."""),
# ('max_j_inv', 100.0,
# """If :attr:`robust`, then the inverse Jacobian is clipped
# so that the maximum value is this."""),
# ('min_j_inv', 0.0001,
# """The inverse Jacobian can become singular. If this is
# the case, then a step might have very small length even if
# the error is not small (the step size can be reduced by a
# factor of the minimum singular value of the inverse
# Jacobian). If the step reduction is less than this value,
# then the step is scaled up to have length
# :attr:`min_step`."""),
# ('minimize_inverse', False,
# """If `True`, then use the update the minimizes the change
# in the Frobenius norm of the inverse Jacobian, otherwise
# we use the Sherman-Morrison formula to update so as to
# minimize the norm of the Jacobian."""),
# ('incompatibility_measure', 'dg',
# r"""Specifies which criterion to use for measured the
# compatibility of an update with the present inverse
# Jacobian. See :attr:`incompatibility` for supported
# methods."""),
# ('_j_inv_transform', NotImplemented,
# """Optional methods for updating the Jacobian inverse in
# :meth:`change_basis`. For testing only."""),
# ]
# process_vars()
# def __init__(self, *v, **kw):
# # pylint: disable-msg=W0231
# if self.dyadic:
# n_max = np.inf
# if self.dyadic > 1:
# n_max = self.dyadic
# if self.j_inv is 1:
# self.j_inv = DyadicSum(n_max=n_max)
# elif isinstance(self.j_inv, DyadicSum):
# self.j_inv.n_max = n_max
# else:
# raise NotImplementedError(
# "No support yet for initializing a" +
# " DyadicSum from an array.")
# def __mul__(self, dg):
# return self.j_inv.__mul__(dg)
# def __rmul__(self, dxt):
# return self.j_inv.__rmul__(dxt)
# def reset(self, m0=NotImplemented):
# self.j_inv.reset(m0=m0)
# def change_basis(self, t):
# r"""Use the function `x_new = t(x)` to update the data
# corresponding to a change of basis. Raise an
# :exc:`AttributeError` if the change cannot be implemented (or
# don't define this method.)
# Parameters
# ----------
# t : function
# Change of basis transformation (should be essentially
# matrix multiplication, but for large bases one might want
# to prevent forming this matrix explicitly.)
# Notes
# -----
# Let the function be represented by the set of
# points :math:`f^{n}_i = f(x^{n}_i)` on the smaller set of
# abscissa and :math:`f^{N}_j = f(x^{N}_j)` on the larger set.
# The transformation is of the form:
# .. math:: f^N_i = T_{ij}f^{n}_{j}
# We update the inverse Jacobian in the following way:
# .. math:: J^{-1}_N = \mat{1} + T(J^{-1}_n - \mat{1})T^{T}
# Several other options have been considered, but in practise
# this has proved to be not only the easiest to implement, but
# also the most efficient. If we do not add and subtract the
# identity, then the resultant inverse Jacobian will be rank
# deficient and thus the resulting steps will never explore the
# singular directions which is highly undesirable. One can also
# think of this as directly transforming the inverse Jacobian
# related to the fixed-point problem :math:`f \rightarrow f -
# G[f]`.
# Also, if using the dyadic representation, then this procedure
# only requires applying the linear interpolation operator to
# the components of the dyads: Transforming the identity portion
# would change the form requiring a significantly more
# complicated representation.
# We have also explored using :math:`T^{-1}` in place of
# :math:`T^T`, but this has several problems: 1) Computing
# :math:`T^{-1}` can be expensive if the basis is large, 2)
# Since :math:`T` is rank deficient, :math:`T^{-1}` is not
# unique, but must be chosen from one of the pseudo-inverses and
# it is not clear how to best do this, 3) For the diadic
# representation, we would like to work directly with the
# interpolation operator on both components of the dyads, so
# using the transpose is much easier than having to solve the
# inverse problem.
# If :math:`T^T\cdot T \approx \mat{1}`, then the interpretation
# of the transformation is this: We have built up the Jacobian
# inverse as minimal updates from the identity using the
# displacements :math:`\ket{\d{x}}` and :math:`\ket{\d{G}}`.
# The transformed inverse Jacobian using the above proceedure is
# what would have been obtain by minimal updates from the
# identity using the transformed displacements
# :math:`T\ket{\d{x}}` and :math:`T\ket{\d{G}}`. Thus, the
# transformed Jacobian inverse should have the same amount of
# information as the original, and should in general reduce the
# number of operations required to form a good approximation by
# the rank of :math:`J^{-1} - \mat{1}`.
# It may be beneficial to choose the interpolation operator such
# that :math:`T^T\cdot T = \mat{1}` over the smaller basis. We
# have not investigated this, but standard interpolation
# operators seem sufficiently close to the identity that this
# extra complication is probably not justified.
# The code contains additional update methods for testing
# purposes to allow these results to be verified (except for the
# dyadic representation which can only be easily transformed
# using this method). Here we present results for a few
# problems.
# >>> from scipy.interpolate import *
# >>> def G(f, x, normalize=True):
# ... c1 = np.cosh(1)
# ... dx = np.diff(x[1:])
# ... ddf = np.diff(np.diff(f))/dx/dx
# ... a = 1
# ... if normalize:
# ... a = 1 + 2./dx/dx
# ... G = np.hstack([[f[0] - c1],
# ... (f[1:-1] - ddf)/a,
# ... [f[-1] - c1]])
# ... return G
# >>> def test(N1, N2, normalize, meth, k=3, tol=1e-14):
# ... '''Return `(i1, i2, iN, gain)`. `i1` is the number of
# ... steps at low resolution. `i2` is the subsequent
# ... number of steps required to polish the
# ... high-resolution solution. `iN` is the number of
# ... steps at only the high resolution, and `gain = i1+i2
# ... - iN` is the gain in speed. (Negative is good)'''
# ... x1 = np.linspace(-1, 1, N1)
# ... x2 = np.linspace(-1, 1, N2)
# ... #def t(f):
# ... # 'The interpolation function'
# ... # return splev(x2, splrep(x1, f, s=0, k=k))
# ... t = lambda f: splev(x2, splrep(x1, f, s=0, k=k))
# ... f = 0*x1 # Initial guess
# ... g = G(f, x1, normalize)
# ... j_inv = Jinv(_j_inv_transform=meth)
# ... err = np.max(abs(g))
# ... i1 = 1
# ... while err > tol:
# ... # Find solution over x1 abscissa
# ... f0, g0 = f, g
# ... f = f + j_inv.get_dx(g0)
# ... g = G(f, x1, normalize)
# ... err = np.max(abs(g))
# ... j_inv.update(dg=g - g0, dx=f - f0)
# ... i1 += 1
# ... j_inv.change_basis(t) # Change basis
# ... f1 = t(f)
# ... g1 = G(f1, x2, normalize)
# ... err = np.max(abs(g1))
# ... i2 = 1
# ... while err > tol:
# ... # Find solution over x2 abscissa
# ... f0, g0 = f1, g1
# ... f1 = f1 + j_inv.get_dx(g0)
# ... g1 = G(f1, x2, normalize)
# ... err = np.max(abs(g1))
# ... j_inv.update(dx=f1 - f0, dg=g1 - g0)
# ... i2 += 1
# ... # Now try only at high resolution.
# ... f1 = 0*f1
# ... g1 = G(f1, x2, normalize)
# ... j_inv = Jinv(_j_inv_transform=meth)
# ... err = np.max(abs(g1))
# ... i3 = 1
# ... while err > tol:
# ... # Find solution over x2 abscissa
# ... f0, g0 = f1, g1
# ... f1 = f1 + j_inv.get_dx(g0)
# ... g1 = G(f1, x2, normalize)
# ... err = np.max(abs(g1))
# ... j_inv.update(dx=f1 - f0, dg=g1 - g0)
# ... i3 += 1
# ... return i1, i2, i3, (i1+i2) - i3
# >>> def tabulate(N1, N2, k=3, normalize=True, tol=1e-10):
# ... meth = Container()
# ... for meth.inv_l in [0, 1]:
# ... for meth.inv_r in [0, 1]:
# ... for meth.set_diag in [0, 1]:
# ... res = test(N1, N2, normalize,
# ... k=k, tol=tol,
# ... meth=dict(meth.items()))
# ... print repr(meth), res
# >>> tabulate(10, 50, k=1, tol=1e-15)
# Container(set_diag=0, inv_r=0, inv_l=0) (9, 66, 109, -34)
# Container(set_diag=1, inv_r=0, inv_l=0) (9, 68, 109, -32)
# Container(set_diag=0, inv_r=1, inv_l=0) (9, 95, 109, -5)
# Container(set_diag=1, inv_r=1, inv_l=0) (9, 70, 109, -30)
# Container(set_diag=0, inv_r=0, inv_l=1) (9, 99, 109, -1)
# Container(set_diag=1, inv_r=0, inv_l=1) (9, 96, 109, -4)
# Container(set_diag=0, inv_r=1, inv_l=1) (9, 77, 109, -23)
# Container(set_diag=1, inv_r=1, inv_l=1) (9, 71, 109, -29)
# """
# self.suspend()
# j_inv = self.j_inv
# if j_inv is 1:
# return
# if self.dyadic:
# j_inv.apply_transform(t)
# else:
# if not hasattr(self, '_j_inv_transform'):
# # One could also apply t directly to the columns and
# # then rows of j_inv - I but I don't think this will be
# # any faster.
# N, N = j_inv.shape
# I = np.eye(N)
# t_mat = np.mat(np.vstack([t(I[n, :])
# for n in xrange(N)]).T)
# M, N = t_mat.shape
# j_inv = np.eye(M) + t_mat*(j_inv - I)*t_mat.T
# else:
# d = np.diag(j_inv)
# N, N = j_inv.shape
# I = np.eye(N)
# t_mat = np.mat(np.vstack([t(I[n, :])
# for n in xrange(N)]).T)
# ti_mat = np.linalg.pinv(t_mat)
# if self._j_inv_transform.get('inv_l', False):
# tl = ti_mat.T
# else:
# tl = t_mat
# if self._j_inv_transform.get('inv_r', False):
# tr = ti_mat
# else:
# tr = t_mat.T
# if self._j_inv_transform.get('direct', False):
# j_inv = tl*j_inv*tr
# else:
# M, N = t_mat.shape
# j_inv = np.eye(M) + tl*(j_inv - I)*tr
# if self._j_inv_transform.get('set_diag', False):
# m = np.arange(j_inv.shape[0])
# j_inv[m, m] = t(d)
# self.j_inv = j_inv
# self.resume()
# @property
# def J(self):
# r"""Current approximation of Jacobian."""
# if self.dyadic:
# j_inv = self._j_inv.asarray()
# else:
# j_inv = self._j_inv
# return np.linalg.inv(j_inv)
# def incompatibility(self, dx, dg, skip_inds=None, complement=False):
# r"""Return a dimensionless measure of the compatibility of the
# specified step with the current Jacobian. An incompatibility of
# 0 means perfectly compatible (this is what should happen if
# the function is linear and the Jacobian is correct).
# There are several options here as specified by
# :attr:`incompatibility_measure`.
# `dg` : This option considers the change in the inverse derivative
# the direction `dg`
# .. math::
# \chi_G = \frac{\norm{(\mat{J}^{-1} - \mat{J}_{0}^{-1})\d{G}}}
# {\max_{i=0, 1}\norm{\mat{J}^{-1}_{i}\d{G}}}
# `sigma` : This option is valid with the :attr:`dyadic` option
# and only considers the incompatibility in the space spanned
# by the dyads.
# Examples
# --------
# >>> def g(x):
# ... return x*x + x*np.sin(x.sum())
# >>> j = Jinv()
# >>> x0 = np.zeros(10)
# >>> g0 = g(x0)
# >>> dx = np.ones(10) - x0
# >>> dg = g(x0 + dx) - g0
# >>> j.incompatibility(dx, dg) # doctest: +ELLIPSIS
# 0.377...
# This does not modify the Jacobian as can be noted by the fact
# that update returns the same incompatibility
# >>> j.update(dx=dx, dg=dg) # doctest: +ELLIPSIS
# 0.377...
# After an update, the step should be fully compatibility:
# >>> j.incompatibility(dx, dg) < 1e-12
# True
# """
# if skip_inds is not None:
# skip_inds = np.asarray(skip_inds)
# if complement:
# dx_ = 1*dx
# dg_ = 1*dg
# dx_[skip_inds] = 0
# dg_[skip_inds] = 0
# dx -= dx_
# dg -= dg_
# else:
# dx[skip_inds] = 0
# dg[skip_inds] = 0
# ng = np.linalg.norm(dg)
# if 0 == ng:
# res = np.linalg.norm(dx)/(ng + _TINY)
# elif 'dg' == self.incompatibility_measure:
# nx = np.linalg.norm(dx)
# dg = dg/(ng + _TINY)
# dx = dx/(ng + _TINY)
# dx1 = np.asarray(self.j_inv*_ket(dg)).ravel()
# if skip_inds is not None:
# if complement:
# dx1_ = 1*dx1
# dx1_[skip_inds] = 0
# dx1 -= dx1_
# else:
# dx1[skip_inds] = 0
# res = (np.linalg.norm(dx - dx1)/
# (max(np.linalg.norm(dx1), nx) + _TINY))
# elif 'sigma' == self.incompatibility_measure and self.dyadic:
# raise NotImplementedError(
# "incompatibility_measure '%s' not implemented." %
# (self.incompatibility_measure, ))
# else:
# raise NotImplementedError(
# "incompatibility_measure '%s' not implemented." %
# (self.incompatibility_measure, ))
# return res
# def update(self, dx, dg, j_inv=None, skip_inds=None,
# include_skip=False, mags=1):
# r"""Update based on differences `dg` and `dx`. Return the
# incompatibility value returned by :meth:`incompatibility`
# Parameters
# ----------
# dg : array
# Change `dg = G(x1) - G(x0)`.
# dx : array
# Change `dx = x1 - x0`
# j_inv : matrix, optional
# If this is provided, then it is used to set the Jacobian
# (implementing Newton's method), otherwise the standard
# Broyden update is used.
# skip_inds : None, array, optional
# This is an array of the indices of sensitive parameters.
# The change in these indices will be ignored. This is
# important if there are "sensitive" or "noisy" parameters
# that should only be trusted once a certain degree of
# convergence has been achieved.
# include_skip : bool
# If `True` and `skip_inds` is provided, then cross-terms are
# ignored, but both the main and skip blocks are updated.
# mags : array, optional
# This should be set to the vector of typical magnitudes for
# `G`. One option is `abs(G0) + abs(G1)`. Assumed to be `1`
# if not provided.
# Notes
# -----
# The inverse Jacobian is updated so that
# .. math::
# \mat{J}^{-1}\ket{\d{G}} = \ket{\d{x}}.
# There are two senses in which the update can be considered as
# minimal controlled by the flag :attr:`minimize_inverse`. If
# this is `True`, we minimize :math:`\norm{J^{-1}_0 -
# J^{-1}_1}_{\text{Frobenius}}`:
# .. math::
# J^{-1} \rightarrow
# J^{-1} + \frac{(\ket{\d{x}} - J^{-1}\ket{\d{G}})\bra{\d{G}}}
# {\braket{\d{G}|\d{G}}},
# otherwise we minimize :math:`\norm{J_0 - J_1}_{\text{Frobenius}}`:
# .. math::
# J \rightarrow
# J + \frac{(\ket{\d{G}} - J\ket{\d{x}})\bra{\d{x}}}
# {\braket{\d{x}|\d{x}}}.
# This can be analytically inverted to give an update for the inverse:
# .. math::
# J^{-1} \rightarrow J^{-1}
# + \frac{\ket{\d{x}} - J^{-1}\ket{\d{G}}\bra{\d{x}}J^{-1}}
# {\braket{\d{x}|J^{-1}|\d{G}}}.
# Any update that satisfies the secant condition can be expressed as:
# .. math::
# J^{-1} \rightarrow J^{-1}
# + \frac{\ket{\d{x}} - J^{-1}\ket{\d{G}}\bra{v}}
# {\braket{v|\d{G}}}.
# as long as :math:`\braket{v|\d{G}}` is not zero. The two
# conditions correspond to different choices of either
# :math:`\bra{v}=\bra{\d{G}}` to minimize the Jacobian norm
# (Broyden's "second" method) or
# :math:`\bra{v}=\bra{\d{x}}J^{-1}` to minimize the inverse
# Jacobian norm (Broyden's "second" method). Numerical Recipies
# has the latter but I am not sure it is better... It is more
# complicated.
# As a measure of the significance of this correction, we return
# the following dimensionless quantity:
# .. math::
# \chi_G = \frac{\norm{(\mat{J}^{-1} - \mat{J}_{0}^{-1})\d{G}}}
# {\max_{i=0, 1}\norm{\mat{J}^{-1}_{i}\d{G}}}
# """
# dg = np.asarray(dg).ravel()
# dx = np.asarray(dx).ravel()
# incompatibility = self.incompatibility(dx=dx, dg=dg)
# j_inv = self.j_inv
# if j_inv is 1:
# j_inv = np.eye(len(dx), dtype=float)
# if skip_inds is not None:
# skip_inds = np.asarray(skip_inds, dtype=int)
# if include_skip:
# dx_skip = 1*dx
# dg_skip = 1*dg
# dg[skip_inds] = 0 # Upper block
# dx[skip_inds] = 0
# dx_dgs = [(dx, dg)]
# if include_skip:
# dx_skip -= dx # Skip block
# dg_skip -= dg
# dx_dgs.append((dx_skip, dg_skip))
# else:
# dx_dgs = [(dx, dg)]
# for dx, dg in dx_dgs:
# if np.linalg.norm(dx) <= 0 or np.linalg.norm(dg) <= 0:
# # It could happen, such as when changing resolutions, that
# # this is called without actually generating a new step,
# # so we just pass here.
# return 0.0
# w = np.asarray(_ket(dx) - j_inv*_ket(dg)).ravel()
# # Zero out noisy components
# w = np.where(abs(w) > _EPS*mags, w, 0.0)
# if (w != 0).any():
# if self.minimize_inverse:
# vt = _bra(dg)
# else:
# vt = j_inv.__rmul__(_bra(dx))
# dot_prod = np.dot(vt, _ket(dg))
# dx_norm = np.linalg.norm(dx)
# dg_norm = np.linalg.norm(dg)
# reduction = abs(dot_prod/dx_norm/dg_norm)
# import pdb;pdb.set_trace()
# if reduction < self.min_j_inv:
# # Use other step
# vt = _bra(dg)
# t = np.dot(vt, dg)
# w = w/t
# _add_dyad(j_inv, w, vt)
# if self.robust:
# U, d, Vinv = np.linalg.svd(j_inv)
# print np.hstack((abs(np.dot(U.T, _ket(abs(G)))), _ket(d)))
# if self.max_j_inv < d.max():
# warnings.warn("Warning: Clipping Jacobian: %f"%d.max())
# d = np.where(d<self.max_j_inv, d, self.max_j_inv)
# j_inv = U*mat(np.diag(d))*Vinv
# self.j_inv = j_inv
# return incompatibility
# def get_dx(self, g, x=None, x_controls=None,
# min_step=_EPS, max_step=np.inf):
# r"""Return `dx`: the Broyden step to get to the root of `G(x)`.
# Parameters
# ----------
# g : array
# The step is determined from `dx = -j_inv*g`.
# x : array, optional
# If `x_controls` is provided, then `x` must also be provided to
# determine the offset of the controlled parameters.
# x_controls : (array, array), optional
# Tuple `(inds, vals)` indices and the respective values of the
# control parameters.
# min_step : float, optional
# Minimum step length for testing. Used to give some
# non-zero step `dx` of length `sqrt(min_step) even if
# `j_inv` is singular. This may be random. A warning will
# be raised.
# max_step : float, optional
# Maximum length of the step.
# Examples
# --------
# Here is a little test with a linear function
# >>> j = np.array([[1., 2, 3], [1, -3, 2], [-2, 1, 4]])
# >>> j_inv = Jinv(j_inv=np.linalg.inv(j))
# >>> sol = np.array([1, 2, 3])
# >>> def G(x):
# ... return np.dot(j, x - sol)
# >>> x0 = np.array([0.0, 0.0, 0.0])
# >>> x = np.array([1.0, 2.0, 3.0])
# Now try to step to the solution, but hold the second variable as
# a control at 1. Since the function is exactly linear and we
# have explicitly given the correct Jacobian, we should end up
# with a root in the other two parameters.
# >>> x = x + j_inv.get_dx(g=G(x), x=x, x_controls=([1], [1.0]))
# >>> x
# array([ 1.5, 1. , 3.5])
# >>> G(x)
# array([ 0. , 4.5, 0. ])
# This needs to work with very large arrays, so here is an
# example. Here we use an example where the vectors have length
# $2^{20}$ and we have $2^{11} = 2048$ controls. Implemented poorly,
# the arrays would be too large to be indexed.
# >>> np.random.seed(0)
# >>> n_x = 2**20
# >>> n_d = 2**11
# >>> a = np.random.rand(n_x)
# >>> b = np.random.rand(n_x)
# >>> g = np.random.rand(n_x)
# >>> x = np.zeros(n_x)
# >>> inds = np.unique1d(np.random.randint(0, n_x, size=n_d))
# >>> n_d = len(inds)
# >>> vals = np.random.rand(n_d)
# >>> x_controls = (inds, vals)
# >>> j_inv = Jinv(dyadic=True)
# >>> j_inv.j_inv.add_dyad(a, b)
# >>> dx = j_inv.get_dx(g=g, x=x, x_controls=x_controls)
# >>> np.allclose((x + dx)[inds], vals)
# True
# """
# j_inv = self.j_inv
# if j_inv is 1:
# dx = -g
# else:
# if x_controls is not None and 0 < len(x_controls[0]):
# inds, vals = x_controls
# inds = np.asarray(inds, dtype=int)
# vals = np.asarray(vals)
# dg_dmu = copy.copy(g)
# n_d = len(inds)
# n_x = len(x)
# d = j_inv[inds[:, None], inds[None, :]]
# dg_dmu[inds] = x[inds] - vals
# dmu = _ket(dg_dmu[inds])
# b = (j_inv*_ket(dg_dmu))[inds, :]
# dN = dmu - np.linalg.solve(d, b - dmu)
# dg_dN = dg_dmu
# dg_dN[inds] = dN[:]
# dx_dmu = j_inv*_ket(dg_dN)
# assert np.allclose(dx_dmu[inds], dmu)
# # Set these to make sure no round-off errors have
# # accumulated.
# dx_dmu[inds].flat = x[inds] - vals
# dx = -dx_dmu
# else:
# dx = -(j_inv*_ket(g))
# dx = np.asarray(dx).ravel()
# g_mag = np.linalg.norm(g)
# dx_mag = np.linalg.norm(dx)
# if dx_mag == 0 and g_mag > 0:
# warnings.warn("Warning: Singular Jinv, taking random step")
# dx = np.random.rand(len(dx))*np.sqrt(min_step)
# elif np.divide(dx_mag, g_mag) < self.min_j_inv:
# warnings.warn("Warning: Singular Jinv, taking minimum step")
# dx *= math.sqrt(min_step)/np.linalg.norm(dx)
# dx_len = math.sqrt(dot(dx, dx))
# if max_step < dx_len:
# dx *= max_step/dx_len
# return dx
# I am not yet satsified with a good standard way of providing
# options. I would like something that provides the user with an easy
# way to
# 1) See the current set of options (i.e. a pretty implementation of
# __repr__)
# 2) Maintain sub-options (like JacobianOptions below)
# 3) Provide documentation about the options through help.
# 4) Be efficient.
#
# I have tried to meet these criteria with the Options class, but am
# not sure it is the best solution yet.
from mmf.objects import option
class JacobianOptions(mmf.objects.Options):
r"""These are options the affect the Jacobian calculation.
"""
@option
def max_h():
r"""Maximum step size to use in Jacobian computations."""
return 1.0
class BroydenOptions(mmf.objects.Options):
r"""These are options that affect the Broyden iteration.
Jacobian : Options controlling the Jacobian calculation.
"""
@option
def Jacobian():
r"""Jacobian computation options"""
return JacobianOptions()
class Function:
r"""Wrapper for function objects to allow for a variable
tolerance. This allows for the broyden algorithm to adaptively
update the tolerance. The function may take advantage of a low
tolerance to perform a faster calculation.
>>> import math
>>> def f(x, abs_tol):
... "Return -ln(1+abs(x)) to specified tolerance"
... x = abs(x)
... A = 0.0
... n = 1
... while True:
... An = (-x)**n/n
... A += An
... n += 1
... if abs(An) < abs_tol:
... break
... return A
>>> f1 = Function(f, abs_tol=1e-2)
>>> round(f1(0.1)+math.log(1.0+0.1), 2)
0.0
>>> round(f1(0.1)+math.log(1.0+0.1), 4)
0.0003
>>> f1 = Function(f, abs_tol=1e-4)
>>> round(f1(0.1)+math.log(1.0+0.1), 4)
0.0
"""
def __init__(self, f,
fixed_tol=_EPS,
abs_tol=None,
return_tol=None):
r"""The following forms are accepted. The function f must be
callable as shown:
Function(f, abs_tol=tol) # ans = f(x, tol, ...)
Function(f, return_tol=tol) # (ans, err) = f(x, tol...)
Function(f, fixed_tol=tol) # ans = f(x, ...)
Function(f) # ans = f(x, ...)
The default form assumes f computes to machine precision.
>>> import math
>>> def tolToPlaces(tol):
... return int(math.ceil(abs(-math.log10(tol))))
>>> def f(x, tol): return round(x, tolToPlaces(tol))
>>> f1 = Function(f, abs_tol=0.01)
>>> f1(0.1234)
0.12
>>> def f(x): return x
>>> f1 = Function(f, fixed_tol=1e-16)
>>> f1(0.1234)
0.1234
>>> def f(x, tol):
... tol = max(1e-3, tol)
... return (round(x, tolToPlaces(tol)), tol)
>>> f1 = Function(f, return_tol=1e-16)
>>> f1(0.1234)
0.123
"""
self._f = f
if abs_tol is not None:
self.abs_tol = abs_tol
self._type = 'abs_tol'
elif return_tol is not None:
self.abs_tol = return_tol
self._type = 'return_tol'
elif fixed_tol is not None:
self.abs_tol = fixed_tol
self._type = 'fixed_tol'
else:
raise TypeError('Function type not supported')
def __call__(self, x, *vargin, **kwargs):
if 'abs_tol' is self._type:
return self._f(x, self.abs_tol, *vargin, **kwargs)
elif 'fixed_tol' is self._type:
return self._f(x, *vargin, **kwargs)
if 'return_tol' is self._type:
(ans, tol) = self._f(x, self.abs_tol, *vargin, **kwargs)
self.abs_tol = tol
return ans
else:
raise TypeError('Function type not supported')
def zeros(f, x0, f0=None, j0=None, f_tol=None, opts=None):
r"""Return dictionary of results containing solution to
multi-dimensional root problem f(x) = [0, 0, ...]
"""
results = {}
results['x'] = copy.copy(x0)
j = np.matrix([[1, 2, 3],
[-1, 2, -3],
[1, -2, -3]])
results['j'] = j
return results
def ComputeInitialJacobian():
r"""
if isempty(J0in) | ~all(size(J0in)==[NF, b.NX])
Broyden.J0 = np.eye(NF, b.NX);
else
# We do not consider the provided Jacobian as a restart because it
# might contain errors.
b.Restart = False
Broyden.J0 = J0in;
end
"""
def ComputeJacobian(f, x, y=None, opts=None):
r"""
Compute the Jacobian for function f(x) in a straightforward way
using finite differences: J = (f(x+h)-f(x))/h
"""
if opts is None:
opts = JacobianOptions()
if y is None: # Allow user to supply y=f(x)
y = f(x)
Nx = len(x)
Ny = len(y)
J = np.zeros((Ny, Nx), float)
h = math.sqrt(f.abs_tol)
if opts.max_h is not None:
h = min(h, opts.max_h)
for n in range(len(x)):
# Here is the trick for ensuring that x+h is accurate: we
# choose h so this is true.
h_n = ((x[n]+h)-x[n])
if x[n] < 0: # Make step in same direction as sign
h_n = -h_n # of x component .
x[n] += h_n
y_n = f(x)
x[n] -= h_n
J[:, n] = (y_n - y)/h_n
return J
r"""
# We choose the direction for the step based on the sign of the
# component of x. The step-size is determined by the parameter
xdir = ceil((sign(x)+1)./2).*2-1;
h = sqrt(f_tol).*max(abs(x), 1).*xdir;
if norm(h, 2) > opt.JacobianMaxh
h = h.*opt.JacobianMaxh./norm(h, 2);
end
# Here is the trick for ensuring that x+h is accurate: we
# choose h so this is true.
h = (x+h) - x;
for n = 1:MaxJacobianSteps
if (3 <= opt.Verbose)
disp([num2str(n) ' of ' num2str(MaxJacobianSteps)]);
end
if (3 <= opt.Verbose)
disp(['h = ' num2str(h(n))]);
end
t = x(n);
x(n) = x(n)+h(n);
<<Compute Residue, Fx and FXarg from x>>
Fn = Fx;
x(n) = t;
Broyden.J0(:, n) = (Fn-Broyden.F0)/h(n);
end
end
if (10 <= opt.Verbose)
disp('Singular Values of Jacobian');
svd(Broyden.J0)
end
else
if 2 <= opt.Verbose
if ((opt.MaxFeval <= 0) | (Flag.Fcnt < opt.MaxFeval-(b.NX+1)))
disp('Not enough function evals left to compute Jacobian');
elseif ((opt.MaxTime <= 0) |(EstJacobianTime < RemainingTime))
disp('Not enough time left to compute Jacobian');
end
end
end
# The restart flag indicates if the iteration has been restarted (or
# started) with a new Jacobian. This might help a stuck iteration.
b.Restart = True
"""
class BroydenData(StateVars):
r"""Represents the current state of a Broyden calculation."""
_state_vars = [
('x', None, \
"""Current value of `x`, i.e. the point where the current
inverse Jacobian approximation :attr:`Jinv` is
valid."""),
('G', None, "Current residual `G(x)`."),
('Jinv', None, \
"""Current inverse Jacobian approximation."""),
('dyadic', False, \
r"""If `True`, then use a dyadic representation for Jinv. If
this is a number greater than 1, then this number is
passed as :attr:`DyadicSum.n_max` limiting the rank of the
dyadic representation."""),
]
process_vars()
def __init__(self, *v, **kw):
if self.x is not None:
self.x = np.asarray(self.x, dtype=float).ravel()
if self.G is not None:
self.G = np.asarray(self.G, dtype=float).ravel()
if (self.x is not None
and self.G is not None
and self.Jinv is None):
if self.dyadic:
n_max = np.inf
if self.dyadic > 1:
n_max = self.dyadic
self.Jinv = DyadicSum(n_max=n_max)
else:
self.Jinv = np.mat(np.eye(len(self.x)))
class Broyden(StateVars):
r"""Implements a rudimentary Broyden root search search by keeping
track of the Jacobian an allowing for Broyden steps to be made.
This algorithm is used for solving the problem :math:`G(x) = 0`
using a quasi-Newton method:
.. math::
x \rightarrow x - J^{-1}G(x)
where :math:`J` is an approximation for the Jacobian :math:`J\equiv
\partial_{x}G(x)`. By default, one starts with the approximation
:math:`J=1`, which is equivalent to the step :math:`x \rightarrow
x - G(x)`. Thus, if one wishes to improve the convergence of a
reasonable fixed-point iteration scheme :math:`x\rightarrow F(x)`,
then a reasonable choice is :math:`G(x) = x - F(x)`.
Examples
--------
Here we present an example solving the problem and solution:
.. math::
x = x^2 \qquad x=1.
In this case, the Broyden algorithm is equivalent to the secant
algorithm with the first two points `x0` and `G(x0)`.
>>> def G(x):
... return x - x*x
>>> x0 = 2.0
>>> b = Broyden(x0=x0, G0=G(x0))
>>> b.x, b.J
(4.0, matrix([[ 1.]]))
>>> b.update(G(b.x)) # doctest: +ELLIPSIS
1.2...
>>> b.x, b.J # doctest: +ELLIPSIS
(1.6..., matrix([[-5.]]))
>>> b.update(G(b.x)) # doctest: +ELLIPSIS
0.08...
>>> b.x, b.J # doctest: +ELLIPSIS
(1.3913043478..., matrix([[-4.6]]))
>>> for n in xrange(8): b.update(G(b.x)) # doctest: +ELLIPSIS
0.56...
0.24...
0.23...
0.09...
0.02...
0.003...
...e-05
...e-07
>>> b.x, b.J
(1.0, matrix([[-1.]]))
Here is a two-dimensional problem:
.. math::
\begin{aligned}
x &= 2x^2, &
y &= y^2, &
x , y &= 0.5, 1.0
\end{aligned}
>>> def G(x):
... return [x[0] - x[0]*x[0]*2.0, x[1] - x[1]*x[1]]
>>> x0 = [2.0, 2.0]
>>> b = Broyden(x0=x0, G0=G(x0))
>>> for n in xrange(40):
... res = b.update(G(b.x))
>>> map(lambda r:round(r, 12), b.x)
[0.5, 1.0]
"""
_state_vars = [
('x0=broyden_data.x', None,
"""Current value of `x`, i.e. the point where the current
inverse Jacobian approximation :attr:`Jinv0` is
valid."""),
('G0=broyden_data.G', None, \
"Current function evaluation `G(x0)`."),
('Jinv0=broyden_data.Jinv', None, \
r"""Approximate Jacobian inverse at `x0`. (Defaults to
the identity)."""),
('max_step', np.inf, \
"""Maximium step length. Use to prevent a singular
Jacobian from shooting the solution far away."""),
('min_step', np.sqrt(_EPS), \
"""Minimum step length to take when the Jacobian is
singular. See :attr:`min_Jinv`. This should be a good
length for estimating the Jacobian through finite
differences, i.e. typically the square-root of the
precision to which the function is evaluate."""),
('step_factor', 1, \
"""A number between 0 and 1. The step taken is `x +
step_factor*dx`"""),
('robust', False, \
"""If `True`, then use the singular value decomposition to
analyse the Jacobian. Much slower, but could prevent some
numerical problems."""),
('max_Jinv', 100.0, \
"""If :attr:`robust`, then the inverse Jacobian is clipped
so that the maximum value is this."""),
('min_Jinv', 0.0001, \
"""The inverse Jacobian can become singular. If this is
the case, then a step might have very small length even if
the error is not small (the step size can be reduced by a
factor of the minimum singular value of the inverse
Jacobian). If the step reduction is less than this value,
then the step is scaled up to have length
:attr:`min_step`."""),
('minimize_inverse', False, \
"""If `True`, then use the update the minimizes the change
in the Frobenius norm of the inverse Jacobian, otherwise
we use the Sherman-Morrison formula to update so as to
minimize the norm of the Jacobian."""),
('x', Computed, "Next value of `x`."),
('_packing_info', (type(None), None), \
"Packing information to reshape `x`."),
('_max_step2', Computed, "Square of :attr:`max_step`"),
('broyden_data', Delegate(BroydenData, ['dyadic']), \
""":class:`BroydenData` instance representing the current
state."""),
('_Jinv_transform', NotImplemented, \
"""Optional methods for updating the Jacobian inverse in
:meth:`change_basis`. For testing only."""),
]
process_vars()
def __init__(self, *v, **kw):
self._max_step2 = self.max_step*self.max_step
if 'x0' in kw and kw['x0'] is not None:
self._define_packing_info(kw['x0'])
self.x0 = self._pack(self.x0)
N = self.x0.shape[0]
if 'G0' in kw and self.G0 is not None:
self.G0 = self._pack(self.G0)
if self.x0 is not None:
self._update_x(self.x0, self.G0)
def _define_packing_info(self, x):
r"""
Parameters
----------
x : array
Original representation.
Returns
-------
_packing_info : typle
(type(x), shape)
"""
shape = np.asarray(x).shape
type_ = type(x)
self._packing_info = (type_, shape)
def _pack(self, x):
r"""Return `x` in a flat representation.
Parameters
----------
x : array
Original representation.
Returns
-------
x : array
Flat representation of array.
"""
return np.asarray(x, dtype=float).ravel()
def _unpack(self, x):
r"""Return `x` in the original shape.
Parameters
----------
x : array
Flat representation of array.
Returns
-------
x : array
Reshaped representation.
"""
type_, shape = self._packing_info
if issubclass(type_, list):
return x.tolist()
elif issubclass(type_, np.matrix):
return np.asmatrix(x).reshape(shape)
elif issubclass(type_, np.ndarray):
return np.asarray(x).reshape(shape)
elif (issubclass(type_, float) or
issubclass(type_, int) or
issubclass(type_, complex)):
return np.asarray(x).ravel()[0]
else:
raise ValueError("Type %s not supported."%type_)
def change_basis(self, t):
r"""Use the function `x_new = t(x)` to update the data
corresponding to a change of basis. Raise an
:exc:`AttributeError` if the change cannot be implemented (or
don't define this method.)
Parameters
----------
t : function
Change of basis transformation (should be essentially
matrix multiplication, but for large bases one might want
to prevent forming this matrix explicitly.)
Notes
-----
Let the function be represented by the set of
points :math:`f^{n}_i = f(x^{n}_i)` on the smaller set of
abscissa and :math:`f^{N}_j = f(x^{N}_j)` on the larger set.
The transformation is of the form:
.. math:: f^N_i = T_{ij}f^{n}_{j}
We update the inverse Jacobian in the following way:
.. math:: J^{-1}_N = \mat{1} + T(J^{-1}_n - \mat{1})T^{T}
Several other options have been considered, but in practise
this has proved to be not only the easiest to implement, but
also the most efficient. If we do not add and subtract the
identity, then the resultant inverse Jacobian will be rank
deficient and thus the resulting steps will never explore the
singular directions which is highly undesirable. One can also
think of this as directly transforming the inverse Jacobian
related to the fixed-point problem :math:`f \rightarrow f -
G[f]`.
Also, if using the dyadic representation, then this procedure
only requires applying the linear interpolation operator to
the components of the dyads: Transforming the identity portion
would change the form requiring a significantly more
complicated representation.
We have also explored using :math:`T^{-1}` in place of
:math:`T^T`, but this has several problems: 1) Computing
:math:`T^{-1}` can be expensive if the basis is large, 2)
Since :math:`T` is rank deficient, :math:`T^{-1}` is not
unique, but must be chosen from one of the pseudo-inverses and
it is not clear how to best do this, 3) For the diadic
representation, we would like to work directly with the
interpolation operator on both components of the dyads, so
using the transpose is much easier than having to solve the
inverse problem.
If :math:`T^T\cdot T \approx \mat{1}`, then the interpretation
of the transformation is this: We have built up the Jacobian
inverse as minimal updates from the identity using the
displacements :math:`\ket{\d{x}}` and :math:`\ket{\d{G}}`.
The transformed inverse Jacobian using the above proceedure is
what would have been obtain by minimal updates from the
identity using the transformed displacements
:math:`T\ket{\d{x}}` and :math:`T\ket{\d{G}}`. Thus, the
transformed Jacobian inverse should have the same amount of
information as the original, and should in general reduce the
number of operations required to form a good approximation by
the rank of :math:`J^{-1} - \mat{1}`.
It may be beneficial to choose the interpolation operator such
that :math:`T^T\cdot T = \mat{1}` over the smaller basis. We
have not investigated this, but standard interpolation
operators seem sufficiently close to the identity that this
extra complication is probably not justified.
The code contains additional update methods for testing
purposes to allow these results to be verified (except for the
dyadic representation which can only be easily transformed
using this method). Here we present results for a few
problems.
>>> from scipy.interpolate import *
>>> from mmf.objects import *
>>> def G(f, x, normalize=True):
... c1 = np.cosh(1)
... dx = np.diff(x[1:])
... ddf = np.diff(np.diff(f))/dx/dx
... a = 1
... if normalize:
... a = 1 + 2./dx/dx
... G = np.hstack([[f[0] - c1],
... (f[1:-1] - ddf)/a,
... [f[-1] - c1]])
... return G
>>> def test(N1, N2, normalize, meth, k=3, tol=1e-14):
... '''Return `(i1, i2, iN, gain)`. `i1` is the number of
... steps at low resolution. `i2` is the subsequent
... number of steps required to polish the
... high-resolution solution. `iN` is the number of
... steps at only the high resolution, and `gain = i1+i2
... - iN` is the gain in speed. (Negative is good)'''
... x1 = np.linspace(-1, 1, N1)
... x2 = np.linspace(-1, 1, N2)
... #def t(f):
... # 'The interpolation function'
... # return splev(x2, splrep(x1, f, s=0, k=k))
... t = lambda f: splev(x2, splrep(x1, f, s=0, k=k))
... f = 0*x1 # Initial guess
... g = G(f, x1, normalize)
... b = Broyden(x0=f, G0=g, _Jinv_transform=meth)
... err = np.max(abs(g))
... i1 = 1
... while err > tol:
... # Find solution over x1 abscissa
... f = b.x
... g = G(f, x1, normalize)
... err = np.max(abs(g))
... b.update(x=f, g=g)
... i1 += 1
... b.change_basis(t) # Change basis
... f1 = b.x
... g = G(f1, x2, normalize)
... err = np.max(abs(g))
... i2 = 1
... while err > tol:
... # Find solution over x2 abscissa
... f = b.x
... g = G(f, x2, normalize)
... err = np.max(abs(g))
... b.update(x=f, g=g)
... i2 += 1
... # Now try only at high resolution.
... f1 = 0*f1
... g = G(f1, x2, normalize)
... b = Broyden(x0=f1, G0=g, _Jinv_transform=meth)
... err = np.max(abs(g))
... i3 = 1
... while err > tol:
... # Find solution over x2 abscissa
... f1 = b.x
... g1 = G(f1, x2, normalize)
... err = np.max(abs(g1))
... b.update(x=f1, g=g1)
... i3 += 1
... return i1, i2, i3, (i1+i2) - i3
>>> def tabulate(N1, N2, k=3, normalize=True, tol=1e-10):
... meth = Container()
... for meth.inv_l in [0, 1]:
... for meth.inv_r in [0, 1]:
... for meth.set_diag in [0, 1]:
... res = test(N1, N2, normalize,
... k=k, tol=tol,
... meth=dict(meth.items()))
... print repr(meth), res
>>> tabulate(10, 50, k=1, tol=1e-15)
Container(set_diag=0, inv_r=0, inv_l=0) (9, 67, 111, -35)
Container(set_diag=1, inv_r=0, inv_l=0) (9, 77, 111, -25)
Container(set_diag=0, inv_r=1, inv_l=0) (9, 90, 111, -12)
Container(set_diag=1, inv_r=1, inv_l=0) (9, 68, 111, -34)
Container(set_diag=0, inv_r=0, inv_l=1) (9, 78, 111, -24)
Container(set_diag=1, inv_r=0, inv_l=1) (9, 71, 111, -31)
Container(set_diag=0, inv_r=1, inv_l=1) (9, 107, 111, 5)
Container(set_diag=1, inv_r=1, inv_l=1) (9, 73, 111, -29)
Container(set_diag=0, inv_r=0, inv_l=0) (9, 64, 105, -32)
Container(set_diag=1, inv_r=0, inv_l=0) (9, 82, 105, -14)
Container(set_diag=0, inv_r=1, inv_l=0) (9, 101, 105, 5)
Container(set_diag=1, inv_r=1, inv_l=0) (9, 68, 105, -28)
Container(set_diag=0, inv_r=0, inv_l=1) (9, 65, 105, -31)
Container(set_diag=1, inv_r=0, inv_l=1) (9, 69, 105, -27)
Container(set_diag=0, inv_r=1, inv_l=1) (9, 96, 105, 0)
Container(set_diag=1, inv_r=1, inv_l=1) (9, 75, 105, -21)
Here are some old results::
Container(set_diag=0, inv_r=0, inv_l=0) (9, 64, 105, -32)
Container(set_diag=1, inv_r=0, inv_l=0) (9, 82, 105, -14)
Container(set_diag=0, inv_r=1, inv_l=0) (9, 87, 105, -9)
Container(set_diag=1, inv_r=1, inv_l=0) (9, 70, 105, -26)
Container(set_diag=0, inv_r=0, inv_l=1) (9, 90, 105, -6)
Container(set_diag=1, inv_r=0, inv_l=1) (9, 68, 105, -28)
Container(set_diag=0, inv_r=1, inv_l=1) (9, 100, 105, 4)
Container(set_diag=1, inv_r=1, inv_l=1) (9, 75, 105, -21)
Container(set_diag=0, inv_r=0, inv_l=0) (9, 73, 134, -52)
Container(set_diag=1, inv_r=0, inv_l=0) (9, 77, 134, -48)
Container(set_diag=0, inv_r=1, inv_l=0) (9, 88, 134, -37)
Container(set_diag=1, inv_r=1, inv_l=0) (9, 71, 134, -54)
Container(set_diag=0, inv_r=0, inv_l=1) (9, 86, 134, -39)
Container(set_diag=1, inv_r=0, inv_l=1) (9, 71, 134, -54)
Container(set_diag=0, inv_r=1, inv_l=1) (9, 94, 134, -31)
Container(set_diag=1, inv_r=1, inv_l=1) (9, 70, 134, -55)
"""
self.suspend()
if self.x0 is not None:
self.x0 = t(self.x0)
self.G0 = None
Jinv = self.Jinv0
if Jinv is None:
return
if self.dyadic:
Jinv.apply_transform(t)
else:
if not hasattr(self, '_Jinv_transform'):
# One could also apply t directly to the columns and
# then rows of Jinv - I but I don't think this will be
# any faster.
N, N = Jinv.shape
I = np.eye(N)
t_mat = np.mat(np.vstack([t(I[n, :])
for n in xrange(N)]).T)
M, N = t_mat.shape
Jinv = np.eye(M) + t_mat*(Jinv - I)*t_mat.T
else:
d = np.diag(Jinv)
N, N = Jinv.shape
I = np.eye(N)
t_mat = np.mat(np.vstack([t(I[n, :])
for n in xrange(N)]).T)
ti_mat = np.linalg.pinv(t_mat)
if self._Jinv_transform.get('inv_l', False):
tl = ti_mat.T
else:
tl = t_mat
if self._Jinv_transform.get('inv_r', False):
tr = ti_mat
else:
tr = t_mat.T
if self._Jinv_transform.get('direct', False):
Jinv = tl*Jinv*tr
else:
M, N = t_mat.shape
Jinv = np.eye(M) + tl*(Jinv - I)*tr
if self._Jinv_transform.get('set_diag', False):
m = np.arange(Jinv.shape[0])
Jinv[m, m] = t(d)
self.Jinv0 = Jinv
self.resume()
@property
def J(self):
r"""Current approximation of Jacobian."""
return np.linalg.inv(self.Jinv0)
def update(self, g, x=None, Jinv=None, skip_inds=None):
r"""Update based on new value of `g(x)`. Changes the current
position (the new Jacobian should be valid at `x`). Return a
dimensionless measure of the change in Jinv along the
specified direction.
Use the property `x` to access the next value of `x`.
Parameters
----------
g : array
New value of `G(x)`.
x : array, optional
Value of abscissa `x` at which `g` is evaluated.
If `x` is `None`, then assume that `x == self.x`.
Jinv : matrix, optional
If this is provided, then it is used to set the Jacobian
(implementing Newton's method), otherwise the standard
Broyden update is used.
skip_inds : None, array, optional
This is an array of the indices of sensitive parameters.
The change in these indices will be ignored. This is
important if there are "sensitive" or "noisy" parameters
that should only be trusted once a certain degree of
convergence has been achieved.
Notes
-----
The inverse Jacobian is updated so that
.. math::
\mat{J}^{-1}\ket{\d{G}} = \ket{\d{x}}
where the changes are with respect to :attr:`x0` and
:attr:`G0` respectively.
As a measure of the significance of this correction, we return
the following dimensionless quantity:
.. math::
\chi_G &= \frac{\norm{\vect{x}_2 - \vect{x}_1}}
{\norm{\vect{x}_2 - \vect{x}_0}}, \\
&= \frac{\norm{\left(\mat{J}^{-1}_{1}
- \mat{J}^{-1}_{0}\right)\cdot\uvect{G}}}
{\max_{n=0, 1}\norm{\mat{J}^{-1}_{n}\cdot\uvect{G}}}
Another option might be
.. math::
\frac{\norm{(\mat{J}^{-1} - \mat{J}_{0}^{-1})\d{G}}}
{\mat{J}^{-1}\norm{\d{G}}}
"""
if self.x0 is None or self.G0 is None:
# Initial call. Just store values an initialize
self.suspend()
self.x0 = copy.copy(x)
self.G0 = g
self.resume()
return 1.0
if x is None:
x = self.x
x = self._pack(x)
g = self._pack(g)
if Jinv is None:
res = self.update_J(x=x, g=g, skip_inds=skip_inds)
else:
# Compute measure of change in Jinv
G0_ = _ket(self.G0)/(np.linalg.norm(self.G0) + _TINY)
J0G0 = self.Jinv0*G0_
JG0 = Jinv*G0_
res = np.linalg.norm(JG0 - J0G0)/(
max(np.linalg.norm(JG0), np.linalg.norm(J0G0)) + _TINY)
self.Jinv0 = Jinv
#self.update_J(x=x, g=g)
self.broyden_data.x[:] = x
self.broyden_data.G[:] = g
return res
def get_x(self, x_controls=None, x0=None, G0=None):
"""Return the new step subject to the constraints specified in
`x_controls`.
Parameters
----------
x0, G0 : array, optional
If these are provided, the new step is relative to these
points, otherwise :attr:`x0` and :attr:`G0` are used.
x_controls : (array, array), optional
If this is provided, then the specified elements are used
as control parameters when determining the step.
`x_controls` should be a pair `(inds, vals)` of indices
into the array `x` and values such that the step
should preserve ``x[ind] = c*vals + (1-c)*x0[ins]`` for
some ``0 < c =< 1``. (I.e. this is a convex extrapolation
from `x0` to `x`).
If the solver contains information about the function `G`
such as gradient information, then the extrapolated state
`x_new` will incorporate this information extrapolating the
step in such a way as to satisfy the controls.
"""
if x_controls is None and x0 is None and G0 is None:
return self.x
else:
if x0 is None:
x0 = self.x0
if G0 is None:
G0 = self.G0
dx = self._get_dx(x0, G0, x_controls)
return self._unpack(x0 + dx)
def _update_x(self, x, g, x_controls=None):
"""Update the new step :attr:`x`."""
dx = self._get_dx(x, g, x_controls)
self.__dict__['x'] = self._unpack(x + dx)
def update_J(self, x, g, skip_inds=None):
r"""Update the inverse Jacobian to include information about
`(x, J)` but without changing the present location (use
:meth:`update` to set the new position too`.) Return a
dimensionless measure of the change in Jinv along the
specified direction.
Parameters
----------
x, g : array
These are the values at the new point that should be used
to update the inverse Jacobian `g = G(x)`.
skip_inds : None, array, optional
This is an array of the indices of sensitive parameters.
The change in these indices will be ignored. This is
important if there are "sensitive" or "noisy" parameters
that should only be trusted once a certain degree of
convergence has been achieved.
Notes
-----
At each step, we update the Jacobian in a minimal sense so
as to maintain the secant condition :math:`\ket{\d{x}} =
J^{-1}_1\cdot\ket{\d{G}}`.
There are two senses in which the update can be considered as
minimal controlled by the flag :attr:`minimize_inverse`. If
this is `True`, we minimize :math:`\norm{J^{-1}_0 -
J^{-1}_1}_{\text{Frobenius}}`:
.. math::
J^{-1} \rightarrow
J^{-1} + \frac{(\ket{\d{x}} - J^{-1}\ket{\d{G}})\bra{\d{G}}}
{\braket{\d{G}|\d{G}}},
otherwise we minimize :math:`\norm{J_0 - J_1}_{\text{Frobenius}}`:
.. math::
J \rightarrow
J + \frac{(\ket{\d{G}} - J\ket{\d{x}})\bra{\d{x}}}
{\braket{\d{x}|\d{x}}}.
This can be analytically inverted to give an update for the inverse:
.. math::
J^{-1} \rightarrow J^{-1}
+ \frac{\ket{\d{x}} - J^{-1}\ket{\d{G}}\bra{\d{x}}J^{-1}}
{\braket{\d{x}|J^{-1}|\d{G}}}.
Any update that satisfies the secant condition can be expressed as:
.. math::
J^{-1} \rightarrow J^{-1}
+ \frac{\ket{\d{x}} - J^{-1}\ket{\d{G}}\bra{v}}
{\braket{v|\d{G}}}.
as long as :math:`\braket{v|\d{G}}` is not zero. The two
conditions correspond to different choices of either
:math:`\bra{v}=\bra{\d{G}}` to minimize the Jacobian norm
(Broyden's "second" method) or
:math:`\bra{v}=\bra{\d{x}}J^{-1}` to minimize the inverse
Jacobian norm (Broyden's "second" method). Numerical Recipies
has the latter but I am not sure it is better... It is more
complicated.
As a measure of the significance of this correction, we return
the following dimensionless quantity:
.. math::
\chi_G &= \frac{\norm{\vect{x}_2 - \vect{x}_1}}
{\norm{\vect{x}_2 - \vect{x}_0}}, \\
&= \frac{\norm{\left(\mat{J}^{-1}_{1}
- \mat{J}^{-1}_{0}\right)\cdot\uvect{G}}}
{\max_{n=0, 1}\norm{\mat{J}^{-1}_{n}\cdot\uvect{G}}}
Another option might be
.. math::
\frac{\norm{(\mat{J}^{-1} - \mat{J}_{0}^{-1})\d{G}}}
{\mat{J}^{-1}\norm{\d{G}}}
"""
G0 = self.broyden_data.G
x0 = self.broyden_data.x
Jinv0 = self.broyden_data.Jinv
if x0 is None or G0 is None:
# Initial call. Just store values an initialize
self.suspend()
self.x0 = copy.copy(x)
self.G0 = g
self.resume()
return 1.0
x = self._pack(x)
g = self._pack(g)
dg = g - G0
dx = x - x0
# Unit vector and change for later measuring change in Jinv
# Could use dg here rather than G0
G0_ = _ket(G0/(np.linalg.norm(G0) + _TINY))
J0G0 = Jinv0*G0_
if skip_inds is not None:
skip_inds = np.asarray(skip_inds, dtype=int)
dg[skip_inds] = 0
dx[skip_inds] = 0
if np.linalg.norm(dx) <= 0 or np.linalg.norm(dg) <= 0:
# It could happen, such as when changing resolutions, that
# this is called without actually generating a new step,
# so we just pass here.
return 0.0
w = _ket(dx) - Jinv0*_ket(dg)
# Zero out noisy components
mags = _ket(abs(G0) + abs(g))
w = np.where(abs(w) > _EPS*mags, w, 0.0)
if (w != 0).any():
if self.minimize_inverse:
vt = _bra(dg)
else:
vt = _bra(dx)*Jinv0
dot_prod = dot(vt, _ket(dg))
dx_norm = np.linalg.norm(dx)
dg_norm = np.linalg.norm(dg)
reduction = abs(dot_prod/dx_norm/dg_norm)
if reduction < self.min_Jinv:
# Use other step
vt = _bra(dg)
t = dot(vt, dg)
w = w/t
_add_dyad(Jinv0, w, vt)
if self.robust:
U, d, Vinv = np.linalg.svd(Jinv0)
print np.hstack((abs(dot(U.T, _ket(abs(g)))), _ket(d)))
if self.max_Jinv < d.max():
warnings.warn("Warning: Clipping Jacobian: %f"%d.max())
d = np.where(d<self.max_Jinv, d, self.max_Jinv)
Jinv0 = U*np.mat(np.diag(d))*Vinv
self.broyden_data.Jinv = Jinv0
# Compute measure for change in Jinv.
JG0 = Jinv0*G0_
res = np.linalg.norm(JG0 - J0G0)/(
max(np.linalg.norm(JG0), np.linalg.norm(J0G0)) + _TINY)
# Update predicted position.
self._update_x(x=x, g=g)
return res
def _get_dx(self, x, g, x_controls):
r"""Get step from error `G(x)`.
Parameters
----------
x_controls : (array, array) or None
Tuple `(inds, vals)` indices and the respective values of the
control parameters.
Examples
--------
Here is a little test with a linear function
>>> J = np.array([[1, 2, 3], [1, -3, 2], [-2, 1, 4]], dtype=float)
>>> sol = np.array([1, 2, 3])
>>> def G(x):
... return np.dot(J, x - sol)
>>> x0 = np.array([0.0, 0.0, 0.0])
>>> b = Broyden(x0=x0, G0=G(x0), Jinv0=np.linalg.inv(J))
>>> x = np.array([1.0, 2.0, 3.0])
Now try to step to the solution, but hold the second variable as
a control at 1. Since the function is exactly linear and we
have explicitly given the correct Jacobian, we should end up
with a root in the other two parameters.
>>> res = b.update(x=x, g=G(x))
>>> b.x
array([ 1., 2., 3.])
>>> x = b.get_x(x_controls=([1], [1.0]))
>>> x
array([ 1.5, 1. , 3.5])
>>> np.allclose(G(x), [0., 4.5, 0.])
True
"""
if g is None:
dx = 0*x
return dx
elif self.broyden_data.Jinv is None:
dx = -g
else:
Jinv = self.broyden_data.Jinv
if x_controls is not None and 0 < len(x_controls[0]):
inds, vals = x_controls
inds = np.asarray(inds, dtype=int)
vals = np.asarray(vals)
dg_dmu = copy.copy(g)
n_d = len(inds)
n_x = len(x)
basis = np.mat(np.zeros((n_x, n_d), dtype=float))
basis[inds, np.arange(n_d)] = 1
dg_dmu[inds] = x[inds] - vals
d = basis.T*Jinv*basis
dmu = _ket(dg_dmu[inds])
b = (Jinv*_ket(dg_dmu))[inds, :]
dN = dmu - np.linalg.solve(d, b - dmu)
dg_dN = dg_dmu
dg_dN[inds] = dN[:]
dx_dmu = Jinv*_ket(dg_dN)
assert np.allclose(dx_dmu[inds], dmu)
# Set these to make sure no round-off errors have
# accumulated.
dx_dmu[inds].flat = x[inds] - vals
dx = -dx_dmu
else:
dx = -(Jinv*_ket(g))
dx = np.asarray(dx).ravel()
G_mag = np.linalg.norm(g)
dx_mag = np.linalg.norm(dx)
if dx_mag == 0 and G_mag > 0:
warnings.warn("Warning: Singular Jinv, taking random step")
dx = np.random.rand(len(dx))*self.min_step
elif np.divide(dx_mag, G_mag) < self.min_Jinv:
warnings.warn("Warning: Singular Jinv, taking minimum step")
dx *= self.min_step/np.linalg.norm(dx)
dx *= self.step_factor
dx_len2 = dot(dx, dx)
if self._max_step2 < dx_len2:
dx *= math.sqrt(self._max_step2/dx_len2)
return dx
class BroydenDyadic(Broyden):
r"""Implements a rudimentary Broyden search by keeping track of the
Jacobian an allowing for Broyden steps to be made. Uses a dyadic
sum representation to reduce the memory overhead.
Examples
--------
>> def G(x):
... return x - x*x
>> x0 = 2.0
>> b = BroydenDyadic(x0=x0, G0=G(x0))
>> b.x
4.0
>> b.update(G(b.x)) # doctest: +ELLIPSIS
6.0...
>> b.x # doctest: +ELLIPSIS
1.60000...
>> b.update(G(b.x)) # doctest: +ELLIPSIS
0.08...
>> b.x # doctest: +ELLIPSIS
1.3913043478...
>> for n in xrange(8): b.update(G(b.x)) # doctest: +ELLIPSIS
0.56...
0.24...
0.23...
0.09...
0.02...
0.003...
...e-05
...e-07
>> b.x
1.0
Here is a two-dimensional problem:
x = 2*x*x
y = y*y
x , y = 0.5, 1.0
>> def G(x):
... return [x[0] - x[0]*x[0]*2.0, x[1] - x[1]*x[1]]
>> x0 = [2.0, 2.0]
>> b = BroydenDyadic(x0=x0, G0=G(x0))
>> for n in xrange(30):
... res = b.update(G(b.x))
>> map(lambda r:round(r, 12), b.x)
[0.5, 1.0]
"""
_state_vars = [('dyadic=broyden_data.dyadic', True)]
process_vars()
class ModifiedBroyden(object):
r"""Implements a modified Broyden search.
Similar to that described in
J. Chem. Phys. Vol. 108, No. 100, 4426.
but with a straightforward implementation (not efficient).
Notes
=====
N = length of vectors x and F(x)
Ni = subspace size
X = [x0 x1 ... xi]
G = [G(x0) G(x1) ... G(xi)] where G(x) = x - F(x)
_next_x = Current step.
>> def F(x):
... return x*x
>> x0 = 2.0
>> B = ModifiedBroyden(x0, F(x0))
>> B.x
4.0
>> B.update(F(B.x))
>> B.x # doctest: +ELLIPSIS
1.59999999...
>> B.update(F(B.x))
>> B.x # doctest: +ELLIPSIS
1.3913043478...
>> for n in xrange(8): B.update(F(B.x))
>> B.x
1.0
Here is a two-dimensional problem.
x = 2*x*x
y = y*y
x , y = 0.5, 1.0
>> def F(x):
... return [x[0]*x[0]*2.0, x[1]*x[1]]
>> x0 = [2.0, 2.0]
>> B = ModifiedBroyden(x0, F(x0))
>> for n in xrange(30):
... B.update(F(B.x))
>> map(lambda r:round(r, 12), B.x)
[0.5, 1.0]
"""
def __init__(self, x0, Fx0, alpha=1.0, W=None, max_step=np.inf,
mmf_method=True, DIIS=5):
r"""Initialize the Brodyen object.
:Parameters:
alpha : Initial Jacobian is alpha*eye(N)
W : W is a function for computing the weights w[i].
It will be passed the vector arange(Ni) and should
return the pair (w0, wi) where wi is a vector.
mmf_method : If True then use my form of DeltaG.
"""
self._pack, self._unpack = pack_unpack(x0)
self.max_step = max_step
self.max_step2 = max_step*max_step
if W is None:
def W(i):
r"""Reproduce standard Broyden method."""
w = array(i)*0.0
w[-1] = 1.0
return (0.001, w)
def W(i):
r"""A Weigted Broyden."""
N = len(i)
w = array(i)/N
return (0.001, w)
def W(i):
r"""A Weigted Broyden."""
N = len(i)
w = array(i)*0.0
w[-10:] = 1.0
return (0.001, w)
self.W = W
self.mmf_method = mmf_method
self.DIIS = DIIS
self.X = mat(self._pack(x0))
self.G = mat(self.X[:, 0] - self._pack(Fx0))
self.Jinv = alpha*mat(np.eye(self.N))
self.R = -self.Jinv*self.G
self._next_x = self.get_next_x(self.X[:, 0], self.G[:, 0])
@property
def kappa(self):
return self.G.T*self.G
@property
def N(self):
r"""Number of parameters."""
return self.X.shape[0]
@property
def Ni(self):
r"""Number of iterations."""
return self.X.shape[1]
def get_x(self):
return self._unpack(self._next_x)
def set_x(self, x):
self._next_x = self._pack(x)
x = property(get_x, set_x, doc="New point x.")
def update(self, Fx, x=None):
r"""Update based on new value of F(x).
Use the property x to access the next value of x.
"""
if x is None:
x = self._pack(self.x)
else:
x = self._pack(x)
Fx = self._pack(Fx)
G = x - Fx
if self.mmf_method:
dg = array(G - self.G)
dX = array(x - self.X)
else:
G_ = np.hstack((self.G, G))
X_ = np.hstack((self.X, x))
dg = array(G_[:, 1:] - G_[:, :-1])
dX = array(X_[:, 1:] - X_[:, :-1])
Nni = array(np.sqrt(np.diag(mat(dg.T)*mat(dg)))
).reshape((1, self.Ni)) + \
np.finfo(float).tiny
DeltaG = dg/Nni
DeltaX = dX/Nni
(w_tilde, w) = self.W(np.arange(0, self.Ni))
w = array(w).reshape((1, self.Ni))
w_tilde2 = w_tilde*w_tilde
DeltaG *= w
DeltaX *= w
An = w_tilde2*self.Jinv + mat(DeltaX)*mat(DeltaG).T
Bn = w_tilde2*np.eye(self.N) + mat(DeltaG)*mat(DeltaG).T
self.Jinv = np.linalg.solve(Bn.T, An.T).T
self.X = np.hstack((self.X, x))
self.G = np.hstack((self.G, G))
self.R = np.hstack((self.R, -self.Jinv*G))
self._next_x = self.get_next_x(x, G)
def get_next_x(self, x, G):
r"""Get new x from error G = x - F(x)."""
dx = -(self.Jinv*G)
if self.DIIS:
X = self.X
G = self.G
if self.mmf_method:
R = -self.Jinv*self.G
else:
R = self.R
Ni = min(self.Ni, self.DIIS)
Ri = R[:, -Ni:]
A = np.empty((Ni+1, Ni+1), dtype=float)
A[:Ni, :Ni] = Ri.T*Ri
A[-1, :] = -1.0
A[:, -1] = -1.0
A[-1, -1] = 0.0
B = np.zeros((Ni+1, 1))
B[-1, 0] = -1.0
c = mat(np.linalg.solve(A, B))
x_new = mat((X+R)[:, -Ni:])*c[:-1, :]
dx = x_new - x
dx_len2 = (dx.T*dx)[0, 0]
if self.max_step2 < dx_len2:
dx *= math.sqrt(self.max_step2/dx_len2)
return x + dx
[docs]class ExtendedBroyden(StateVars):
r"""This is a class implementing the memory limited extended
Broyden update.
"""
_state_vars = [
('method', 1),
# ('X', Computed, r"""$N\times n$ array of the previous states.
# Also used to store basis information so that redundant
# storage is not required."""),
# ('G', Computed, r"""$N\times n$ array of the previous function
# evaluations. Also used to store basis information so that
# redundant storage is not required."""),
# ('A', Computed),
# ('B', Computed),
]
process_vars()
def get_J(self, dX, dG, w_0=0, w_r=0,
Qa=None, Qb=None, b0=None,
w=None, singular_tol=_SINGULAR_TOL):
r"""Return `(Qa, Qb, b)` such that the inverse Jacobian is
expressed as:
.. math::
\mat{B} = \mat{Q}_{a}\diag(b)\mat{Q}_{b}^{T}
Parameters
----------
dX, dG : array
These should be $N \times n$ arrays of differences $x_{i} -
x_{j}$ and $G(x_{i}) - G(x_{j})$. If there are $m$
independent points then there may be up to $n=m(m-1)/2$
independent differences, but because everything depends on
these linearly, only $n=m-1$ entries are needed. The
additional information can be codified in the weight matrix
`w`. (If desired, the larger array may be passed -- a
decomposition will be used to reduce the rank.)
.. note:: The code will effectively only work with the
matrices $\ket{\d\mat{X}}\mat{w}$ and
$\ket{\d\mat{G}}\mat{w}$, so one could in principle pass
in something like the full matrix of previous steps
$\mat{X}$ and include the differencing in $\mat{w}$.
w : array, (optional)
This is the weight matrix. If it is one-dimensional (with
length $n$) then it is assumed to be diagonal. A
two-dimensional array may also be specified as discussed
above. Assumed to be the identity if not provided.
Qa, Qb : array
These are the basis vectors in which the previous inverse
Jacobian is expressed:
.. math::
\mat{B}_{0} = \mat{Q}_{a}\mat{b}_{0}\mat{Q}_{b}^{T}
b : array
The non-identity part of the previous Jacobian expressed in
the basis `Qa` and `Qb`. If one-dimensional, then it will
be interpreted as the diagonals.
w_0 : float
The magnitude of this in relationship to `w` determines how
much to pull the Jacobian back to the previous Jacobian.
w_r : float
The magnitude of this in relationship to `w` determines how
much to pull the Jacobian back to the identity. This
simulates a "restart".
singular_tol : float
The minimum singular value of the Jacobian and inverse
Jacobian is limited by this. Values outside this range are
eliminated (along with the corresponding entries in the
basis vectors `Qa` and `Qb`).
Returns
-------
Qa, Qb : array
These are $N \times m$ orthonormal arrays
$\mat{Q}^{T}\mat{Q} = \mat{1}$ with $m \leq n$.
b : array
This diagonal array expresses the non-identity part of
the inverse Jacobian $\mat{B}$ in the specified basis.
This is computed from the SVD so these are positive entries
between `min_svd` and `1/min_svd`.
Notes
-----
This method computes using the following formulae
.. math::
\mat{b} &= \frac{-1}{\mat{j}^{-1}
+ \braket{\uvect{\beta}|\uvect{\alpha}}},\\
\mat{j}_{\text{1a)}} &=
\left\{
\braket{\uvect{\alpha}|\d\mat{G}-\d\mat{X}}\mat{w}^2
\braket{\d\mat{X}|\uvect{\beta}}
+
w_{0}^2\braket{\uvect{\alpha}|\uvect{\alpha}_{0}}
\mat{j}_{0}
\braket{\uvect{\beta}_{0}|\uvect{\beta}}
\right\}
\cdot
\frac{1}{\braket{\uvect{\beta}|\d\mat{X}}\mat{w}^2
\braket{\d\mat{X}|\uvect{\beta}} + w_{0}^2 + w_{r}^2},\\
\mat{b}_{\text{2b)}} &=
\left\{
\braket{\uvect{\alpha}|\d\mat{X}-\d\mat{G}}\mat{w}^2
\braket{\d\mat{G}|\uvect{\beta}}
+
w_{0}^2\braket{\uvect{\alpha}|\uvect{\alpha}_{0}}
\mat{b}_{0}
\braket{\uvect{\beta}_{0}|\uvect{\beta}}
\right\}
\cdot
\frac{1}{\braket{\uvect{\beta}|\d\mat{G}}\mat{w}^2
\braket{\d\mat{G}|\uvect{\beta}} + w_{0}^2 + w_{r}^2}.
Examples
--------
First we start with a problem of full rank:
>>> np.random.seed(0)
>>> dX = np.random.rand(10, 4)
>>> dG = np.random.rand(10, 4)
>>> w = np.ones(4)
>>> broyden = ExtendedBroyden(method=1)
>>> Qa, Qb, b = broyden.get_J(dX, dG, w=w)
>>> np.allclose(np.dot(Qa.T, Qa), np.eye(4))
True
>>> np.allclose(np.dot(Qb.T, Qb), np.eye(4))
True
In this case, the inverse Jacobian exactly satisfies the
secant condition:
>>> B = np.eye(10) + np.dot(Qa*b[None, :], Qb.T)
>>> np.allclose(np.dot(B, dG), dX)
True
Broyden's "bad" method is also supported
>>> broyden.method = 2
>>> Qa, Qb, b = broyden.get_J(dX, dG, w=w)
>>> np.allclose(np.dot(Qa.T, Qa), np.eye(4))
True
>>> np.allclose(np.dot(Qb.T, Qb), np.eye(4))
True
>>> B = np.eye(10) + np.dot(Qa*b[None, :], Qb.T)
>>> np.allclose(np.dot(B, dG), dX)
True
Now we consider a rank deficient problem
>>> dX[:,0] = dX[:,1]
>>> dG[:,0] = dG[:,1]
>>> Qa, Qb, b = broyden.get_J(dX, dG, w=w)
>>> b.shape
(3,)
>>> B = np.eye(10) + np.dot(Qa*b[None, :], Qb.T)
>>> np.allclose(np.dot(B, dG), dX)
True
"""
def inv(A):
return np.linalg.pinv(A, rcond=singular_tol**2)
simple_algorithm = True
if w is not None:
w = np.asarray(w)
if simple_algorithm:
# This is the simple form of the algorithm where we
# process the large matrices.
Xa = dX - dG
if 1 == self.method:
# Broyden's "good" method
Xb = dX
elif 2 == self.method:
# Broyden's "bad" method
Xb = dG
else:
raise NotImplementedError
if w is not None:
if 1 == len(w.shape):
Xa *= w[None, :]
Xb *= w[None, :]
else:
Xa = dot(Xa, w)
Xb = dot(Xb, w)
if Qa is not None:
Xa = np.hstack([Xa, Qa])
if Qb is not None:
Xb = np.hstack([Xb, Qb])
alpha, r_a = np.linalg.qr(Xa)
beta, r_b = np.linalg.qr(Xb)
if 1 == self.method:
d_beta = dot(dX.T, beta)
beta_alpha = dot(beta.T, alpha)
if b0 is not None:
bj0 = -inv(inv(b0) + beta_alpha)
elif 2 == self.method:
d_beta = dot(dG.T, beta)
if b0 is not None:
bj0 = b0
num = dot(dot(alpha.T, dX - dG), d_beta)
if 1 == self.method:
num *= -1
if b0 is not None:
num += w_0**2*dot(
dot(dot(alpha.T, Qa), bj0),
dot(Qb.T, beta))
denom = dot(d_beta.T, d_beta)
denom += np.eye(len(denom), dtype=float)*(w_0**2 + w_r**2)
b = dot(num, inv(denom))
if 1 == self.method:
# We actually computed j rather than b
j = b
b = -inv(inv(j) + beta_alpha)
u, d, vt = np.linalg.svd(b)
inds = np.where(np.logical_and(
d >= singular_tol,
d <= 1./singular_tol))[0]
d = d[inds]
Qa = dot(alpha, u[:,inds])
Qb = dot(beta, vt.T[:,inds])
return Qa, Qb, d
else:
dX = np.asarray(dX)
dG = np.asarray(dG)
dXG = dX - dG
xgxg = dot(dXG.T, dXG)
gg = dot(dG.T, dG)
if w is not None:
if 1 == len(w.shape):
gg = w[:,None]*gg*w[None, :]
xgxg = w[:,None]*xgxg*w[None, :]
else:
gg = dot(dot(w.T, gg), w)
xgxg = dot(dot(w.T, xgxg), w)
d2_xg, u_xg = eigh(xgxg)
assert np.all(0 <= d2_xg)
d_xg = np.sqrt(d2_xg)
alpha_dXG = dot(u_xg*d_xg[None, :], u_xg.T)
if Qa is not None:
alpha0_dXG = dot(Qa.T, dXG)
alpha_dXG = np.hstack([alpha_dXG, alpha0_dXG])
alpha0_alpha = np.vstack([dot(alpha0_dXG,
u_xg/(d_xg[None, :] + _TINY)),
np.eye(Qa.shape[1], dtype=float)])
d2_g, u_g = eigh(gg)
assert np.all(0 <= d2_g)
d_g = np.sqrt(d2_g)
beta_dG = dot(u_g*d_g[None, :], u_g.T)
if Qb is not None:
beta0_dG = dot(Qb.T, dG)
beta_dG = np.hstack([beta_dG, beta0_dG])
beta0_beta = np.vstack([dot(beta0_dG,
u_g/(d_g[None, :] + _TINY)),
np.eye(Qb.shape[1], dtype=float)])
if b0 is not None:
alpha_b0_beta = 2*dot(dot(alpha0_alpha.T, b0),
beta0_beta)
else:
alpha_b0_beta = 0
num = (dot(alpha_dXG, beta_dG.T) +
w_0**2*alpha_b0_beta)
denom = dot(beta_dG, beta_dG.T)
denom += np.eye(len(denom), dtype=float)*(w_0**2 + w_r**2)
b = self._stable_divide(num, denom)
u, d, vt = np.linalg.svd(b)
u = dot(u_n, u)
vt = dot(vt, u_d.T)
inds = np.where(np.logical_and(
min_svd <= abs(d),
abs(d) <= 1./min_svd))[0]
Qa = dot(Qa, u[:, inds])
Qb = dot(Qb, vt.T[:, inds])
return Qa, Qb, d