Source code for mmf.solve.broyden

r"""Broyden Root Finders.

This module contains several different root finders that implement
Broyden's method.

Mathematical Background
=======================
The basic idea is to use a linear approximation to the objective
function

.. math::
   \d\vect{G} \approx \mat{J}\cdot \d\vect{x}

where :math:`\d\vect{G} = \vect{G}(x) - \vect{G}(x_0)` and
:math:`\d\vect{x} = \vect{x} - \vect{x}_0`.

The goal of the Broyden methods is to find the value
:math:`\vect{x}_0` such that :math:`\vect{G}(\vect{x}_0) = 0`.  This
is done using the Newton step:

.. math::
   \vect{x}_0 = \vect{x} - \mat{J}^{-1}\cdot\vect{G}.

If the Jacobian is exact, then this should give quadratic convergence
as one approaches the root.  However, it can be expensive to compute
the Jacobian (at least :math:`N^2+1` function evaluations for a simple
finite difference approximation), so instead, the Broyden method uses
a sequence approximations, starting with the identity and using a
minimal update in the sense of satisfying the secant condition while
minimizing the Frobenius norm of the change.  In the following we
shall us bra-ket notation for vectors.

We shall first discuss the standard Broyden method which imposes a
"minimal" update such that the secant condition $\ket{\d{G}} =
\mat{J}\ket{\d{x}}$ holds.

Standard Broyden Method
~~~~~~~~~~~~~~~~~~~~~~~
For a given step :math:`\ket{\d{x}}` where the function changes by
:math:`\ket{\d{G}}` the update must satisfy the secant condition

.. math::
   \ket{\d{G}} = \mat{J}\ket{\d{x}}.

We can write any such update in the following form

.. math::
   \mat{J}^{-1} \rightarrow \mat{J}^{-1}
        + \frac{\ket{\d{x}} - \mat{J}^{-1}\ket{\d{G}}\bra{v}}
               {\braket{v|\d{G}}}

where :math:`\ket{v}` is any vector not orthogonal to :math:`\ket{\d{G}}`.
This ambiguity of the multidimensional secant method causes problems
if not properly resolved, but Broyden showed that good results can be
obtained if the update is chosen to minimize the Frobenius norm of the
Jacobian:

.. math::
   \norm{\mat{J}}_{\text{Frobenius}} = \sum_{ij}\sqrt{J_{ij}^2}.

This yields the update

.. math::
   \mat{J} \rightarrow
      \mat{J} + \frac{(\ket{\d{G}} - \mat{J}\ket{\d{x}})\bra{\d{x}}}
                     {\braket{\d{x}|\d{x}}}

which can be analytically inverted using the Shermann-Morrison formula
to find :math:`\ket{v} = \left[\mat{J}^{-1}\right]^{T}\ket{\d{x}}`:

.. math::
   \mat{J}^{-1} \rightarrow \mat{J}^{-1}
       + \frac{\left(\ket{\d{x}} - \mat{J}^{-1}\ket{\d{G}}\right)
               \bra{\d{x}}\mat{J}^{-1}}
              {\braket{\d{x}|\mat{J}^{-1}|\d{G}}}.

An alternative update (Broyden's "first" method) is found by minimizing
the norm of the inverse Jacobian, which yields :math:`\ket{v} =
\ket{\d{G}}`

.. math::
   \mat{J}^{-1} \rightarrow 
      \mat{J}^{-1} + \frac{(\ket{\d{x}} - \mat{J}^{-1}\ket{\d{G}})\bra{\d{G}}}
                          {\braket{\d{G}|\d{G}}},

however, we have found that this does not always work as well.  It may
be used, however, even if the inverse Jacobian develops a singular
direction (which can cause trouble for the previous update).

Generalized Broyden Method
~~~~~~~~~~~~~~~~~~~~~~~~~~
This method is a generalization of the one described in
[Bierlaire:2006]_ and defines the new Jacobian $\mat{J}$ at the point
$x$ that minimizes a weighted combination of previous "residuals"
$\ket{r_i}$ along with a minimal change condition on a matrix residual
$\mat{R}$:

.. math::
   \min_{\mat{J}}\left(\sum_{i}\norm{w_i\ket{r_i}}_{2}^{2}
         +\norm{\mat{R}\mat{\Gamma}}^{2}_{F}\right)

where $w_i$ are weights and $\mat{\Gamma}$ will be discussed later and
specifies the weighting of the matrix residual.  The residuals may
take the following form:

.. math::
   \ket{r_i} &= \begin{cases}
      \ket{\d{G}_i} - \mat{J}\ket{\d{x}_i} & \text{1)}\\
      \ket{\d{x}_i} - \mat{J}^{-1}\ket{\d{G}_i} & \text{2)}
   \end{cases}\\
   \mat{R} &= \begin{cases}
      \mat{J} - \mat{J}_{0} & \text{a)}\\
      \mat{J}^{-1} - \mat{J}^{-1}_{0} & \text{b)}
   \end{cases}

The method 1a) corresponds to the generalization of Broyden's first or
"Good" method while the combination 2b) corresponds to the
generalization of Broyden's second or "Bad" method.  This second
method with $\mat{\Gamma} = \mat{1}$ is the method employed in
[Johnson:1988]_ with the slight modification that $\ket{\d{x}_i} =
\ket{x_{i+1}} - \ket{x_{i}}$.  The two mixed cases 1b) and 2a) are
much more difficult to solve as they are linear in neither $\mat{J}$
nor $\mat{J}^{-1}$.

In what follows, we shall use the notation $\mat{B} = \mat{J}^{-1}$
for the inverse approximate Jacobian.  Boldface capital letters like
this refer to "large" $N\times N$ matrices that we typically will not
be able to form.  Boldface lowercase letters such as $\mat{w} =
\diag(w_i)$ denote "small" $n\times n$ matrices that we can easily
store.  We define mixed matrices using bra-ket notation such as
$\ket{\d{\mat{G}}}$ and $\ket{\d{\mat{X}}}$ for the $N \times n$
matrices of the vectors $[\ket{\d{\mat{X}}}]_{:i} = \ket{\d{x}_i}$.
We may now write the minimization condition as:

.. math::
   \min_{\mat{J}}
      \tr\left(\ket{\mat{r}}\mat{w}^2\bra{\mat{r}}
            + \mat{R}\mat{\Gamma}\mat{\Gamma}^T\mat{R}^T\right).

Using the relationship $\partial J^{-1}_{MN}/\partial J_{AB} =
J^{-1}_{MA}J^{-1}_{BN}$ we arrive at the following normal equations:

.. ::
..    (G - JX) W^2 X^T = (J - J0)Gamma
..    (G - BX) W^2 X^T = -B (B - B0)Gamma B
..    -B(X - BG) W^2 G^T B = (J - J0)Gamma
..    -B(X - BG) W^2 G^T B= -B (B - B0)Gamma B

.. math::
   \mat{B}_{\text{1a)}} &= \mat{B}_{0} +
     (\ket{\d\mat{X}} - \mat{B}_{0}\ket{\d\mat{G}})\cdot\mat{w}^2
     \bra{\d\mat{X}}\cdot\frac{1}{
       \mat{B}_{0}\ket{\d\mat{G}}\mat{w}^2\bra{\d\mat{X}}
       + \mat{\Gamma}\mat{\Gamma}^{T}}\cdot\mat{B}_{0}, \\
   \mat{B}_{\text{2b)}} &= \mat{B}_{0}
     + (\ket{\d\mat{X}} - \mat{B}_{0}\ket{\d\mat{G}})\cdot\mat{w}^2
     \bra{\d\mat{G}}\cdot\frac{1}{
       \ket{\d\mat{G}}\mat{w}^2\bra{\d\mat{G}}
       + \mat{\Gamma}\mat{\Gamma}^{T}}

The weights have the following effects:

1) If not using the subspace decomposition method, then the weights affect the
   inversion procedure.  In particular, the ratio
   $\mat{\Gamma}\mat{\Gamma}^{T}\mat{w}^{-2}$ is important for determining the
   relative weighting of the old Jacobian and the new steps.
2) If components of $\ket{\d\mat{X}}$ and $\ket{\d\mat{G}}$ are linearly
   dependent -- i.e. the secant conditions are not all consistent -- them the
   weights determine how to mix these.
3) If using the subspace decomposition method, then for the linearly independent
   components, the weights are formally irrelevant, but they can still be used
   to determine which components should be truncated or ignored (see section
   :ref:`sec:truncation-algorithm`).  They will also play a role when dynamic
   restarts are considered (see :ref:`sec:dynamic-restarts`).

.. note:: Here we present a little bit of analysis.  The first step will be to
   define $\ket{\mat{X}} \approx \ket{\d\mat{X}}\mat{w}$ and $\ket{\mat{G}}
   \approx \ket{\d\mat{G}}\mat{w}$ so that they have full rank and we can write
   $\ket{\mat{X}} = \ket{\uvect{X}}\mat{x}$, $\ket{\mat{G}} =
   \ket{\uvect{G}}\mat{g}$ with $\mat{x}$ and $\mat{g}$ being invertable.  (In
   the memory limited context, these will be small.  They can be computed using
   an SVD or rank-revealing QR decomposition.)  The two updates can now be
   written as:

   .. math::
      \mat{B}_{2a)} 
      = \mat{B}_{0} 
        - \mat{B}_{0}\ket{\uvect{G}}\bra{\uvect{G}} 
        + \ket{\mat{X}}\mat{g}^{-1}\bra{\uvect{G}}
      = \mat{B}_{0} 
        + (\ket{\mat{X}} - \mat{B}_{0}\ket{\mat{G}})
        \mat{g}^{-1}(\mat{g}^T)^{-1}\bra{\mat{G}}.

   The first equality expresses the update as first subtracting off the
   projection of the previous approximation on the subspace spanned by
   $\ket{\uvect{G}}$ and then reconstructing this from the new data.  In this
   sense the update is minimal and can be expressed as minimizing the Frobenius
   norm.  This follows from equation :eq:`proj_inv` with $\mat{\Gamma} =
   \ket{\uvect{G}_{\perp}}$.

   The main feature of this update is that, if the function is linear, then
   previous updates will be preserved so that after $N$ independent updates,
   the approximation will be exact: *NO NO NO.  Not True!*

   >>> np.random.seed(0)
   >>> import numpy.random
   >>> from numpy import mat
   >>> from numpy.linalg import svd, qr, inv
   >>> ns = numpy.random.randint(1,10,size=10)
   >>> N = sum(ns)
   >>> X_ = mat(np.random.rand(N,N))
   >>> J_ = mat(np.random.rand(N,N))
   >>> G_ = J_*X_
   >>> B = mat(np.eye(N))
   >>> B1 = mat(np.eye(N))
   >>> n0 = 0
   >>> for n in ns:
   ...    X = X_[:,n0:n0+n]
   ...    G = G_[:,n0:n0+n]
   ...    n0 += n
   ...    _, g = qr(G)
   ...    _, x = qr(X)
   ...    Xhat = svd(X)[0][:,:n]
   ...    Xperp = svd(X)[0][:,n:]
   ...    Ghat = svd(G)[0][:,:n]
   ...    Gperp = svd(G)[0][:,n:]
   ...    B = B + (X - B*G)*inv(g.T*g)*G.T
   ...    B1 = B1 - B1*Ghat*Ghat.T + X*inv(g)*Ghat.T



        

.. note:: I am trying to form a better intuition for these equations, so
   consider the following.  First assume that the $\mat{B}_0$ is not singular
   and that the matrices $\ket{\d{\mat{X}}}$ and $\ket{\d\mat{G}}$ are not rank
   deficient. (I.e., f there are $n$ previous points, then these have rank $n$.
   Otherwise, use the weights to reduce the size so that they satisfy this
   property appropriately.  The weights can now be ignored.)  In this case, one
   can satisfy the secant conditions exactly if the equations have the following
   structure:

   .. math::
      \mat{B} &= \mat{B}_{0} +
        (\ket{\d\mat{X}} - \mat{B}_{0}\ket{\d\mat{G}})\cdot
        \braket{\mat{A}|\d\mat{G}}^{-1}\bra{\mat{A}}

   for an arbitrary choice of $\bra{\mat{A}}$ insofar as
   $\braket{\mat{A}|\d\mat{G}}$ is not singular.  The ambiguity in the choice of
   $\bra{\mat{A}}$ is now captured in the desire to minimize the change from
   $\mat{B}_{0}$ and we have:

   .. math::
      \bra{\mat{A}_{\text{1a)}}} & = \bra{\d\mat{X}}\mat{B}_{0},\\
      \bra{\mat{A}_{\text{2b)}}} & = \bra{\d\mat{G}}.

   >>> from numpy.random import rand
   >>> from numpy import dot
   >>> from numpy.linalg import inv, svd
   >>> np.random.seed(0)
   >>> dG = rand(5,3)
   >>> dX = rand(5,3)
   >>> B0 = rand(5,5)
   >>> Gamma = dot(svd(dG)[0][:,-2:], rand(2,2))
   >>> np.allclose(0, dot(dG.T, Gamma))
   True
   >>> np.allclose(dot(dG.T, inv(dot(dG, dG.T) + dot(Gamma, Gamma.T))),
   ...             dot(inv(dot(dG.T, dG)), dG.T))
   True

   >>> A = dG.T
   >>> R = dX - dot(B0, dG)
   >>> B1 = B0 + dot(R, dot(inv(dot(A, dG)), A))
   
   
[Bierlaire:2006]_ suggests that a good choice of weights $w_i$ is
something like

.. math::
   w_i = \norm{x - x_i}^{-2}
       = \frac{1}{\braket{\d\mat{X}|\d\mat{X}}_{ii}}

which weights the small steps (and hence more likely to be in the
linear regime) more highly than long secant steps.  [Johnson:1988]_
suggests a similar criterion based on $\norm{G_{i}}$ or a fixed set of
weights depending on iteration number.

A good choice of $\mat{B}_{0}$ and $\mat{\Gamma}$ is not so simple,
but we can be lead by the original Broyden method which sets
$\mat{B}_{0}$ to be the previous approximate inverse Jacobian.
Another option is to set $\mat{B}_{0} = \mat{1}$ or some other
constant independent of iteration.  If enough iterates are maintained,
then this might also work (but I am not aware of this method having
been explored).

.. _sec:dynamic-restarts:

Dynamic Restarts
----------------

One interesting idea is to include an additional Jacobian residual
term $\mat{R} = \mat{B} - \mat{B}_{-1}$ in addition to $\mat{R}_{0} =
\mat{B} - \mat{B}_{0}$ where $\mat{B}_{0}$ is constant (typically
$\mat{B}_{0}\propto \mat{1}$) and $\mat{B}_{-1}$ is the previous
approximation.  If we also take $\mat{\Gamma}_{0} = w_{0}\mat{1}$,
then the weight $w_{0}$ will "pull" the inverse Jacobian back towards
$\mat{B}_{0}$.  This is useful if one can formulate some slow but
approximately convergent iteration scheme.  [Johnson:1988]_, for
example, advocates restarting the algorithm and resetting the Jacobian
approximation after some progress has been made.  Instead, by dynamically
increasing $w_0$ (if the Broyden steps are not improving the residual
for example) once can continuously "restart" the approximation by
pulling it back toward $\mat{B}_{0}$. This gives the following
generalized equations (which we express in several forms here):

.. math::
   \mat{J}_{\text{1a)}} &= \Bigl[
       \ket{\d\mat{G}}\mat{w}^2\bra{\d\mat{X}}
       + \mat{J}_{-1}\mat{\Gamma}\mat{\Gamma}^{T}
       + \mat{J}_{0}w_0^2
     \Bigr]\cdot\Bigl[
       \ket{\d\mat{X}}\mat{w}^2\bra{\d\mat{X}}
       + \mat{\Gamma}\mat{\Gamma}^{T} 
       + \mat{1}w_0^2
     \Bigr]^{-1},\\
   &= \mat{J}_{-1} +
     \Bigl(
       (\ket{\d\mat{G}} - \mat{J}_{-1}\ket{\d\mat{X}})\cdot\mat{w}^2
       \bra{\d\mat{X}}
       + (\mat{J}_{0} - \mat{J}_{-1})w_0^2
     \Bigr)\cdot\frac{1}{
       \ket{\d\mat{X}}\mat{w}^2\bra{\d\mat{X}}
       + \mat{\Gamma}\mat{\Gamma}^{T} + \mat{1}w_0^2},\\
   \mat{B}_{\text{1a)}} &= \mat{B}_{-1} + 
   \Bigl(
     (\ket{\d\mat{X}} - \mat{B}_{-1}\ket{\d\mat{G}})\cdot\mat{w}^2
     \bra{\d\mat{X}}
     + (\mat{1} - \mat{B}_{-1}\mat{B}_{0}^{-1})w_0^2
   \Bigr)\cdot\frac{1}{
       \mat{B}_{-1}\ket{\d\mat{G}}\mat{w}^2\bra{\d\mat{X}}
       + \mat{\Gamma}\mat{\Gamma}^{T} + \mat{B}_{-1}\mat{B}_{0}^{-1}w_0^2
       }\cdot\mat{B}_{-1}, \\
   \mat{B}_{\text{2b)}} &= \Bigl[
       \ket{\d\mat{X}}\mat{w}^2\bra{\d\mat{G}}
       + \mat{B}_{-1}\mat{\Gamma}\mat{\Gamma}^{T}
       + \mat{B}_{0}w_0^2
     \Bigr]\cdot\Bigl[
       \ket{\d\mat{G}}\mat{w}^2\bra{\d\mat{G}}
       + \mat{\Gamma}\mat{\Gamma}^{T} 
       + \mat{1}w_0^2
     \Bigr]^{-1},\\
   &= \mat{B}_{-1} +
     \Bigl(
       (\ket{\d\mat{X}} - \mat{B}_{-1}\ket{\d\mat{G}})\cdot\mat{w}^2
       \bra{\d\mat{G}}
      + (\mat{B}_{0} - \mat{B}_{-1})w_0^2
     \Bigr)\cdot\frac{1}{
       \ket{\d\mat{G}}\mat{w}^2\bra{\d\mat{G}}
       + \mat{\Gamma}\mat{\Gamma}^{T} + \mat{1}w_0^2
       }

This dynamic restart mechanism can be easily used if one can relate
the objective function $\vect{G}(\vect{x})$ to an approximately
convergent iteration:

.. math::
   \vect{x} \mapsto \vect{F}(\vect{x}) 
   = \vect{x} - \mat{B}_{0}\vect{G}(\vect{x}).

Pulling the approximate inverse Jacobian $\mat{B} \rightarrow
\mat{B}_{0}$ will pull the iterations back to this (hopefully)
convergent iteration.  Note that the typical linear mixing scheme
$\vect{x} \mapsto (1-\alpha)\vect{x} + \alpha\vect{F}(\vect{x})$ is
simply equivalent to scaling $\mat{B}_{0} \rightarrow
\alpha\mat{B}_{0}$ which may also be simply implemented.

.. note:: The diagonals of the matrix $\mat{J}_0 = \mat{B}_0^{-1}$ are
   specified by the parameter :attr:`IProblem.j0_scale`.  The
   parameter $\alpha$ is treated separately in the solver.

Regularization
--------------

Finally, the choice of $\mat{\Gamma}\mat{\Gamma}^{T}$ is such that the
denominators are positive definite.  Several choices are
discussed in [Bierlaire:2006]_:

"Geometric approach" : 
   Here we set $\mat{\Gamma} = \mat{0}$ which is fine unless there are
   singular or dependent direction which are common.  It is also sufficient with
   a dynamic restart where $w_0^2$ regularizes the denominator.  It is
   completely unsuitable, however, for a memory limited approach.
"Subspace decomposition" :
   The subspace decomposition method makes use of the following formulae:

   .. math::
        \frac{1}{\ket{G}\bra{H} +
          \ket{G_\perp}w_{r}^2\bra{G_\perp} + w_0^2\mat{1}} = 
        \frac{1}{w_0^2}
        \left(\mat{1} - \ket{G}\frac{1}
             {\mat{1}w_0^2 + \braket{H|G}}\bra{H}\right)
        \left(\mat{1} - \frac{\ket{G_\perp}w_r^2\bra{G_\perp}}
                           {w_0^2 + w_r^2}\right),\\
        \frac{1}{\ket{G}\bra{H} +
          \ket{H_\perp}w_{r}^2\bra{H_\perp} + w_0^2\mat{1}} = 
        \frac{1}{w_0^2}
        \left(\mat{1} - \frac{\ket{H_\perp}w_r^2\bra{H_\perp}}
                           {w_0^2 + w_r^2}\right)
        \left(\mat{1} - \ket{G}\frac{1}
             {\mat{1}w_0^2 + \braket{H|G}}\bra{H}\right)

   >>> np.random.seed(0)
   >>> w02 = 0.1
   >>> wr2 = 0.2
   >>> N = 10
   >>> m = 5
   >>> I = np.eye(N)
   >>> i = np.eye(m)
   >>> dG = np.mat(np.random.rand(N,m))
   >>> H = np.mat(np.random.rand(N,m))
   >>> dG_perp = np.linalg.svd(dG)[0][:,m:]
   >>> assert(np.allclose(dG.T*dG_perp, 0))
   >>> B = np.mat(np.random.rand(N,N))
   >>> D = dG*H.T + dG_perp*wr2*dG_perp.T + I*w02
   >>> Dinv = ((I - dG*np.linalg.inv(i*w02 + H.T*dG)*H.T)*
   ...         (I - wr2*dG_perp/(wr2+w02)*dG_perp.T))/w02
   >>> np.allclose(D*Dinv,I)
   True
   >>> H_perp = np.linalg.svd(H)[0][:,m:]
   >>> assert(np.allclose(H.T*H_perp, 0))
   >>> B = np.mat(np.random.rand(N,N))
   >>> D = dG*H.T + H_perp*wr2*H_perp.T + I*w02
   >>> Dinv = ((I - wr2*H_perp/(wr2+w02)*H_perp.T)*
   ...     (I - dG*np.linalg.inv(i*w02 + H.T*dG)*H.T))/w02
   >>> np.allclose(D*Dinv,I)
   True

   For the 
   We start with the subspace decomposition for method 2b).  Here we set
   $\mat{\Gamma} = \ket{G_\perp}\mat{w}_r$, arriving at the following inversion
   formula:

   The fraction has the following forms for method 1a) and 2b) respectively:

   .. math::
        \frac{1}{\ket{G}\bra{X} +
          \mat{B}^{-1}\ket{\Gamma}\bra{\Gamma} + w_0^2\mat{1}},\\
        \frac{1}{\ket{G}\bra{G} +
          \ket{\Gamma}\bra{\Gamma} + w_0^2\mat{1}}

   The subspace decomposition method chooses $\ket{\Gamma}$ to be orthogonal to
   either $\ket{X}$, $


   This method sets $\mat{\Gamma} = \ket{\uvect{X}_{\perp}}$ (case 1a) or
   $\mat{\Gamma} = \ket{\uvect{G}_{\perp}}$ (case 1a or 2b).  These are the
   basis vectors orthogonal to $\ket{\d\mat{X}}$ or $\ket{\d\mat{G}}$
   respectively.  Let the denominator be written as:


   (Old notes) A couple of points.  First, if $w_0 = 0$, then the results are
   independent of the magnitude of $\mat{\Gamma}$.  To see this, note that one
   has explicitly (for case 2b) for example):

   .. math::
      \frac{1}{\ket{\uvect{G}}\mat{x}\bra{\uvect{G}} 
        + \lambda\ket{\uvect{G}_\perp}\bra{\uvect{G}_\perp}} =
      \ket{\uvect{G}}\mat{x}^{-1}\bra{\uvect{G}} 
        + \frac{1}{\lambda}\ket{\uvect{G}_\perp}\bra{\uvect{G}_\perp}

   The second term will be eliminated when acted upon by the numerator
   factor $\bra{\d\mat{G}}$.  This method poses some formal difficulty for the
   inversion of the inverse formulation of case 2b) as the denominator contains
   the mixed term $\mat{B}_{-1}\ket{\d\mat{G}}\mat{w}^2\bra{\d\mat{X}}$,
   however, since one should generally have $\mat{B}_{-1}\ket{\d\mat{G}} \approx
   \ket{\d\mat{X}}$, this should not pose any difficulty in practise.
   Technically, we implement this by simply performing an SVD of the denominator
   and dropping the orthogonal subspace.

   .. note:: The standard Broyden method is reproduced with this approach if
      there is no restart parameter $w_0 = 0$ and only a single previous vector
      $\ket{X}$ is kept.
"Numerical approach":
   They recommend a numerical approach of performing a Modified
   Cholesky decomposition :func:`mmf.math.linalg.modchol` of $\mat{A}
   = \ket{\d\mat{G}}\mat{w}^2\bra{\d\mat{X}} = \mat{L}\mat{L}^{T} -
   \mat{E}$ where $\mat{E} = \mat{\Gamma}\mat{\Gamma}^T$ is a
   diagonal matrix as small as possible making $\mat{A} + \mat{E}$
   safely positive definite and well conditioned.

   Unfortunately, this Modified Cholesky based approach does not work
   for method 1a) where the denominator is both large ($N\times N$)
   and slightly asymmetric.  (Ideally $\mat{B}_{0}\ket{\d\mat{G}}
   \approx \ket{\d\mat{X}}$ so the asymmetry should be small.)

   Our implementation of this is the same as the "direct inversion"
   approach discussed below, but with a finite $\mat{\Gamma}$ such
   that the smallest singular value is not less than
   :attr:`JinvBroyden.min_svd` times the largest.
"Direct inversion":
   If $\mat{C} = \mat{\Gamma}\mat{\Gamma}^{T} + w_{0}^2\mat{1}$ is easily
   invertible, then the following formula suggested by 
   [Johnson:1988]_ can be used.  (In [Johnson:1988]_ they set
   $\mat{\Gamma} = w_{0}\mat{1}$ keeping and so the requirement of minimizing
   the effect of $\mat{\Gamma}$ comes in the limit $w_0 \rightarrow 0$.  Note
   that this is not exactly the same as our restart condition because in this
   case the Jacobian is pulled back to the previous Jacobian, not back to the
   restart Jacobian.):

   .. math::
      \frac{1}{\mat{C} + \ket{\mat{a}}\bra{\mat{b}}} =
      \mat{C}^{-1} - \mat{C}^{-1}\ket{\mat{a}}\cdot
        \frac{1}{\mat{1}
                 + \braket{\mat{b}|\mat{C^{-1}|\mat{a}}}}
      \cdot\bra{\mat{b}}\mat{C}^{-1}.

   >>> from numpy.linalg import inv
   >>> np.random.seed(3); N=10; n=3;
   >>> X = np.mat(np.random.rand(N, n))
   >>> G = np.mat(np.random.rand(N, n))
   >>> C = np.mat(np.random.rand(N, N))
   >>> I = np.mat(np.eye(N))
   >>> i = np.mat(np.eye(n))
   >>> np.allclose(inv(C + G*X.T),
   ...             inv(C) - inv(C)*G*inv(i + X.T*inv(C)*G)*X.T*inv(C))
   True

   This only requires inverting the small $n\times n$ matrix $\mat{1} +
   \braket{\mat{X}|\mat{\mat{G}}}$. One further simplification arises when the
   factor of $\bra{\mat{b}}$ appears to the left as is the case in both methods
   without the restart:

   .. math::
      :label: proj_inv

      \bra{\mat{b}}\frac{1}{\mat{C} + \ket{\mat{a}}\bra{\mat{b}}} =
      \frac{1}{\mat{1} + \braket{\mat{b}|\mat{C^{-1}|\mat{a}}}}
      \cdot\bra{\mat{b}}\mat{C}^{-1}.


Symmetric Jacobian
------------------
If it is known that the Jacobian is symmetric, then this constraint
can be built into the minimization with a Lagrange multiplier.

.. todo:: Finish figuring out how to solve the symmetric problem::

   >> A = rand(5,5)
   >> B = rand(5,5)
   >> B = B+B.T
   >> I = eye(5)
   >> M = zeros((25,25))
   >> for i in xrange(5):
   ..     for j in xrange(i+1):
   ..         for a in xrange(5):
   ..             for b in xrange(5):
   ..                 M[i+5*j, a+5*b] = (I[i,a]*A[j,b] + I[j,a]*A[i,b]
   ..                                   +I[j,b]*A[i,a] + I[i,b]*A[j,a])

   See http://dx.doi.org/10.1016/0024-3795(89)90067-0


Memory-Limited Extended Broyden Method
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Here we consider a slight generalization of the previous method in a memory
limited approach.  First, we keep a limited set of $n$ previous points
$\ket{\mat{X}}$ and function evaluations at those points $\ket{\mat{G}}$.  These
define two $n-1$ dimensional subspaces spanned by their differences:
$\ket{\uvect{X}} = \Sp\ket{\delta\mat{X}}$ and $\ket{\uvect{G}} =
\Sp\ket{\delta\mat{G}}$.  This leads to the "stateless" version described
first. Second, we keep track of an additional $2m$ basis vectors for spanning
the domain and range of the inverse Jacobian approximation.  Thus, we need
$N\times(m+n)$ storage.  We express everything in terms of these subspaces, and
so work only with $(m+n-1)\times(m+n-1)$ dimensional matrices as described
below.  Finally, for simplicity, we assume that the original problem is
appropriately scaled so that $\mat{B}_{0} = \mat{1}$.

Full "Inefficient" Version
--------------------------

The general update has the form:

.. math::
     \mat{B} \rightarrow 
     \mat{B} + 
      \left((\ket{\d\mat{X}} - \mat{B}\ket{\d\mat{G}})\bra{H} 
      + (1-\mat{B})w_0^2\right)\cdot
      \frac{1}{\ket{\d\mat{G}}\bra{H} +
        \ket{\mat{\Gamma}}w_{r}^2\bra{\mat{\Gamma}} + w_0^2\mat{1}}

where $\ket{H} \in \{\ket{\d\mat{X}}, \ket{\d\mat{G}}\}$ for methods 1a) and 2b)
respectively.


We start with the subspace decomposition method with $\mat{\Gamma} =
\ket{G_\perp}$ or $\mat{\Gamma} = \ket{H_\perp}$.  The inverse Jacobian has the
following general form where $\ket{R} = \ket{\d\mat{X}} -
\mat{B}\ket{\d\mat{G}}$ and $\ket{H} \in \{\ket{\d\mat{X}},\ket{\d\mat{G}}\}$
for methods 1a) and 2b) respectively.

.. math::
   \begin{aligned}
     \mat{B} &\rightarrow 
     \mat{B} + 
      \left(\ket{R}\bra{H} + (1-\mat{B})w_0^2\right)\cdot
      \frac{1}{\ket{\d\mat{G}}\bra{H} +
        \ket{\d\mat{G}_\perp}w_{r}^2\bra{\d\mat{G}_\perp} + w_0^2\mat{1}},\\
    &= \mat{1} + f(\mat{B} - \mat{1})\left(\mat{1} -
       \ket{\d\uvect{G}}\bra{\d\uvect{G}}\right)
    + (\ket{\d\mat{X}} - \ket{\d\mat{G}})
      \frac{1}{\mat{1}w_0^2 + \braket{H|\d\mat{G}}}
      \left((1-f)\bra{H} +
            f\braket{H|\d\mat{G}}
            \braket{\d\mat{G}|\d\mat{G}}^{-1}\bra{\d\mat{G}}\right)
  \end{aligned}

>>> np.random.seed(0)
>>> w02 = 0.1
>>> wr2 = 0.2
>>> N = 10
>>> m = 5
>>> I = np.eye(N)
>>> i = np.eye(m)
>>> dG = np.mat(np.random.rand(N,m))
>>> dX = np.mat(np.random.rand(N,m))
>>> H = np.mat(np.random.rand(N,m))
>>> dG_perp = np.linalg.svd(dG)[0][:,m:]
>>> dG_hat = np.linalg.svd(dG)[0][:,:m]
>>> assert(np.allclose(dG.T*dG_perp, 0))
>>> B = np.mat(np.random.rand(N,N))
>>> R = dX - B*dG
>>> B1 = B + (R*H.T + (I-B)*w02)*np.linalg.inv(
...     dG*H.T + dG_perp*wr2*dG_perp.T + I*w02)
>>> f = wr2/(wr2+w02)
>>> B2 = (I + f*(B-I)*(I - dG_hat*dG_hat.T) 
...             + (dX - dG)*np.linalg.inv(i*w02 + H.T*dG)*(
...          (1-f)*H.T + f*H.T*dG*np.linalg.inv(dG.T*dG)*dG.T))
>>> np.allclose(B1,B2)
True

.. math::
   \begin{aligned}
     \mat{B} &\rightarrow 
     \mat{B} + 
      \left(\ket{R}\bra{H} + (1-\mat{B})w_0^2\right)\cdot
      \frac{1}{\ket{\d\mat{G}}\bra{H} +
        \ket{H_\perp}w_{r}^2\bra{H_\perp} + w_0^2\mat{1}} \\
    &= \mat{1} + f(\mat{B} - \mat{1})
    + \left[\ket{\d\mat{X}} - \ket{\d\mat{G}} 
            - f(\mat{B} - \mat{1})
               (\ket{\d\mat{G}} + w_0^2\ket{H}\braket{H|H}^{-1})\right]
      \frac{1}{\mat{1}w_0^2 + \braket{H|\d\mat{G}}}\bra{H}
   \end{aligned}

>>> np.random.seed(0)
>>> w02 = 0.1
>>> wr2 = 0.2
>>> N = 10
>>> m = 5
>>> I = np.eye(N)
>>> i = np.eye(m)
>>> dG = np.mat(np.random.rand(N,m))
>>> dX = np.mat(np.random.rand(N,m))
>>> H = np.mat(np.random.rand(N,m))
>>> H_perp = np.linalg.svd(H)[0][:,m:]
>>> H_hat = np.linalg.svd(H)[0][:,:m]
>>> assert(np.allclose(H.T*H_perp, 0))
>>> B = np.mat(np.random.rand(N,N))
>>> R = dX - B*dG
>>> B1 = B + (R*H.T + (I-B)*w02)*np.linalg.inv(
...     dG*H.T + H_perp*wr2*H_perp.T + I*w02)
>>> f = wr2/(wr2+w02)
>>> B2 = (I + f*(B-I)
...         + (dX - dG - f*(B-I)*(dG + w02*H*np.linalg.inv(H.T*H)))
...            *np.linalg.inv(i*w02 + H.T*dG)*H.T)
>>> np.allclose(B1, B2)
True

From these expressions we can determine the range and domain of the final
inverse Jacobian.  Let the previous inverse be expressed as $\mat{B} = \mat{1} +
\ket{\alpha}\bra{\beta}$.

Method 1a)
++++++++++

For method 1a) $\ket{H} = \ket{\d\mat{X}}\mat{w}^2$. Choosing $\mat{\Gamma} =
\ket{\uvect{H}_\perp}$ gives the following, which reduces to Broyden's original
"Good" method in the limit of $w_0 \rightarrow 0$ and keeping exactly one
previous step.

.. math::
    \mat{B} = \mat{1} + f(\mat{B} - \mat{1})
    + \left[\ket{\d\mat{X}} - \ket{\d\mat{G}} 
            - f(\mat{B} - \mat{1})
               (\ket{\d\mat{G}} 
                + w_0^2\ket{\d\mat{X}}
                  \braket{\d\mat{X}|\d\mat{X}}^{-1}\mat{w}^{-2})\right]
      \frac{1}{\mat{1}w_0^2 + \mat{w}^2
               \braket{\d\mat{X}|\d\mat{G}}}\mat{w}^2\bra{\d\mat{X}}.


The range is $\Sp(\ket{\alpha}, \ket{\d\mat{X} - \d\mat{G}})$ and the domain is
$\Sp(\bra{\beta}, \bra{\d\mat{X}}, \bra{\d\mat{G}})$.  Choosing $\mat{\Gamma} =
\ket{\d\uvect{G}_\perp}$ gives the following:

.. math::
    \mat{B} = \mat{1} + 
     f(\mat{B} - \mat{1})\left(
         \mat{1} - \ket{\d\uvect{G}}\bra{\d\uvect{G}}\right)
    + (\ket{\d\mat{X}} - \ket{\d\mat{G}})
      \frac{1}{\mat{1}w_0^2 + \mat{w}^2\braket{\d\mat{X}|\d\mat{G}}}\mat{w}^2
      \left((1-f)\bra{\d\mat{X}} 
       + f\braket{\d\mat{X}|\d\uvect{G}}\bra{\d\uvect{G}}\right)

The range is $\Sp(\ket{\alpha}, \ket{\d\mat{X} - \d\mat{G}})$ and the
domain is $\Sp(\bra{\beta}, \bra{\d\mat{X}})$.

Method 2b)
++++++++++

Both expressions are the same for method 2b) since $\ket{H} = \ket{\d\mat{G}}$
and the final expression for $\mat{\Gamma} = \ket{\d\uvect{G}_\perp}$ is the
following, which reduces to Broyden's original "Bad" method in the limit of $w_0
\rightarrow 0$ and keeping exactly one previous step.

.. math::
    \mat{B} = \mat{1} + 
     f(\mat{B} - \mat{1})\left(\mat{1} 
      - \ket{\d\uvect{G}}\bra{\d\uvect{G}}\right)
    + (\ket{\d\mat{X}} - \ket{\d\mat{G}})
      \frac{1}{\mat{1}w_0^2 + 
               \mat{w}^2\braket{\d\mat{G}|\d\mat{G}}}\mat{w}^2\bra{\d\mat{G}}.

The range is $\Sp(\ket{\alpha}, \ket{\d\mat{X} - \d\mat{G}})$ and the domain is
$\Sp(\bra{\beta}, \bra{\d\mat{G}})$.

.. note:: These reduce to the following in the limit of $f \rightarrow 0$ where
   one sets $w_r\rightarrow 0$.  This limit is a "stateless" version where the
   Jacobian does not depend on the previous Jacobian (however, in preliminary
   investigations, this has not performed well.)

   .. math::
      \mat{B}_{\text{1a)}} &= \mat{1} + 
        (\ket{\d\mat{X}} - \ket{\d\mat{G}})\cdot
        \frac{1}{w_0^2\mat{1} + \mat{w}^2\cdot\braket{\d\mat{X}|\d\mat{G}}}
        \cdot\mat{w}^2\cdot\bra{\d\mat{X}}\\
      \mat{B}_{\text{2b)}} &= \mat{1} +
        (\ket{\d\mat{X}} - \ket{\d\mat{G}})\cdot
        \frac{1}{w_0^2\mat{1} + \mat{w}^2\cdot\braket{\d\mat{G}|\d\mat{G}}}
        \cdot\mat{w}^2\cdot\bra{\d\mat{G}}

   >>> np.random.seed(0)
   >>> w02 = 0.1
   >>> N = 10
   >>> m = 5
   >>> I = np.eye(N)
   >>> dG = np.mat(np.random.rand(N,m))
   >>> dX = np.mat(np.random.rand(N,m))
   >>> B = np.mat(np.random.rand(N,N))
   >>> R = dX - B*dG
   >>> w2 = np.mat(np.diag(np.random.rand(m)**2))
   >>> B1a = B + (R*w2*dX.T + (I - B)*w02)*np.linalg.inv(
   ...     B*dG*w2*dX.T + B*w02)*B
   >>> B2b = B + (R*w2*dG.T + (I - B)*w02)*np.linalg.inv(
   ...     dG*w2*dG.T + I*w02)
   >>> B1a_ = I + (dX - dG)*np.linalg.inv(
   ...     np.eye(m)*w02 + w2*dX.T*dG)*w2*dX.T
   >>> B2b_ = I + (dX - dG)*np.linalg.inv(
   ...     np.eye(m)*w02 + w2*dG.T*dG)*w2*dG.T
   >>> np.allclose(B1a, B1a_), np.allclose(B2b, B2b_)
   True, True
   

We begin by representing the Jacobian and inverse in a limited
subspace as follows:

.. math::
   \mat{J} &= \mat{1} + \ket{\uvect{\alpha}}\mat{j}\bra{\uvect{\beta}},\\
   \mat{B} &= \mat{1} + \ket{\uvect{\alpha}}\mat{b}\bra{\uvect{\beta}}.

The inversion relation gives

.. math::
   \mat{b} &= \frac{-1}{\mat{j}^{-1} +
                       \braket{\uvect{\beta}|\uvect{\alpha}}},\\
   \mat{j} &= \frac{-1}{\mat{b}^{-1} +
                       \braket{\uvect{\beta}|\uvect{\alpha}}}.


We drop the use of the $\mat{\Gamma}$ matrix, instead relying on the
projectors $\ket{\uvect{\alpha}}$ and $\ket{\uvect{\beta}}$ and
minimizing the following

.. math::
   &\min_{\mat{J}}\left(\sum_{j}
      \Norm{(\ket{\d{G}_i} - \mat{J}\ket{\d{X}_i})\mat{w}_{ij}}_{2}^{2}
      + w_0^{2} \norm{\mat{J} - \mat{J}_{0}}_{F}^{2}
      + w_r^{2} \norm{\mat{J} - \mat{1}}_{F}^{2}
    \right),\\
   &\min_{\mat{B}}\left(\sum_{j}
      \Norm{(\ket{\d{X}_{i}} - \mat{J}\ket{\d{G}_{i}})\mat{w}_{ij}}_{2}^{2}
      + w_0^{2} \norm{\mat{B} - \mat{J}_{0}}_{F}^{2}
      + w_r^{2} \norm{\mat{B} - \mat{1}}_{F}^{2}
    \right).

Here we have three weights $\mat{w}$ specifies the priority of the
secant conditions: This may be a matrix, allowing for more complicated
preference schemes involving all $n(n-1)/2$ differences rather than
just the $n-1$ basis vectors.

.. note:: It is assumed that $\mat{w}$ is symmetric in this sense that
   $\mat{w}^2$ is used everywhere.  The input need not be symmetric
   and $\mat{w}^{2} \equiv \mat{w}\mat{w}^T$ will be used throughout.

.. note:: In order for this to work well, the Jacobian of the objective function
   must be well represented by the identity $\mat{1}$ plus a correction of
   low rank.  If, for a given problem, the Jacobian has the form 
   $\mat{J} \approx \mat{J}_0 + \ket{}\bra{}$ then one should rescale the
   function $\ket{\mat{G}} \rightarrow \mat{J}_0^{-1} \ket{\mat{G}}$ so that the
   leading term is the identity.  It is assumed here that this rescaling has
   been performed.

   This is related to the notion of "linear mixing" as an iterative scheme:

   .. math::
      x \rightarrow (1-\alpha)x + \alpha F(x).

   If this is known to converge, then one can probably assume that $\mat{B}
   \approx \alpha\mat{1} + \ket{}\bra{}$ and one should rescale the objective
   $G(x) \rightarrow \alpha G(x)$.  If this has been done consistently, then
   pulling the Jacobian back toward the identity with a non-zero $w_r$ will pull
   the Newton iteration back toward the linear mixing which should restore
   convergence.

for methods 1a) and 2b) respectively.  This gives

.. math::
   \mat{j}_{\text{1a)}} &=
     \left\{
       \braket{\uvect{\alpha}|\d\mat{G}-\d\mat{X}}\mat{w}^2
       \braket{\d\mat{X}|\uvect{\beta}}
       +
       w_{0}^2\braket{\uvect{\alpha}|\uvect{\alpha}_{0}}
              \mat{j}_{0}
              \braket{\uvect{\beta}_{0}|\uvect{\beta}}
     \right\}
   \cdot
   \frac{1}{\braket{\uvect{\beta}|\d\mat{X}}\mat{w}^2
            \braket{\d\mat{X}|\uvect{\beta}} + w_{0}^2 + w_{r}^2},\\
   \mat{b}_{\text{2b)}} &= 
     \left\{
       \braket{\uvect{\alpha}|\d\mat{X}-\d\mat{G}}\mat{w}^2
       \braket{\d\mat{G}|\uvect{\beta}}
       +
       w_{0}^2\braket{\uvect{\alpha}|\uvect{\alpha}_{0}}
              \mat{b}_{0}
              \braket{\uvect{\beta}_{0}|\uvect{\beta}}       
     \right\}
   \cdot
   \frac{1}{\braket{\uvect{\beta}|\d\mat{G}}\mat{w}^2
            \braket{\d\mat{G}|\uvect{\beta}} + w_{0}^2 + w_{r}^2}.

From these formula, it is clear that no information will be lost if we
have the following:

  1a)
    .. math::
       \ket{\uvect{\alpha}} &=
         \Sp(\ket{\uvect{\alpha}_{0}}, \ket{\d\mat{X}-\d\mat{G}}),\\
       \ket{\uvect{\beta}} &=
         \Sp(\ket{\uvect{\beta}_{0}}, \ket{\d\mat{X}}),\\
  2b)
    .. math::
       \ket{\uvect{\alpha}} &=
         \Sp(\ket{\uvect{\alpha}_{0}}, \ket{\d\mat{X}-\d\mat{G}}),\\
       \ket{\uvect{\beta}} &=
         \Sp(\ket{\uvect{\beta}_{0}}, \ket{\d\mat{G}}),\\

The purpose of the $w_{0}$ term here is to minimize the change of the
approximation from the previous result.  This is important for the
representation of the Jacobian in the subspace spanned by
$\ket{\uvect{\alpha}}$ and $\ket{\uvect{\beta}}$ that lies outside of
the space of secant conditions.  The parameters $w_{r}$ is new here and
"pulls" the solution back toward the identity $\mat{1}$ which ideally
describes a convergent (but slow) method.  This simulates a "restart"
when large -- resetting the Jacobian to the identity and thus should
be used when convergence is poor (or there is a divergence).

The numeric issue here is dealing with poorly behaved matrices.  To
deal with this, we introduce a parameter `singular_tol` with the goal
that all singular values of the diadic $\mat{b}$ of the final inverse
Jacobian lie between `singular_tol` and `1/singular_tol`.  To do this,
all matrix multiplications of the smaller matrices are done using the
SVD with singular values outside the range of `singular_tol**2`
truncated in intermediate calculations and a final truncation within
the required range.



Memory-limited Implementation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To limit the amount of memory required, we store the following:

$\ket{\mat{X}}$, $\ket{\mat{G}}$ :
   $N \times n$ matrices: the columns of which are the previous steps
   and function evaluations.  (The oldest steps may be replaced and
   indices used to keep track of the ordering to minimize the
   requirement for working with this array.)
$\ket{\uvect{A}}$, $\ket{\uvect{B}}$ :
   $N \times m - (n-1)$ matrices representing the extra basis vectors
   required to span the full space of $\ket{\uvect{\alpha}}$ and
   $\ket{\uvect{\beta}}$.

In order to prevent working with the large matrices, we form the
following matrices:

.. math::
   \begin{pmatrix}
     \braket{\mat{X}-\mat{G}|\mat{X}-\mat{G}} 
     & \braket{\uvect{A}|\mat{X}-\mat{G}}\\
     \braket{\mat{X}-\mat{G}|\uvect{A}} 
     & \braket{\uvect{A}|\uvect{A}}
   \end{pmatrix},
   \qquad
   \begin{pmatrix}
     \braket{\mat{X}|\mat{X}} 
     & \braket{\uvect{B}|\mat{X}}\\
     \braket{\mat{X}|\uvect{B}} 
     & \braket{\uvect{B}|\uvect{B}}
   \end{pmatrix},
   \qquad
   \begin{pmatrix}
     \braket{\mat{G}|\mat{G}} 
     & \braket{\uvect{B}|\mat{G}}\\
     \braket{\mat{G}|\uvect{B}} 
     & \braket{\uvect{B}|\uvect{B}}
   \end{pmatrix}.

.. note:: This is predicated on the following analysis for extracting
   the span of a set of vectors.  Consider the SVD of 
   a set of vectors:

   .. math::
      \ket{\mat{X}} = 
      \begin{pmatrix}
        \ket{\uvect{U}} & \ket{\uvect{U}_{\perp}}
      \end{pmatrix}
      \begin{pmatrix}
      \mat{\sigma}\\
        & \mat{0}
      \end{pmatrix}
      \begin{pmatrix}
        \mat{V}^{\dagger}\\
        \mat{V}_{\perp}^{\dagger}
      \end{pmatrix}
    
   The desired basis $\ket{\uvect{U}} = \Sp(\ket{\mat{X}})$ may be
   expressed in terms of the original vectors as

   .. math::
      \ket{\uvect{U}} = \ket{\mat{X}}\mat{V}\mat{\sigma}^{-1}.

   Hence, we may determine this from the original matrix by
   considering the eigenvalue decomposition of the small matrix

   .. math::
      \braket{\mat{X}|\mat{X}} = 
      \begin{pmatrix}
        \mat{V} & \mat{V}_{\perp}
      \end{pmatrix}
      \begin{pmatrix}
        \mat{\sigma}^2\\
          & \mat{0}
      \end{pmatrix}
      \begin{pmatrix}
        \mat{V}^{\dagger}\\
        \mat{V}_{\perp}^{\dagger}
      \end{pmatrix}.

   A rank revealing Cholesky decomposition of $\braket{\mat{X}|\mat{X}} =
   \mat{r}^{T}\mat{r}$ may also be similarly used.

   A corollary is that, given an orthonormal basis
   $\ket{\uvect{\alpha}_{0}}$ and a new set of vectors $\ket{\beta}$,
   we may construct the augmented basis by diagonalizing the small
   matrix

   .. math::
      \braket{\beta|\beta} - 
      \braket{\beta|\uvect{\alpha}_{0}}
      \braket{\uvect{\alpha}_{0}|\beta} 
      &= \mat{v} \mat{\sigma}^2 \mat{v}^{T},\\
      \begin{pmatrix}
        \ket{\uvect{\alpha}_{0}} &
        \ket{\uvect{\alpha}_{1}}
      \end{pmatrix} &=
      \begin{pmatrix}
        \ket{\uvect{\alpha}_{0}} &
        \ket{\beta}
      \end{pmatrix}
      \cdot
      \begin{pmatrix}
        \mat{1} & -\braket{\uvect{\alpha}_{0}|\beta}\mat{v}\mat{\sigma}^{-1}\\
        \mat{0} & \mat{v}\mat{\sigma}^{-1}
      \end{pmatrix}.

   >>> np.random.seed(0)
   >>> alpha_0 = np.linalg.qr(np.random.rand(10, 5))[0][:,:5]
   >>> beta = np.random.rand(10, 2)
   >>> alpha_0_beta = np.dot(alpha_0.T, beta)
   >>> s2, v = np.linalg.eigh(np.dot(beta.T, beta) 
   ...                        - np.dot(alpha_0_beta.T, alpha_0_beta))
   >>> s = np.sqrt(s2)
   >>> alpha_1 = np.dot(beta - np.dot(alpha_0, alpha_0_beta), v/s[None,:])
   
   $\ket{\uvect{\alpha}_{0}}$ and $\ket{\uvect{\alpha}_{1}}$ are
   orthogonal and normalized:

   >>> np.allclose(np.dot(alpha_1.T, alpha_0), 0)
   True
   >>> np.allclose(np.dot(alpha_1.T, alpha_1), np.eye(2))
   True

For large systems, we cannot store the full matrices.  If the number
of iterations is small, then we can form all of the matrices in terms
of dyads (see :class:`DyadicSum`), however, if the number of
iterations becomes too large, then we must truncate the spaces.  Let
us consider the following information with $N$ being the (large) size
of the space, and $n$ be the number of previous steps maintained.

$\ket{x}$, $\ket{x_0}$, $\ket{g}$, $\ket{g_0}$ : 
   1-d arrays of length `N` of the current and previous steps along
   with the value of the objective function at these points.
$\ket{\uvect{X}}$, $\ket{\uvect{G}}$ :
   $N \times (n-1)$ arrays of orthonormal basis vectors spanning the
   space of difference vectors $\ket{x_i} - \ket{x_j}$ and $\ket{g_i}
   - \ket{g_j}$ respectively.
$\ket{\uvect{B}_x}$, $\ket{\uvect{B}_g}$ :
   $N \times m$ arrays of orthonormal basis vectors for representing
   the inverse Jacobian.

Given these, we can express everything in terms of various $n \times
n$ matrices in these bases.  Thus, the inverse Jacobian is expressed
as:

.. math::
   \mat{B} = \mat{1} + \ket{\uvect{X}}\mat{b}\bra{\uvect{G}}.

One might like to include an optional preconditioner $\mat{B}
\rightarrow \mat{B}\op{B}_0$.  For method 1a) the preconditioner
factors to the right: to implement this, the simplest approach is to
simply precondition the initial function $G(x) \rightarrow
\op{B}_{0}G(x)$.  For method 2b) a simple factorization does
not work quite as well.  [Johnson:1988]_ also suggests allowing for a
factor $\alpha$ to reduce the weight of the initial identity
approximation:

.. math::
   \mat{B} = \alpha\mat{1} + \ket{\uvect{X}}\mat{b}\bra{\uvect{G}}.

We now express everything in terms of the basis vectors
$\ket{\uvect{X}}$ and $\ket{\uvect{G}}$.  Thus, we define the
following matrices:


.. math::
   \ket{\d\mat{X}} &= \ket{\uvect{X}}\mat{d}_x, \\
   \ket{\d\mat{G}} &= \ket{\uvect{G}}\mat{d}_g.

This allows use to express the equations as:

.. math::
   \mat{B}\ket{\uvect{G}} = \ket{\uvect{G}}\alpha
    + \ket{\uvect{X}}\mat{b}

.. math::
   \mat{B}_{\text{1a)}} &\rightarrow
     \mat{1} + \ket{\uvect{X}}\mat{b}\bra{\uvect{G}}
     +
     \left(\ket{\uvect{X}}\mat{d}_x 
           - \mat{B}_{0}\ket{\uvect{G}}\mat{d}_g\right)
     \cdot\mat{w}^2\cdot\mat{d}_{x}^{T}\bra{\uvect{X}}\cdot\frac{1}{
       \mat{B}_{0}\ket{\uvect{G}}
       \mat{d}_g\mat{w}^2\mat{d}_x^{T}\bra{\uvect{X}}
       + w_{0}^2\mat{\Gamma}\mat{\Gamma}^{T}}\cdot
       (\mat{1} +  \ket{\uvect{X}}\mat{b}\bra{\uvect{G}}), \\
   \mat{B}_{\text{2b)}} &= \mat{B}_{0}
     + (\ket{\d\mat{X}} - \mat{B}_{0}\ket{\d\mat{G}})\cdot\mat{w}^2
     \bra{\d\mat{G}}\cdot\frac{1}{
       \ket{\d\mat{G}}\mat{w}^2\bra{\d\mat{G}}
       + w_{0}^2\mat{\Gamma}\mat{\Gamma}^{T}}


For extremely large systems, we wish to represent the inverse Jacobian
as a dyadic sum:

.. math::
   \mat{B} = (\mat{1} + \mat{U}\mat{\sigma}\mat{V}^{T})\mat{B}_{0}

where $\mat{U}$ and $\mat{V}$ are $n \times k$ where $n$ is the size
of the problem (large) and $k$ is the number of previous steps we
keep (small).  The main difficulty with these formula is inverting the
large matrix.  The "Good" method makes this even more difficult
because the denominator is not symmetric while the "Bad" method --
apart from possibly suffering some of the numerical instabilities of
the "Bad" Broyden method -- does not easily allow the form of the
dyadic inverse Jacobian to be preserved (if $\mat{B}_{0}$ is
non-trivial).

.. _sec:truncation-algorithm:

Truncation Algorithm
--------------------
In the memory limited-implementation, one must choose how to discard
the previous steps.  This choice must be made in two place: the set of
previous steps $\ket{x}$ and $\ket{g}$ to be retained, and the
truncation of the current inverse Jacobian approximation $\mat{B}$.



In principle, given $n < N$ previous steps,
it is likely that the all of the steps are linearly independent, and
so the minimization condition is irrelevant - one simply solves the
$n$ secant equations in the subspace spanned by the difference
vectors. (The subspace decomposition method has this property -
Johnson's method does not quite have this property when $0 < w_0$.)

Once one needs to limit 


.. [Broyden:1965] C. G. Broyden, `A Class of Methods for Solving
   Nonlinear Simultaneous Equations
   <http://www.jstor.org/stable/2003941>`_, Math. Comput. 19(92),
   577-593 (1965)

.. [Johnson:1988] D. D. Johnson, `Modified Broyden's method for
   accelerating convergence in self-consistent calculations
   <http://dx.doi.org/10.1103/PhysRevB.38.12807>`_, Phys. Rev. B
   38(18), 12807--12813 (1988)

.. [Bierlaire:2006] M. Bierlaire, F. Crittin, and M. Themans, `A
   multi-iterate method to solve systems of nonlinear equations
   <http://dx.doi.org/10.1016/j.ejor.2006.09.080>`_, European Journal
   of Operational Research 183(1), 20 -- 41 (2006)

Parameters
==========
A modification of the Newton step can be used to allow manual control
of some (small) subset of the parameters.  As an example (and in our
notations) we consider a set of chemical potentials :math:`\mu` and
the respective densities :math:`N`.  Allowing the chemical potentials
to change to soon, before the other parameters have relaxed, can throw
off an iteration leading to wild oscillations.   Instead, one might
want to wait until a degree of convergence for fixed chemical
potentials has been achieved, and then let the vary.

We partition the problem and Jacobian as

.. math::
   \begin{pmatrix}
      \d{G}\\
      \d{N}
   \end{pmatrix} = 
   \mat{J}\cdot
   \begin{pmatrix}
     \d{x}\\
     \d{\mu}
   \end{pmatrix}

The full Broyden step would be to find :math:`\d{x}` and :math:`\d{\mu}`
from :math:`\d{G}` and :math:`\d{N}` using :math:`\mat{J}^{-1}`.
Instead, we need to find :math:`\d{x}` (and :math:`\d{N}`) from
:math:`\d{G}` and the specified "control" parameters :math:`\d{\mu}`.

To do this, we partition the inverse Jacobian as

.. math::
   \mat{J}^{-1} = \begin{pmatrix}
     \cdots & \cdots\\
     \cdots & \mat{d}
   \end{pmatrix}

One can solve the problem explicit using the full partitioning, but in
practise, the matrices are large and one can really only form the small
block :math:`\mat{d}` corresponding to the small number of control
parameters.  The following formula allow one to compute the required
components using only this small matrix, and the inverse Jacobian
(often stored in a compact dyadic representation):

.. math::
   \begin{pmatrix}
     \d{G}\\
     \d{N}
   \end{pmatrix} &= 
   \left(
     \mat{1} - \begin{pmatrix}
       \mat{0} & \mat{0}\\
       \mat{0} & \mat{d}^{-1}
     \end{pmatrix}
     \left[
       \mat{J}^{-1} - \mat{1}\right]
   \right)
   \cdot
   \begin{pmatrix}
     \d{G}\\
     \d{\mu}
   \end{pmatrix}, \\
   \begin{pmatrix}
     \d{x}\\
     \d{\mu}
   \end{pmatrix} &= 
   \mat{J}^{-1}
   \cdot
   \begin{pmatrix}
     \d{G}\\
     \d{N}
   \end{pmatrix}.

For calculational purposes we form 

.. math::
   \begin{pmatrix}
     a\\
     b
   \end{pmatrix} &= 
   \mat{J}^{-1}
   \cdot
   \begin{pmatrix}
     \d{G}\\
     \d{\mu}
   \end{pmatrix}, \\
   \d{N} &= \d{\mu} - \mat{d}^{-1}(b - \d{\mu}), \\
   \begin{pmatrix}
     \d{x}\\
     \d{\mu}
   \end{pmatrix} &= 
   \mat{J}^{-1}
   \cdot
   \begin{pmatrix}
     \d{G}\\
     \d{N}
   \end{pmatrix}.
   
.. todo:: Use :func:`scipy.linalg.fblas.dgemm` to allow matrix
   multiplications etc. to be done in place.  In general, minimize
   memory footprint and make everything work for very large matrices
   (maybe with an option if it is slower).

Sensitive Parameters
~~~~~~~~~~~~~~~~~~~~
In some situations, certain parameters may be considered "sensitive".
Typically, these are collective parameters.  For example, in problems
representing functions over a grid, one may have a couple of
parameters (such as chemical potentials) where the dependent quantity
to be set to zero is an integral of other quantities.  Premature
optimization of these parameter can cause problems.

A typical case is one where the other "local" parameters must first
achieve a certain degree of convergence before the dependent quantity
can be accurately estimated.  If one attempts to adjust the sensitive
parameter before this convergence had been achieved, then the
parameters will fluctuate wildly.

What one needs to do is to hold these sensitive parameters fixed for a
while, and then adjust them once the local parameters have settled
down.  In order to ensure that these "noisy" parameters do not mess up
the Jacobian, one must call :meth:`Broyen.update` or
:meth:`Broyden.update_J` with an appropriate argument `skip_inds` that
identifies the noisy parameters.  One may use this in conjunction with
the previous techniques to provide steps that are consistent with
holding these sensitive parameters fixed until the "noise" is
sufficiently reduced that updates can be resumed.

Characterizing this "noise" can be tricky: one might for example look
for the change in error from a previous iteration.  However, if a very
small step size has been taken (for example as a result of trying to
improve the Jacobian or because of a line-search), then this could
result in an incorrect assessment about the degree of convergence.

The simplest criterion is for the case of models where holding the
sensitive parameters fixed (and ignoring the error in this component)
will lead to a convergent iteration.  Here the natural test for"noise"
is to check for a weak level of convergence of the other parameters
(maybe in conjunction with the change in the parameter error change
from iteration to iteration).  This is equivalent to an efficient
method of solving the following recursive problem: Define an outer
problem with only the sensitive parameters.  The objective function
solves the inner problem with these parameters fixed, then the outer
problem solves for the sensitive parameters.

.. _sec-notes:

Linear Algebra
==============
Here we summarize some of the linear algebra required here.

Subspace Reduction
~~~~~~~~~~~~~~~~~~
One of the problems we shall encounter is that we have a matrix
expressed in a basis as

.. math::
   \begin{pmatrix} \ket{a_0} & \ket{a_1} \end{pmatrix}
   \begin{pmatrix} 
     \mat{c}_{11} & \mat{c}_{12}\\
     \mat{c}_{21} & \mat{c}_{22}
   \end{pmatrix}
   \begin{pmatrix} \bra{b_0} \\ \bra{b_1} \end{pmatrix}

and we need to reduce the rank of the subspaces $\ket{a_1}$ and
$\bra{b_1}$ while holding subspaces $\ket{a_0}$ and $\bra{b_0}$
fixed.  I am not yet sure what the optimal solution to this problem
is, but a very good solution can be obtained by performing two SVD's:

.. math::
   \begin{pmatrix}
     \mat{c}_{21} & \mat{c}_{22}
   \end{pmatrix}
   &= 
   \mat{q}_{a}\mat{\sigma}_{a}\mat{V}^{T}_{a},\\
   \begin{pmatrix}
     \mat{c}_{12} \\ \mat{c}_{22}
   \end{pmatrix}
   &= 
   \mat{V}_{b}
   \begin{pmatrix} 
     \mat{\sigma}_{b}\\
     \mat{0}
   \end{pmatrix}
   \mat{q}_{b}^{T}.

The truncation scheme is then to use the two transformations obtained
to determine the rearrangement.  This typically approaches the optimal
truncation to within a couple of percent (though a tuned matrix could
probably break this).

To demonstrate this, we project out one state from a 4 by 4 matrix.
We compute the optimal projection using a non-linear solver:

>>> np.random.seed(0)
>>> def my_projection(A):
...     A = np.asmatrix(A)
...     P = np.asmatrix(np.diag([1.0, 1.0, 1.0, 0.0]))
...     qa, _, _ = np.linalg.svd(A[2:,:])
...     _, _, qbt = np.linalg.svd(A[:,2:])
...     Qa = Qbt = np.eye(4)
...     Qa[2:,2:] = qa
...     Qbt[2:, 2:] = qbt
...     return Qa*P*Qa.T*A*Qbt.T*P*Qbt
>>> solver = sp.optimize.fmin_powell
>>> def optimal_projection(A):
...     P = np.asmatrix(np.diag([1.0, 1.0, 1.0, 0.0]))
...     def trans(A, x):
...         t1, t2 = x
...         qa = np.mat([[np.cos(t1), -np.sin(t1)],
...                      [np.sin(t1), np.cos(t1)]])
...         qbt = np.mat([[np.cos(t2), -np.sin(t2)],
...                       [np.sin(t2), np.cos(t2)]])
...         Qa = Qbt = np.eye(4)
...         Qa[2:,2:] = qa
...         Qbt[2:,2:] = qbt
...         return Qa.T*P*Qa*A*Qbt.T*P*Qbt
...     def f(x):
...         return np.linalg.norm(trans(A,x) - A, 2)
...     x = solver(f, [0, 0], disp=False)
...     return trans(A, x)
>>> def rel_err(A):
...     e1 = np.linalg.norm(my_projection(A) - A, 2)
...     e0 = np.linalg.norm(optimal_projection(A) - A, 2)
...     return(abs(e0-e1)/abs(e0))
>>> errs = []
>>> for n in xrange(10):
...     #print("%i of 10" % n)
...     errs.append(rel_err(np.random.rand(4,4) - np.random.rand(4,4)))
>>> np.mean(errs), np.std(errs), np.max(errs) < 0.1 # doctest: +SKIP
(0.01..., 0.02..., True)





  To do this we note that we can perform a block diagonalization
of the matrix using the SVD of the lower-right block $\mat{c}_{22} =
\mat{u}\mat{\sigma}\mat{v}^{T}$ and truncating the basis based on
this.  This follows from the following decomposition:

.. math::
   \begin{pmatrix} 
     \ket{a_0} &
     \ket{a_0}\mat{c}_{12}\mat{c}_{22}^{-1} + \ket{a_1}
   \end{pmatrix}
   \begin{pmatrix} 
     \mat{c}_{11} - \mat{c}_{12}\mat{c}_{22}^{-1}\mat{c}_{21} \\
     & \mat{c}_{22}
   \end{pmatrix}
   \begin{pmatrix}
     \bra{b_0} \\ 
     \mat{c}_{22}^{-1}\mat{c}_{21}\bra{b_0} + \bra{b_1}
   \end{pmatrix}

Notes
=====
Here we summarize some problems, convergence characteristics etc.

1) In one case we got stuck in a loop adding a point that was already
   included. What happened is that we were using
   :attr:`JinvBroyden.x_method` `='min'` which was choosing a
   particular point.  A step was made from this point producing a new
   point which was added, but the new point did not have a lower
   $\norm{G(x)}$ so the same point was used over and over again with
   no significant update to the Jacobian.

   A similar problem can occur with :attr:`JinvBroyden.replace_method`
   `='largest'`.  The resolution consists of one of the following:

   a) Using to :attr:`JinvBroyden.x_method` `= 'recent'` and
      :attr:`JinvBroyden.replace_method` `='oldest'` to ensure that
      that all the points are cycled through and that the newest point
      is used.  These are now the defaults.
   b) Using a line-search to ensure that a downhill step is chosen to
      ensure that the newest points have the smallest error.

To Do
=====
.. todo:: Check why things work better with limited memory options.
   I.e. keeping a large :attr:`JinvBroyden.n_prev` can make things worse.

"""
from __future__ import division

__all__ = ['DyadicSum', 'JinvBroyden', 'ExtendedBroyden', 
           '_JinvExtendedBroyden', 'JinvExtendedBroydenFull']

import warnings
import copy
import math

import numpy as np

# We import these here in case we want to change in future to use more
# efficient implementations.
from numpy import dot
from numpy.linalg import eigh

import scipy as sp

from zope.interface import implements

import mmf.objects
from mmf.objects import StateVars, process_vars
from mmf.objects import Required, Computed, Delegate

import mmf.math.linalg as mmf_linalg

from mmf.solve.solver_interface import IJinv


_FINFO = np.finfo(float)
_EPS = _FINFO.eps
_TINY = _FINFO.tiny
_SINGULAR_TOL = 1e-6

[docs]class DyadicSum(StateVars): r"""Represents a matrix as a sum of $n$ dyads of length $N$: .. math:: \mat{M} = \mat{1} + \sum_{n}\ket{a_n}\sigma_n\bra{b_n} = \mat{1} + \ket{\mat{A}}\mat{\sigma}\bra{\mat{B}}. The two sets of bases are stored in $N \times n$ matrices $\ket{\mat{A}}$ and $\ket{\mat{B}}$ and the $n\times n$ matrix $\mat{\sigma}$. If :attr:`use_svd` is `True`, then the singular value decomposition of $\mat{\sigma}$ is used to keep $\mat{\sigma}$ diagonal with positive entries and only these diagonals are stored. .. todo:: Perform an analysis of the inverse jacobian, changing bases if needed to make the dyads linearly independent. .. note:: We assume everything is real here: no complex conjugation is performed with transposition. Examples -------- >>> np.random.seed(3) >>> N, n, m = 20, 5, 3 >>> At = np.random.random((N, n+m)) >>> B = np.random.random((n+m, N)) >>> at = At[:, :n] >>> b = B[:n, :] >>> sigma = np.eye(n) >>> s = DyadicSum(at=at, b=b, sigma=sigma, n_max=n) >>> np.allclose(np.dot(s.at*s.sigma, s.b), np.dot(at, b)) True Now we add the remaining terms. The dyadic sum should be the best rank n approximation of `At*B`: >>> s.add_dyad(at=At[:, n:], b=B[n:, :], sigma=np.eye(m)) >>> U, d, Vt = np.linalg.svd(np.dot(At, B)) >>> d[n:] = 0 # Make rank deficient approx >>> np.allclose(np.dot(s.at*s.sigma, s.b), np.dot(U*d, Vt)) True Note that the approximation is not exact: >>> np.allclose(np.dot(s.at*s.sigma, s.b), np.dot(At, B)) False Examples -------- >>> J = DyadicSum() >>> x = np.mat([[1.0], ... [2.0]]) >>> J*x - x matrix([[ 0.], [ 0.]]) >>> b = np.mat([[1.0, 0.0]]) >>> a = np.mat([[0.0], ... [2.0]]) >>> J.add_dyad(a, b) >>> J*x matrix([[ 1.], [ 4.]]) >>> x.T*J matrix([[ 5., 2.]]) >>> import mmf.archive >>> A = mmf.archive.Archive() >>> A.insert(J=J) 'J' >>> s = str(A) >>> d = {} >>> exec(s, d) >>> np.allclose(J.asarray(), d['J'].asarray()) True Here is an example of limiting the size of the dyadic. >>> np.random.seed(3) >>> s1 = np.eye(10) >>> s2 = DyadicSum(n_max=10) >>> for n in xrange(100): ... a = np.random.random(10) ... b = np.random.random(10) ... _add_dyad(s1, a, b) ... s2.add_dyad(a, b) This should still be exact because the space is only ten dimensional. >>> np.allclose(s1, s2.asarray()) True The dynamic_range parameter is a more mathematical way of keeping the basis fixed. It ensures that the most significant singular values will be kept. >>> s1 = DyadicSum(dynamic_range=0.01) >>> s2 = DyadicSum() >>> for n in xrange(100): ... a = np.random.random(20) ... b = np.random.random(20) ... s1.add_dyad(a, b) ... s2.add_dyad(a, b) >>> d1 = np.linalg.svd(s1.asarray(), compute_uv=False) >>> d2 = np.linalg.svd(s2.asarray(), compute_uv=False) >>> (abs(d1 - d2)/max(d2)).max() < 0.02 True """ _state_vars = [ ('at', np.empty((0, 0), dtype=float), r"""`(N, n`) array $\ket{\mat{A}}$. We use the ket form here because that is most often used and more efficient (presently, :func:`numpy.dot` makes copies if the arrays are not both `C_CONTIGUOUS`."""), ('b', np.empty((0, 0), dtype=float), r"""`(n, N`) array $\bra{\mat{B}}$."""), ('sigma', np.empty(0, dtype=float), r"""`(n, n)` array $\mat{\sigma}$ or `(n, )` array $\diag\mat{\sigma}$ if :attr:`use_svd`."""), ('n_max', np.inf, r"""Maximum number $n$ of Dyads to store. If this is finite, then the $N \times n$ matrices are pre-allocated, and when $m$ new dyads are added, they replace the least significant $m$ dyads, requiring no more allocations, otherwise the dyads are added as required."""), ('inplace', False, r"""If :attr:`n_max` is finite, then this signifies that the matrices :attr:`at` and :attr:`b` should be allocated only once and all operations should be done inplace on these. In particular, when new dyads are added, the least significant components of the ole matrix are dropped *first* then the dyads are added. If this is `False`, then the new dyads are added first and *then* the least-significant components are dropped. This latter form requires more storage, but may be more accurate. .. note:: Presently, the inplace operations cannot be performed because `scipy` does not expose the correct BLAS routines, so the only purpose of this flag is to somewhat limit memory usage, but this is probably quite ineffective at the moment."""), ('use_svd', True, r"""If `True`, then :meth:`orthogonalize` will be run whenever a new dyad is added. (Generally a good idea, but can be slow if there are a lot of dyads)."""), ('dynamic_range', _EPS, r"""Each call to :meth:`orthogonalize` will only keep the singular values within this fraction of the highest."""), ] process_vars()
[docs] def __init__(self, *v, **kw): if 'at' in kw or 'b' in kw or 'sigma' in kw: self.at = np.ascontiguousarray(self.at) self.b = np.ascontiguousarray(self.b) assert(self.at.shape[0] == self.b.shape[1]) assert ((self.at.shape[1], self.b.shape[0]) == self.sigma.shape or (self.at.shape[1], self.b.shape[0]) == self.sigma.shape*2) self.orthogonalize() if 'n_max' in kw: if self.n_max < np.inf: self.use_svd = True if len(self.b) != self.n_max: # Reset n_max self.orthogonalize() if ('use_svd' in kw and not self.use_svd and self.n_max < np.inf): raise ValueError( "Finite n_max = %i requires svd." % (self.n_max, ))
[docs] def reset(self): """Reset the matrix. Examples -------- >>> M = DyadicSum() >>> M.asarray() array(1.0) >>> M.add_dyad([1, 0], [1, 0]) >>> M.add_dyad([0, 2], [1, 0]) >>> M.add_dyad([3, 0], [0, 1]) >>> M.add_dyad([0, 4], [0, 1]) >>> M.asarray() array([[ 2., 3.], [ 2., 5.]]) >>> M.reset() >>> M.asarray() array(1.0) """ self.suspend() self.at = np.empty((0, 0), dtype=float) self.b = np.empty((0, 0), dtype=float) self.sigma = np.empty(0, dtype=float) self.resume()
[docs] def add_dyad(self, at, b, sigma=None): r"""Add `|a>d<b|` to `J`. Parameters ---------- at, b : array These should be either 1-d arrays (for a single dyad) or 2-d arrays (`(N, m)` for `at` and `(m, N)` for `b`) arrays representing the bra's `<a|` and `<b|`. sigma : array Matrix linking `at` and `b`. Must be provided if `len(at.T)` and `len(b)` are different. Examples -------- Here is a regression test for a bug: adding a set of dyads that would push the total number beyond `n_max`. >>> np.random.seed(0) >>> at = np.random.random((10, 3)) >>> b = np.random.random((3, 10)) >>> s = DyadicSum(n_max=2) >>> s.add_dyad(at=at, b=b) """ # Checking at = np.asarray(at) b = np.asarray(b) if 1 == len(at.shape): at = at.reshape((len(at), 1)) if 1 == len(b.shape): b = b.reshape((1, len(b))) a = at.T if sigma is None: assert len(a) == len(b) sigma = np.eye(len(a), len(b)) else: sigma = np.asarray(sigma) if 1 == len(sigma.shape): sigma = np.diag(sigma) assert len(a) == sigma.shape[0] assert len(b) == sigma.shape[1] assert a.shape[1] == b.shape[1] if 0 < len(self.b): assert self.at.shape[0] == at.shape[0] assert self.b.shape[1] == b.shape[1] if self.n_max == np.inf or not self.inplace: # Just add them at_ = np.hstack([self.at, at]) b_ = np.vstack([self.b, b]) self_sigma = self.sigma if 1 == len(self_sigma.shape): self_sigma = np.diag(self_sigma) sigma_ = mmf_linalg.block_diag((self_sigma, sigma)) else: assert len(a) <= self.n_max assert len(b) <= self.n_max at_ = self.at b_ = self.b sigma_ = np.diag(self.sigma) na, nb = sigma.shape sigma_[-na:, -nb:] = sigma at_[:, -na:] = at b_[-nb:, :] = b else: # Dyadic sum is empty: just add the new ones at_ = at b_ = b sigma_ = sigma self.orthogonalize(at=at_, b=b_, sigma=sigma_)
[docs] def orthogonalize(self, at=None, b=None, sigma=None): r"""Perform a QR decomposition of the dyadic components to make them orthogonal. Notes ----- Let $\ket{\mat{A}} =$ :attr:`at` and $\bra{\mat{B}} =$ :attr:`b` and $\mat{\sigma} =$ :attr:`sigma`: .. math:: \ket{\mat{A}}\mat{\sigma}\bra{\mat{B}} = \ket{\uvect{Q}_a}\mat{r}_a\mat{\sigma}\mat{r}_b^T\bra{\uvect{Q}_b} = \ket{\uvect{Q}_a}\mat{u}\mat{d}\mat{v}^T\bra{\uvect{Q}_b} = \ket{\uvect{A}}\mat{d}\bra{\uvect{B}} In order to compute the QR factorization of $\ket{\mat{A}}$ without working with the large matrices, we could do a Cholesky decomposition on .. math:: \braket{\mat{A}|\mat{A}} &= \mat{r}_a^T\mat{r}_{a}\\ \ket{\uvect{Q}} &= \ket{\mat{A}}\mat{r}_{a}^{-1} .. note:: These should be nicely accelerated using the BLAS routines for performing the operations in place. In principle, :func:`numpy.dot` should do this, but at present there are issues with contiguity that force copies to be made. Also, SciPy does not yet expose all BLAS operations, in particular `_SYRK` is missing. We perform a QR decomposition of these .. math:: \mat{A} &= \mat{Q}_{A}\mat{R}_{A}, \\ \mat{B} &= \mat{Q}_{B}\mat{R}_{B} and then an SVD of the product of the R factors $\mat{U}\mat{D}\mat{V}^{\dagger} = \mat{R}_{A}\mat{R}_{B}^T$ to obtain the new basis: .. math:: \mat{M} = \mat{A}\mat{B}^{T} = \mat{Q}_{A}\mat{R}_{A}\mat{R}_{B}^{T}\mat{Q}_{B}^T = \left(\mat{Q}_{A}\mat{U}\right)\mat{D} \left(\mat{V}^{\dagger}\mat{Q}_{B}^T\right) .. math:: \mat{A}_{\text{new}} &= \mat{Q}_{A}\mat{U}\sqrt{\mat{D}}, \\ \mat{B}_{\text{new}} &= \mat{Q}_{B}\mat{V}^{*}\sqrt{\mat{D}}. """ if at is None: at = self.at if 0 == np.prod(at.shape): # Empty. Do nothing. This should only happen on # default initialization. return if b is None: b = self.b if sigma is None: sigma = self.sigma if 1 == len(sigma.shape): sigma = np.diag(sigma) cholesky = False if cholesky: # Cholesky version: avoids working with large arrays. # This could be optimized using BLAS and LAPACK here for # both inplace operations and to avoid making copies. # Right now there is no point in the added complication. aa = dot(at, at.T) bb = dot(b, b.T) la = np.linalg.cholesky(aa) lb = np.linalg.cholesky(bb) ra = la.T rb = lb.T at = sp.linalg.cho_solve(clow=(lb, True), b=at.T).T b = sp.linalg.cho_solve(clow=(lb, True), b=b) else: qa, ra = np.linalg.qr(at) # at = qa*ra qb, rb = np.linalg.qr(b.T) # b = rb.T*qb.T sigma = dot(ra, dot(sigma, rb.T)) if self.use_svd: u, d_, vt = np.linalg.svd(sigma) max_inds = np.where(d_/d_.max() >= self.dynamic_range)[0] if 0 < len(max_inds): max_ind = min(max_inds[-1] + 1, self.n_max) else: max_ind = self.n_max max_ind = min(max_ind, len(u)) at = dot(qa, u[:, :max_ind]) b = dot(vt[:max_ind, :], qb.T) sigma = d_[:max_ind] else: at = qa b = qb.T self.__dict__['at'] = at self.__dict__['b'] = b self.__dict__['sigma'] = sigma
[docs] def apply_transform(self, f, fr=None): """Apply the linear transform `f` to the vectors forming the dyads. If `fr` is `None`, this is applied to the right (bra's), otherwise `f` is applied to both sides. This is equivalent to the transformation `J ->I + F*(J - I)*Fr.T`. .. note:: This should be modified to be done in place if possible. Examples -------- """ if fr is None: fr = f if 0 < len(self.at): self.suspend() self.at = f(self.at) self.b = fr(self.b.T).T self.resume() self.orthogonalize()
[docs] def asarray(self): """Return the matrix. Examples -------- >>> M = DyadicSum() >>> M.asarray() array(1.0) >>> M.add_dyad([1, 0], [1, 0]) >>> M.add_dyad([0, 2], [1, 0]) >>> M.add_dyad([3, 0], [0, 1]) >>> M.add_dyad([0, 4], [0, 1]) >>> M.asarray() array([[ 2., 3.], [ 2., 5.]]) """ if len(self.b) == 0: return np.array(1.0) if 1 == len(self.sigma.shape): M = dot(self.at*self.sigma, self.b) else: M = dot(self.at, dot(self.sigma, self.b)) return M + np.eye(*M.shape)
[docs] def diag(self, k=0): r"""Return the diagonal of the matrix. Examples -------- >>> M = DyadicSum() >>> M.diag() array(1.0) >>> M.add_dyad([1, 0], [1, 0]) >>> M.add_dyad([0, 2], [1, 0]) >>> M.add_dyad([3, 0], [0, 1]) >>> M.add_dyad([0, 4], [0, 1]) >>> M.diag() array([ 2., 5.]) """ if len(self.b) == 0: return np.array(1.0) b = self.b if 1 == len(self.sigma.shape): at = self.at*self.sigma else: at = dot(self.at, self.sigma) na = at.shape[0] nb = b.shape[1] ka = max(0, -k) kb = max(0, k) n = min(na - ka, nb - kb) d = (at[ka:ka+n, :].T*b[:, kb:kb+n]).sum(axis=0) if 0 == k: d += 1 return d
def __mul__(self, x): r"""Matrix multiplication: Return self*x.""" if 0 == len(self.b): res = x else: shape = x.shape if 1 == len(shape): x = x.reshape((len(x), 1)) if 1 == len(self.sigma.shape): res = x + dot(self.at, np.multiply(self.sigma[:, None], dot(self.b, x))) else: res = x + dot(self.at, dot(self.sigma, dot(self.b, x))) res = res.reshape(shape) return res def __rmul__(self, x): r"""Matrix multiplication: Return x*self.""" if 0 == len(self.b): res = x else: shape = x.shape if 1 == len(shape): x = x.reshape((1, len(x))) if 1 == len(self.sigma.shape): res = x + dot( np.multiply(dot(x, self.at), self.sigma[None, :]), self.b) else: res = x + dot(dot(dot(x, self.at), self.sigma), self.b) res = res.reshape(shape) return res def __getitem__(self, key): r"""Minimal support of two-dimensional indexing. Examples -------- >>> np.random.seed(1) >>> d = DyadicSum() >>> for n in xrange(10): ... d.add_dyad(np.random.rand(100), np.random.rand(100)) >>> inds = np.s_[[1, 2, 3, 5, 4, -98], 20:-2] >>> np.allclose(d[inds], ... d.asarray()[inds]) True >>> inds = np.array([1, 3, 4, 2]) >>> np.allclose(d[inds[:, None], inds[None, :]], ... d.asarray()[inds[:, None], inds[None, :]]) True .. warning:: Assumes that the indexing is for extracting sub-arrays. This is different from numpy. >>> np.allclose(d[inds, inds], ... d.asarray()[inds, inds]) False >>> np.allclose(d[inds, inds], ... d.asarray()[inds[:, None], inds[None, :]]) True """ if 2 == len(key): at = self.at[key[0], :] b = self.b[: , key[1]] if 2 < len(b.shape): b = np.rollaxis(b, 0, -1) if 1 == len(self.sigma.shape): res = dot(at*self.sigma, b).squeeze() else: res = dot(at, dot(self.sigma, b)).sqeeze() if res.shape == (): res = res.reshape((1, 1)) ka = np.arange(self.at.shape[0])[key[0]].ravel() kb = np.arange(self.b.shape[1])[key[1]].ravel() res[np.where(ka[:, None] == kb[None, :])] += 1 else: raise ValueError( "% only supports two-dimensional indexing. Got %s" % (self.__class__.__name__, repr(key))) return res
[docs]class JinvExtendedBroydenFull(StateVars): r"""Class to represent an inverse Jacobian for the Extended Broyden method using full matrices (for small problems only). """ implements(IJinv) _state_vars = [ ('curr', None,\ r"""Index of current point in :attr`:x`. The Jacobian was updated to be valid at this point. In the process of adding a point, the current point may be pushed off of the stack (since only :attr:`n_prev` points are kept. In this case, :attr:`curr` will be ``curr < -len(xs)`` (will only happen if :meth:`add_point` is passed the flag `replace=False`."""), ('xs', [],\ r"List of previous steps $x_i$ (newest last)."), ('gs', [],\ r"List of previous objectives $g_i = G(x_i)$."), ('j_inv', None, r"""Current approximation."""), ('j_inv0', 1.0,\ r"""Convergent approximation $\mat{B}_0$ to pull back to."""), ('method', 'J',\ r"""`'J'` for Broyden's good method 1a) and `'B'` for the \"bad\" method 2b)."""), ('n_prev', 1000, r"""Number $n$ of previous steps to keep and use."""), ('max_j_inv', 100.0, """If :attr:`robust`, then the inverse Jacobian is clipped so that the maximum value is this."""), ('min_j_inv', 0.0001, """The inverse Jacobian can become singular. If this is the case, then a step might have very small length even if the error is not small (the step size can be reduced by a factor of the minimum singular value of the inverse Jacobian). If the step reduction is less than this value, then the step is scaled up to have length :attr:`min_step`."""), ('_eps', _EPS,\ r"""Two points in :attr:`xs` are considered equal if the norm of their difference is less than this. No two points in :attr:`xs` will have a distance less than this."""), ('_x0', None, "Internal storage for :attr:`x0`"), ('_g0', None, "Internal storage for :attr:`g0`"), ] process_vars() def __init__(self, *v, **kw): self.j_inv0 = np.asarray(self.j_inv0) if self.j_inv is None: self.j_inv = self.j_inv0 else: self.j_inv = np.asarray(self.j_inv) if 0 < len(self.xs): N = len(self.xs[0]) if (self.j_inv.shape != (N, N) or self.j_inv0.shape != (N, N)): one = np.eye(N, N) self.j_inv0 = np.dot(self.j_inv0, one) self.j_inv = np.dot(self.j_inv, one) @property def x(self): if self.xs != [] and self.curr is not None: return self.xs[self.curr] else: return None @property def g(self): if self.gs != [] and self.curr is not None: return self.gs[self.curr] else: return None @property def x0(self): if self._x0 is None: return self.x else: return self._x0 @property def g0(self): if self._g0 is None: return self.g else: return self._g0 def get_w(self, dXdX, dGdG, x_scale=1): r"""Return weights `w, w_0, w_r`. Parameters ---------- x_scale : array, optional Typical scale over which `g(x)` is expected to be linear. This is used to calculate the weights `w`: .. math:: w_{mn} = \delta_{mn} \frac{\texttt{x\_scale}_n} {\norm{\vect{x} - \vect{x}_n}}. Thus, thus is greater than one for points where the approximation should be linear and less than one elsewhere. This only really affects the dynamic restart mechanism where `w_0=1`. In this case, components where `w/w_0 > 1` will be maintained while other components will be restarted. Returns ------- w : array Array of weights for `dX`. This is $\mat{w}$. w_r : array Pull back weight pulling the Jacobian inverse back to the previous approximation $\mat{B}_{-1}$. This is $w_r$ (assumes $\mat{\Gamma} = w_r\uvect{G}_{\perp}$ for example)). As with the regular weights, this only plays a role if we are employing a dynamic restart, in which case the Jacobian is also pulled back toward the previous value instead of to $\mat{B}_0$ for larger values of `w_r/w_0`. Typically this is small so it only serves to project the inverse when the other weights are zero. """ w = np.diag( np.sqrt(np.divide(1.0, np.diag(dXdX)/x_scale**2 + _SINGULAR_TOL))) #w[-1,-1] = 1 #if 1 < len(w): # w[-2, -2] = 1 #if 2 < len(w): # w[-3, -3] = 1 #w = dXdX[-1,-1]/(np.diag(dXdX) + _SINGULAR_TOL) w_r = np.sqrt(_SINGULAR_TOL) return w, w_r def __mul__(self, dg, mutate=False): r"""Return the matrix multiplication `j_inv*dg`. Parameters ---------- dg : array This should be in column vector format with dimension `dg.shape = (N,_)`. mutate : bool If `True`, then `dg` will be mutated, otherwise a copy will be returned. (Note: with the present implementation of :func:`numpy.dot`, one copy of `dg` will be made). """ res = np.dot(self.j_inv, dg) if mutate: # Should use BLAS if we can... don't know how yet. dg[::] = res return res def __rmul__(self, dxt, mutate=False): r"""Return the matrix multiplication `dxt*j_inv`. Parameters ---------- dxt : array This should be in row vector format with dimension `dg.shape = (_,N)`. mutate : bool If `True`, then `dxt` will be mutated, otherwise a copy will be returned. (Note: with the present implementation of :func:`numpy.dot`, one copy of `dxt` will be made). """ res = np.dot(dxt, self.j_inv) if mutate: # Should use BLAS if we can... don't know how yet. dxt[::] = res return res def reset(self): r"""Reset the Jacobian to a diagonal form.""" self.j_inv = self.j_inv0 def set_point(self, x, g, **kw): r"""Add `x` and `g` and make this the current point `(x, g)` and update the Jacobian to be valid here. Parameters ---------- x, g : arrays """ x0 = self.x g0 = self.g x = np.asarray(x).ravel() g = np.asarray(g).ravel() self.curr = self.add_point(x, g) self.update(**kw) return self.incompatibility(x, g, x0, g0) def add_point(self, x, g, replace=True): r"""Add the point `(x, g)` to the Jacobian information. Does not necessarily update the Jacobian (use :meth:`update`) but may. Does not change the current point (use :meth:`set_point`). Parameters ---------- x, g : arrays replace : bool (optional) If `True`, then any points in :attr:`xs` with distance less than :attr:`_eps` from `x` (`inf` norm) are removed before the new point is added. (Could be useful if points need to be recomputed at a higher tolerance for example.) If `False`, then new point is only added if there are no points within distance :attr:`_eps` from `x` and a :exc:`ValueError` exception is raised if `g` is not within distance :attr:`_eps` from the corresponding value in :attr:`gs`. The old points are not moved in this case. """ x = np.asarray(x).ravel() g = np.asarray(g).ravel() xs = self.xs gs = self.gs norms = np.array([abs(x - _x).max() for _x in self.xs]) inds = np.where(norms < self._eps)[0] if 0 == len(inds): xs.append(x) gs.append(g) curr = -1 else: curr = inds[-1] - len(xs) if replace: for ind in reversed(inds): del xs[ind] del gs[ind] xs.append(x) gs.append(g) curr = -1 else: g_inds = np.where([abs(g - _g) for _g in self.gs[inds]])[0] if 0 == len(g_inds): pass else: raise ValueError( ("New `x` matches current `xs%s` but `g` does " + "not match `gs%s`") % (str(inds), str(g_inds))) self.suspend() n_prev = self.n_prev self.xs = self.xs[-n_prev:] self.gs = self.gs[-n_prev:] self.resume() return curr def update(self, x0=None, g0=None, skip_inds=None, include_skip=False, mags=1, warn=False, restart=False, x_scale=1.0): r"""Update the inverse Jacobian so that it is valid at the point `(x, g)` (the current point by default). Does not necessarily add `x` and `g` (use :meth:`add_point`) or set the current point (use :meth:`set_point`) but may do either or both depending on the algorithm. Return a measure of the incompatibility of the new data with the current approximation. (Not implemented yet.) Parameters ---------- restart : bool If `True`, then the weight `w_0=1` is used to dynamically restart the calculation by pulling the inverse Jacobian back to `B_0`. Examples -------- Here we perform some tests with a linear function >>> np.random.seed(0) >>> N = 4 >>> J = np.random.rand(N, N) >>> x0 = np.random.rand(N) >>> x = np.random.rand(N) >>> def g(x): ... return np.dot(J, x - x0) >>> B = JinvExtendedBroydenFull(method='J') >>> B.add_point(x0, g(x0)) -1 >>> B.set_point(x, g(x)) 0 >>> np.allclose(B*(g(x) - g(x0)), x - x0) True >>> for n in xrange(N): ... _x = np.random.rand(N) ... incompatibility = B.set_point(_x, g(_x)); >>> np.allclose(B.j_inv, np.linalg.inv(J)) True >>> B = JinvExtendedBroydenFull(method='B') >>> B.add_point(x0, g(x0)) -1 >>> B.set_point(x, g(x)) 0 >>> np.allclose(B*(g(x) - g(x0)), x - x0) True >>> for n in xrange(N): ... _x = np.random.rand(N) ... incompatibility = B.set_point(_x, g(_x)); >>> np.allclose(B.j_inv, np.linalg.inv(J)) True """ if x0 is None: x0 = self.x g0 = self.g self._x0 = None self._g0 = None else: x0 = self._x0 = np.asarray(x0).ravel() g0 = self._g0 = np.asarray(g0).ravel() inds = np.where(np.array([abs(x0 - _x).max() for _x in self.xs]) >= self._eps)[0] if 0 == len(inds): # Can't update as there is no information pass else: j_inv = np.asarray(self.j_inv) j_inv0 = np.asarray(self.j_inv0) dX = x0[:,None] - np.vstack(self.xs).T[:,inds] dG = g0[:,None] - np.vstack(self.gs).T[:,inds] one = np.eye(len(x0)) # get weights and apply dXdX = np.dot(dX.T, dX) dGdG = np.dot(dG.T, dG) w, w_r = self.get_w(dGdG=dGdG, dXdX=dXdX, x_scale=x_scale) if restart: w_0 = 1.0 else: w_0 = 0.0 dG = np.dot(dG, w) dX = np.dot(dX, w) BdG = np.dot(j_inv, dG) numer = dX - BdG # Form denominator and correct numerator if 'J' == self.method: # Broyden's good method _U, _d, _Vt = np.linalg.svd(dX, full_matrices=True) Gamma = _U[:,len(np.where(_d >= _SINGULAR_TOL)[0]):] # j_inv*J0 _tmp = np.linalg.solve(j_inv0.T, j_inv.T).T denom = (np.dot(BdG, dX.T) + w_r**2*np.dot(Gamma, Gamma.T) + w_0**2*_tmp) numer = np.dot(numer, dX.T) + (one - _tmp)*w_0**2 else: # Broyden's "bad" method _U, _d, _Vt = np.linalg.svd(dG, full_matrices=True) Gamma = _U[:,len(np.where(_d >= _SINGULAR_TOL)[0]):] denom = (np.dot(dG, dG.T) + w_r**2*np.dot(Gamma, Gamma.T) + w_0**2*one) numer = np.dot(numer, dG.T) + (j_inv0 - j_inv)*w_0**2 # Invert denominator and multiply using SVD. This is where the # appropriate choice of Gamma must be made. u, d, vt = np.linalg.svd(denom) di = np.divide(1, d) bad_inds = np.where(d < _SINGULAR_TOL) if False: # Subspace decomposition method di[bad_inds] = 1 else: # Subspace ignoring method??? di[bad_inds] = 0 denom_inv = np.dot(vt.T*di, u.T) if 'J' == self.method: # Broyden's "good" method denom_inv = np.dot(denom_inv, j_inv) j_inv_new = j_inv + np.dot(numer, denom_inv) # Remove bad components. u, d, vt = np.linalg.svd(j_inv_new) bad_inds = np.where(np.logical_or(d < _SINGULAR_TOL, d > 1/_SINGULAR_TOL)) d[bad_inds] = 1 j_inv_new = np.dot(u*d, vt) # Update variables self.j_inv = j_inv_new def incompatibility(self, x, g, x0, g0): r"""Return a dimensionless measure of the compatibility of a step from `x0` to `x` with the current Jacobian. An incompatibility of 0 means perfectly compatible (this is what should happen if the function is linear and the Jacobian is correct along the specified step). """ return 0 raise NotImplementedError def asarray(self): """Return the Jacobian matrix. Examples -------- >>> M = JinvExtendedBroydenFull() >>> M.asarray() array(1.0) """ return np.array(self.j_inv) def change_basis(self, f, f_inv=None): if 0 == len(self.xs): return self.suspend() # gs must be G(xs), can't use transformed value. self.xs = [] self.gs = [] u, d, vt = np.linalg.svd(self.j_inv) u = f(u) if f_inv is None: vt = f(vt.T).T else: vt = f_inv(vt) self.j_inv = np.dot(u*d, vt) u, d, vt = np.linalg.svd(self.j_inv0) u = f(u) if f_inv is None: vt = f(vt.T).T else: vt = f_inv(vt) self.j_inv0 = np.dot(u*d, vt) self.resume() def get_x(self, x_controls=None, min_step=_EPS, max_step=np.inf): r"""Return `x`: the Broyden step to get to the root of `G(x)`. This is the downhill Newton step from the point specified by :attr:`curr` that satisfies the constraints `x_controls`. Parameters ---------- x_controls : (array, array), optional Tuple `(inds, vals)` indices and the respective values of the control parameters. min_step : float, optional Minimum step length for testing. Used to give some non-zero step `dx` of length `sqrt(min_step) even if `j_inv` is singular. This may be random. A warning will be raised. max_step : float, optional Maximum length of the step. Examples -------- Here is a little test with a linear function >>> j = np.array([[1., 2, 3], [1, -3, 2], [-2, 1, 4]]) >>> j_inv = JinvExtendedBroydenFull(j_inv=np.linalg.inv(j)) >>> sol = np.array([1, 2, 3]) >>> def G(x): ... return np.dot(j, x - sol) >>> x0 = np.array([0.0, 0.0, 0.0]) >>> x = np.array([1.0, 2.0, 3.0]) Now try to step to the solution, but hold the second variable as a control at 1. Since the function is exactly linear and we have explicitly given the correct Jacobian, we should end up with a root in the other two parameters. >>> j_inv.set_point(x=x, g=G(x)) 0 >>> x = j_inv.get_x(x_controls=([1], [1.0])) >>> x array([ 1.5, 1. , 3.5]) >>> np.allclose(G(x), [0., 4.5, 0.]) True This needs to work with very large arrays, so here is an example. Here we use an example where the vectors have length $2^{20}$ and we have $2^{11} = 2048$ controls. Implemented poorly, the arrays would be too large to be indexed. >> np.random.seed(0) >> n_x = 2**20 >> n_d = 2**11 >> a = np.random.random(n_x) >> b = np.random.random(n_x) >> g = np.random.random(n_x) >> x = np.zeros(n_x) >> inds = np.unique(np.random.randint(0, n_x, size=n_d)) >> n_d = len(inds) >> vals = np.random.rand(n_d) >> x_controls = (inds, vals) >> j_inv = JinvExtendedBroydenFull(dyadic=True) >> j_inv.j_inv.add_dyad(a, b) >> j_inv.set_point(x=a, g=b) 0.0 >> x = j_inv.get_x(x_controls=x_controls) >> np.allclose(x[inds], vals) True """ if 0 == len(self.xs): raise ValueError("No `xs` stored: call update first.") x = self.x g = self.g j_inv = self.j_inv if x_controls is not None and 0 < len(x_controls[0]): inds, vals = x_controls inds = np.asarray(inds, dtype=int) vals = np.asarray(vals) dg_dmu = copy.copy(g) d = j_inv[inds[:, None], inds[None, :]] dg_dmu[inds] = x[inds] - vals dmu = _ket(dg_dmu[inds]) b = (np.asarray(j_inv*_ket(dg_dmu)))[inds, :] dN = dmu - np.linalg.solve(d, b - dmu) dg_dN = dg_dmu dg_dN[inds] = dN[:] dx_dmu = np.asarray(j_inv*_ket(dg_dN)) assert np.allclose(dx_dmu[inds], dmu) # Set these to make sure no round-off errors have # accumulated. dx_dmu[inds].flat = x[inds] - vals dx = -dx_dmu else: dx = -np.asarray(j_inv*_ket(g)) dx = np.asarray(dx).ravel() g_mag = np.linalg.norm(g) dx_mag = np.linalg.norm(dx) if dx_mag == 0 and g_mag > 0: warnings.warn("Warning: Singular Jinv, taking random step") dx = np.random.rand(len(dx))*np.sqrt(min_step) elif np.divide(dx_mag, g_mag) < self.min_j_inv: warnings.warn("Warning: Singular Jinv, taking minimum step") dx *= math.sqrt(min_step)/np.linalg.norm(dx) dx_len = math.sqrt((dx*dx).sum()) if max_step < dx_len: dx *= max_step/dx_len return x + dx
[docs]class _JinvExtendedBroyden(StateVars): r"""Class to represent an inverse Jacobian for the Extended Broyden method. Presently, this class does not do any of the special processing to minimize the memory footprint described in the module description. We represent the inverse Jacobian as: .. math:: \mat{B} = \mat{1} + \ket{\uvect{\alpha}}\mat{b}\bra{\uvect{\beta}} Notes ----- Since python is implemented in C, the arrays are typically stored as `C_CONTIGUOUS` and :func:`numpy.dot` is efficient if both arguments are `C_CONTIGUOUS`, otherwise copies are made. Thus, where one needs to compute $\braket{\mat{A}|\mat{B}}$, one should store the transpose of the first matrix $\bra{\mat{A}}$. To Do ----- .. todo:: Reduce memory footprint. In particular, one can make use of set of previous points $\ket{\mat{X}}$ and function evaluations $\ket{\mat{G}}$ to reduce the space required by the dyadic basis $\ket{\uvect{\alpha}}$ and $\ket{\uvect{\beta}}$. """ implements(IJinv) _state_vars = [ ('X', Required, r"$N\times n$ array $\ket{\mat{X}}$ of previous steps $x_i$."), ('G', Required, r"$N\times n$ array $\ket{\mat{G}}$ of objectives $G(x_i)$."), ('_jinv', Delegate(DyadicSum)), ('_b', None, r"Dyadic portion of the inverse Jacobian in the implied basis."), ('method', 1, r"""`1` for Broyden's good method and `2` for the \"bad\" method."""), ('n_prev_steps', np.inf, r"""Number $n$ of previous steps to keep and use."""), ('_check', True, r"""If `True`, perform some potentially expensive sanity checks."""), ] process_vars()
[docs] def __init__(self, *v, **kw): N, n = self.X.shape assert self.G.shape == self.X.shape if self._alpha is None: self._alpha = np.zeros((N, 0), dtype=float) self._betat = np.zeros((0, N), dtype=float) m = self._alpha.shape[1] assert m == self._betat.shape[0] assert self._alpha.shape == (N, m) assert self._betat.shape == (m, N) if self._check: # Check orthonormality of basese assert np.allclose(np.dot(self.betat, self.alpha), np.eye(m)) if self._b is None: self._b = np.zeros((m, m), dtype=float) assert self._b.shape == (m, m)
def __mul__(self, dg, mutate=False): r"""Return the matrix multiplication `B*dg`. Parameters ---------- dg : array This should be in column vector format with dimension `dg.shape = (N,_)`. mutate : bool If `True`, then `dg` will be mutated, otherwise a copy will be returned. (Note: with the present implementation of :func:`numpy.dot`, one copy of `dg` will be made). Notes ----- .. math:: \mat{B}\ket{\d{g}} = \ket{\d{g}} + \begin{pmatrix} \ket{G - X}\mat{q}_{a} & \ket{\uvect{a}} \end{pmatrix} \mat{b} \begin{pmatrix} \mat{q}_{b}\bra{X,G}\\ \bra{\uvect{b}} \end{pmatrix}\ket{\d{g}} """ dg = np.asarray(dg) if 1 == len(dg.shape): dg = dg[:, None] na = self._qa.shape[1] ma = self._a.shape[1] if 1 == self.method: # The default dot copies dg here, so hopefully it is small. XGdg = dot(dg.T, self.X).T else: assert 2 == self.method XGdg = dot(dg.T, self.G).T tmp = dot(self._b, np.vstack([dot(self._qb, XGdg), dot(self._bt, dg)])) tmp1 = dot(self._qa, tmp[:na,:]) tmp2 = tmp[na:,:] if mutate: res = dg else: res = 1*dg res += np.dot(self.G, tmp1) res -= np.dot(self.X, tmp1) res += np.dot(self._a, tmp2) return res def __rmul__(self, dxt, mutate=False): r"""Return the matrix multiplication `dxt*B`. Parameters ---------- dxt : array This should be in row vector format with dimension `dg.shape = (_,N)`. mutate : bool If `True`, then `dxt` will be mutated, otherwise a copy will be returned. (Note: with the present implementation of :func:`numpy.dot`, one copy of `dxt` will be made). Notes ----- .. math:: \bra{\d{x}}\mat{B} = \bra{\d{x}} + \bra{\d{x}} \begin{pmatrix} \ket{G - X}\mat{q}_{a} & \ket{\uvect{a}} \end{pmatrix} \mat{b} \begin{pmatrix} \mat{q}_{b}\bra{X,G}\\ \bra{\uvect{b}} \end{pmatrix} """ nb = self._qb.shape[0] mb = self._bt.shape[0] tmp = dot(np.hstack([dot(dot(dxt, self.G) - dot(dxt, self.X), self._qa), dot(dxt, self._a)]), self._b) tmp1 = dot(tmp[:, :nb], self._qb) tmp2 = tmp[:, nb:] if 1 == self.method: # The default dot copies dg here, so hopefully it is small. tmp1 = dot(self.X, tmp1.T).T else: assert 2 == self.method tmp1 = dot(self.G, tmp1.T).T if mutate: res = dxt else: res = 1*dxt res += tmp1 res += np.dot(tmp2, self._bt) return res
[docs] def update(self, x0=None, g0=None): r"""Update the inverse Jacobian so that it is valid at the point `(x, g)`. Return a measure of the incompatibility of the new data with the current approximation. Examples -------- Here we perform some tests with a linear function >>> np.random.seed(0) >>> N = 4 >>> J = np.random.rand(N, N) >>> x0 = np.random.rand(N) >>> def g(x): ... return np.dot(J, x - x0) >>> B = _JinvExtendedBroyden(X=np.zeros((N,1)), ... G=g(0*x0)[:,None]) >>> x = np.random.rand(N) >>> B.update(x, g(x)) >>> np.allclose(B*(g(x) - g(x0)), x - x0) True >>> for n in xrange(N): ... x = np.random.rand(N) ... B.update(x, g(x)) Notes ----- Our goal here is to minimize the amount of processing of large matrices. As such, we try to work with projections as much as possible. We start with the following large matrices: $\ket{X}$, $\ket{G}$, $\ket{x}$, $\ket{g}$, $\ket{a_0}$, $\ket{b_0}$. We use the following: .. math:: \ket{\delta\mat{x}} &= \ket{x} - \ket{X_0}, \qquad \ket{\delta\mat{g}} = \ket{g} - \ket{G_0},\\ \ket{\d\mat{X}} &= \ket{\mat{X}} - \ket{x}, \qquad \ket{\d\mat{G}} = \ket{\mat{G}} - \ket{g},\\ \ket{\uvect{\alpha}_0} &= \begin{pmatrix} \ket{\mat{G} - \mat{X}}\mat{q}_{a} & \ket{\mat{A}_0} \end{pmatrix},\\ \bra{\uvect{\beta}_0} &= \begin{pmatrix} \mat{q}_{b}\bra{\mat{X}, \mat{G}} \\ \bra{\mat{B}_0} \end{pmatrix},\\ \ket{\uvect{\alpha}} &= \begin{pmatrix} \ket{\uvect{\alpha}_0} & \ket{\uvect{\alpha}_1} \end{pmatrix},\\ \bra{\uvect{\beta}} &= \begin{pmatrix} \bra{\uvect{\beta}_0}\\ \bra{\uvect{\beta}_1} \end{pmatrix},\\ \ket{\uvect{\alpha}_1} &= \ket{\uvect{\alpha}_0}\mat{c}_{a} + \ket{\delta{g} - \delta{x}}\mat{d}_{a},\\ \bra{\uvect{\beta}_1} &= \mat{c}_{b}^{T}\bra{\uvect{\beta}_0} + \mat{d}_{b}^{T}\bra{\delta{x}, \delta{g}},\\ \mat{d}_{a} &= \mat{v}_{a}\mat{s}_{a}^{-1},\qquad \mat{c}_{a} = -\braket{\uvect{\alpha}_0|\delta{g} - \delta{x}} \mat{v}_{a}\mat{s}_{a}^{-1},\\ \mat{d}_{b} &= \mat{v}_{b}\mat{s}_{b}^{-1},\qquad \mat{c}_{b} = -\braket{\uvect{\beta}_0|\delta{x},\delta{g}} \mat{v}_{b}\mat{s}_{b}^{-1},\\ \mat{v}_{a}\mat{s}_{a}^2\mat{v}_{a}^{T} &= \braket{\delta{g} - \delta{x}|\delta{g} - \delta{x}} - \braket{\delta{g} - \delta{x}|\uvect{\alpha}_0} \braket{\uvect{\alpha}_0|\delta{g} - \delta{x}},\\ \mat{v}_{b}\mat{s}_{b}^2\mat{v}_{b}^{T} &= \braket{\delta{x},\delta{g}|\delta{x},\delta{g}} - \braket{\delta{x},\delta{g}|\uvect{\beta}_0} \braket{\uvect{\beta}_0|\delta{x},\delta{g}},\\ \braket{\d\mat{G}-\d\mat{X}|\uvect{\alpha}} &= \begin{pmatrix} \braket{\d\mat{G}-\d\mat{X}|\uvect{\alpha}_0} & \braket{\d\mat{G}-\d\mat{X}|\uvect{\alpha}_1} \end{pmatrix},\\ \braket{\d\mat{G} - \d\mat{X}|\uvect{\alpha}_0} &= \begin{pmatrix} \braket{\d\mat{G} - \d\mat{X}|\mat{G} - \mat{X}}\mat{q}_{a} & \braket{\d\mat{G} - \d\mat{X}|\mat{A}_0} \end{pmatrix},\\ \braket{\d\mat{G} - \d\mat{X}|\uvect{\alpha}_1} &= \braket{\d\mat{G} - \d\mat{X}|\uvect{\alpha}_0}\mat{c}_{a} + \braket{\d\mat{G} - \d\mat{X}|\delta{g}-\delta{x}}\mat{d}_{a},\\ \braket{\d\mat{X},\d\mat{G}|\uvect{\beta}} &= \begin{pmatrix} \braket{\d\mat{X},\d\mat{G}|\uvect{\beta}_0} & \braket{\d\mat{X},\d\mat{G}|\uvect{\beta}_1} \end{pmatrix} ,\\ \braket{\d\mat{X},\d\mat{G}|\uvect{\beta}_0} &= \begin{pmatrix} \braket{\d\mat{X},\d\mat{G}|\mat{X},\mat{G}}\mat{q}_{b}^{T} & \braket{\d\mat{X},\d\mat{G}|\mat{B}_0} \end{pmatrix},\\ \braket{\d\mat{X},\d\mat{G}|\uvect{\beta}_1} &= \braket{\d\mat{X},\d\mat{G}|\uvect{\beta}_0}\mat{c}_{b} + \braket{\d\mat{X},\d\mat{G}|\delta{x},\delta{g}}\mat{d}_{b},\\ \braket{\uvect{\alpha}|\uvect{\alpha}_0} &= \braket{\uvect{\beta}|\uvect{\beta}_0} = \begin{pmatrix} \mat{1}\\ \mat{0} \end{pmatrix},\\ \braket{\beta|\alpha} &= \begin{pmatrix} \mat{q}_{b}\braket{\mat{X}, \mat{G}| \mat{G} - \mat{X}}\mat{q}_{a} & \mat{q}_{b}\braket{\mat{X}, \mat{G}| \mat{G} - \mat{X}}\mat{q}_{a} \\ \end{pmatrix} """ # First we cast all of the arrays as matrices so we can # use * as matrix multiplication. Large matrices are # preceeded by an underscore. x = np.asarray(x0) g = np.asarray(g0) if 1 == len(x.shape): x = x[:,None] if 1 == len(g.shape): g = g[:,None] _X = np.asmatrix(self.X) _G = np.asmatrix(self.G) _b0 = np.asmatrix(self._bt.T) _a0 = np.asmatrix(self._a) qa = np.asmatrix(self._qa) qb = np.asmatrix(self._qb) # We assume that g is only one column, so a copy here is okay. _dg = np.asmatrix(g - _G[:,0]) _dx = np.asmatrix(x - _X[:,0]) _dgdx = _dg - _dx # Consider optimizing these copies away later _dG = np.asmatrix(_G - g) _dX = np.asmatrix(_X - x) _dGdX = _dG - _dX _GX = np.asmatrix(_G - _X) if 1 == self.method: _dxg = _dx _XG = np.asmatrix(_X) _dXG = _dX else: _dxg = _dg _XG = np.asmatrix(_G) _dXG = _dG # These are all of the brackets required. dXdX = _dX.T*_dX dGdG = _dG.T*_dG dXG_XG = _dXG.T*_XG dXG_dxg = _dXG.T*_dxg dXG_b0 = _dXG.T*_b0 dxg_XG = _dxg.T*_XG dxg_GX = _dxg.T*_GX dxg_dxg = _dxg.T*_dxg dxg_b0 = _dxg.T*_b0 dxg_a0 = _dxg.T*_a0 dGdX_a0 = _dGdX.T*_a0 GX_a0 = _GX.T*_a0 XG_GX = _XG.T*_GX X_a0 = _X.T*_a0 g_a0 = g.T*_a0 x_a0 = x.T*_a0 XG_a0 = _XG.T*_a0 b0_a0 = _b0.T*_a0 GX_b0 = _GX.T*_b0 dGdX_GX = _dGdX.T*_GX dGdX_dgdx = _dGdX.T*_dgdx dgdx_GX = _dgdx.T*_GX dgdx_XG = _dgdx.T*_XG dgdx_a0 = _dgdx.T*_a0 dxg_dgdx = _dxg.T*_dgdx dgdx_dgdx = _dgdx.T*_dgdx dgdx_b0 = _dgdx.T*_b0 # Now we form the small matrices dxg_beta0 = np.hstack([dxg_XG*qb.T, dxg_b0]) ub, sb, _ = np.linalg.svd(dxg_dxg - dxg_beta0*dxg_beta0.T) # Only keep significant basis vectors inds = np.where(sb >= _SINGULAR_TOL)[0] sb = sb[inds] ub = ub[:, inds] db = np.asmatrix(ub/(np.sqrt(sb)[None, :] + _SINGULAR_TOL**2)) cb = -dxg_beta0.T*db dXG_beta0 = np.hstack([dXG_XG*qb.T, dXG_b0]) dXG_beta1 = dXG_beta0*cb + dXG_dxg*db dXG_beta = np.hstack([dXG_beta0, dXG_beta1]) dGdX_a0 = GX_a0 - (g_a0 - x_a0) dgdx_alpha0 = np.hstack([dgdx_GX*qa, dgdx_a0]) ua, sa, _ = np.linalg.svd(dgdx_dgdx - dgdx_alpha0*dgdx_alpha0.T) # Only keep significant basis vectors (assuming everything has # been scaled to be of unit norm here...) inds = np.where(sa >= _SINGULAR_TOL)[0] sa = sa[inds] ua = ua[:, inds] da = np.asmatrix(ua/(np.sqrt(sa)[None, :] + _SINGULAR_TOL**2)) ca = -dgdx_alpha0.T*da dGdX_alpha0 = np.hstack([dGdX_GX*qa, dGdX_a0]) dGdX_alpha1 = dGdX_alpha0*ca + dGdX_dgdx*da dGdX_alpha = np.hstack([dGdX_alpha0, dGdX_alpha1]) # the basis vectors alpha and beta are stacked from three sets # with sizes n, m0 and (m-m0) N, n_ = self.X.shape na0, na = self._qa.shape nb, nb0 = self._qb.shape ma0 = self._a.shape[1] # Previous size of aux space mb0 = self._bt.shape[0] ma = ma0 + ca.shape[1] # New size of aux space mb = mb0 + cb.shape[1] eye0 = np.asmatrix(np.eye(na + ma0, nb + mb0, dtype=float)) eye1 = np.asmatrix(np.eye(na + ma, nb + mb, dtype=float)) alpha_alpha0 = np.asmatrix(np.eye(na + ma, na + ma0, dtype=float)) beta_beta0 = np.asmatrix(np.eye(nb + mb, nb + mb0, dtype=float)) beta0_dgdx = np.vstack([qb*dgdx_XG.T, dgdx_b0.T]) dxg_alpha0 = np.hstack([dxg_GX*qa, dxg_a0]) beta0_alpha0 = np.bmat( [[qb*XG_GX*qa, qb*XG_a0], [GX_b0.T*qa, b0_a0]]) beta0_alpha1 = beta0_alpha0*ca + beta0_dgdx*da beta1_alpha0 = cb.T*beta0_alpha0 + db.T*dxg_alpha0 beta1_alpha1 = (cb.T*(beta0_alpha0*ca + beta0_dgdx*da) + db.T*(dxg_alpha0*ca + dxg_dgdx*da)) beta_alpha = np.bmat([[beta0_alpha0, beta0_alpha1], [beta1_alpha0, beta1_alpha1]]) ############################################################ # Now we have the bases and all the required matrices, so we # do the real work. def inv(A): if 0 == np.prod(A.shape): # Empty matrix needs a degenerate case return A.T return np.linalg.pinv(A, rcond=_SINGULAR_TOL**2) w, w0, wr = self.get_w(dXdX, dGdG) if w is not None: w = np.asarray(w) if 1 == len(w.shape): dGdX_alpha *= w[:, None] dXG_beta *= w[:, None] else: dGdX_alpha = dot(w.T, dGdX_alpha) dXG_beta = dot(w.T, dXG_beta) if 1 == self.method: num = dGdX_alpha.T*dXG_beta if self._b is not None: j0 = -inv(inv(self._b) + beta0_alpha0) num += w0*w0*alpha_alpha0*j0*beta_beta0.T denom = dXG_beta.T*dXG_beta + (w0*w0 + wr*wr)*np.eye(nb+mb) j = num*inv(denom) b = -inv(inv(j) + beta_alpha) elif 2 == self.method: num = -dGdX_alpha.T*dXG_beta if self._b is not None: num += w0*w0*alpha_alpha0*self._b*beta_beta0.T denom = dXG_beta.T*dXG_beta + (w0*w0 + wr*wr)*np.eye(nb+mb) b = num*inv(denom) # Set singular values to zero so they don't contribute. if 0 == np.prod(b.shape): # There is no dyadic contribution. b = None _a = None _bt = None else: u, d, vt = np.linalg.svd(b, full_matrices=False) bad_inds = np.where(np.logical_or(d < _SINGULAR_TOL, 1/_SINGULAR_TOL < d))[0] d[bad_inds] = 0.0 b = dot(np.multiply(u,d[None, :]), vt) # Now that we have b, we perform an svd in the auxiliary basis to # determine what basis vectors to keep. qa_, qbt_ = self._restrict(b, na, nb) b[na:, nb:] = qa_*b[na:, nb:]*qbt_ # na + ma0 ma - ma0 # |alpha> = [alpha0, alpha0*ca + dgdx*da] # = [GX*qa, [a0, [GX*qa, a0]*ca + dgdx*da]] # = [GX*qa, A_new] # = [GX*qa, [GX, a0, dgdx]*A] # |A_new> = [GX, a0, dgdx]*A # A_[na_ + ma0 + (ma - ma0), ma = ma0 + (ma-ma0)] # A = [[0, qa*ca] # [1, ca] # [0, da]] A = np.bmat( [[np.zeros((na0, ma0)), qa*ca[:na,:]], [np.eye(ma0, ma0), ca[na:,:]], [np.zeros((da.shape[0], ma0)), da]]) Bt = np.bmat( [[np.zeros((nb0, mb0)), qb.T*cb[:nb,:]], [np.eye(mb0, mb0), cb[nb:,:]], [np.zeros((db.shape[0], mb0)), db]]).T A = A*qa_.T Bt = qbt_.T*Bt if ma > self.max_aux_basis_size: ma = self.max_aux_basis_size A = A[:, :ma] b = b[:ma, :] if mb > self.max_aux_basis_size: mb = self.max_aux_basis_size Bt = Bt[:mb, :] b = b[:, :mb] _a = np.hstack([_GX, _a0, _dgdx])*A _bt = Bt*np.vstack([_XG.T, _b0.T, _dxg.T]) self.suspend() self._a = _a self._bt = _bt self._b = b # These are not needed: just for checking and debugging _alpha0 = np.hstack([_GX*qa, _a0]) _alpha1 = _alpha0*ca + _dgdx*da _alpha = np.hstack([_alpha0, _alpha1]) self.resume()
[docs] def get_w(self, dXdX, dGdG): r"""Return weights `w, w0, wr`. Returns ------- w : array Array of weights for `dx`. """ N, n_ = self.X.shape ma = self._a.shape[1] mb = self._bt.shape[0] na = self._qa.shape[1] nb = self._qb.shape[0] w = np.eye(dXdX.shape[0]) w0 = _SINGULAR_TOL wr = 0 return w, w0, wr
def _restrict(self, b, na=0, nb=0): r"""Return unitary matrices `(qa, qb.T)` which can be used to transform `b` so that truncating the matrix will introduce minimal errors. The first `n0` rows and columns of `b` will be held fixed. The present implementation is an approximation: it is not optimal. Parameters ---------- b : array of shape (n,n) Array to be rearranged and truncated. na, nb : int Number of rows and columns to be held fixed. If this is `0`, then the SVD will be used: ``qa, d, qb.T = svd(b)``. """ ua, da, vat = np.linalg.svd(b[na:, :]) ub, db, vbt = np.linalg.svd(b[:, nb:]) return ua, vbt @staticmethod def _ortho(a0, b): r"""Return `(A, B, s)` such that `a1 = dot(a0, A) + dot(b, B)` is orthonormal and orthogonal to `a0`. Parameters ---------- a0 : array The columns of this must be orthonormal. b : array Need not be orthonormal. Returns ------- A, B : array s : array This is the list of singular values for the transformation. If these are too small, then the corresponding dimensions should be dropped. I.e. use only `a1[:, np.where(tol < s)[0]]` Examples -------- >>> np.random.seed(0) >>> a0 = np.linalg.qr(np.random.rand(10, 5))[0][:,:5] >>> b = np.random.rand(10, 2) >>> A, B, s = JinvExtendedBroyden._ortho(a0, b) >>> a1 = np.dot(a0, A) + np.dot(b, B) >>> np.allclose(np.dot(a1.T, a0), 0) True >>> np.allclose(np.dot(a1.T, a1), np.eye(2)) True Notes ----- If `a0` and `b` are big (in the first dimension), then it is best if `a0` is `F_CONTIGUOUS` (i.e. the transpose of a `C_CONTIGUOUS` array) and `b` is `C_CONTIGUOUS` to avoid :func:`np.dot` from making unneeded copies. (One copy will always be made when computing `dot(b.T, b)`). Uses the relationship .. math:: \braket{\beta|\beta} - \braket{\beta|\uvect{\alpha}_{0}} \braket{\uvect{\alpha}_{0}|\beta} &= \mat{v} \mat{\sigma}^2 \mat{v}^{T},\\ \begin{pmatrix} \ket{\uvect{\alpha}_{0}} & \ket{\uvect{\alpha}_{1}} \end{pmatrix} &= \begin{pmatrix} \ket{\uvect{\alpha}_{0}} & \ket{\beta} \end{pmatrix} \cdot \begin{pmatrix} \mat{1} & -\braket{\uvect{\alpha}_{0}|\beta}\mat{v} \mat{\sigma}^{-1}\\ \mat{0} & \mat{v}\mat{\sigma}^{-1} \end{pmatrix}. """ a0_b = dot(a0.T, b) s2, v = eigh(dot(b.T, b) - dot(a0_b.T, a0_b)) s = np.sqrt(s2) A = -np.dot(a0_b, v/s[None, :]) B = v/s[None, :] return (A, B, s)
[docs] def asarray(self): """Return the Jacobian matrix. Examples -------- >>> M = _JinvExtendedBroyden() >>> M.asarray() array(1.0) """ if np.prod(self._b.shape) == 0: return np.array(1.0) _alpha = np.hstack([dot(self.G - self.X, self._qa), self._a]) if 1 == self.method: _XG = self.X else: _XG = self.G _betat = np.vstack([dot(self._qb, _XG.T), self._bt]) M = dot(dot(_alpha, self._b), _betat) return M + np.eye(self.X.shape[0], self.G.shape[0], dtype=float)
class _JinvExtendedBroydenMemoryLimited(StateVars): r"""Class to represent an inverse Jacobian for the Extended Broyden method. We represent the inverse Jacobian as: .. math:: \mat{B} &= \mat{1} + \ket{\uvect{\alpha}}\mat{b}\bra{\uvect{\beta}},\\ \ket{\uvect{\alpha}} &= \begin{pmatrix} \ket{G - X}\mat{q}_{a} & \ket{\uvect{a}}\end{pmatrix},\\ \bra{\uvect{\beta}} &= \begin{pmatrix} \mat{q}_{b}\bra{X,G},\\ \bra{\uvect{b}}\end{pmatrix}, with the choice of $\ket{X}$ (method 1) or $\ket{G}$ (method 2) for $\ket{\uvect{\beta}}$ depends on the method. Notes ----- Since python is implemented in C, the arrays are typically stored as `C_CONTIGUOUS` and :func:`numpy.dot` is efficient if both arguments are `C_CONTIGUOUS`, otherwise copies are made. Thus, where one needs to compute $\braket{\mat{A}|\mat{B}}$, one should store the transpose of the first matrix $\bra{\mat{A}}$. To Do ----- .. todo:: Add support for non-square matrices. This is only a bookkeeping issue, but probably not that useful so maybe leave for a while. """ implements(IJinv) _state_vars = [ ('X', Required, r"$N\times n$ array $\ket{\mat{X}}$ of previous steps $x_i$."), ('G', Required, r"$N\times n$ array $\ket{\mat{G}}$ of objectives $G(x_i)$."), ('_a', None, r"$N\times m$ array of extra range basis vectors."), ('_bt', None, r"""$m\times N$ array of extra domain basis vectors stored in row-vector format for improved :func:`numpy.dot` performance."""), ('_qa', None, r"""Transformation $\mat{q}_{a}$ such that $(\ket{G}-\ket{X})\mat{q}_{a}$ is the first part of the basis for the range."""), ('_qb', None, r"""Transformation $\mat{q}_{b}$ such that $\mat{q}_{b}*\bra{X,G}$ is the first part of the basis for the domain."""), ('_b', None, r"Dyadic portion of the inverse Jacobian in the implied basis."), ('method', 1, r"""`1` for Broyden's good method and `2` for the \"bad\" method."""), ('max_aux_basis_size', np.inf, r"""Maxiumum number of extra basis states to store for representing the inverse Jacobian."""), ('_check', True, r"""If `True`, perform some potentially expensive sanity checks."""), ] process_vars() def __init__(self, *v, **kw): N, n_ = self.X.shape assert self.G.shape == self.X.shape if self._a is None: self._a = np.zeros((N, 0), dtype=float) self._bt = np.zeros((0, N), dtype=float) ma = self._a.shape[1] mb = self._bt.shape[0] assert self._a.shape == (N, ma) assert self._bt.shape == (mb, N) if self._qa is None: _GX = self.G - self.X GXGX = np.dot(_GX.T, _GX) del _GX u, d, vt = np.linalg.svd(GXGX) inds = np.where(d/d.max() > _SINGULAR_TOL)[0] self._qa = u[:,inds]/np.sqrt(d[inds])[None,:] na = self._qa.shape[1] assert self._qa.shape == (n_, na) if self._qb is None: if 1 == self.method: _XG = self.X else: _XG = self.G XGXG = np.dot(_XG.T, _XG) del _XG u, d, vt = np.linalg.svd(XGXG) inds = np.where(d/d.max() > _SINGULAR_TOL)[0] self._qb = (u[:,inds]/np.sqrt(d[inds])[None,:]).T nb = self._qb.shape[0] assert self._qb.shape == (nb, n_) if self._check: # Check orthonormality of basese qa = np.asmatrix(self._qa) _GX = self.G - self.X GXGX = np.asmatrix(np.dot(_GX.T, _GX)) GXa0 = np.dot(_GX.T, self._a) a0a0 = np.dot(self._a.T, self._a) del _GX assert np.allclose(qa.T*GXGX*qa, np.eye(na)) assert np.allclose(GXa0, 0) assert np.allclose(a0a0, np.eye(ma)) qb = np.asmatrix(self._qb) if 1 == self.method: _XG = self.X else: _XG = self.G b0XG = np.dot(self._bt, _XG) b0b0 = np.dot(self._bt, self._bt.T) XGXG = np.asmatrix(np.dot(_XG.T, _XG)) del _XG assert np.allclose(qb*XGXG*qb.T, np.eye(nb)) assert np.allclose(b0XG, 0) assert np.allclose(b0b0, np.eye(mb)) if self._b is None: self._b = np.zeros((na + ma, nb + mb), dtype=float) assert self._b.shape == (na + ma, nb + mb) def __mul__(self, dg, mutate=False): r"""Return the matrix multiplication `B*dg`. Parameters ---------- dg : array This should be in column vector format with dimension `dg.shape = (N,_)`. mutate : bool If `True`, then `dg` will be mutated, otherwise a copy will be returned. (Note: with the present implementation of :func:`numpy.dot`, one copy of `dg` will be made). Notes ----- .. math:: \mat{B}\ket{\d{g}} = \ket{\d{g}} + \begin{pmatrix} \ket{G - X}\mat{q}_{a} & \ket{\uvect{a}} \end{pmatrix} \mat{b} \begin{pmatrix} \mat{q}_{b}\bra{X,G}\\ \bra{\uvect{b}} \end{pmatrix}\ket{\d{g}} """ dg = np.asarray(dg) if 1 == len(dg.shape): dg = dg[:, None] na = self._qa.shape[1] ma = self._a.shape[1] if 1 == self.method: # The default dot copies dg here, so hopefully it is small. XGdg = dot(dg.T, self.X).T else: assert 2 == self.method XGdg = dot(dg.T, self.G).T tmp = dot(self._b, np.vstack([dot(self._qb, XGdg), dot(self._bt, dg)])) tmp1 = dot(self._qa, tmp[:na,:]) tmp2 = tmp[na:,:] if mutate: res = dg else: res = 1*dg res += np.dot(self.G, tmp1) res -= np.dot(self.X, tmp1) res += np.dot(self._a, tmp2) return res def __rmul__(self, dxt, mutate=False): r"""Return the matrix multiplication `dxt*B`. Parameters ---------- dxt : array This should be in row vector format with dimension `dg.shape = (_,N)`. mutate : bool If `True`, then `dxt` will be mutated, otherwise a copy will be returned. (Note: with the present implementation of :func:`numpy.dot`, one copy of `dxt` will be made). Notes ----- .. math:: \bra{\d{x}}\mat{B} = \bra{\d{x}} + \bra{\d{x}} \begin{pmatrix} \ket{G - X}\mat{q}_{a} & \ket{\uvect{a}} \end{pmatrix} \mat{b} \begin{pmatrix} \mat{q}_{b}\bra{X,G}\\ \bra{\uvect{b}} \end{pmatrix} """ nb = self._qb.shape[0] mb = self._bt.shape[0] tmp = dot(np.hstack([dot(dot(dxt, self.G) - dot(dxt, self.X), self._qa), dot(dxt, self._a)]), self._b) tmp1 = dot(tmp[:, :nb], self._qb) tmp2 = tmp[:, nb:] if 1 == self.method: # The default dot copies dg here, so hopefully it is small. tmp1 = dot(self.X, tmp1.T).T else: assert 2 == self.method tmp1 = dot(self.G, tmp1.T).T if mutate: res = dxt else: res = 1*dxt res += tmp1 res += np.dot(tmp2, self._bt) return res def update(self, x0=None, g0=None): r"""Update the inverse Jacobian so that it is valid at the point `(x, g)`. Return a measure of the incompatibility of the new data with the current approximation. Examples -------- Here we perform some tests with a linear function >>> np.random.seed(0) >>> N = 4 >>> J = np.random.rand(N, N) >>> x0 = np.random.rand(N) >>> def g(x): ... return np.dot(J, x - x0) >>> B = _JinvExtendedBroyden(X=np.zeros((N,1)), ... G=g(0*x0)[:,None]) >>> x = np.random.rand(N) >>> B.update(x, g(x)) >>> np.allclose(B*(g(x) - g(x0)), x - x0) True >>> for n in xrange(N): ... x = np.random.rand(N) ... B.update(x, g(x)) Notes ----- Our goal here is to minimize the amount of processing of large matrices. As such, we try to work with projections as much as possible. We start with the following large matrices: $\ket{X}$, $\ket{G}$, $\ket{x}$, $\ket{g}$, $\ket{a_0}$, $\ket{b_0}$. We use the following: .. math:: \ket{\delta\mat{x}} &= \ket{x} - \ket{X_0}, \qquad \ket{\delta\mat{g}} = \ket{g} - \ket{G_0},\\ \ket{\d\mat{X}} &= \ket{\mat{X}} - \ket{x}, \qquad \ket{\d\mat{G}} = \ket{\mat{G}} - \ket{g},\\ \ket{\uvect{\alpha}_0} &= \begin{pmatrix} \ket{\mat{G} - \mat{X}}\mat{q}_{a} & \ket{\mat{A}_0} \end{pmatrix},\\ \bra{\uvect{\beta}_0} &= \begin{pmatrix} \mat{q}_{b}\bra{\mat{X}, \mat{G}} \\ \bra{\mat{B}_0} \end{pmatrix},\\ \ket{\uvect{\alpha}} &= \begin{pmatrix} \ket{\uvect{\alpha}_0} & \ket{\uvect{\alpha}_1} \end{pmatrix},\\ \bra{\uvect{\beta}} &= \begin{pmatrix} \bra{\uvect{\beta}_0}\\ \bra{\uvect{\beta}_1} \end{pmatrix},\\ \ket{\uvect{\alpha}_1} &= \ket{\uvect{\alpha}_0}\mat{c}_{a} + \ket{\delta{g} - \delta{x}}\mat{d}_{a},\\ \bra{\uvect{\beta}_1} &= \mat{c}_{b}^{T}\bra{\uvect{\beta}_0} + \mat{d}_{b}^{T}\bra{\delta{x}, \delta{g}},\\ \mat{d}_{a} &= \mat{v}_{a}\mat{s}_{a}^{-1},\qquad \mat{c}_{a} = -\braket{\uvect{\alpha}_0|\delta{g} - \delta{x}} \mat{v}_{a}\mat{s}_{a}^{-1},\\ \mat{d}_{b} &= \mat{v}_{b}\mat{s}_{b}^{-1},\qquad \mat{c}_{b} = -\braket{\uvect{\beta}_0|\delta{x},\delta{g}} \mat{v}_{b}\mat{s}_{b}^{-1},\\ \mat{v}_{a}\mat{s}_{a}^2\mat{v}_{a}^{T} &= \braket{\delta{g} - \delta{x}|\delta{g} - \delta{x}} - \braket{\delta{g} - \delta{x}|\uvect{\alpha}_0} \braket{\uvect{\alpha}_0|\delta{g} - \delta{x}},\\ \mat{v}_{b}\mat{s}_{b}^2\mat{v}_{b}^{T} &= \braket{\delta{x},\delta{g}|\delta{x},\delta{g}} - \braket{\delta{x},\delta{g}|\uvect{\beta}_0} \braket{\uvect{\beta}_0|\delta{x},\delta{g}},\\ \braket{\d\mat{G}-\d\mat{X}|\uvect{\alpha}} &= \begin{pmatrix} \braket{\d\mat{G}-\d\mat{X}|\uvect{\alpha}_0} & \braket{\d\mat{G}-\d\mat{X}|\uvect{\alpha}_1} \end{pmatrix},\\ \braket{\d\mat{G} - \d\mat{X}|\uvect{\alpha}_0} &= \begin{pmatrix} \braket{\d\mat{G} - \d\mat{X}|\mat{G} - \mat{X}}\mat{q}_{a} & \braket{\d\mat{G} - \d\mat{X}|\mat{A}_0} \end{pmatrix},\\ \braket{\d\mat{G} - \d\mat{X}|\uvect{\alpha}_1} &= \braket{\d\mat{G} - \d\mat{X}|\uvect{\alpha}_0}\mat{c}_{a} + \braket{\d\mat{G} - \d\mat{X}|\delta{g}-\delta{x}}\mat{d}_{a},\\ \braket{\d\mat{X},\d\mat{G}|\uvect{\beta}} &= \begin{pmatrix} \braket{\d\mat{X},\d\mat{G}|\uvect{\beta}_0} & \braket{\d\mat{X},\d\mat{G}|\uvect{\beta}_1} \end{pmatrix} ,\\ \braket{\d\mat{X},\d\mat{G}|\uvect{\beta}_0} &= \begin{pmatrix} \braket{\d\mat{X},\d\mat{G}|\mat{X},\mat{G}}\mat{q}_{b}^{T} & \braket{\d\mat{X},\d\mat{G}|\mat{B}_0} \end{pmatrix},\\ \braket{\d\mat{X},\d\mat{G}|\uvect{\beta}_1} &= \braket{\d\mat{X},\d\mat{G}|\uvect{\beta}_0}\mat{c}_{b} + \braket{\d\mat{X},\d\mat{G}|\delta{x},\delta{g}}\mat{d}_{b},\\ \braket{\uvect{\alpha}|\uvect{\alpha}_0} &= \braket{\uvect{\beta}|\uvect{\beta}_0} = \begin{pmatrix} \mat{1}\\ \mat{0} \end{pmatrix},\\ \braket{\beta|\alpha} &= \begin{pmatrix} \mat{q}_{b}\braket{\mat{X}, \mat{G}| \mat{G} - \mat{X}}\mat{q}_{a} & \mat{q}_{b}\braket{\mat{X}, \mat{G}| \mat{G} - \mat{X}}\mat{q}_{a} \\ \end{pmatrix} """ # First we cast all of the arrays as matrices so we can # use * as matrix multiplication. Large matrices are # preceeded by an underscore. x = np.asarray(x0) g = np.asarray(g0) if 1 == len(x.shape): x = x[:,None] if 1 == len(g.shape): g = g[:,None] _X = np.asmatrix(self.X) _G = np.asmatrix(self.G) _b0 = np.asmatrix(self._bt.T) _a0 = np.asmatrix(self._a) qa = np.asmatrix(self._qa) qb = np.asmatrix(self._qb) # We assume that g is only one column, so a copy here is okay. _dg = np.asmatrix(g - _G[:,0]) _dx = np.asmatrix(x - _X[:,0]) _dgdx = _dg - _dx # Consider optimizing these copies away later _dG = np.asmatrix(_G - g) _dX = np.asmatrix(_X - x) _dGdX = _dG - _dX _GX = np.asmatrix(_G - _X) if 1 == self.method: _dxg = _dx _XG = np.asmatrix(_X) _dXG = _dX else: _dxg = _dg _XG = np.asmatrix(_G) _dXG = _dG # These are all of the brackets required. dXdX = _dX.T*_dX dGdG = _dG.T*_dG dXG_XG = _dXG.T*_XG dXG_dxg = _dXG.T*_dxg dXG_b0 = _dXG.T*_b0 dxg_XG = _dxg.T*_XG dxg_GX = _dxg.T*_GX dxg_dxg = _dxg.T*_dxg dxg_b0 = _dxg.T*_b0 dxg_a0 = _dxg.T*_a0 dGdX_a0 = _dGdX.T*_a0 GX_a0 = _GX.T*_a0 XG_GX = _XG.T*_GX X_a0 = _X.T*_a0 g_a0 = g.T*_a0 x_a0 = x.T*_a0 XG_a0 = _XG.T*_a0 b0_a0 = _b0.T*_a0 GX_b0 = _GX.T*_b0 dGdX_GX = _dGdX.T*_GX dGdX_dgdx = _dGdX.T*_dgdx dgdx_GX = _dgdx.T*_GX dgdx_XG = _dgdx.T*_XG dgdx_a0 = _dgdx.T*_a0 dxg_dgdx = _dxg.T*_dgdx dgdx_dgdx = _dgdx.T*_dgdx dgdx_b0 = _dgdx.T*_b0 # Now we form the small matrices dxg_beta0 = np.hstack([dxg_XG*qb.T, dxg_b0]) ub, sb, _ = np.linalg.svd(dxg_dxg - dxg_beta0*dxg_beta0.T) # Only keep significant basis vectors inds = np.where(sb >= _SINGULAR_TOL)[0] sb = sb[inds] ub = ub[:, inds] db = np.asmatrix(ub/(np.sqrt(sb)[None, :] + _SINGULAR_TOL**2)) cb = -dxg_beta0.T*db dXG_beta0 = np.hstack([dXG_XG*qb.T, dXG_b0]) dXG_beta1 = dXG_beta0*cb + dXG_dxg*db dXG_beta = np.hstack([dXG_beta0, dXG_beta1]) dGdX_a0 = GX_a0 - (g_a0 - x_a0) dgdx_alpha0 = np.hstack([dgdx_GX*qa, dgdx_a0]) ua, sa, _ = np.linalg.svd(dgdx_dgdx - dgdx_alpha0*dgdx_alpha0.T) # Only keep significant basis vectors (assuming everything has # been scaled to be of unit norm here...) inds = np.where(sa >= _SINGULAR_TOL)[0] sa = sa[inds] ua = ua[:, inds] da = np.asmatrix(ua/(np.sqrt(sa)[None, :] + _SINGULAR_TOL**2)) ca = -dgdx_alpha0.T*da dGdX_alpha0 = np.hstack([dGdX_GX*qa, dGdX_a0]) dGdX_alpha1 = dGdX_alpha0*ca + dGdX_dgdx*da dGdX_alpha = np.hstack([dGdX_alpha0, dGdX_alpha1]) # the basis vectors alpha and beta are stacked from three sets # with sizes n, m0 and (m-m0) N, n_ = self.X.shape na0, na = self._qa.shape nb, nb0 = self._qb.shape ma0 = self._a.shape[1] # Previous size of aux space mb0 = self._bt.shape[0] ma = ma0 + ca.shape[1] # New size of aux space mb = mb0 + cb.shape[1] eye0 = np.asmatrix(np.eye(na + ma0, nb + mb0, dtype=float)) eye1 = np.asmatrix(np.eye(na + ma, nb + mb, dtype=float)) alpha_alpha0 = np.asmatrix(np.eye(na + ma, na + ma0, dtype=float)) beta_beta0 = np.asmatrix(np.eye(nb + mb, nb + mb0, dtype=float)) beta0_dgdx = np.vstack([qb*dgdx_XG.T, dgdx_b0.T]) dxg_alpha0 = np.hstack([dxg_GX*qa, dxg_a0]) beta0_alpha0 = np.bmat( [[qb*XG_GX*qa, qb*XG_a0], [GX_b0.T*qa, b0_a0]]) beta0_alpha1 = beta0_alpha0*ca + beta0_dgdx*da beta1_alpha0 = cb.T*beta0_alpha0 + db.T*dxg_alpha0 beta1_alpha1 = (cb.T*(beta0_alpha0*ca + beta0_dgdx*da) + db.T*(dxg_alpha0*ca + dxg_dgdx*da)) beta_alpha = np.bmat([[beta0_alpha0, beta0_alpha1], [beta1_alpha0, beta1_alpha1]]) ############################################################ # Now we have the bases and all the required matrices, so we # do the real work. def inv(A): if 0 == np.prod(A.shape): # Empty matrix needs a degenerate case return A.T return np.linalg.pinv(A, rcond=_SINGULAR_TOL**2) w, w0, wr = self.get_w(dXdX, dGdG) if w is not None: w = np.asarray(w) if 1 == len(w.shape): dGdX_alpha *= w[:, None] dXG_beta *= w[:, None] else: dGdX_alpha = dot(w.T, dGdX_alpha) dXG_beta = dot(w.T, dXG_beta) if 1 == self.method: num = dGdX_alpha.T*dXG_beta if self._b is not None: j0 = -inv(inv(self._b) + beta0_alpha0) num += w0*w0*alpha_alpha0*j0*beta_beta0.T denom = dXG_beta.T*dXG_beta + (w0*w0 + wr*wr)*np.eye(nb+mb) j = num*inv(denom) b = -inv(inv(j) + beta_alpha) elif 2 == self.method: num = -dGdX_alpha.T*dXG_beta if self._b is not None: num += w0*w0*alpha_alpha0*self._b*beta_beta0.T denom = dXG_beta.T*dXG_beta + (w0*w0 + wr*wr)*np.eye(nb+mb) b = num*inv(denom) # Set singular values to zero so they don't contribute. if 0 == np.prod(b.shape): # There is no dyadic contribution. b = None _a = None _bt = None else: u, d, vt = np.linalg.svd(b, full_matrices=False) bad_inds = np.where(np.logical_or(d < _SINGULAR_TOL, 1/_SINGULAR_TOL < d))[0] d[bad_inds] = 0.0 b = dot(np.multiply(u,d[None, :]), vt) # Now that we have b, we perform an svd in the auxiliary basis to # determine what basis vectors to keep. qa_, qbt_ = self._restrict(b, na, nb) b[na:, nb:] = qa_*b[na:, nb:]*qbt_ # na + ma0 ma - ma0 # |alpha> = [alpha0, alpha0*ca + dgdx*da] # = [GX*qa, [a0, [GX*qa, a0]*ca + dgdx*da]] # = [GX*qa, A_new] # = [GX*qa, [GX, a0, dgdx]*A] # |A_new> = [GX, a0, dgdx]*A # A_[na_ + ma0 + (ma - ma0), ma = ma0 + (ma-ma0)] # A = [[0, qa*ca] # [1, ca] # [0, da]] A = np.bmat( [[np.zeros((na0, ma0)), qa*ca[:na,:]], [np.eye(ma0, ma0), ca[na:,:]], [np.zeros((da.shape[0], ma0)), da]]) Bt = np.bmat( [[np.zeros((nb0, mb0)), qb.T*cb[:nb,:]], [np.eye(mb0, mb0), cb[nb:,:]], [np.zeros((db.shape[0], mb0)), db]]).T A = A*qa_.T Bt = qbt_.T*Bt if ma > self.max_aux_basis_size: ma = self.max_aux_basis_size A = A[:, :ma] b = b[:ma, :] if mb > self.max_aux_basis_size: mb = self.max_aux_basis_size Bt = Bt[:mb, :] b = b[:, :mb] _a = np.hstack([_GX, _a0, _dgdx])*A _bt = Bt*np.vstack([_XG.T, _b0.T, _dxg.T]) self.suspend() self._a = _a self._bt = _bt self._b = b # These are not needed: just for checking and debugging _alpha0 = np.hstack([_GX*qa, _a0]) _alpha1 = _alpha0*ca + _dgdx*da _alpha = np.hstack([_alpha0, _alpha1]) self.resume() def get_w(self, dXdX, dGdG): r"""Return weights `w, w0, wr`. Returns ------- w : array Array of weights for `dx`. """ N, n_ = self.X.shape ma = self._a.shape[1] mb = self._bt.shape[0] na = self._qa.shape[1] nb = self._qb.shape[0] w = np.eye(dXdX.shape[0]) w0 = _SINGULAR_TOL wr = 0 return w, w0, wr def _restrict(self, b, na=0, nb=0): r"""Return unitary matrices `(qa, qb.T)` which can be used to transform `b` so that truncating the matrix will introduce minimal errors. The first `n0` rows and columns of `b` will be held fixed. The present implementation is an approximation: it is not optimal. Parameters ---------- b : array of shape (n,n) Array to be rearranged and truncated. na, nb : int Number of rows and columns to be held fixed. If this is `0`, then the SVD will be used: ``qa, d, qb.T = svd(b)``. """ ua, da, vat = np.linalg.svd(b[na:, :]) ub, db, vbt = np.linalg.svd(b[:, nb:]) return ua, vbt @staticmethod def _ortho(a0, b): r"""Return `(A, B, s)` such that `a1 = dot(a0, A) + dot(b, B)` is orthonormal and orthogonal to `a0`. Parameters ---------- a0 : array The columns of this must be orthonormal. b : array Need not be orthonormal. Returns ------- A, B : array s : array This is the list of singular values for the transformation. If these are too small, then the corresponding dimensions should be dropped. I.e. use only `a1[:, np.where(tol < s)[0]]` Examples -------- >>> np.random.seed(0) >>> a0 = np.linalg.qr(np.random.rand(10, 5))[0][:,:5] >>> b = np.random.rand(10, 2) >>> A, B, s = JinvExtendedBroyden._ortho(a0, b) >>> a1 = np.dot(a0, A) + np.dot(b, B) >>> np.allclose(np.dot(a1.T, a0), 0) True >>> np.allclose(np.dot(a1.T, a1), np.eye(2)) True Notes ----- If `a0` and `b` are big (in the first dimension), then it is best if `a0` is `F_CONTIGUOUS` (i.e. the transpose of a `C_CONTIGUOUS` array) and `b` is `C_CONTIGUOUS` to avoid :func:`np.dot` from making unneeded copies. (One copy will always be made when computing `dot(b.T, b)`). Uses the relationship .. math:: \braket{\beta|\beta} - \braket{\beta|\uvect{\alpha}_{0}} \braket{\uvect{\alpha}_{0}|\beta} &= \mat{v} \mat{\sigma}^2 \mat{v}^{T},\\ \begin{pmatrix} \ket{\uvect{\alpha}_{0}} & \ket{\uvect{\alpha}_{1}} \end{pmatrix} &= \begin{pmatrix} \ket{\uvect{\alpha}_{0}} & \ket{\beta} \end{pmatrix} \cdot \begin{pmatrix} \mat{1} & -\braket{\uvect{\alpha}_{0}|\beta}\mat{v} \mat{\sigma}^{-1}\\ \mat{0} & \mat{v}\mat{\sigma}^{-1} \end{pmatrix}. """ a0_b = dot(a0.T, b) s2, v = eigh(dot(b.T, b) - dot(a0_b.T, a0_b)) s = np.sqrt(s2) A = -np.dot(a0_b, v/s[None, :]) B = v/s[None, :] return (A, B, s) def asarray(self): """Return the Jacobian matrix. Examples -------- >>> M = _JinvExtendedBroyden() >>> M.asarray() array(1.0) """ if np.prod(self._b.shape) == 0: return np.array(1.0) _alpha = np.hstack([dot(self.G - self.X, self._qa), self._a]) if 1 == self.method: _XG = self.X else: _XG = self.G _betat = np.vstack([dot(self._qb, _XG.T), self._bt]) M = dot(dot(_alpha, self._b), _betat) return M + np.eye(self.X.shape[0], self.G.shape[0], dtype=float)
[docs]class JinvBroyden(StateVars): r"""Class to represent an inverse Jacobian for the Generalized Broyden method. This is intended to solve root-finding problems of the form .. math:: \vect{G}(\vect{x}) = 0. This is designed for large scale problems where it is typically prohibitive to store the entire Jacobian, so a limited memory approximation is typically made by using :attr:`dyadic` and :attr:`n_prev`. As a result, we assume that the problem is preconditions so that the Jacobian is well approximated by the identity. .. math:: \mat{J} \approx \mat{1}. In other words, we implicitly expect that the problem is preconditioned in such a way that .. math:: \vect{G}(\vect{x}) = \vect{x} - \vect{F}(\vect{x}) where the iteration $\vect{x} \mapsto \vect{F}(\vect{x})$ is "almost" convergent. This assumption plays out in much of the internal analysis about when behaviour is singular. The core of this is one of the following two updates to the inverse Jacobian: .. math:: \mat{B}_{\text{1a)}} &= \mat{B}_{0} + (\ket{\d\mat{X}} - \mat{B}_{0}\ket{\d\mat{G}})\cdot\mat{w}^2 \bra{\d\mat{X}}\cdot\frac{1}{ \mat{B}_{0}\ket{\d\mat{G}}\mat{w}^2\bra{\d\mat{X}} + w_{0}^2\mat{\Gamma}\mat{\Gamma}^{T}}\cdot\mat{B}_{0}, \\ \mat{B}_{\text{2b)}} &= \mat{B}_{0} + (\ket{\d\mat{X}} - \mat{B}_{0}\ket{\d\mat{G}})\cdot\mat{w}^2 \bra{\d\mat{G}}\cdot\frac{1}{ \ket{\d\mat{G}}\mat{w}^2\bra{\d\mat{G}} + w_{0}^2\mat{\Gamma}\mat{\Gamma}^{T}} where the weights $\mat{w}$ are computed using :meth:`_get_weight` and the inversion of the denominator is computed using :meth:`_inversion`. The displacements are computed from the current step $\ket{x}$ and $\ket{g}$ and the previous steps $\ket{\mat{X}}$ and $\ket{\mat{G}}$, of which we store the previous :attr:`n_prev` steps. .. math:: \ket{\d\mat{X}} &= \ket{x} - \ket{\mat{X}}, \\ \ket{\d\mat{G}} &= \ket{g} - \ket{\mat{G}}. Examples -------- We start with a linear function. The Jacobian should fully represent this. >>> np.random.seed(0) >>> A = np.random.random((3, 3)) >>> Ainv = np.linalg.inv(A) >>> b = np.random.random(3) >>> x0 = np.random.rand(3) >>> def g(x): ... return np.dot(A, x) + b For both :attr:`method` "good" (`'J'`) and "bad" (`'B'`) the :attr:`inv_method` `'svd'` reproduces the Jacobian once the full space is sampled: >>> B = JinvBroyden(dyadic=True, inv_method='svd') >>> for B.method in ['J', 'B']: ... B.reset_all() ... x_ = 1*x0 ... for n in xrange(4): ... g_ = g(x_) ... incompatibility = B.set_point(x=x_, g=g_) ... x_ = B.get_x() ... np.allclose(B.j_inv.asarray(), Ainv) True True The :attr:`inv_method` `'w0'` does not exactly reproduce this as it "remembers" the initial Jacbian $\mat{1}$, but the results are almost correct if `w0` is small enough: >>> B.inv_method = 'w0' >>> B.x_method = 'recent' >>> B.w0 = 0.0001 >>> for B.method in ['J', 'B']: ... B.reset_all() ... x_ = 1*x0 ... for n in xrange(4): ... g_ = g(x_) ... incompatibility = B.set_point(x=x_, g=g_) ... x_ = B.get_x() ... # abs(B.j_inv.asarray() - Ainv).max() ... print(np.allclose(B.j_inv.asarray(), Ainv), ... np.allclose(B.j_inv.asarray(), Ainv, atol=0.001), ... np.allclose(B.j_inv.asarray(), Ainv, atol=0.003)) (False, True, True) (False, False, True) This also demonstrates that the "good" method 1a) converges better to the Jacobian. Here we start with a non-trivial Jacobian but no vectors `xs` and then update. This happens, for example, with :meth:`change_basis`. We only keep `xs` and `gs` for exact `x = g(x)` and this is violated by linear changes in the basis, so we discard the points but keep the Jacobian. >>> np.random.seed(0) >>> B = JinvBroyden(dyadic=False, inv_method='svd', method='B') >>> B0 = np.random.random((4, 4)) >>> B.j_inv = 1*B0 >>> x = np.random.random(4) >>> g = np.random.random(4) >>> B.set_point(x=x, g=g) 0.0 This does not yet modify the Jacobian because we have only one point >>> abs(B0 - B.j_inv).max() 0.0 Now we update the Jacobian >>> dx = np.random.random(4) >>> dg = np.random.random(4) >>> B.set_point(x=x+dx, g=g+dg) # doctest: +ELLIPSIS 0.79... The new Jacobian now fits the slope properly, >>> np.allclose(B.j_inv*dg[:, None], dx[:, None]) True But has not been modified in orthogonal directions (only true for :attr:`method` `'B'`). >>> dg_perp = np.linalg.qr(np.vstack([dg, np.random.random(4)]).T)[0][:, 1] >>> np.allclose(np.dot(dg_perp, dg), 0) True >>> np.allclose(np.dot(B0, dg_perp[:, None]), B.j_inv*dg_perp[:, None]) True If you want to reproduce the original Broyden algorithms, then one should set: :attr:`n_prev` to `2`, :attr:`x_method` to `'recent'`, :attr:`inv_method` to `'svd'`, and :attr:`replace_method` to `'oldest'`. """ implements(IJinv) _state_vars = [ ('curr', None,\ r"""Index of current point in :attr`:xs`. The Jacobian was updated to be valid at this point."""), ('xs', None,\ r"""Previous abscissa stored as row-vectors $\bra{\mat{X}}$"""), ('gs', None,\ r"""Previous `g = G(x)` stored as row-vectors $\bra{\mat{G}}$"""), ('n_prev', 10,\ r"Number of previous steps to store."), ('method', 'J', r"""Denotes which method of update is used in :meth:`update` (see for details)."""), ('x_method', 'recent', r"""Denotes which method is used to determine the next abscissa. The two options determine at which point :meth:`update` updates the Jacobian inverse to be valid and stores this index in :attr:`curr`. This is the point from which :meth:`get_x` takes a downhill step. See :meth:`update` for possible options. .. warning:: Methods other than `'recent'` should be used with caution. It is possible to get stuck into a loop here where the step does not update the Jacobian and so the same step is repeated over and over. See the section :ref:`sec-notes`."""), ('replace_method', 'oldest', r"""Determines which point is replaced in order to satisfy the memory-limitation :attr:`n_prev`. See :meth:`add_point` for possible values. .. warning:: Methods other than `'oldest'` should be used with caution. It is possible to get stuck into a loop here where newest point is continually replaced, making no progress. See yje section :ref:`sec-notes`."""), ('inv_method', 'svd', r"""Method used by :meth:`_inversion` to regulate singular denominators. See :meth:`_inversion` for options."""), ('min_svd', 1e-8, r"""Minimum singular value allowed in :meth:`_inversion` for the 'svd' :attr:`inv_method`. This parameter does not seem to be too sensitive although it may be in problems where the Jacobian is singular."""), ('w0', 1, r"""Weight for minimal Jacobian change. Large values reduce the influence of the previous Jacobian inverse."""), ('w_min', np.sqrt(_EPS), r"""Minimum weight (to ensure that $\mat{w}^2$ is invertible)."""), ('dyadic', False, r"""If `True`, then use a dyadic representation for Jinv. If this is a number greater than 1, then this number is passed as :attr:`DyadicSum.n_max` limiting the rank of the dyadic representation."""), ('j_inv', 1, "Internal representation of matrix"), ('robust', False, """If `True`, then use the singular value decomposition to analyse the Jacobian. Much slower, but could prevent some numerical problems."""), ('max_j_inv', 100.0, """If :attr:`robust`, then the inverse Jacobian is clipped so that the maximum value is this."""), ('min_j_inv', 0.0001, """The inverse Jacobian can become singular. If this is the case, then a step might have very small length even if the error is not small (the step size can be reduced by a factor of the minimum singular value of the inverse Jacobian). If the step reduction is less than this value, then the step is scaled up to have length :attr:`min_step`."""), ('minimize_inverse', False, """If `True`, then use the update the minimizes the change in the Frobenius norm of the inverse Jacobian, otherwise we use the Sherman-Morrison formula to update so as to minimize the norm of the Jacobian."""), ('incompatibility_measure', 'dg', r"""Specifies which criterion to use for measured the compatibility of an update with the present inverse Jacobian. See :attr:`incompatibility` for supported methods."""), ('check_roundoff', True, r"""If `True`, then do some extra calculations to check for roundoff error."""), ('roundoff_cut', 0.001, r"""If the estimated relative rounding errors in dg or dx are greater than this and :attr:`check_roundoff`, then these are not used."""), ('_j_inv_transform', NotImplemented, """Optional methods for updating the Jacobian inverse in :meth:`change_basis`. For testing only."""), ('_dx_norm2_min', _EPS**2, r"""Used in :meth:`update`. If the relative L2 norm squared of `x0 - x` is less than this, then the point `x` is removed from the analysis. This prevents naked singularities in `dx`. .. warning:: This parameter is crucial to obtain good convergence. If it is too large, then the nearest -- most significant -- points will not be used resulting in poor convergence. If it is too small, then ill-condition updates to the Jacobian may be made with no information content because of round off errors."""), ('_age', None, r"""Age of the various points so that one can identify the oldest."""), ('_x0', None, "Internal storage for :attr:`x0`"), ('_g0', None, "Internal storage for :attr:`g0`"), ] process_vars() def __init__(self, *v, **kw): # pylint: disable-msg=W0231 if self.dyadic: n_max = np.inf if self.dyadic > 1: n_max = self.dyadic if self.j_inv is 1: self.j_inv = DyadicSum(n_max=n_max) elif isinstance(self.j_inv, DyadicSum): self.j_inv.n_max = n_max else: raise NotImplementedError( "No support yet for initializing a" + " DyadicSum from an array.") elif self.j_inv is not 1: self.j_inv = np.asmatrix(self.j_inv) @property def x(self): if self.xs is not None and self.curr is not None: return self.xs[self.curr] else: return None @property def g(self): if self.gs is not None and self.curr is not None: return self.gs[self.curr] else: return None @property def x0(self): if self._x0 is None: return self.x else: return self._x0 @property def g0(self): if self._g0 is None: return self.g else: return self._g0 def __mul__(self, dg): if self.j_inv is 1: return dg else: return np.asarray(self.j_inv.__mul__(dg)) def __rmul__(self, dxt): if self.j_inv is 1: return dxt else: return self.j_inv.__rmul__(dxt) def _get_weight(self, dx, dg): r"""Return `(w0, w)` where $\mat{w}$ is the non-singular weight matrix and $w_0$ is the weight for the Jacbian residual. Parameters ---------- dx, dg : array `(n, N)` arrays of previous differences $\bra{\d{X}} = \bra{x} - \bra{\mat{X}}$ (and similarly for $G$) where $\bra{x}$ is the current step and $\bra{X}$ are the previous steps. Notes ----- Currently the weights are implemented as .. math:: w_{ij} = w_{\text{max}}^{-1} \delta_{ij}\frac{1}{\braket{\d\mat{X}|\d\mat{X}}_{ii}} These are normalized so that the largest weight is 1 and all weights are larger than :attr:`w_min`. """ w = np.maximum(self.w_min, 1/(np.diag(dot(dx, dx.T))**2 + _TINY)) w = 0.0*w + 1.0; return (self.w0, np.diag(w/w.max())) def _inversion(self, dx, dg, j_inv): r"""Return `(ket_a, bra_b)` where .. math:: \ket{a} &= \ket{\d{\mat{X}}} - \mat{B}_{0}\ket{\d{\mat{G}}}\\ \bra{b} &= \bra{\mat{B}} \frac{1}{\ket{\mat{A}}\bra{\mat{B}} + w_0^2\mat{\Gamma}\mat{\Gamma}^T}\mat{C} 1a) : .. math:: \bra{\mat{B}} = \mat{w}^2\bra{\d\mat{X}}, \qquad \ket{\mat{A}} = \mat{B}_{0}\ket{\d\mat{G}}, \qquad \mat{C} = \mat{B}_{0}, 2b) : .. math:: \bra{\mat{B}} = \mat{w}^2\bra{\d\mat{G}}, \qquad \ket{\mat{A}} = \ket{\d\mat{G}}, \qquad \mat{C} = \mat{1} Parameters ---------- dx : 2-d array `(N, n)` array $\ket{\d{\mat{X}}}$. dg : 2-d array `(N, n)` array $\ket{\d{\mat{G}}}$. j_inv : operator `(N, N)` operator $\mat{B}_{0}$ representing the previous inverse Jacobian. This is used to compute the residue and to implement the 1a) `w0` method. Notes ----- The following methods permitted as specified by :attr:`inv_method`: `'svd'` : Subspace decomposition based on the singular value decomposition and corresponds to the choice .. math:: \mat{\Gamma} = \ket{\uvect{B}_{\perp}} which is the orthonormal complement to the span $\ket{\uvect{B}}$ of $\ket{\mat{B}}$. In the basis $\left(\ket{\uvect{B}}, \ket{\uvect{B}_{\perp}}\right)$, the matrix $\ket{\mat{A}}\bra{\mat{B}}$ is block lower triangular: .. math:: \begin{pmatrix} \braket{\uvect{B}|\mat{A}} \braket{\mat{B}|\uvect{B}} & \mat{0}\\ \braket{\uvect{B}_{\perp}|\mat{\tilde{A}}} \braket{\mat{B}|\uvect{B}} & w_{0}^2\mat{1} \end{pmatrix}^{-1}. Hence the inverse is also block lower triangular so that the action of $\bra{\mat{B}}$ will project out only the upper component and we can ignore $w_{0}$, writing: .. math:: \bra{b} &= \braket{\mat{B}|\mat{A}}^{-1} \bra{\mat{B}}\mat{C}\\ \bra{b}_{\text{1a}} &= \left( \mat{w}^2\braket{\d\mat{X}|\mat{B}_{0}|\d\mat{G}} \right)^{-1} \mat{w}^2\bra{\d\mat{X}}\mat{B}_{0} = \braket{\d\mat{X}|\mat{B}_{0}|\d\mat{G}}^{-1} \bra{\d\mat{X}}\mat{B}_{0}\\ \bra{b}_{\text{2b}} &= \left( \mat{w}^2\braket{\d\mat{G}|\d\mat{G}} \right)^{-1} \mat{w}^2\bra{\d\mat{G}} = \braket{\d\mat{G}|\d\mat{G}}^{-1} \bra{\d\mat{G}} This makes it clear that the weights are irrelevant in this method: They are only significant if the system is over-determined or for restricting the subspace. `'w0'` : Here we set $\mat{\Gamma} = w_0\mat{1}$. This is what is done in [Johnson:1988]_. The inversion formula becomes: .. math:: \bra{b} &= \frac{1}{ \braket{\mat{B}|\mat{B}_{0}|\mat{A}} + w_0^2\mat{1}} \bra{\mat{B}}\mat{C}\\ \bra{b}_{\text{1a}} &= \frac{1}{ \mat{w}^2\braket{\d\mat{X}|\mat{B}_{0}|\d\mat{G}} + w_0^2\mat{1}} \mat{w}^2\bra{\d\mat{X}}\mat{B}_{0} = \frac{1}{ \braket{\d\mat{X}|\mat{B}_{0}|\d\mat{G}} + w_0^2\mat{w}^{-2}}\bra{\d\mat{X}}\mat{B}_{0}\\ \bra{b}_{2b} &= \frac{1}{ \mat{w}^2\braket{\d\mat{G}|\d\mat{G}} + w_0^2\mat{1}} \mat{w}^2\bra{\d\mat{G}} = \frac{1}{ \braket{\d\mat{G}|\d\mat{G}} + w_0^2\mat{w}^{-2}}\bra{\d\mat{G}} `'numeric'` : This implements the numerical method. This is the same as the `'w0'` method except that `w0` is chosen to be as small as possible to ensure that the smallest singular value of the denominator is larger than :attr:`min_svd` times the largest. .. note:: A desirable property is that the update work for perfectly linear models in the sense that, give a linear model: $\ket{\mat{X}} = \ket{\mat{X}_0} + \mat{J}^{-1}\ket{\mat{G}}$, the update will satisfy $\mat{B}\ket{\mat{G}} = \mat{J}^{-1}\ket{\mat{G}}$: In other words, the action of $\mat{B}$ and $\mat{A}$ are the same on the subspace $\ket{\uvect{G}}$ spanned by $\ket{\d\mat{G}}$. This require that: .. math:: \ket{\mat{G}}\braket{b|\uvect{G}} = \ket{\mat{G}}\bra{\mat{B}} \frac{1}{\ket{\mat{A}}\bra{\mat{B}} + w_0^2\mat{\Gamma}\mat{\Gamma}^{T}}\mat{C} \ket{\uvect{G}} = \ket{\uvect{G}} which is satisfied by both `'svd'` methods but only approximated by the `'w0'` method in the limit $w_{0} \rightarrow 0$. One potential problem is that the weights might be very small, cause a premature truncation of the SVD, however, a separate decomposition of the components fails to preserve the desired behaviour: >>> np.random.seed(0) >>> N = 10; n=5; m = 1 >>> A = np.mat(np.random.rand(N, N)) >>> B0 = np.mat(np.random.rand(N, N)) >>> B0 = 0*np.eye(N) >>> G = np.random.rand(N, n-m) >>> G = np.mat(np.dot(np.hstack((G, ) + (G[:, :1], )*m), ... np.random.rand(n, n))) >>> Ug, dg, VTg = np.linalg.svd(G); Ug = Ug[:, :n]; VTg = VTg[:n, :] >>> Pg = Ug[:, :n-m] >>> X = A*G >>> b = X >>> a = G >>> w = np.mat(np.diag(np.random.rand(n))) >>> wi = np.linalg.inv(w) >>> Qa, Ra = np.linalg.qr(a) >>> Qb, Rb = np.linalg.qr(b) >>> U, d, VT = np.linalg.svd(Ra*w*Rb.T) >>> di = np.where(d > 1e-12, 1/d, 0) # Singular portions. >>> Inv1 = Qb*VT.T*np.diag(di)*U.T*Qa.T >>> np.allclose(G*w*b.T*Inv1*Pg, Pg) True >>> Ua, da, VTa = np.linalg.svd(a); Ua = Ua[:, :n]; VTa = VTa[:n, :] >>> Ub, db, VTb = np.linalg.svd(b); Ub = Ub[:, :n]; VTb = VTb[:n, :] >>> dai = np.where(da > 1e-12, 1/da, 0) # Singular portions. >>> dbi = np.where(db > 1e-12, 1/db, 0) # Singular portions. >>> Inv2 = Ub*np.diag(dbi)*VTb*wi*VTa.T*np.diag(dai)*Ua.T >>> np.allclose(G*w*b.T*Inv2*Pg, Pg) False """ w0, w = self._get_weight(dx=dx, dg=dg) w2 = dot(w.T, w) at = dg.T if 'J' == self.method: # Broyden's "good" method if self.dyadic: b = j_inv.__rmul__(dx) else: b = dot(dx, j_inv) elif 'B' == self.method: # Broyden's "bad" method b = dg else: raise ValueError("Method 'method' = %s not supported." % (self.method, )) # Include weights b = np.asarray(dot(w2, b)) if 'svd' == self.inv_method: # Should optimize so that copies are not made. u, d, vt = np.linalg.svd(dot(b, at)) d_inv = np.where(d/d.max() < self.min_svd, 0, 1/d) inv = dot(vt.T*d_inv, u.T) bra_b = dot(inv, b) elif self.inv_method in ['w0', 'numeric']: # Johnson's method den = dot(b, at) if 'numeric' == self.inv_method: eigs = np.linalg.eigvals(den) w02 = self.min_svd*abs(eigs).max() - min(0, eigs.real.min()) else: w02 = w0**2 den += w02*np.eye(len(b)) bra_b = np.linalg.solve(den, b) else: raise ValueError("Method 'inv_method' = %s not supported." % (self.inv_method, )) ket_a = dx.T - np.asarray(j_inv*np.mat(dg.T)) return ket_a, bra_b def add_point(self, x, g): r"""Add point `(x, g)` to :attr:`xs` and :attr:`gs` and return the index. Notes ----- If using a memory limited version, only :attr:`n_prev` points are maintained. The order in which points are replaced is specified by :attr:`replace_method` which may have one of the following values: `'furthest'`: Replace the point that is furthest from `x`, thus keeping points close to the solution. `'largest'`: Replace the point that has the largest $\norm{g}_{2}$, thus keeping points close to the solution. `'oldest'`: Replace the oldest point, thus keeping the most recent points. Examples -------- >>> np.random.seed(0) >>> B = JinvBroyden() >>> x0 = np.random.random(3) >>> g0 = 2*x0 >>> B.add_point(x=x0, g=g0) 0 >>> B.add_point(x=2*x0, g=2*g0) 1 Adding a duplicate `x` value will just change `g`: >>> len(B.xs) 2 >>> B.add_point(x=x0, g=g0/2) 0 >>> len(B.xs) 2 >>> np.allclose(B.gs[0], g0/2) True If adding the point would make more than :attr:`n_prev` points, then either the `'furthest'` point from `x` or the point with `'largest'` `g` is replaced, as specified by the flag :attr:`replace_method`. >>> B.n_prev = 2 >>> B.replace_method = 'furthest' >>> B.add_point(x=3*x0, g=0*g0) 0 >>> B.replace_method = 'largest' >>> B.add_point(x=3.1*x0, g=1*g0) 1 To reproduce the original Broyden behaviour, one should use the `'oldest'` :attr:`replace_method`: >>> B = JinvBroyden(replace_method='oldest', n_prev=3) >>> for n in xrange(5): ... B.add_point(x=n*x0, g=n*g0), B._age (0, array([0])) (1, array([1, 0])) (2, array([2, 1, 0])) (0, array([0, 2, 1])) (1, array([1, 0, 2])) Here is an example of the original Broyden behaviour (these results were precomputed using original Broyden routines): >>> def g(x): return np.array([np.linalg.norm(x), x.sum()]) >>> B = JinvBroyden(dyadic=False, method='J', n_prev=2, ... x_method='recent', replace_method='oldest', ... inv_method='svd') >>> x = np.array([1, 2]) >>> for n in xrange(12): ... incompat = B.set_point(x=x, g=g(x)) ... x = B.get_x() ... print x [-1.23606798 -1. ] [-2.53373794 0.82504005] [ 4.85954425 -4.07499765] [-10.4372347 11.47369596] [ 14.52239974 -14.63948627] [-78.27732015 77.51003951] [ 35.85727402 -35.8634207 ] [ 132.21082218 -132.33155288] [ 0.00993506 -0.01682379] [-0.00230879 0.00360778] [-0.00615491 0.0056663 ] [-0.00027107 -0.00017255] >>> B = JinvBroyden(dyadic=False, method='B', n_prev=2, ... x_method='recent', replace_method='oldest', ... inv_method='svd') >>> x = np.array([1, 2]) >>> for n in xrange(12): ... incompat = B.set_point(x=x, g=g(x)) ... x = B.get_x() ... print x [-1.23606798 -1. ] [-2.21588124 0.37800712] [-1.74692251 0.06694124] [-1.03087007 0.59532897] [ 0.73055565 -0.48593282] [ 2.35041873 -2.10818778] [-0.27073117 0.2984435 ] [-0.64849785 0.64808622] [-0.0555911 0.06113293] [-0.00685355 0.00753659] [ 2.20253269e-07 -2.42442632e-07] [ 5.12344665e-07 -5.17238905e-07] >>> B = JinvBroyden(dyadic=False, method='J', n_prev=3, ... x_method='recent', ... inv_method='svd') >>> x = np.array([1, 2]) >>> for n in xrange(5): ... incompat = B.set_point(x=x, g=g(x)) ... x = B.get_x() ... print x [-1.23606798 -1. ] [-2.53373794 0.82504005] [ 2.53310799 -2.53310799] [ 15.48393424 -15.48393424] [ ...e-12 ...e-12] """ # Special case of empty xs if self.xs is None: self.xs = 1.0*x[None, :] self.gs = 1.0*g[None, :] self._age = np.array([0]) return 0 # Increase ages: self._age += 1 # Special case of duplicate point ind = (x == self.xs).all(axis=1) if ind.any(): # Don't add duplicate points, but change g and reset age. ind = np.where(ind)[0][0] self.gs[ind] = g self._age[ind] = 0 return ind # Regular case if len(self.xs) < self.n_prev: self.xs = np.vstack([self.xs, x]) self.gs = np.vstack([self.gs, g]) ind = len(self.xs) - 1 self._age = np.append(self._age, 0) else: if self.replace_method == 'furthest': dx = x - self.xs ind = np.argmax((dx*dx).sum(axis=1)) del dx elif self.replace_method == 'largest': ind = np.argmax((self.gs*self.gs).sum(axis=1)) elif self.replace_method == 'oldest': ind = np.argmax(self._age) else: raise ValueError( "Unsupported replace_method = %s" % (self.replace_method,)) self.xs[ind] = x self.gs[ind] = g self._age[ind] = 0 return ind def update(self, x0=None, g0=None, j_inv=None, skip_inds=None, include_skip=False, mags=1, warn=False): r"""Update the inverse Jacobian so that it is valid at the point `(x0, g0)` (use :attr:`x` and :attr:`g` by default). Does not necessarily add `x0` and `g0` (use :meth:`add_point`) or set the current point (use :meth:`set_point`) but may do either or both depending on the algorithm. The update method is governed by :attr:`method` which may be one of the following: 'J' : Update to minimize the Jacobian and related residues. This corresponds to Broyden's "good" algorithm. 'B' : Update to minimize the inverse Jacobian and related residues. This corresponds to Broyden's "bad" algorithm. Parameters ---------- x0, g0 : array New function value `g0 = G(x0)`. skip_inds : None, array, optional This is an array of the indices of sensitive parameters. The change in these indices will be ignored. This is important if there are "sensitive" or "noisy" parameters that should only be trusted once a certain degree of convergence has been achieved. include_skip : bool If `True` and `skip_inds` is provided, then cross-terms are ignored, but both the main and skip blocks are updated. mags : array, optional This should be set to the vector of typical magnitudes for `G`. One option is `abs(G0) + abs(G1)`. Assumed to be `1` if not provided. Notes ----- The inverse Jacobian is updated so that .. math:: \mat{J}^{-1}\ket{\d{G}} = \ket{\d{x}}. There are two senses in which the update can be considered as minimal controlled by the flag :attr:`minimize_inverse`. If this is `True`, we minimize :math:`\norm{J^{-1}_0 - J^{-1}_1}_{\text{Frobenius}}`: .. math:: J^{-1} \rightarrow J^{-1} + \frac{(\ket{\d{x}} - J^{-1}\ket{\d{G}})\bra{\d{G}}} {\braket{\d{G}|\d{G}}}, otherwise we minimize :math:`\norm{J_0 - J_1}_{\text{Frobenius}}`: .. math:: J \rightarrow J + \frac{(\ket{\d{G}} - J\ket{\d{x}})\bra{\d{x}}} {\braket{\d{x}|\d{x}}}. This can be analytically inverted to give an update for the inverse: .. math:: J^{-1} \rightarrow J^{-1} + \frac{\ket{\d{x}} - J^{-1}\ket{\d{G}}\bra{\d{x}}J^{-1}} {\braket{\d{x}|J^{-1}|\d{G}}}. Any update that satisfies the secant condition can be expressed as: .. math:: J^{-1} \rightarrow J^{-1} + \frac{\ket{\d{x}} - J^{-1}\ket{\d{G}}\bra{v}} {\braket{v|\d{G}}}. as long as :math:`\braket{v|\d{G}}` is not zero. The two conditions correspond to different choices of either :math:`\bra{v}=\bra{\d{G}}` to minimize the Jacobian norm (Broyden's "second" method) or :math:`\bra{v}=\bra{\d{x}}J^{-1}` to minimize the inverse Jacobian norm (Broyden's "second" method). Numerical Recipies has the latter but I am not sure it is better... It is more complicated. Examples -------- >>> B = JinvBroyden() >>> B.set_point([1, 2], [1, 2]) 0.0 >>> B.j_inv 1 >>> B.set_point([3, 4], [3, 4]) 0.0 This should change g but not add a new x >>> B.set_point([1, 2], [4, 5]) # doctest: +ELLIPSIS 1.06... >>> B.xs array([[ 1., 2.], [ 3., 4.]]) >>> B.gs array([[ 4., 5.], [ 3., 4.]]) """ if x0 is None: x0 = self.x g0 = self.g self._x0 = None self._g0 = None else: x0 = self._x0 = np.asarray(x0).ravel() g0 = self._g0 = np.asarray(g0).ravel() j_inv = self.j_inv if j_inv is 1: j_inv = np.eye(len(x0), dtype=float) #if skip_inds is not None: # skip_inds = np.asarray(skip_inds, dtype=int) # if include_skip: # dx_skip = 1*dx # dg_skip = 1*dg # dg[skip_inds] = 0 # Upper block # dx[skip_inds] = 0 # dx_dgs = [(dx, dg)] # if include_skip: # dx_skip -= dx # Skip block # dg_skip -= dg # dx_dgs.append((dx_skip, dg_skip)) #else: # dx_dgs = [(dx, dg)] # Add points if self.xs is None: # Initial update. Nothing to do. return x0 = x0[None, :] g0 = g0[None, :] dx = self.xs - x0 dg = self.gs - g0 dx_norms = (dx*dx).sum(axis=1)/(x0*x0).sum(axis=1) is_ = np.where(dx_norms >= self._dx_norm2_min)[0] if 0 == len(is_): # Do nothing. return #raise ValueError( # "No points to update! " + # "max(dx_norms) = %g < _dx_norm2_min = %g" % # (dx_norms.max(), self._dx_norm2_min)) dx = dx[is_, :] dg = dg[is_, :] dx_norms = dx_norms[is_] if self.check_roundoff: # Check calculation for roundoff errors. x_err = _EPS/np.sqrt(dx_norms + _TINY) dg_norms = (dg*dg).sum(axis=1)/(g0*g0).sum(axis=1) g_err = _EPS/np.sqrt(dg_norms + _TINY) if (x_err > self.roundoff_cut).any() and warn: print("Warning: %i roundoff errors in dx" % (x_err > self.roundoff_cut).sum()) if (g_err > self.roundoff_cut).any() and warn: print("Warning: %i roundoff errors in dg" % (g_err > self.roundoff_cut).sum()) is_ = np.where(np.logical_and(x_err <= self.roundoff_cut, g_err <= self.roundoff_cut))[0] dx = dx[is_, :] dg = dg[is_, :] # B -> B + ket_a*bra_b ket_a, bra_b = self._inversion(dx=dx, dg=dg, j_inv=j_inv) self.j_inv = j_inv self.add_dyad(at=ket_a, b=bra_b) def set_point(self, x, g, **kw): r"""Add `x` and `g` and make this the current point `(x, g)` and update the Jacobian to be valid here. Return a measure of the incompatibility of the new data with the current approximation. This need not "add" the point to the current data set (the functionality provided by :meth:`add_point`). Parameters ---------- x, g : array New function value `g = G(x)`. **kw : See :meth:`update`. As a measure of the significance of this correction, we return the following dimensionless quantity: .. math:: \chi_G = \frac{\norm{(\mat{J}^{-1} - \mat{J}_{0}^{-1})\d{G}}} {\max_{i=0, 1}\norm{\mat{J}^{-1}_{i}\d{G}}} Examples -------- >>> B = JinvBroyden() >>> B.set_point([1, 2], [1, 2]) 0.0 >>> B.j_inv 1 >>> B.set_point([3, 4], [3, 4]) 0.0 This should change g but not add a new x >>> B.set_point([1, 2], [4, 5]) # doctest: +ELLIPSIS 1.06... >>> B.xs array([[ 1., 2.], [ 3., 4.]]) >>> B.gs array([[ 4., 5.], [ 3., 4.]]) """ x0 = self.x g0 = self.g x = np.asarray(x).ravel() g = np.asarray(g).ravel() self.curr = self.add_point(x, g) self.update(**kw) return self.incompatibility(x, g, x0, g0) def add_dyad(self, at, b): r"""Add the dyad $\ket{a}\bra{b}$ to :attr:`j_inv`. Parameters ---------- at : array 1-d array or 2-d `(N, n)` array $\ket{a}$. b : array 1-d array or 2-d `(n, N)` array $\ket{b}$. """ at = np.asarray(at) b = np.asarray(b) if 1 == len(at.shape): at = at[:, None] if 1 == len(b.shape): b = b[None, :] if self.dyadic: self.j_inv.add_dyad(at=at, b=b) else: self.j_inv += dot(at, b) def get_x(self, x_controls=None, min_step=_EPS, max_step=np.inf): r"""Return `x`: the Broyden step to get to the root of `G(x)`. This is the downhill Newton step from the point specified by :attr:`curr` that satisfies the constraints `x_controls`. Parameters ---------- x_controls : (array, array), optional Tuple `(inds, vals)` indices and the respective values of the control parameters. min_step : float, optional Minimum step length for testing. Used to give some non-zero step `dx` of length `sqrt(min_step) even if `j_inv` is singular. This may be random. A warning will be raised. max_step : float, optional Maximum length of the step. Examples -------- Here is a little test with a linear function >>> j = np.array([[1., 2, 3], [1, -3, 2], [-2, 1, 4]]) >>> j_inv = JinvBroyden(j_inv=np.linalg.inv(j)) >>> sol = np.array([1, 2, 3]) >>> def G(x): ... return np.dot(j, x - sol) >>> x0 = np.array([0.0, 0.0, 0.0]) >>> x = np.array([1.0, 2.0, 3.0]) Now try to step to the solution, but hold the second variable as a control at 1. Since the function is exactly linear and we have explicitly given the correct Jacobian, we should end up with a root in the other two parameters. >>> j_inv.set_point(x=x, g=G(x)) 0.0 >>> x = j_inv.get_x(x_controls=([1], [1.0])) >>> x array([ 1.5, 1. , 3.5]) >>> np.allclose(G(x), [0., 4.5, 0.]) True This needs to work with very large arrays, so here is an example. Here we use an example where the vectors have length $2^{20}$ and we have $2^{11} = 2048$ controls. Implemented poorly, the arrays would be too large to be indexed. >>> np.random.seed(0) >>> n_x = 2**20 >>> n_d = 2**11 >>> a = np.random.random(n_x) >>> b = np.random.random(n_x) >>> g = np.random.random(n_x) >>> x = np.zeros(n_x) >>> inds = np.unique(np.random.randint(0, n_x, size=n_d)) >>> n_d = len(inds) >>> vals = np.random.rand(n_d) >>> x_controls = (inds, vals) >>> j_inv = JinvBroyden(dyadic=True) >>> j_inv.j_inv.add_dyad(a, b) >>> j_inv.set_point(x=a, g=b) 0.0 >>> x = j_inv.get_x(x_controls=x_controls) >>> np.allclose(x[inds], vals) True """ if self.curr is None or 0 == len(self.xs): raise ValueError( "No `xs` stored: call update first.") x = self.xs g = self.gs x = self.xs[self.curr] g = self.gs[self.curr] j_inv = self.j_inv if j_inv is 1 or self.dyadic and 0 == len(self.j_inv.at): dx = -g if x_controls is not None and 0 < len(x_controls[0]): dx[x_controls[0]] = x_controls[1] - x[x_controls[0]] else: if x_controls is not None and 0 < len(x_controls[0]): inds, vals = x_controls inds = np.asarray(inds, dtype=int) vals = np.asarray(vals) dg_dmu = copy.copy(g) n_d = len(inds) n_x = len(x) d = j_inv[inds[:, None], inds[None, :]] dg_dmu[inds] = x[inds] - vals dmu = _ket(dg_dmu[inds]) b = (np.asarray(j_inv*_ket(dg_dmu)))[inds, :] dN = dmu - np.linalg.solve(d, b - dmu) dg_dN = dg_dmu dg_dN[inds] = dN[:] dx_dmu = np.asarray(j_inv*_ket(dg_dN)) assert np.allclose(dx_dmu[inds], dmu) # Set these to make sure no round-off errors have # accumulated. dx_dmu[inds].flat = x[inds] - vals dx = -dx_dmu else: dx = -np.asarray(j_inv*_ket(g)) dx = np.asarray(dx).ravel() g_mag = np.linalg.norm(g) dx_mag = np.linalg.norm(dx) if dx_mag == 0 and g_mag > 0: warnings.warn("Warning: Singular Jinv, taking random step") dx = np.random.rand(len(dx))*np.sqrt(min_step) elif np.divide(dx_mag, g_mag) < self.min_j_inv: warnings.warn("Warning: Singular Jinv, taking minimum step") dx *= math.sqrt(min_step)/np.linalg.norm(dx) dx_len = math.sqrt((dx*dx).sum()) if max_step < dx_len: dx *= max_step/dx_len return x + dx def reset(self): r"""Reset the Jacobian but keep :attr:`xs` and :attr:`gs`. See Also -------- :meth:`reset_all` """ if self.dyadic: self.j_inv.reset() else: self.j_inv = 1 def reset_all(self): r"""Reset both the points :attr:`xs`, :attr:`gs`, and the Jacobian. See Also -------- :meth:`reset` """ self.suspend() self.xs = None self.gs = None self.curr = None self.reset() self.resume() @property def J(self): r"""Current approximation of Jacobian.""" if self.dyadic: j_inv = self._j_inv.asarray() else: j_inv = self._j_inv return np.linalg.inv(j_inv) def incompatibility(self, x, g, x0, g0, skip_inds=None, complement=False): r"""Return a dimensionless measure of the compatibility of the specified step with the current Jacobian. An incompatibility of 0 means perfectly compatible (this is what should happen if the function is linear and the Jacobian is correct along the specified step). There are several options here as specified by :attr:`incompatibility_measure`. `dg` : This option considers the change in the inverse derivative the direction `dg` .. math:: \chi_G = \frac{\norm{(\mat{J}^{-1} - \mat{J}_{0}^{-1})\d{G}}} {\max_{i=0, 1}\norm{\mat{J}^{-1}_{i}\d{G}}} `sigma` : This option is valid with the :attr:`dyadic` option and only considers the incompatibility in the space spanned by the dyads. Examples -------- >>> def g(x): ... return x*x + x*np.sin(x.sum()) >>> j = JinvBroyden() >>> x0 = np.zeros(10) >>> g0 = g(x0) >>> j.set_point(x=x0, g=g0) 0.0 >>> x1 = np.ones(10) >>> g1 = g(x1) >>> j.incompatibility(x=x1, g=g1, x0=x0, g0=g0) # doctest: +ELLIPSIS 0.377... This does not modify the Jacobian as can be noted by the fact that update returns the same incompatibility >>> j.set_point(x=x1, g=g1) # doctest: +ELLIPSIS 0.377... After an update, the step should be fully compatibility: >>> j.incompatibility(x=x1, g=g1, x0=x0, g0=g0) < 1e-12 True """ if self.xs is None: # If there are no other points, then consider the update # compatible return 0 dx = x[None, :] - self.xs dg = g[None, :] - self.gs if skip_inds is not None: skip_inds = np.asarray(skip_inds) if complement: dx_ = 1*dx dg_ = 1*dg dx_[skip_inds] = 0 dg_[skip_inds] = 0 dx -= dx_ dg -= dg_ else: dx[skip_inds] = 0 dg[skip_inds] = 0 ng = np.linalg.norm(dg) if 0 == ng: res = np.linalg.norm(dx)/(ng + _TINY) elif 'dg' == self.incompatibility_measure: nx = np.linalg.norm(dx) dg = dg/(ng + _TINY) dx = dx/(ng + _TINY) dx1 = self.__rmul__(dg) if skip_inds is not None: if complement: dx1_ = 1*dx1 dx1_[skip_inds] = 0 dx1 -= dx1_ else: dx1[skip_inds] = 0 res = (np.linalg.norm(dx - dx1)/ (max(np.linalg.norm(dx1), nx) + _TINY)) elif 'sigma' == self.incompatibility_measure and self.dyadic: raise NotImplementedError( "incompatibility_measure '%s' not implemented." % (self.incompatibility_measure, )) else: raise NotImplementedError( "incompatibility_measure '%s' not implemented." % (self.incompatibility_measure, )) return res def change_basis(self, t): r"""Use the functions `x_new = t(x)` and `g_new = t(g)` to update the data corresponding to a change of basis. Raise an :exc:`AttributeError` if the change cannot be implemented (or don't define this method.) Parameters ---------- t : function Change of basis transformation (should be essentially matrix multiplication, but for large bases one might want to prevent forming this matrix explicitly.) Notes ----- Let the function be represented by the set of points :math:`f^{n}_i = f(x^{n}_i)` on the smaller set of abscissa and :math:`f^{N}_j = f(x^{N}_j)` on the larger set. The transformation is of the form: .. math:: f^N_i = T_{ij}f^{n}_{j} We update the inverse Jacobian in the following way: .. math:: J^{-1}_N = \mat{1} + T(J^{-1}_n - \mat{1})T^{T} Several other options have been considered, but in practise this has proved to be not only the easiest to implement, but also the most efficient. If we do not add and subtract the identity, then the resultant inverse Jacobian will be rank deficient and thus the resulting steps will never explore the singular directions which is highly undesirable. One can also think of this as directly transforming the inverse Jacobian related to the fixed-point problem :math:`f \rightarrow f - G[f]`. Also, if using the dyadic representation, then this procedure only requires applying the linear interpolation operator to the components of the dyads: Transforming the identity portion would change the form requiring a significantly more complicated representation. We have also explored using :math:`T^{-1}` in place of :math:`T^T`, but this has several problems: 1) Computing :math:`T^{-1}` can be expensive if the basis is large, 2) Since :math:`T` is rank deficient, :math:`T^{-1}` is not unique, but must be chosen from one of the pseudo-inverses and it is not clear how to best do this, 3) For the diadic representation, we would like to work directly with the interpolation operator on both components of the dyads, so using the transpose is much easier than having to solve the inverse problem. If :math:`T^T\cdot T \approx \mat{1}`, then the interpretation of the transformation is this: We have built up the Jacobian inverse as minimal updates from the identity using the displacements :math:`\ket{\d{x}}` and :math:`\ket{\d{G}}`. The transformed inverse Jacobian using the above proceedure is what would have been obtain by minimal updates from the identity using the transformed displacements :math:`T\ket{\d{x}}` and :math:`T\ket{\d{G}}`. Thus, the transformed Jacobian inverse should have the same amount of information as the original, and should in general reduce the number of operations required to form a good approximation by the rank of :math:`J^{-1} - \mat{1}`. It may be beneficial to choose the interpolation operator such that :math:`T^T\cdot T = \mat{1}` over the smaller basis. We have not investigated this, but standard interpolation operators seem sufficiently close to the identity that this extra complication is probably not justified. The code contains additional update methods for testing purposes to allow these results to be verified (except for the dyadic representation which can only be easily transformed using this method). Here we present results for a few problems. >>> from scipy.interpolate import * >>> from mmf.objects import * >>> def G(f, x, normalize=True): ... c1 = np.cosh(1) ... dx = np.diff(x[1:]) ... ddf = np.diff(np.diff(f))/dx/dx ... a = 1 ... if normalize: ... a = 1 + 2./dx/dx ... G = np.hstack([[f[0] - c1], ... (f[1:-1] - ddf)/a, ... [f[-1] - c1]]) ... return G >>> def test(N1, N2, normalize, meth, k=3, tol=1e-14): ... '''Return `(i1, i2, iN, gain)`. `i1` is the number of ... steps at low resolution. `i2` is the subsequent ... number of steps required to polish the ... high-resolution solution. `iN` is the number of ... steps at only the high resolution, and `gain = i1+i2 ... - iN` is the gain in speed. (Negative is good)''' ... x1 = np.linspace(-1, 1, N1) ... x2 = np.linspace(-1, 1, N2) ... #def t(f): ... # 'The interpolation function' ... # return splev(x2, splrep(x1, f, s=0, k=k)) ... t = lambda f: splev(x2, splrep(x1, f, s=0, k=k)) ... f = 0*x1 # Initial guess ... g = G(f, x1, normalize) ... j_inv = JinvBroyden(_j_inv_transform=meth) ... j_inv.set_point(x=f, g=g) ... err = np.max(abs(g)) ... i1 = 1 ... while err > tol: ... # Find solution over x1 abscissa ... f0, g0 = f, g ... f = j_inv.get_x() ... g = G(f, x1, normalize) ... err = np.max(abs(g)) ... j_inv.set_point(x=f, g=g) ... i1 += 1 ... j_inv.change_basis(t) # Change basis ... f1 = t(f) ... g1 = G(f1, x2, normalize) ... j_inv.set_point(x=f1, g=g1) ... err = np.max(abs(g1)) ... i2 = 1 ... while err > tol: ... # Find solution over x2 abscissa ... f0, g0 = f1, g1 ... f1 = j_inv.get_x() ... g1 = G(f1, x2, normalize) ... err = np.max(abs(g1)) ... j_inv.set_point(x=f1, g=g1) ... i2 += 1 ... # Now try only at high resolution. ... f1 = 0*f1 ... g1 = G(f1, x2, normalize) ... j_inv = JinvBroyden(_j_inv_transform=meth) ... j_inv.set_point(x=f1, g=g1) ... err = np.max(abs(g1)) ... i3 = 1 ... while err > tol: ... # Find solution over x2 abscissa ... f0, g0 = f1, g1 ... f1 = j_inv.get_x() ... g1 = G(f1, x2, normalize) ... err = np.max(abs(g1)) ... j_inv.set_point(x=f1, g=g1) ... i3 += 1 ... return i1, i2, i3, (i1+i2) - i3 >>> def tabulate(N1, N2, k=3, normalize=True, tol=1e-10): ... meth = Container() ... for meth.inv_l in [0, 1]: ... for meth.inv_r in [0, 1]: ... for meth.set_diag in [0, 1]: ... res = test(N1, N2, normalize, ... k=k, tol=tol, ... meth=dict(meth.items())) ... print repr(meth), res >>> tabulate(10, 50, k=1, tol=1e-12) Container(set_diag=0, inv_r=0, inv_l=0) (7, 28, 29, 6) Container(set_diag=1, inv_r=0, inv_l=0) (7, 28, 29, 6) Container(set_diag=0, inv_r=1, inv_l=0) (7, 28, 29, 6) Container(set_diag=1, inv_r=1, inv_l=0) (7, 39, 29, 17) Container(set_diag=0, inv_r=0, inv_l=1) (7, 29, 29, 7) Container(set_diag=1, inv_r=0, inv_l=1) (7, 40, 29, 18) Container(set_diag=0, inv_r=1, inv_l=1) (7, 29, 29, 7) Container(set_diag=1, inv_r=1, inv_l=1) (7, 41, 29, 19) Container(set_diag=0, inv_r=0, inv_l=0) (7, 28, 29, 6) Container(set_diag=1, inv_r=0, inv_l=0) (7, 28, 29, 6) Container(set_diag=0, inv_r=1, inv_l=0) (7, 28, 29, 6) Container(set_diag=1, inv_r=1, inv_l=0) (7, 40, 29, 18) Container(set_diag=0, inv_r=0, inv_l=1) (7, 29, 29, 7) Container(set_diag=1, inv_r=0, inv_l=1) (7, 40, 29, 18) Container(set_diag=0, inv_r=1, inv_l=1) (7, 29, 29, 7) Container(set_diag=1, inv_r=1, inv_l=1) (7, 41, 29, 19) Container(set_diag=0, inv_r=0, inv_l=0) (7, 28, 29, 6) Container(set_diag=1, inv_r=0, inv_l=0) (7, 28, 29, 6) Container(set_diag=0, inv_r=1, inv_l=0) (7, 28, 29, 6) Container(set_diag=1, inv_r=1, inv_l=0) (7, 42, 29, 20) Container(set_diag=0, inv_r=0, inv_l=1) (7, 29, 29, 7) Container(set_diag=1, inv_r=0, inv_l=1) (7, 42, 29, 20) Container(set_diag=0, inv_r=1, inv_l=1) (7, 29, 29, 7) Container(set_diag=1, inv_r=1, inv_l=1) (7, 43, 29, 21) Container(set_diag=0, inv_r=0, inv_l=0) (7, 30, 30, 7) Container(set_diag=1, inv_r=0, inv_l=0) (7, 30, 30, 7) Container(set_diag=0, inv_r=1, inv_l=0) (7, 31, 30, 8) Container(set_diag=1, inv_r=1, inv_l=0) (7, 38, 30, 15) Container(set_diag=0, inv_r=0, inv_l=1) (7, 32, 30, 9) Container(set_diag=1, inv_r=0, inv_l=1) (7, 40, 30, 17) Container(set_diag=0, inv_r=1, inv_l=1) (7, 31, 30, 8) Container(set_diag=1, inv_r=1, inv_l=1) (7, 41, 30, 18) The results are as follows: First the number of iterations performed on the small system. Second is the number of iterations performed on the system with the interpolation applied to the Jacobian. The sum of these indicates how many total iterations were required to converge in the two-stage process. Third is the number of iterations required to converge to the final solution using only the large system. The last number is the improvement of the two-stage method (negative is good). Here are the old results with the old Broyden method:: Container(set_diag=0, inv_r=0, inv_l=0) (9, 66, 109, -34) Container(set_diag=1, inv_r=0, inv_l=0) (9, 68, 109, -32) Container(set_diag=0, inv_r=1, inv_l=0) (9, 95, 109, -5) Container(set_diag=1, inv_r=1, inv_l=0) (9, 70, 109, -30) Container(set_diag=0, inv_r=0, inv_l=1) (9, 99, 109, -1) Container(set_diag=1, inv_r=0, inv_l=1) (9, 96, 109, -4) Container(set_diag=0, inv_r=1, inv_l=1) (9, 77, 109, -23) Container(set_diag=1, inv_r=1, inv_l=1) (9, 71, 109, -29) """ self.suspend() j_inv = self.j_inv # We won't transform the points and function values because # the function is not linear, so this would make the values # inaccurate. Instead, we reset these. self.xs = None self.gs = None # Now apply transform to the Jacobian. if j_inv is 1: return if self.dyadic: j_inv.apply_transform(t) else: if not hasattr(self, '_j_inv_transform'): # One could also apply t directly to the columns and # then rows of j_inv - I but I don't think this will be # any faster. N, N = j_inv.shape I = np.eye(N) t_mat = np.mat(np.vstack([t(I[n, :]) for n in xrange(N)]).T) M, N = t_mat.shape j_inv = np.eye(M) + t_mat*(j_inv - I)*t_mat.T else: d = np.diag(j_inv) N, N = j_inv.shape I = np.eye(N) t_mat = np.mat(np.vstack([t(I[n, :]) for n in xrange(N)]).T) ti_mat = np.linalg.pinv(t_mat) if self._j_inv_transform.get('inv_l', False): tl = ti_mat.T else: tl = t_mat if self._j_inv_transform.get('inv_r', False): tr = ti_mat else: tr = t_mat.T if self._j_inv_transform.get('direct', False): j_inv = tl*j_inv*tr else: M, N = t_mat.shape j_inv = np.eye(M) + tl*(j_inv - I)*tr if self._j_inv_transform.get('set_diag', False): m = np.arange(j_inv.shape[0]) j_inv[m, m] = t(d) self.j_inv = j_inv self.resume()
def _ket(x): """Return `x` as a ket (column) vector.""" return np.mat(x.ravel()).T def _bra(x): """Return `x` as a bra (row) vector.""" return np.mat(x.ravel()) def _add_dyad(J, a, b): """Add the dyad `|a><b|` to the matrix `J` inplace and return `J`.""" try: J.add_dyad(a, b) except AttributeError: J += _ket(a)*_bra(b) # class Jinv(StateVars): # r"""Class to represent an inverse Jacobian.""" # implements(IJinv) # _state_vars = [ # ('dyadic', False, # r"""If `True`, then use a dyadic representation for Jinv. If # this is a number greater than 1, then this number is # passed as :attr:`DyadicSum.n_max` limiting the rank of the # dyadic representation."""), # ('j_inv', 1, "Internal representation of matrix"), # ('robust', False, # """If `True`, then use the singular value decomposition to # analyse the Jacobian. Much slower, but could prevent some # numerical problems."""), # ('max_j_inv', 100.0, # """If :attr:`robust`, then the inverse Jacobian is clipped # so that the maximum value is this."""), # ('min_j_inv', 0.0001, # """The inverse Jacobian can become singular. If this is # the case, then a step might have very small length even if # the error is not small (the step size can be reduced by a # factor of the minimum singular value of the inverse # Jacobian). If the step reduction is less than this value, # then the step is scaled up to have length # :attr:`min_step`."""), # ('minimize_inverse', False, # """If `True`, then use the update the minimizes the change # in the Frobenius norm of the inverse Jacobian, otherwise # we use the Sherman-Morrison formula to update so as to # minimize the norm of the Jacobian."""), # ('incompatibility_measure', 'dg', # r"""Specifies which criterion to use for measured the # compatibility of an update with the present inverse # Jacobian. See :attr:`incompatibility` for supported # methods."""), # ('_j_inv_transform', NotImplemented, # """Optional methods for updating the Jacobian inverse in # :meth:`change_basis`. For testing only."""), # ] # process_vars() # def __init__(self, *v, **kw): # # pylint: disable-msg=W0231 # if self.dyadic: # n_max = np.inf # if self.dyadic > 1: # n_max = self.dyadic # if self.j_inv is 1: # self.j_inv = DyadicSum(n_max=n_max) # elif isinstance(self.j_inv, DyadicSum): # self.j_inv.n_max = n_max # else: # raise NotImplementedError( # "No support yet for initializing a" + # " DyadicSum from an array.") # def __mul__(self, dg): # return self.j_inv.__mul__(dg) # def __rmul__(self, dxt): # return self.j_inv.__rmul__(dxt) # def reset(self, m0=NotImplemented): # self.j_inv.reset(m0=m0) # def change_basis(self, t): # r"""Use the function `x_new = t(x)` to update the data # corresponding to a change of basis. Raise an # :exc:`AttributeError` if the change cannot be implemented (or # don't define this method.) # Parameters # ---------- # t : function # Change of basis transformation (should be essentially # matrix multiplication, but for large bases one might want # to prevent forming this matrix explicitly.) # Notes # ----- # Let the function be represented by the set of # points :math:`f^{n}_i = f(x^{n}_i)` on the smaller set of # abscissa and :math:`f^{N}_j = f(x^{N}_j)` on the larger set. # The transformation is of the form: # .. math:: f^N_i = T_{ij}f^{n}_{j} # We update the inverse Jacobian in the following way: # .. math:: J^{-1}_N = \mat{1} + T(J^{-1}_n - \mat{1})T^{T} # Several other options have been considered, but in practise # this has proved to be not only the easiest to implement, but # also the most efficient. If we do not add and subtract the # identity, then the resultant inverse Jacobian will be rank # deficient and thus the resulting steps will never explore the # singular directions which is highly undesirable. One can also # think of this as directly transforming the inverse Jacobian # related to the fixed-point problem :math:`f \rightarrow f - # G[f]`. # Also, if using the dyadic representation, then this procedure # only requires applying the linear interpolation operator to # the components of the dyads: Transforming the identity portion # would change the form requiring a significantly more # complicated representation. # We have also explored using :math:`T^{-1}` in place of # :math:`T^T`, but this has several problems: 1) Computing # :math:`T^{-1}` can be expensive if the basis is large, 2) # Since :math:`T` is rank deficient, :math:`T^{-1}` is not # unique, but must be chosen from one of the pseudo-inverses and # it is not clear how to best do this, 3) For the diadic # representation, we would like to work directly with the # interpolation operator on both components of the dyads, so # using the transpose is much easier than having to solve the # inverse problem. # If :math:`T^T\cdot T \approx \mat{1}`, then the interpretation # of the transformation is this: We have built up the Jacobian # inverse as minimal updates from the identity using the # displacements :math:`\ket{\d{x}}` and :math:`\ket{\d{G}}`. # The transformed inverse Jacobian using the above proceedure is # what would have been obtain by minimal updates from the # identity using the transformed displacements # :math:`T\ket{\d{x}}` and :math:`T\ket{\d{G}}`. Thus, the # transformed Jacobian inverse should have the same amount of # information as the original, and should in general reduce the # number of operations required to form a good approximation by # the rank of :math:`J^{-1} - \mat{1}`. # It may be beneficial to choose the interpolation operator such # that :math:`T^T\cdot T = \mat{1}` over the smaller basis. We # have not investigated this, but standard interpolation # operators seem sufficiently close to the identity that this # extra complication is probably not justified. # The code contains additional update methods for testing # purposes to allow these results to be verified (except for the # dyadic representation which can only be easily transformed # using this method). Here we present results for a few # problems. # >>> from scipy.interpolate import * # >>> def G(f, x, normalize=True): # ... c1 = np.cosh(1) # ... dx = np.diff(x[1:]) # ... ddf = np.diff(np.diff(f))/dx/dx # ... a = 1 # ... if normalize: # ... a = 1 + 2./dx/dx # ... G = np.hstack([[f[0] - c1], # ... (f[1:-1] - ddf)/a, # ... [f[-1] - c1]]) # ... return G # >>> def test(N1, N2, normalize, meth, k=3, tol=1e-14): # ... '''Return `(i1, i2, iN, gain)`. `i1` is the number of # ... steps at low resolution. `i2` is the subsequent # ... number of steps required to polish the # ... high-resolution solution. `iN` is the number of # ... steps at only the high resolution, and `gain = i1+i2 # ... - iN` is the gain in speed. (Negative is good)''' # ... x1 = np.linspace(-1, 1, N1) # ... x2 = np.linspace(-1, 1, N2) # ... #def t(f): # ... # 'The interpolation function' # ... # return splev(x2, splrep(x1, f, s=0, k=k)) # ... t = lambda f: splev(x2, splrep(x1, f, s=0, k=k)) # ... f = 0*x1 # Initial guess # ... g = G(f, x1, normalize) # ... j_inv = Jinv(_j_inv_transform=meth) # ... err = np.max(abs(g)) # ... i1 = 1 # ... while err > tol: # ... # Find solution over x1 abscissa # ... f0, g0 = f, g # ... f = f + j_inv.get_dx(g0) # ... g = G(f, x1, normalize) # ... err = np.max(abs(g)) # ... j_inv.update(dg=g - g0, dx=f - f0) # ... i1 += 1 # ... j_inv.change_basis(t) # Change basis # ... f1 = t(f) # ... g1 = G(f1, x2, normalize) # ... err = np.max(abs(g1)) # ... i2 = 1 # ... while err > tol: # ... # Find solution over x2 abscissa # ... f0, g0 = f1, g1 # ... f1 = f1 + j_inv.get_dx(g0) # ... g1 = G(f1, x2, normalize) # ... err = np.max(abs(g1)) # ... j_inv.update(dx=f1 - f0, dg=g1 - g0) # ... i2 += 1 # ... # Now try only at high resolution. # ... f1 = 0*f1 # ... g1 = G(f1, x2, normalize) # ... j_inv = Jinv(_j_inv_transform=meth) # ... err = np.max(abs(g1)) # ... i3 = 1 # ... while err > tol: # ... # Find solution over x2 abscissa # ... f0, g0 = f1, g1 # ... f1 = f1 + j_inv.get_dx(g0) # ... g1 = G(f1, x2, normalize) # ... err = np.max(abs(g1)) # ... j_inv.update(dx=f1 - f0, dg=g1 - g0) # ... i3 += 1 # ... return i1, i2, i3, (i1+i2) - i3 # >>> def tabulate(N1, N2, k=3, normalize=True, tol=1e-10): # ... meth = Container() # ... for meth.inv_l in [0, 1]: # ... for meth.inv_r in [0, 1]: # ... for meth.set_diag in [0, 1]: # ... res = test(N1, N2, normalize, # ... k=k, tol=tol, # ... meth=dict(meth.items())) # ... print repr(meth), res # >>> tabulate(10, 50, k=1, tol=1e-15) # Container(set_diag=0, inv_r=0, inv_l=0) (9, 66, 109, -34) # Container(set_diag=1, inv_r=0, inv_l=0) (9, 68, 109, -32) # Container(set_diag=0, inv_r=1, inv_l=0) (9, 95, 109, -5) # Container(set_diag=1, inv_r=1, inv_l=0) (9, 70, 109, -30) # Container(set_diag=0, inv_r=0, inv_l=1) (9, 99, 109, -1) # Container(set_diag=1, inv_r=0, inv_l=1) (9, 96, 109, -4) # Container(set_diag=0, inv_r=1, inv_l=1) (9, 77, 109, -23) # Container(set_diag=1, inv_r=1, inv_l=1) (9, 71, 109, -29) # """ # self.suspend() # j_inv = self.j_inv # if j_inv is 1: # return # if self.dyadic: # j_inv.apply_transform(t) # else: # if not hasattr(self, '_j_inv_transform'): # # One could also apply t directly to the columns and # # then rows of j_inv - I but I don't think this will be # # any faster. # N, N = j_inv.shape # I = np.eye(N) # t_mat = np.mat(np.vstack([t(I[n, :]) # for n in xrange(N)]).T) # M, N = t_mat.shape # j_inv = np.eye(M) + t_mat*(j_inv - I)*t_mat.T # else: # d = np.diag(j_inv) # N, N = j_inv.shape # I = np.eye(N) # t_mat = np.mat(np.vstack([t(I[n, :]) # for n in xrange(N)]).T) # ti_mat = np.linalg.pinv(t_mat) # if self._j_inv_transform.get('inv_l', False): # tl = ti_mat.T # else: # tl = t_mat # if self._j_inv_transform.get('inv_r', False): # tr = ti_mat # else: # tr = t_mat.T # if self._j_inv_transform.get('direct', False): # j_inv = tl*j_inv*tr # else: # M, N = t_mat.shape # j_inv = np.eye(M) + tl*(j_inv - I)*tr # if self._j_inv_transform.get('set_diag', False): # m = np.arange(j_inv.shape[0]) # j_inv[m, m] = t(d) # self.j_inv = j_inv # self.resume() # @property # def J(self): # r"""Current approximation of Jacobian.""" # if self.dyadic: # j_inv = self._j_inv.asarray() # else: # j_inv = self._j_inv # return np.linalg.inv(j_inv) # def incompatibility(self, dx, dg, skip_inds=None, complement=False): # r"""Return a dimensionless measure of the compatibility of the # specified step with the current Jacobian. An incompatibility of # 0 means perfectly compatible (this is what should happen if # the function is linear and the Jacobian is correct). # There are several options here as specified by # :attr:`incompatibility_measure`. # `dg` : This option considers the change in the inverse derivative # the direction `dg` # .. math:: # \chi_G = \frac{\norm{(\mat{J}^{-1} - \mat{J}_{0}^{-1})\d{G}}} # {\max_{i=0, 1}\norm{\mat{J}^{-1}_{i}\d{G}}} # `sigma` : This option is valid with the :attr:`dyadic` option # and only considers the incompatibility in the space spanned # by the dyads. # Examples # -------- # >>> def g(x): # ... return x*x + x*np.sin(x.sum()) # >>> j = Jinv() # >>> x0 = np.zeros(10) # >>> g0 = g(x0) # >>> dx = np.ones(10) - x0 # >>> dg = g(x0 + dx) - g0 # >>> j.incompatibility(dx, dg) # doctest: +ELLIPSIS # 0.377... # This does not modify the Jacobian as can be noted by the fact # that update returns the same incompatibility # >>> j.update(dx=dx, dg=dg) # doctest: +ELLIPSIS # 0.377... # After an update, the step should be fully compatibility: # >>> j.incompatibility(dx, dg) < 1e-12 # True # """ # if skip_inds is not None: # skip_inds = np.asarray(skip_inds) # if complement: # dx_ = 1*dx # dg_ = 1*dg # dx_[skip_inds] = 0 # dg_[skip_inds] = 0 # dx -= dx_ # dg -= dg_ # else: # dx[skip_inds] = 0 # dg[skip_inds] = 0 # ng = np.linalg.norm(dg) # if 0 == ng: # res = np.linalg.norm(dx)/(ng + _TINY) # elif 'dg' == self.incompatibility_measure: # nx = np.linalg.norm(dx) # dg = dg/(ng + _TINY) # dx = dx/(ng + _TINY) # dx1 = np.asarray(self.j_inv*_ket(dg)).ravel() # if skip_inds is not None: # if complement: # dx1_ = 1*dx1 # dx1_[skip_inds] = 0 # dx1 -= dx1_ # else: # dx1[skip_inds] = 0 # res = (np.linalg.norm(dx - dx1)/ # (max(np.linalg.norm(dx1), nx) + _TINY)) # elif 'sigma' == self.incompatibility_measure and self.dyadic: # raise NotImplementedError( # "incompatibility_measure '%s' not implemented." % # (self.incompatibility_measure, )) # else: # raise NotImplementedError( # "incompatibility_measure '%s' not implemented." % # (self.incompatibility_measure, )) # return res # def update(self, dx, dg, j_inv=None, skip_inds=None, # include_skip=False, mags=1): # r"""Update based on differences `dg` and `dx`. Return the # incompatibility value returned by :meth:`incompatibility` # Parameters # ---------- # dg : array # Change `dg = G(x1) - G(x0)`. # dx : array # Change `dx = x1 - x0` # j_inv : matrix, optional # If this is provided, then it is used to set the Jacobian # (implementing Newton's method), otherwise the standard # Broyden update is used. # skip_inds : None, array, optional # This is an array of the indices of sensitive parameters. # The change in these indices will be ignored. This is # important if there are "sensitive" or "noisy" parameters # that should only be trusted once a certain degree of # convergence has been achieved. # include_skip : bool # If `True` and `skip_inds` is provided, then cross-terms are # ignored, but both the main and skip blocks are updated. # mags : array, optional # This should be set to the vector of typical magnitudes for # `G`. One option is `abs(G0) + abs(G1)`. Assumed to be `1` # if not provided. # Notes # ----- # The inverse Jacobian is updated so that # .. math:: # \mat{J}^{-1}\ket{\d{G}} = \ket{\d{x}}. # There are two senses in which the update can be considered as # minimal controlled by the flag :attr:`minimize_inverse`. If # this is `True`, we minimize :math:`\norm{J^{-1}_0 - # J^{-1}_1}_{\text{Frobenius}}`: # .. math:: # J^{-1} \rightarrow # J^{-1} + \frac{(\ket{\d{x}} - J^{-1}\ket{\d{G}})\bra{\d{G}}} # {\braket{\d{G}|\d{G}}}, # otherwise we minimize :math:`\norm{J_0 - J_1}_{\text{Frobenius}}`: # .. math:: # J \rightarrow # J + \frac{(\ket{\d{G}} - J\ket{\d{x}})\bra{\d{x}}} # {\braket{\d{x}|\d{x}}}. # This can be analytically inverted to give an update for the inverse: # .. math:: # J^{-1} \rightarrow J^{-1} # + \frac{\ket{\d{x}} - J^{-1}\ket{\d{G}}\bra{\d{x}}J^{-1}} # {\braket{\d{x}|J^{-1}|\d{G}}}. # Any update that satisfies the secant condition can be expressed as: # .. math:: # J^{-1} \rightarrow J^{-1} # + \frac{\ket{\d{x}} - J^{-1}\ket{\d{G}}\bra{v}} # {\braket{v|\d{G}}}. # as long as :math:`\braket{v|\d{G}}` is not zero. The two # conditions correspond to different choices of either # :math:`\bra{v}=\bra{\d{G}}` to minimize the Jacobian norm # (Broyden's "second" method) or # :math:`\bra{v}=\bra{\d{x}}J^{-1}` to minimize the inverse # Jacobian norm (Broyden's "second" method). Numerical Recipies # has the latter but I am not sure it is better... It is more # complicated. # As a measure of the significance of this correction, we return # the following dimensionless quantity: # .. math:: # \chi_G = \frac{\norm{(\mat{J}^{-1} - \mat{J}_{0}^{-1})\d{G}}} # {\max_{i=0, 1}\norm{\mat{J}^{-1}_{i}\d{G}}} # """ # dg = np.asarray(dg).ravel() # dx = np.asarray(dx).ravel() # incompatibility = self.incompatibility(dx=dx, dg=dg) # j_inv = self.j_inv # if j_inv is 1: # j_inv = np.eye(len(dx), dtype=float) # if skip_inds is not None: # skip_inds = np.asarray(skip_inds, dtype=int) # if include_skip: # dx_skip = 1*dx # dg_skip = 1*dg # dg[skip_inds] = 0 # Upper block # dx[skip_inds] = 0 # dx_dgs = [(dx, dg)] # if include_skip: # dx_skip -= dx # Skip block # dg_skip -= dg # dx_dgs.append((dx_skip, dg_skip)) # else: # dx_dgs = [(dx, dg)] # for dx, dg in dx_dgs: # if np.linalg.norm(dx) <= 0 or np.linalg.norm(dg) <= 0: # # It could happen, such as when changing resolutions, that # # this is called without actually generating a new step, # # so we just pass here. # return 0.0 # w = np.asarray(_ket(dx) - j_inv*_ket(dg)).ravel() # # Zero out noisy components # w = np.where(abs(w) > _EPS*mags, w, 0.0) # if (w != 0).any(): # if self.minimize_inverse: # vt = _bra(dg) # else: # vt = j_inv.__rmul__(_bra(dx)) # dot_prod = np.dot(vt, _ket(dg)) # dx_norm = np.linalg.norm(dx) # dg_norm = np.linalg.norm(dg) # reduction = abs(dot_prod/dx_norm/dg_norm) # import pdb;pdb.set_trace() # if reduction < self.min_j_inv: # # Use other step # vt = _bra(dg) # t = np.dot(vt, dg) # w = w/t # _add_dyad(j_inv, w, vt) # if self.robust: # U, d, Vinv = np.linalg.svd(j_inv) # print np.hstack((abs(np.dot(U.T, _ket(abs(G)))), _ket(d))) # if self.max_j_inv < d.max(): # warnings.warn("Warning: Clipping Jacobian: %f"%d.max()) # d = np.where(d<self.max_j_inv, d, self.max_j_inv) # j_inv = U*mat(np.diag(d))*Vinv # self.j_inv = j_inv # return incompatibility # def get_dx(self, g, x=None, x_controls=None, # min_step=_EPS, max_step=np.inf): # r"""Return `dx`: the Broyden step to get to the root of `G(x)`. # Parameters # ---------- # g : array # The step is determined from `dx = -j_inv*g`. # x : array, optional # If `x_controls` is provided, then `x` must also be provided to # determine the offset of the controlled parameters. # x_controls : (array, array), optional # Tuple `(inds, vals)` indices and the respective values of the # control parameters. # min_step : float, optional # Minimum step length for testing. Used to give some # non-zero step `dx` of length `sqrt(min_step) even if # `j_inv` is singular. This may be random. A warning will # be raised. # max_step : float, optional # Maximum length of the step. # Examples # -------- # Here is a little test with a linear function # >>> j = np.array([[1., 2, 3], [1, -3, 2], [-2, 1, 4]]) # >>> j_inv = Jinv(j_inv=np.linalg.inv(j)) # >>> sol = np.array([1, 2, 3]) # >>> def G(x): # ... return np.dot(j, x - sol) # >>> x0 = np.array([0.0, 0.0, 0.0]) # >>> x = np.array([1.0, 2.0, 3.0]) # Now try to step to the solution, but hold the second variable as # a control at 1. Since the function is exactly linear and we # have explicitly given the correct Jacobian, we should end up # with a root in the other two parameters. # >>> x = x + j_inv.get_dx(g=G(x), x=x, x_controls=([1], [1.0])) # >>> x # array([ 1.5, 1. , 3.5]) # >>> G(x) # array([ 0. , 4.5, 0. ]) # This needs to work with very large arrays, so here is an # example. Here we use an example where the vectors have length # $2^{20}$ and we have $2^{11} = 2048$ controls. Implemented poorly, # the arrays would be too large to be indexed. # >>> np.random.seed(0) # >>> n_x = 2**20 # >>> n_d = 2**11 # >>> a = np.random.rand(n_x) # >>> b = np.random.rand(n_x) # >>> g = np.random.rand(n_x) # >>> x = np.zeros(n_x) # >>> inds = np.unique1d(np.random.randint(0, n_x, size=n_d)) # >>> n_d = len(inds) # >>> vals = np.random.rand(n_d) # >>> x_controls = (inds, vals) # >>> j_inv = Jinv(dyadic=True) # >>> j_inv.j_inv.add_dyad(a, b) # >>> dx = j_inv.get_dx(g=g, x=x, x_controls=x_controls) # >>> np.allclose((x + dx)[inds], vals) # True # """ # j_inv = self.j_inv # if j_inv is 1: # dx = -g # else: # if x_controls is not None and 0 < len(x_controls[0]): # inds, vals = x_controls # inds = np.asarray(inds, dtype=int) # vals = np.asarray(vals) # dg_dmu = copy.copy(g) # n_d = len(inds) # n_x = len(x) # d = j_inv[inds[:, None], inds[None, :]] # dg_dmu[inds] = x[inds] - vals # dmu = _ket(dg_dmu[inds]) # b = (j_inv*_ket(dg_dmu))[inds, :] # dN = dmu - np.linalg.solve(d, b - dmu) # dg_dN = dg_dmu # dg_dN[inds] = dN[:] # dx_dmu = j_inv*_ket(dg_dN) # assert np.allclose(dx_dmu[inds], dmu) # # Set these to make sure no round-off errors have # # accumulated. # dx_dmu[inds].flat = x[inds] - vals # dx = -dx_dmu # else: # dx = -(j_inv*_ket(g)) # dx = np.asarray(dx).ravel() # g_mag = np.linalg.norm(g) # dx_mag = np.linalg.norm(dx) # if dx_mag == 0 and g_mag > 0: # warnings.warn("Warning: Singular Jinv, taking random step") # dx = np.random.rand(len(dx))*np.sqrt(min_step) # elif np.divide(dx_mag, g_mag) < self.min_j_inv: # warnings.warn("Warning: Singular Jinv, taking minimum step") # dx *= math.sqrt(min_step)/np.linalg.norm(dx) # dx_len = math.sqrt(dot(dx, dx)) # if max_step < dx_len: # dx *= max_step/dx_len # return dx # I am not yet satsified with a good standard way of providing # options. I would like something that provides the user with an easy # way to # 1) See the current set of options (i.e. a pretty implementation of # __repr__) # 2) Maintain sub-options (like JacobianOptions below) # 3) Provide documentation about the options through help. # 4) Be efficient. # # I have tried to meet these criteria with the Options class, but am # not sure it is the best solution yet. from mmf.objects import option class JacobianOptions(mmf.objects.Options): r"""These are options the affect the Jacobian calculation. """ @option def max_h(): r"""Maximum step size to use in Jacobian computations.""" return 1.0 class BroydenOptions(mmf.objects.Options): r"""These are options that affect the Broyden iteration. Jacobian : Options controlling the Jacobian calculation. """ @option def Jacobian(): r"""Jacobian computation options""" return JacobianOptions() class Function: r"""Wrapper for function objects to allow for a variable tolerance. This allows for the broyden algorithm to adaptively update the tolerance. The function may take advantage of a low tolerance to perform a faster calculation. >>> import math >>> def f(x, abs_tol): ... "Return -ln(1+abs(x)) to specified tolerance" ... x = abs(x) ... A = 0.0 ... n = 1 ... while True: ... An = (-x)**n/n ... A += An ... n += 1 ... if abs(An) < abs_tol: ... break ... return A >>> f1 = Function(f, abs_tol=1e-2) >>> round(f1(0.1)+math.log(1.0+0.1), 2) 0.0 >>> round(f1(0.1)+math.log(1.0+0.1), 4) 0.0003 >>> f1 = Function(f, abs_tol=1e-4) >>> round(f1(0.1)+math.log(1.0+0.1), 4) 0.0 """ def __init__(self, f, fixed_tol=_EPS, abs_tol=None, return_tol=None): r"""The following forms are accepted. The function f must be callable as shown: Function(f, abs_tol=tol) # ans = f(x, tol, ...) Function(f, return_tol=tol) # (ans, err) = f(x, tol...) Function(f, fixed_tol=tol) # ans = f(x, ...) Function(f) # ans = f(x, ...) The default form assumes f computes to machine precision. >>> import math >>> def tolToPlaces(tol): ... return int(math.ceil(abs(-math.log10(tol)))) >>> def f(x, tol): return round(x, tolToPlaces(tol)) >>> f1 = Function(f, abs_tol=0.01) >>> f1(0.1234) 0.12 >>> def f(x): return x >>> f1 = Function(f, fixed_tol=1e-16) >>> f1(0.1234) 0.1234 >>> def f(x, tol): ... tol = max(1e-3, tol) ... return (round(x, tolToPlaces(tol)), tol) >>> f1 = Function(f, return_tol=1e-16) >>> f1(0.1234) 0.123 """ self._f = f if abs_tol is not None: self.abs_tol = abs_tol self._type = 'abs_tol' elif return_tol is not None: self.abs_tol = return_tol self._type = 'return_tol' elif fixed_tol is not None: self.abs_tol = fixed_tol self._type = 'fixed_tol' else: raise TypeError('Function type not supported') def __call__(self, x, *vargin, **kwargs): if 'abs_tol' is self._type: return self._f(x, self.abs_tol, *vargin, **kwargs) elif 'fixed_tol' is self._type: return self._f(x, *vargin, **kwargs) if 'return_tol' is self._type: (ans, tol) = self._f(x, self.abs_tol, *vargin, **kwargs) self.abs_tol = tol return ans else: raise TypeError('Function type not supported') def zeros(f, x0, f0=None, j0=None, f_tol=None, opts=None): r"""Return dictionary of results containing solution to multi-dimensional root problem f(x) = [0, 0, ...] """ results = {} results['x'] = copy.copy(x0) j = np.matrix([[1, 2, 3], [-1, 2, -3], [1, -2, -3]]) results['j'] = j return results def ComputeInitialJacobian(): r""" if isempty(J0in) | ~all(size(J0in)==[NF, b.NX]) Broyden.J0 = np.eye(NF, b.NX); else # We do not consider the provided Jacobian as a restart because it # might contain errors. b.Restart = False Broyden.J0 = J0in; end """ def ComputeJacobian(f, x, y=None, opts=None): r""" Compute the Jacobian for function f(x) in a straightforward way using finite differences: J = (f(x+h)-f(x))/h """ if opts is None: opts = JacobianOptions() if y is None: # Allow user to supply y=f(x) y = f(x) Nx = len(x) Ny = len(y) J = np.zeros((Ny, Nx), float) h = math.sqrt(f.abs_tol) if opts.max_h is not None: h = min(h, opts.max_h) for n in range(len(x)): # Here is the trick for ensuring that x+h is accurate: we # choose h so this is true. h_n = ((x[n]+h)-x[n]) if x[n] < 0: # Make step in same direction as sign h_n = -h_n # of x component . x[n] += h_n y_n = f(x) x[n] -= h_n J[:, n] = (y_n - y)/h_n return J r""" # We choose the direction for the step based on the sign of the # component of x. The step-size is determined by the parameter xdir = ceil((sign(x)+1)./2).*2-1; h = sqrt(f_tol).*max(abs(x), 1).*xdir; if norm(h, 2) > opt.JacobianMaxh h = h.*opt.JacobianMaxh./norm(h, 2); end # Here is the trick for ensuring that x+h is accurate: we # choose h so this is true. h = (x+h) - x; for n = 1:MaxJacobianSteps if (3 <= opt.Verbose) disp([num2str(n) ' of ' num2str(MaxJacobianSteps)]); end if (3 <= opt.Verbose) disp(['h = ' num2str(h(n))]); end t = x(n); x(n) = x(n)+h(n); <<Compute Residue, Fx and FXarg from x>> Fn = Fx; x(n) = t; Broyden.J0(:, n) = (Fn-Broyden.F0)/h(n); end end if (10 <= opt.Verbose) disp('Singular Values of Jacobian'); svd(Broyden.J0) end else if 2 <= opt.Verbose if ((opt.MaxFeval <= 0) | (Flag.Fcnt < opt.MaxFeval-(b.NX+1))) disp('Not enough function evals left to compute Jacobian'); elseif ((opt.MaxTime <= 0) |(EstJacobianTime < RemainingTime)) disp('Not enough time left to compute Jacobian'); end end end # The restart flag indicates if the iteration has been restarted (or # started) with a new Jacobian. This might help a stuck iteration. b.Restart = True """ class BroydenData(StateVars): r"""Represents the current state of a Broyden calculation.""" _state_vars = [ ('x', None, \ """Current value of `x`, i.e. the point where the current inverse Jacobian approximation :attr:`Jinv` is valid."""), ('G', None, "Current residual `G(x)`."), ('Jinv', None, \ """Current inverse Jacobian approximation."""), ('dyadic', False, \ r"""If `True`, then use a dyadic representation for Jinv. If this is a number greater than 1, then this number is passed as :attr:`DyadicSum.n_max` limiting the rank of the dyadic representation."""), ] process_vars() def __init__(self, *v, **kw): if self.x is not None: self.x = np.asarray(self.x, dtype=float).ravel() if self.G is not None: self.G = np.asarray(self.G, dtype=float).ravel() if (self.x is not None and self.G is not None and self.Jinv is None): if self.dyadic: n_max = np.inf if self.dyadic > 1: n_max = self.dyadic self.Jinv = DyadicSum(n_max=n_max) else: self.Jinv = np.mat(np.eye(len(self.x))) class Broyden(StateVars): r"""Implements a rudimentary Broyden root search search by keeping track of the Jacobian an allowing for Broyden steps to be made. This algorithm is used for solving the problem :math:`G(x) = 0` using a quasi-Newton method: .. math:: x \rightarrow x - J^{-1}G(x) where :math:`J` is an approximation for the Jacobian :math:`J\equiv \partial_{x}G(x)`. By default, one starts with the approximation :math:`J=1`, which is equivalent to the step :math:`x \rightarrow x - G(x)`. Thus, if one wishes to improve the convergence of a reasonable fixed-point iteration scheme :math:`x\rightarrow F(x)`, then a reasonable choice is :math:`G(x) = x - F(x)`. Examples -------- Here we present an example solving the problem and solution: .. math:: x = x^2 \qquad x=1. In this case, the Broyden algorithm is equivalent to the secant algorithm with the first two points `x0` and `G(x0)`. >>> def G(x): ... return x - x*x >>> x0 = 2.0 >>> b = Broyden(x0=x0, G0=G(x0)) >>> b.x, b.J (4.0, matrix([[ 1.]])) >>> b.update(G(b.x)) # doctest: +ELLIPSIS 1.2... >>> b.x, b.J # doctest: +ELLIPSIS (1.6..., matrix([[-5.]])) >>> b.update(G(b.x)) # doctest: +ELLIPSIS 0.08... >>> b.x, b.J # doctest: +ELLIPSIS (1.3913043478..., matrix([[-4.6]])) >>> for n in xrange(8): b.update(G(b.x)) # doctest: +ELLIPSIS 0.56... 0.24... 0.23... 0.09... 0.02... 0.003... ...e-05 ...e-07 >>> b.x, b.J (1.0, matrix([[-1.]])) Here is a two-dimensional problem: .. math:: \begin{aligned} x &= 2x^2, & y &= y^2, & x , y &= 0.5, 1.0 \end{aligned} >>> def G(x): ... return [x[0] - x[0]*x[0]*2.0, x[1] - x[1]*x[1]] >>> x0 = [2.0, 2.0] >>> b = Broyden(x0=x0, G0=G(x0)) >>> for n in xrange(40): ... res = b.update(G(b.x)) >>> map(lambda r:round(r, 12), b.x) [0.5, 1.0] """ _state_vars = [ ('x0=broyden_data.x', None, """Current value of `x`, i.e. the point where the current inverse Jacobian approximation :attr:`Jinv0` is valid."""), ('G0=broyden_data.G', None, \ "Current function evaluation `G(x0)`."), ('Jinv0=broyden_data.Jinv', None, \ r"""Approximate Jacobian inverse at `x0`. (Defaults to the identity)."""), ('max_step', np.inf, \ """Maximium step length. Use to prevent a singular Jacobian from shooting the solution far away."""), ('min_step', np.sqrt(_EPS), \ """Minimum step length to take when the Jacobian is singular. See :attr:`min_Jinv`. This should be a good length for estimating the Jacobian through finite differences, i.e. typically the square-root of the precision to which the function is evaluate."""), ('step_factor', 1, \ """A number between 0 and 1. The step taken is `x + step_factor*dx`"""), ('robust', False, \ """If `True`, then use the singular value decomposition to analyse the Jacobian. Much slower, but could prevent some numerical problems."""), ('max_Jinv', 100.0, \ """If :attr:`robust`, then the inverse Jacobian is clipped so that the maximum value is this."""), ('min_Jinv', 0.0001, \ """The inverse Jacobian can become singular. If this is the case, then a step might have very small length even if the error is not small (the step size can be reduced by a factor of the minimum singular value of the inverse Jacobian). If the step reduction is less than this value, then the step is scaled up to have length :attr:`min_step`."""), ('minimize_inverse', False, \ """If `True`, then use the update the minimizes the change in the Frobenius norm of the inverse Jacobian, otherwise we use the Sherman-Morrison formula to update so as to minimize the norm of the Jacobian."""), ('x', Computed, "Next value of `x`."), ('_packing_info', (type(None), None), \ "Packing information to reshape `x`."), ('_max_step2', Computed, "Square of :attr:`max_step`"), ('broyden_data', Delegate(BroydenData, ['dyadic']), \ """:class:`BroydenData` instance representing the current state."""), ('_Jinv_transform', NotImplemented, \ """Optional methods for updating the Jacobian inverse in :meth:`change_basis`. For testing only."""), ] process_vars() def __init__(self, *v, **kw): self._max_step2 = self.max_step*self.max_step if 'x0' in kw and kw['x0'] is not None: self._define_packing_info(kw['x0']) self.x0 = self._pack(self.x0) N = self.x0.shape[0] if 'G0' in kw and self.G0 is not None: self.G0 = self._pack(self.G0) if self.x0 is not None: self._update_x(self.x0, self.G0) def _define_packing_info(self, x): r""" Parameters ---------- x : array Original representation. Returns ------- _packing_info : typle (type(x), shape) """ shape = np.asarray(x).shape type_ = type(x) self._packing_info = (type_, shape) def _pack(self, x): r"""Return `x` in a flat representation. Parameters ---------- x : array Original representation. Returns ------- x : array Flat representation of array. """ return np.asarray(x, dtype=float).ravel() def _unpack(self, x): r"""Return `x` in the original shape. Parameters ---------- x : array Flat representation of array. Returns ------- x : array Reshaped representation. """ type_, shape = self._packing_info if issubclass(type_, list): return x.tolist() elif issubclass(type_, np.matrix): return np.asmatrix(x).reshape(shape) elif issubclass(type_, np.ndarray): return np.asarray(x).reshape(shape) elif (issubclass(type_, float) or issubclass(type_, int) or issubclass(type_, complex)): return np.asarray(x).ravel()[0] else: raise ValueError("Type %s not supported."%type_) def change_basis(self, t): r"""Use the function `x_new = t(x)` to update the data corresponding to a change of basis. Raise an :exc:`AttributeError` if the change cannot be implemented (or don't define this method.) Parameters ---------- t : function Change of basis transformation (should be essentially matrix multiplication, but for large bases one might want to prevent forming this matrix explicitly.) Notes ----- Let the function be represented by the set of points :math:`f^{n}_i = f(x^{n}_i)` on the smaller set of abscissa and :math:`f^{N}_j = f(x^{N}_j)` on the larger set. The transformation is of the form: .. math:: f^N_i = T_{ij}f^{n}_{j} We update the inverse Jacobian in the following way: .. math:: J^{-1}_N = \mat{1} + T(J^{-1}_n - \mat{1})T^{T} Several other options have been considered, but in practise this has proved to be not only the easiest to implement, but also the most efficient. If we do not add and subtract the identity, then the resultant inverse Jacobian will be rank deficient and thus the resulting steps will never explore the singular directions which is highly undesirable. One can also think of this as directly transforming the inverse Jacobian related to the fixed-point problem :math:`f \rightarrow f - G[f]`. Also, if using the dyadic representation, then this procedure only requires applying the linear interpolation operator to the components of the dyads: Transforming the identity portion would change the form requiring a significantly more complicated representation. We have also explored using :math:`T^{-1}` in place of :math:`T^T`, but this has several problems: 1) Computing :math:`T^{-1}` can be expensive if the basis is large, 2) Since :math:`T` is rank deficient, :math:`T^{-1}` is not unique, but must be chosen from one of the pseudo-inverses and it is not clear how to best do this, 3) For the diadic representation, we would like to work directly with the interpolation operator on both components of the dyads, so using the transpose is much easier than having to solve the inverse problem. If :math:`T^T\cdot T \approx \mat{1}`, then the interpretation of the transformation is this: We have built up the Jacobian inverse as minimal updates from the identity using the displacements :math:`\ket{\d{x}}` and :math:`\ket{\d{G}}`. The transformed inverse Jacobian using the above proceedure is what would have been obtain by minimal updates from the identity using the transformed displacements :math:`T\ket{\d{x}}` and :math:`T\ket{\d{G}}`. Thus, the transformed Jacobian inverse should have the same amount of information as the original, and should in general reduce the number of operations required to form a good approximation by the rank of :math:`J^{-1} - \mat{1}`. It may be beneficial to choose the interpolation operator such that :math:`T^T\cdot T = \mat{1}` over the smaller basis. We have not investigated this, but standard interpolation operators seem sufficiently close to the identity that this extra complication is probably not justified. The code contains additional update methods for testing purposes to allow these results to be verified (except for the dyadic representation which can only be easily transformed using this method). Here we present results for a few problems. >>> from scipy.interpolate import * >>> from mmf.objects import * >>> def G(f, x, normalize=True): ... c1 = np.cosh(1) ... dx = np.diff(x[1:]) ... ddf = np.diff(np.diff(f))/dx/dx ... a = 1 ... if normalize: ... a = 1 + 2./dx/dx ... G = np.hstack([[f[0] - c1], ... (f[1:-1] - ddf)/a, ... [f[-1] - c1]]) ... return G >>> def test(N1, N2, normalize, meth, k=3, tol=1e-14): ... '''Return `(i1, i2, iN, gain)`. `i1` is the number of ... steps at low resolution. `i2` is the subsequent ... number of steps required to polish the ... high-resolution solution. `iN` is the number of ... steps at only the high resolution, and `gain = i1+i2 ... - iN` is the gain in speed. (Negative is good)''' ... x1 = np.linspace(-1, 1, N1) ... x2 = np.linspace(-1, 1, N2) ... #def t(f): ... # 'The interpolation function' ... # return splev(x2, splrep(x1, f, s=0, k=k)) ... t = lambda f: splev(x2, splrep(x1, f, s=0, k=k)) ... f = 0*x1 # Initial guess ... g = G(f, x1, normalize) ... b = Broyden(x0=f, G0=g, _Jinv_transform=meth) ... err = np.max(abs(g)) ... i1 = 1 ... while err > tol: ... # Find solution over x1 abscissa ... f = b.x ... g = G(f, x1, normalize) ... err = np.max(abs(g)) ... b.update(x=f, g=g) ... i1 += 1 ... b.change_basis(t) # Change basis ... f1 = b.x ... g = G(f1, x2, normalize) ... err = np.max(abs(g)) ... i2 = 1 ... while err > tol: ... # Find solution over x2 abscissa ... f = b.x ... g = G(f, x2, normalize) ... err = np.max(abs(g)) ... b.update(x=f, g=g) ... i2 += 1 ... # Now try only at high resolution. ... f1 = 0*f1 ... g = G(f1, x2, normalize) ... b = Broyden(x0=f1, G0=g, _Jinv_transform=meth) ... err = np.max(abs(g)) ... i3 = 1 ... while err > tol: ... # Find solution over x2 abscissa ... f1 = b.x ... g1 = G(f1, x2, normalize) ... err = np.max(abs(g1)) ... b.update(x=f1, g=g1) ... i3 += 1 ... return i1, i2, i3, (i1+i2) - i3 >>> def tabulate(N1, N2, k=3, normalize=True, tol=1e-10): ... meth = Container() ... for meth.inv_l in [0, 1]: ... for meth.inv_r in [0, 1]: ... for meth.set_diag in [0, 1]: ... res = test(N1, N2, normalize, ... k=k, tol=tol, ... meth=dict(meth.items())) ... print repr(meth), res >>> tabulate(10, 50, k=1, tol=1e-15) Container(set_diag=0, inv_r=0, inv_l=0) (9, 67, 111, -35) Container(set_diag=1, inv_r=0, inv_l=0) (9, 77, 111, -25) Container(set_diag=0, inv_r=1, inv_l=0) (9, 90, 111, -12) Container(set_diag=1, inv_r=1, inv_l=0) (9, 68, 111, -34) Container(set_diag=0, inv_r=0, inv_l=1) (9, 78, 111, -24) Container(set_diag=1, inv_r=0, inv_l=1) (9, 71, 111, -31) Container(set_diag=0, inv_r=1, inv_l=1) (9, 107, 111, 5) Container(set_diag=1, inv_r=1, inv_l=1) (9, 73, 111, -29) Container(set_diag=0, inv_r=0, inv_l=0) (9, 64, 105, -32) Container(set_diag=1, inv_r=0, inv_l=0) (9, 82, 105, -14) Container(set_diag=0, inv_r=1, inv_l=0) (9, 101, 105, 5) Container(set_diag=1, inv_r=1, inv_l=0) (9, 68, 105, -28) Container(set_diag=0, inv_r=0, inv_l=1) (9, 65, 105, -31) Container(set_diag=1, inv_r=0, inv_l=1) (9, 69, 105, -27) Container(set_diag=0, inv_r=1, inv_l=1) (9, 96, 105, 0) Container(set_diag=1, inv_r=1, inv_l=1) (9, 75, 105, -21) Here are some old results:: Container(set_diag=0, inv_r=0, inv_l=0) (9, 64, 105, -32) Container(set_diag=1, inv_r=0, inv_l=0) (9, 82, 105, -14) Container(set_diag=0, inv_r=1, inv_l=0) (9, 87, 105, -9) Container(set_diag=1, inv_r=1, inv_l=0) (9, 70, 105, -26) Container(set_diag=0, inv_r=0, inv_l=1) (9, 90, 105, -6) Container(set_diag=1, inv_r=0, inv_l=1) (9, 68, 105, -28) Container(set_diag=0, inv_r=1, inv_l=1) (9, 100, 105, 4) Container(set_diag=1, inv_r=1, inv_l=1) (9, 75, 105, -21) Container(set_diag=0, inv_r=0, inv_l=0) (9, 73, 134, -52) Container(set_diag=1, inv_r=0, inv_l=0) (9, 77, 134, -48) Container(set_diag=0, inv_r=1, inv_l=0) (9, 88, 134, -37) Container(set_diag=1, inv_r=1, inv_l=0) (9, 71, 134, -54) Container(set_diag=0, inv_r=0, inv_l=1) (9, 86, 134, -39) Container(set_diag=1, inv_r=0, inv_l=1) (9, 71, 134, -54) Container(set_diag=0, inv_r=1, inv_l=1) (9, 94, 134, -31) Container(set_diag=1, inv_r=1, inv_l=1) (9, 70, 134, -55) """ self.suspend() if self.x0 is not None: self.x0 = t(self.x0) self.G0 = None Jinv = self.Jinv0 if Jinv is None: return if self.dyadic: Jinv.apply_transform(t) else: if not hasattr(self, '_Jinv_transform'): # One could also apply t directly to the columns and # then rows of Jinv - I but I don't think this will be # any faster. N, N = Jinv.shape I = np.eye(N) t_mat = np.mat(np.vstack([t(I[n, :]) for n in xrange(N)]).T) M, N = t_mat.shape Jinv = np.eye(M) + t_mat*(Jinv - I)*t_mat.T else: d = np.diag(Jinv) N, N = Jinv.shape I = np.eye(N) t_mat = np.mat(np.vstack([t(I[n, :]) for n in xrange(N)]).T) ti_mat = np.linalg.pinv(t_mat) if self._Jinv_transform.get('inv_l', False): tl = ti_mat.T else: tl = t_mat if self._Jinv_transform.get('inv_r', False): tr = ti_mat else: tr = t_mat.T if self._Jinv_transform.get('direct', False): Jinv = tl*Jinv*tr else: M, N = t_mat.shape Jinv = np.eye(M) + tl*(Jinv - I)*tr if self._Jinv_transform.get('set_diag', False): m = np.arange(Jinv.shape[0]) Jinv[m, m] = t(d) self.Jinv0 = Jinv self.resume() @property def J(self): r"""Current approximation of Jacobian.""" return np.linalg.inv(self.Jinv0) def update(self, g, x=None, Jinv=None, skip_inds=None): r"""Update based on new value of `g(x)`. Changes the current position (the new Jacobian should be valid at `x`). Return a dimensionless measure of the change in Jinv along the specified direction. Use the property `x` to access the next value of `x`. Parameters ---------- g : array New value of `G(x)`. x : array, optional Value of abscissa `x` at which `g` is evaluated. If `x` is `None`, then assume that `x == self.x`. Jinv : matrix, optional If this is provided, then it is used to set the Jacobian (implementing Newton's method), otherwise the standard Broyden update is used. skip_inds : None, array, optional This is an array of the indices of sensitive parameters. The change in these indices will be ignored. This is important if there are "sensitive" or "noisy" parameters that should only be trusted once a certain degree of convergence has been achieved. Notes ----- The inverse Jacobian is updated so that .. math:: \mat{J}^{-1}\ket{\d{G}} = \ket{\d{x}} where the changes are with respect to :attr:`x0` and :attr:`G0` respectively. As a measure of the significance of this correction, we return the following dimensionless quantity: .. math:: \chi_G &= \frac{\norm{\vect{x}_2 - \vect{x}_1}} {\norm{\vect{x}_2 - \vect{x}_0}}, \\ &= \frac{\norm{\left(\mat{J}^{-1}_{1} - \mat{J}^{-1}_{0}\right)\cdot\uvect{G}}} {\max_{n=0, 1}\norm{\mat{J}^{-1}_{n}\cdot\uvect{G}}} Another option might be .. math:: \frac{\norm{(\mat{J}^{-1} - \mat{J}_{0}^{-1})\d{G}}} {\mat{J}^{-1}\norm{\d{G}}} """ if self.x0 is None or self.G0 is None: # Initial call. Just store values an initialize self.suspend() self.x0 = copy.copy(x) self.G0 = g self.resume() return 1.0 if x is None: x = self.x x = self._pack(x) g = self._pack(g) if Jinv is None: res = self.update_J(x=x, g=g, skip_inds=skip_inds) else: # Compute measure of change in Jinv G0_ = _ket(self.G0)/(np.linalg.norm(self.G0) + _TINY) J0G0 = self.Jinv0*G0_ JG0 = Jinv*G0_ res = np.linalg.norm(JG0 - J0G0)/( max(np.linalg.norm(JG0), np.linalg.norm(J0G0)) + _TINY) self.Jinv0 = Jinv #self.update_J(x=x, g=g) self.broyden_data.x[:] = x self.broyden_data.G[:] = g return res def get_x(self, x_controls=None, x0=None, G0=None): """Return the new step subject to the constraints specified in `x_controls`. Parameters ---------- x0, G0 : array, optional If these are provided, the new step is relative to these points, otherwise :attr:`x0` and :attr:`G0` are used. x_controls : (array, array), optional If this is provided, then the specified elements are used as control parameters when determining the step. `x_controls` should be a pair `(inds, vals)` of indices into the array `x` and values such that the step should preserve ``x[ind] = c*vals + (1-c)*x0[ins]`` for some ``0 < c =< 1``. (I.e. this is a convex extrapolation from `x0` to `x`). If the solver contains information about the function `G` such as gradient information, then the extrapolated state `x_new` will incorporate this information extrapolating the step in such a way as to satisfy the controls. """ if x_controls is None and x0 is None and G0 is None: return self.x else: if x0 is None: x0 = self.x0 if G0 is None: G0 = self.G0 dx = self._get_dx(x0, G0, x_controls) return self._unpack(x0 + dx) def _update_x(self, x, g, x_controls=None): """Update the new step :attr:`x`.""" dx = self._get_dx(x, g, x_controls) self.__dict__['x'] = self._unpack(x + dx) def update_J(self, x, g, skip_inds=None): r"""Update the inverse Jacobian to include information about `(x, J)` but without changing the present location (use :meth:`update` to set the new position too`.) Return a dimensionless measure of the change in Jinv along the specified direction. Parameters ---------- x, g : array These are the values at the new point that should be used to update the inverse Jacobian `g = G(x)`. skip_inds : None, array, optional This is an array of the indices of sensitive parameters. The change in these indices will be ignored. This is important if there are "sensitive" or "noisy" parameters that should only be trusted once a certain degree of convergence has been achieved. Notes ----- At each step, we update the Jacobian in a minimal sense so as to maintain the secant condition :math:`\ket{\d{x}} = J^{-1}_1\cdot\ket{\d{G}}`. There are two senses in which the update can be considered as minimal controlled by the flag :attr:`minimize_inverse`. If this is `True`, we minimize :math:`\norm{J^{-1}_0 - J^{-1}_1}_{\text{Frobenius}}`: .. math:: J^{-1} \rightarrow J^{-1} + \frac{(\ket{\d{x}} - J^{-1}\ket{\d{G}})\bra{\d{G}}} {\braket{\d{G}|\d{G}}}, otherwise we minimize :math:`\norm{J_0 - J_1}_{\text{Frobenius}}`: .. math:: J \rightarrow J + \frac{(\ket{\d{G}} - J\ket{\d{x}})\bra{\d{x}}} {\braket{\d{x}|\d{x}}}. This can be analytically inverted to give an update for the inverse: .. math:: J^{-1} \rightarrow J^{-1} + \frac{\ket{\d{x}} - J^{-1}\ket{\d{G}}\bra{\d{x}}J^{-1}} {\braket{\d{x}|J^{-1}|\d{G}}}. Any update that satisfies the secant condition can be expressed as: .. math:: J^{-1} \rightarrow J^{-1} + \frac{\ket{\d{x}} - J^{-1}\ket{\d{G}}\bra{v}} {\braket{v|\d{G}}}. as long as :math:`\braket{v|\d{G}}` is not zero. The two conditions correspond to different choices of either :math:`\bra{v}=\bra{\d{G}}` to minimize the Jacobian norm (Broyden's "second" method) or :math:`\bra{v}=\bra{\d{x}}J^{-1}` to minimize the inverse Jacobian norm (Broyden's "second" method). Numerical Recipies has the latter but I am not sure it is better... It is more complicated. As a measure of the significance of this correction, we return the following dimensionless quantity: .. math:: \chi_G &= \frac{\norm{\vect{x}_2 - \vect{x}_1}} {\norm{\vect{x}_2 - \vect{x}_0}}, \\ &= \frac{\norm{\left(\mat{J}^{-1}_{1} - \mat{J}^{-1}_{0}\right)\cdot\uvect{G}}} {\max_{n=0, 1}\norm{\mat{J}^{-1}_{n}\cdot\uvect{G}}} Another option might be .. math:: \frac{\norm{(\mat{J}^{-1} - \mat{J}_{0}^{-1})\d{G}}} {\mat{J}^{-1}\norm{\d{G}}} """ G0 = self.broyden_data.G x0 = self.broyden_data.x Jinv0 = self.broyden_data.Jinv if x0 is None or G0 is None: # Initial call. Just store values an initialize self.suspend() self.x0 = copy.copy(x) self.G0 = g self.resume() return 1.0 x = self._pack(x) g = self._pack(g) dg = g - G0 dx = x - x0 # Unit vector and change for later measuring change in Jinv # Could use dg here rather than G0 G0_ = _ket(G0/(np.linalg.norm(G0) + _TINY)) J0G0 = Jinv0*G0_ if skip_inds is not None: skip_inds = np.asarray(skip_inds, dtype=int) dg[skip_inds] = 0 dx[skip_inds] = 0 if np.linalg.norm(dx) <= 0 or np.linalg.norm(dg) <= 0: # It could happen, such as when changing resolutions, that # this is called without actually generating a new step, # so we just pass here. return 0.0 w = _ket(dx) - Jinv0*_ket(dg) # Zero out noisy components mags = _ket(abs(G0) + abs(g)) w = np.where(abs(w) > _EPS*mags, w, 0.0) if (w != 0).any(): if self.minimize_inverse: vt = _bra(dg) else: vt = _bra(dx)*Jinv0 dot_prod = dot(vt, _ket(dg)) dx_norm = np.linalg.norm(dx) dg_norm = np.linalg.norm(dg) reduction = abs(dot_prod/dx_norm/dg_norm) if reduction < self.min_Jinv: # Use other step vt = _bra(dg) t = dot(vt, dg) w = w/t _add_dyad(Jinv0, w, vt) if self.robust: U, d, Vinv = np.linalg.svd(Jinv0) print np.hstack((abs(dot(U.T, _ket(abs(g)))), _ket(d))) if self.max_Jinv < d.max(): warnings.warn("Warning: Clipping Jacobian: %f"%d.max()) d = np.where(d<self.max_Jinv, d, self.max_Jinv) Jinv0 = U*np.mat(np.diag(d))*Vinv self.broyden_data.Jinv = Jinv0 # Compute measure for change in Jinv. JG0 = Jinv0*G0_ res = np.linalg.norm(JG0 - J0G0)/( max(np.linalg.norm(JG0), np.linalg.norm(J0G0)) + _TINY) # Update predicted position. self._update_x(x=x, g=g) return res def _get_dx(self, x, g, x_controls): r"""Get step from error `G(x)`. Parameters ---------- x_controls : (array, array) or None Tuple `(inds, vals)` indices and the respective values of the control parameters. Examples -------- Here is a little test with a linear function >>> J = np.array([[1, 2, 3], [1, -3, 2], [-2, 1, 4]], dtype=float) >>> sol = np.array([1, 2, 3]) >>> def G(x): ... return np.dot(J, x - sol) >>> x0 = np.array([0.0, 0.0, 0.0]) >>> b = Broyden(x0=x0, G0=G(x0), Jinv0=np.linalg.inv(J)) >>> x = np.array([1.0, 2.0, 3.0]) Now try to step to the solution, but hold the second variable as a control at 1. Since the function is exactly linear and we have explicitly given the correct Jacobian, we should end up with a root in the other two parameters. >>> res = b.update(x=x, g=G(x)) >>> b.x array([ 1., 2., 3.]) >>> x = b.get_x(x_controls=([1], [1.0])) >>> x array([ 1.5, 1. , 3.5]) >>> np.allclose(G(x), [0., 4.5, 0.]) True """ if g is None: dx = 0*x return dx elif self.broyden_data.Jinv is None: dx = -g else: Jinv = self.broyden_data.Jinv if x_controls is not None and 0 < len(x_controls[0]): inds, vals = x_controls inds = np.asarray(inds, dtype=int) vals = np.asarray(vals) dg_dmu = copy.copy(g) n_d = len(inds) n_x = len(x) basis = np.mat(np.zeros((n_x, n_d), dtype=float)) basis[inds, np.arange(n_d)] = 1 dg_dmu[inds] = x[inds] - vals d = basis.T*Jinv*basis dmu = _ket(dg_dmu[inds]) b = (Jinv*_ket(dg_dmu))[inds, :] dN = dmu - np.linalg.solve(d, b - dmu) dg_dN = dg_dmu dg_dN[inds] = dN[:] dx_dmu = Jinv*_ket(dg_dN) assert np.allclose(dx_dmu[inds], dmu) # Set these to make sure no round-off errors have # accumulated. dx_dmu[inds].flat = x[inds] - vals dx = -dx_dmu else: dx = -(Jinv*_ket(g)) dx = np.asarray(dx).ravel() G_mag = np.linalg.norm(g) dx_mag = np.linalg.norm(dx) if dx_mag == 0 and G_mag > 0: warnings.warn("Warning: Singular Jinv, taking random step") dx = np.random.rand(len(dx))*self.min_step elif np.divide(dx_mag, G_mag) < self.min_Jinv: warnings.warn("Warning: Singular Jinv, taking minimum step") dx *= self.min_step/np.linalg.norm(dx) dx *= self.step_factor dx_len2 = dot(dx, dx) if self._max_step2 < dx_len2: dx *= math.sqrt(self._max_step2/dx_len2) return dx class BroydenDyadic(Broyden): r"""Implements a rudimentary Broyden search by keeping track of the Jacobian an allowing for Broyden steps to be made. Uses a dyadic sum representation to reduce the memory overhead. Examples -------- >> def G(x): ... return x - x*x >> x0 = 2.0 >> b = BroydenDyadic(x0=x0, G0=G(x0)) >> b.x 4.0 >> b.update(G(b.x)) # doctest: +ELLIPSIS 6.0... >> b.x # doctest: +ELLIPSIS 1.60000... >> b.update(G(b.x)) # doctest: +ELLIPSIS 0.08... >> b.x # doctest: +ELLIPSIS 1.3913043478... >> for n in xrange(8): b.update(G(b.x)) # doctest: +ELLIPSIS 0.56... 0.24... 0.23... 0.09... 0.02... 0.003... ...e-05 ...e-07 >> b.x 1.0 Here is a two-dimensional problem: x = 2*x*x y = y*y x , y = 0.5, 1.0 >> def G(x): ... return [x[0] - x[0]*x[0]*2.0, x[1] - x[1]*x[1]] >> x0 = [2.0, 2.0] >> b = BroydenDyadic(x0=x0, G0=G(x0)) >> for n in xrange(30): ... res = b.update(G(b.x)) >> map(lambda r:round(r, 12), b.x) [0.5, 1.0] """ _state_vars = [('dyadic=broyden_data.dyadic', True)] process_vars() class ModifiedBroyden(object): r"""Implements a modified Broyden search. Similar to that described in J. Chem. Phys. Vol. 108, No. 100, 4426. but with a straightforward implementation (not efficient). Notes ===== N = length of vectors x and F(x) Ni = subspace size X = [x0 x1 ... xi] G = [G(x0) G(x1) ... G(xi)] where G(x) = x - F(x) _next_x = Current step. >> def F(x): ... return x*x >> x0 = 2.0 >> B = ModifiedBroyden(x0, F(x0)) >> B.x 4.0 >> B.update(F(B.x)) >> B.x # doctest: +ELLIPSIS 1.59999999... >> B.update(F(B.x)) >> B.x # doctest: +ELLIPSIS 1.3913043478... >> for n in xrange(8): B.update(F(B.x)) >> B.x 1.0 Here is a two-dimensional problem. x = 2*x*x y = y*y x , y = 0.5, 1.0 >> def F(x): ... return [x[0]*x[0]*2.0, x[1]*x[1]] >> x0 = [2.0, 2.0] >> B = ModifiedBroyden(x0, F(x0)) >> for n in xrange(30): ... B.update(F(B.x)) >> map(lambda r:round(r, 12), B.x) [0.5, 1.0] """ def __init__(self, x0, Fx0, alpha=1.0, W=None, max_step=np.inf, mmf_method=True, DIIS=5): r"""Initialize the Brodyen object. :Parameters: alpha : Initial Jacobian is alpha*eye(N) W : W is a function for computing the weights w[i]. It will be passed the vector arange(Ni) and should return the pair (w0, wi) where wi is a vector. mmf_method : If True then use my form of DeltaG. """ self._pack, self._unpack = pack_unpack(x0) self.max_step = max_step self.max_step2 = max_step*max_step if W is None: def W(i): r"""Reproduce standard Broyden method.""" w = array(i)*0.0 w[-1] = 1.0 return (0.001, w) def W(i): r"""A Weigted Broyden.""" N = len(i) w = array(i)/N return (0.001, w) def W(i): r"""A Weigted Broyden.""" N = len(i) w = array(i)*0.0 w[-10:] = 1.0 return (0.001, w) self.W = W self.mmf_method = mmf_method self.DIIS = DIIS self.X = mat(self._pack(x0)) self.G = mat(self.X[:, 0] - self._pack(Fx0)) self.Jinv = alpha*mat(np.eye(self.N)) self.R = -self.Jinv*self.G self._next_x = self.get_next_x(self.X[:, 0], self.G[:, 0]) @property def kappa(self): return self.G.T*self.G @property def N(self): r"""Number of parameters.""" return self.X.shape[0] @property def Ni(self): r"""Number of iterations.""" return self.X.shape[1] def get_x(self): return self._unpack(self._next_x) def set_x(self, x): self._next_x = self._pack(x) x = property(get_x, set_x, doc="New point x.") def update(self, Fx, x=None): r"""Update based on new value of F(x). Use the property x to access the next value of x. """ if x is None: x = self._pack(self.x) else: x = self._pack(x) Fx = self._pack(Fx) G = x - Fx if self.mmf_method: dg = array(G - self.G) dX = array(x - self.X) else: G_ = np.hstack((self.G, G)) X_ = np.hstack((self.X, x)) dg = array(G_[:, 1:] - G_[:, :-1]) dX = array(X_[:, 1:] - X_[:, :-1]) Nni = array(np.sqrt(np.diag(mat(dg.T)*mat(dg))) ).reshape((1, self.Ni)) + \ np.finfo(float).tiny DeltaG = dg/Nni DeltaX = dX/Nni (w_tilde, w) = self.W(np.arange(0, self.Ni)) w = array(w).reshape((1, self.Ni)) w_tilde2 = w_tilde*w_tilde DeltaG *= w DeltaX *= w An = w_tilde2*self.Jinv + mat(DeltaX)*mat(DeltaG).T Bn = w_tilde2*np.eye(self.N) + mat(DeltaG)*mat(DeltaG).T self.Jinv = np.linalg.solve(Bn.T, An.T).T self.X = np.hstack((self.X, x)) self.G = np.hstack((self.G, G)) self.R = np.hstack((self.R, -self.Jinv*G)) self._next_x = self.get_next_x(x, G) def get_next_x(self, x, G): r"""Get new x from error G = x - F(x).""" dx = -(self.Jinv*G) if self.DIIS: X = self.X G = self.G if self.mmf_method: R = -self.Jinv*self.G else: R = self.R Ni = min(self.Ni, self.DIIS) Ri = R[:, -Ni:] A = np.empty((Ni+1, Ni+1), dtype=float) A[:Ni, :Ni] = Ri.T*Ri A[-1, :] = -1.0 A[:, -1] = -1.0 A[-1, -1] = 0.0 B = np.zeros((Ni+1, 1)) B[-1, 0] = -1.0 c = mat(np.linalg.solve(A, B)) x_new = mat((X+R)[:, -Ni:])*c[:-1, :] dx = x_new - x dx_len2 = (dx.T*dx)[0, 0] if self.max_step2 < dx_len2: dx *= math.sqrt(self.max_step2/dx_len2) return x + dx
[docs]class ExtendedBroyden(StateVars): r"""This is a class implementing the memory limited extended Broyden update. """ _state_vars = [ ('method', 1), # ('X', Computed, r"""$N\times n$ array of the previous states. # Also used to store basis information so that redundant # storage is not required."""), # ('G', Computed, r"""$N\times n$ array of the previous function # evaluations. Also used to store basis information so that # redundant storage is not required."""), # ('A', Computed), # ('B', Computed), ] process_vars() def get_J(self, dX, dG, w_0=0, w_r=0, Qa=None, Qb=None, b0=None, w=None, singular_tol=_SINGULAR_TOL): r"""Return `(Qa, Qb, b)` such that the inverse Jacobian is expressed as: .. math:: \mat{B} = \mat{Q}_{a}\diag(b)\mat{Q}_{b}^{T} Parameters ---------- dX, dG : array These should be $N \times n$ arrays of differences $x_{i} - x_{j}$ and $G(x_{i}) - G(x_{j})$. If there are $m$ independent points then there may be up to $n=m(m-1)/2$ independent differences, but because everything depends on these linearly, only $n=m-1$ entries are needed. The additional information can be codified in the weight matrix `w`. (If desired, the larger array may be passed -- a decomposition will be used to reduce the rank.) .. note:: The code will effectively only work with the matrices $\ket{\d\mat{X}}\mat{w}$ and $\ket{\d\mat{G}}\mat{w}$, so one could in principle pass in something like the full matrix of previous steps $\mat{X}$ and include the differencing in $\mat{w}$. w : array, (optional) This is the weight matrix. If it is one-dimensional (with length $n$) then it is assumed to be diagonal. A two-dimensional array may also be specified as discussed above. Assumed to be the identity if not provided. Qa, Qb : array These are the basis vectors in which the previous inverse Jacobian is expressed: .. math:: \mat{B}_{0} = \mat{Q}_{a}\mat{b}_{0}\mat{Q}_{b}^{T} b : array The non-identity part of the previous Jacobian expressed in the basis `Qa` and `Qb`. If one-dimensional, then it will be interpreted as the diagonals. w_0 : float The magnitude of this in relationship to `w` determines how much to pull the Jacobian back to the previous Jacobian. w_r : float The magnitude of this in relationship to `w` determines how much to pull the Jacobian back to the identity. This simulates a "restart". singular_tol : float The minimum singular value of the Jacobian and inverse Jacobian is limited by this. Values outside this range are eliminated (along with the corresponding entries in the basis vectors `Qa` and `Qb`). Returns ------- Qa, Qb : array These are $N \times m$ orthonormal arrays $\mat{Q}^{T}\mat{Q} = \mat{1}$ with $m \leq n$. b : array This diagonal array expresses the non-identity part of the inverse Jacobian $\mat{B}$ in the specified basis. This is computed from the SVD so these are positive entries between `min_svd` and `1/min_svd`. Notes ----- This method computes using the following formulae .. math:: \mat{b} &= \frac{-1}{\mat{j}^{-1} + \braket{\uvect{\beta}|\uvect{\alpha}}},\\ \mat{j}_{\text{1a)}} &= \left\{ \braket{\uvect{\alpha}|\d\mat{G}-\d\mat{X}}\mat{w}^2 \braket{\d\mat{X}|\uvect{\beta}} + w_{0}^2\braket{\uvect{\alpha}|\uvect{\alpha}_{0}} \mat{j}_{0} \braket{\uvect{\beta}_{0}|\uvect{\beta}} \right\} \cdot \frac{1}{\braket{\uvect{\beta}|\d\mat{X}}\mat{w}^2 \braket{\d\mat{X}|\uvect{\beta}} + w_{0}^2 + w_{r}^2},\\ \mat{b}_{\text{2b)}} &= \left\{ \braket{\uvect{\alpha}|\d\mat{X}-\d\mat{G}}\mat{w}^2 \braket{\d\mat{G}|\uvect{\beta}} + w_{0}^2\braket{\uvect{\alpha}|\uvect{\alpha}_{0}} \mat{b}_{0} \braket{\uvect{\beta}_{0}|\uvect{\beta}} \right\} \cdot \frac{1}{\braket{\uvect{\beta}|\d\mat{G}}\mat{w}^2 \braket{\d\mat{G}|\uvect{\beta}} + w_{0}^2 + w_{r}^2}. Examples -------- First we start with a problem of full rank: >>> np.random.seed(0) >>> dX = np.random.rand(10, 4) >>> dG = np.random.rand(10, 4) >>> w = np.ones(4) >>> broyden = ExtendedBroyden(method=1) >>> Qa, Qb, b = broyden.get_J(dX, dG, w=w) >>> np.allclose(np.dot(Qa.T, Qa), np.eye(4)) True >>> np.allclose(np.dot(Qb.T, Qb), np.eye(4)) True In this case, the inverse Jacobian exactly satisfies the secant condition: >>> B = np.eye(10) + np.dot(Qa*b[None, :], Qb.T) >>> np.allclose(np.dot(B, dG), dX) True Broyden's "bad" method is also supported >>> broyden.method = 2 >>> Qa, Qb, b = broyden.get_J(dX, dG, w=w) >>> np.allclose(np.dot(Qa.T, Qa), np.eye(4)) True >>> np.allclose(np.dot(Qb.T, Qb), np.eye(4)) True >>> B = np.eye(10) + np.dot(Qa*b[None, :], Qb.T) >>> np.allclose(np.dot(B, dG), dX) True Now we consider a rank deficient problem >>> dX[:,0] = dX[:,1] >>> dG[:,0] = dG[:,1] >>> Qa, Qb, b = broyden.get_J(dX, dG, w=w) >>> b.shape (3,) >>> B = np.eye(10) + np.dot(Qa*b[None, :], Qb.T) >>> np.allclose(np.dot(B, dG), dX) True """ def inv(A): return np.linalg.pinv(A, rcond=singular_tol**2) simple_algorithm = True if w is not None: w = np.asarray(w) if simple_algorithm: # This is the simple form of the algorithm where we # process the large matrices. Xa = dX - dG if 1 == self.method: # Broyden's "good" method Xb = dX elif 2 == self.method: # Broyden's "bad" method Xb = dG else: raise NotImplementedError if w is not None: if 1 == len(w.shape): Xa *= w[None, :] Xb *= w[None, :] else: Xa = dot(Xa, w) Xb = dot(Xb, w) if Qa is not None: Xa = np.hstack([Xa, Qa]) if Qb is not None: Xb = np.hstack([Xb, Qb]) alpha, r_a = np.linalg.qr(Xa) beta, r_b = np.linalg.qr(Xb) if 1 == self.method: d_beta = dot(dX.T, beta) beta_alpha = dot(beta.T, alpha) if b0 is not None: bj0 = -inv(inv(b0) + beta_alpha) elif 2 == self.method: d_beta = dot(dG.T, beta) if b0 is not None: bj0 = b0 num = dot(dot(alpha.T, dX - dG), d_beta) if 1 == self.method: num *= -1 if b0 is not None: num += w_0**2*dot( dot(dot(alpha.T, Qa), bj0), dot(Qb.T, beta)) denom = dot(d_beta.T, d_beta) denom += np.eye(len(denom), dtype=float)*(w_0**2 + w_r**2) b = dot(num, inv(denom)) if 1 == self.method: # We actually computed j rather than b j = b b = -inv(inv(j) + beta_alpha) u, d, vt = np.linalg.svd(b) inds = np.where(np.logical_and( d >= singular_tol, d <= 1./singular_tol))[0] d = d[inds] Qa = dot(alpha, u[:,inds]) Qb = dot(beta, vt.T[:,inds]) return Qa, Qb, d else: dX = np.asarray(dX) dG = np.asarray(dG) dXG = dX - dG xgxg = dot(dXG.T, dXG) gg = dot(dG.T, dG) if w is not None: if 1 == len(w.shape): gg = w[:,None]*gg*w[None, :] xgxg = w[:,None]*xgxg*w[None, :] else: gg = dot(dot(w.T, gg), w) xgxg = dot(dot(w.T, xgxg), w) d2_xg, u_xg = eigh(xgxg) assert np.all(0 <= d2_xg) d_xg = np.sqrt(d2_xg) alpha_dXG = dot(u_xg*d_xg[None, :], u_xg.T) if Qa is not None: alpha0_dXG = dot(Qa.T, dXG) alpha_dXG = np.hstack([alpha_dXG, alpha0_dXG]) alpha0_alpha = np.vstack([dot(alpha0_dXG, u_xg/(d_xg[None, :] + _TINY)), np.eye(Qa.shape[1], dtype=float)]) d2_g, u_g = eigh(gg) assert np.all(0 <= d2_g) d_g = np.sqrt(d2_g) beta_dG = dot(u_g*d_g[None, :], u_g.T) if Qb is not None: beta0_dG = dot(Qb.T, dG) beta_dG = np.hstack([beta_dG, beta0_dG]) beta0_beta = np.vstack([dot(beta0_dG, u_g/(d_g[None, :] + _TINY)), np.eye(Qb.shape[1], dtype=float)]) if b0 is not None: alpha_b0_beta = 2*dot(dot(alpha0_alpha.T, b0), beta0_beta) else: alpha_b0_beta = 0 num = (dot(alpha_dXG, beta_dG.T) + w_0**2*alpha_b0_beta) denom = dot(beta_dG, beta_dG.T) denom += np.eye(len(denom), dtype=float)*(w_0**2 + w_r**2) b = self._stable_divide(num, denom) u, d, vt = np.linalg.svd(b) u = dot(u_n, u) vt = dot(vt, u_d.T) inds = np.where(np.logical_and( min_svd <= abs(d), abs(d) <= 1./min_svd))[0] Qa = dot(Qa, u[:, inds]) Qb = dot(Qb, vt.T[:, inds]) return Qa, Qb, d