Source code for mmf.solve.broyden

r"""Broyden Root Finders.

This module contains several different root finders that implement
Broyden's method.

Mathematical Background
=======================
The basic idea is to use a linear approximation to the objective
function

.. math::
   \d\vect{G} \approx \mat{J}\cdot \d\vect{x}

where :math:`\d\vect{G} = \vect{G}(x) - \vect{G}(x_0)` and
:math:`\d\vect{x} = \vect{x} - \vect{x}_0`.

The goal of the Broyden methods is to find the value
:math:`\vect{x}_0` such that :math:`\vect{G}(\vect{x}_0) = 0`.  This
is done using the Newton step:

.. math::
   \vect{x}_0 = \vect{x} - \mat{J}^{-1}\cdot\vect{G}.

If the Jacobian is exact, then this should give quadratic convergence
as one approaches the root.  However, it can be expensive to compute
the Jacobian (at least :math:`N^2+1` function evaluations for a simple
finite difference approximation), so instead, the Broyden method uses
a sequence approximations, starting with the identity and using a
minimal update in the sense of satisfying the secant condition while
minimizing the Frobenius norm of the change.  In the following we
shall us bra-ket notation for vectors.

We shall first discuss the standard Broyden method which imposes a
"minimal" update such that the secant condition $\ket{\d{G}} =
\mat{J}\ket{\d{x}}$ holds.

Standard Broyden Method
~~~~~~~~~~~~~~~~~~~~~~~
For a given step :math:`\ket{\d{x}}` where the function changes by
:math:`\ket{\d{G}}` the update must satisfy the secant condition

.. math::
   \ket{\d{G}} = \mat{J}\ket{\d{x}}.

We can write any such update in the following form

.. math::
   \mat{J}^{-1} \rightarrow \mat{J}^{-1}
        + \frac{\ket{\d{x}} - \mat{J}^{-1}\ket{\d{G}}\bra{v}}
               {\braket{v|\d{G}}}

where :math:`\ket{v}` is any vector not orthogonal to :math:`\ket{\d{G}}`.
This ambiguity of the multidimensional secant method causes problems
if not properly resolved, but Broyden showed that good results can be
obtained if the update is chosen to minimize the Frobenius norm of the
Jacobian:

.. math::
   \norm{\mat{J}}_{\text{Frobenius}} = \sum_{ij}\sqrt{J_{ij}^2}.

This yields the update

.. math::
   \mat{J} \rightarrow
      \mat{J} + \frac{(\ket{\d{G}} - \mat{J}\ket{\d{x}})\bra{\d{x}}}
                     {\braket{\d{x}|\d{x}}}

which can be analytically inverted using the Shermann-Morrison formula
to find :math:`\ket{v} = \left[\mat{J}^{-1}\right]^{T}\ket{\d{x}}`:

.. math::
   \mat{J}^{-1} \rightarrow \mat{J}^{-1}
       + \frac{\left(\ket{\d{x}} - \mat{J}^{-1}\ket{\d{G}}\right)
               \bra{\d{x}}\mat{J}^{-1}}
              {\braket{\d{x}|\mat{J}^{-1}|\d{G}}}.

An alternative update (Broyden's "first" method) is found by minimizing
the norm of the inverse Jacobian, which yields :math:`\ket{v} =
\ket{\d{G}}`

.. math::
   \mat{J}^{-1} \rightarrow 
      \mat{J}^{-1} + \frac{(\ket{\d{x}} - \mat{J}^{-1}\ket{\d{G}})\bra{\d{G}}}
                          {\braket{\d{G}|\d{G}}},

however, we have found that this does not always work as well.  It may
be used, however, even if the inverse Jacobian develops a singular
direction (which can cause trouble for the previous update).

Generalized Broyden Method
~~~~~~~~~~~~~~~~~~~~~~~~~~
This method is a generalization of the one described in
[Bierlaire:2006]_ and defines the new Jacobian $\mat{J}$ at the point
$x$ that minimizes a weighted combination of previous "residuals"
$\ket{r_i}$ along with a minimal change condition on a matrix residual
$\mat{R}$:

.. math::
   \min_{\mat{J}}\left(\sum_{i}\norm{w_i\ket{r_i}}_{2}^{2}
         +\norm{\mat{R}\mat{\Gamma}}^{2}_{F}\right)

where $w_i$ are weights and $\mat{\Gamma}$ will be discussed later and
specifies the weighting of the matrix residual.  The residuals may
take the following form:

.. math::
   \ket{r_i} &= \begin{cases}
      \ket{\d{G}_i} - \mat{J}\ket{\d{x}_i} & \text{1)}\\
      \ket{\d{x}_i} - \mat{J}^{-1}\ket{\d{G}_i} & \text{2)}
   \end{cases}\\
   \mat{R} &= \begin{cases}
      \mat{J} - \mat{J}_{0} & \text{a)}\\
      \mat{J}^{-1} - \mat{J}^{-1}_{0} & \text{b)}
   \end{cases}

The method 1a) corresponds to the generalization of Broyden's first or
"Good" method while the combination 2b) corresponds to the
generalization of Broyden's second or "Bad" method.  This second
method with $\mat{\Gamma} = \mat{1}$ is the method employed in
[Johnson:1988]_ with the slight modification that $\ket{\d{x}_i} =
\ket{x_{i+1}} - \ket{x_{i}}$.  The two mixed cases 1b) and 2a) are
much more difficult to solve as they are linear in neither $\mat{J}$
nor $\mat{J}^{-1}$.

In what follows, we shall use the notation $\mat{B} = \mat{J}^{-1}$
for the inverse approximate Jacobian.  Boldface capital letters like
this refer to "large" $N\times N$ matrices that we typically will not
be able to form.  Boldface lowercase letters such as $\mat{w} =
\diag(w_i)$ denote "small" $n\times n$ matrices that we can easily
store.  We define mixed matrices using bra-ket notation such as
$\ket{\d{\mat{G}}}$ and $\ket{\d{\mat{X}}}$ for the $N \times n$
matrices of the vectors $[\ket{\d{\mat{X}}}]_{:i} = \ket{\d{x}_i}$.
We may now write the minimization condition as:

.. math::
   \min_{\mat{J}}
      \tr\left(\ket{\mat{r}}\mat{w}^2\bra{\mat{r}}
            + \mat{R}\mat{\Gamma}\mat{\Gamma}^T\mat{R}^T\right).

Using the relationship $\partial J^{-1}_{MN}/\partial J_{AB} =
J^{-1}_{MA}J^{-1}_{BN}$ we arrive at the following normal equations:

.. ::
..    (G - JX) W^2 X^T = (J - J0)Gamma
..    (G - BX) W^2 X^T = -B (B - B0)Gamma B
..    -B(X - BG) W^2 G^T B = (J - J0)Gamma
..    -B(X - BG) W^2 G^T B= -B (B - B0)Gamma B

.. math::
   \mat{B}_{\text{1a)}} &= \mat{B}_{0} +
     (\ket{\d\mat{X}} - \mat{B}_{0}\ket{\d\mat{G}})\cdot\mat{w}^2
     \bra{\d\mat{X}}\cdot\frac{1}{
       \mat{B}_{0}\ket{\d\mat{G}}\mat{w}^2\bra{\d\mat{X}}
       + \mat{\Gamma}\mat{\Gamma}^{T}}\cdot\mat{B}_{0}, \\
   \mat{B}_{\text{2b)}} &= \mat{B}_{0}
     + (\ket{\d\mat{X}} - \mat{B}_{0}\ket{\d\mat{G}})\cdot\mat{w}^2
     \bra{\d\mat{G}}\cdot\frac{1}{
       \ket{\d\mat{G}}\mat{w}^2\bra{\d\mat{G}}
       + \mat{\Gamma}\mat{\Gamma}^{T}}

The weights have the following effects:

1) If not using the subspace decomposition method, then the weights affect the
   inversion procedure.  In particular, the ratio
   $\mat{\Gamma}\mat{\Gamma}^{T}\mat{w}^{-2}$ is important for determining the
   relative weighting of the old Jacobian and the new steps.
2) If components of $\ket{\d\mat{X}}$ and $\ket{\d\mat{G}}$ are linearly
   dependent -- i.e. the secant conditions are not all consistent -- them the
   weights determine how to mix these.
3) If using the subspace decomposition method, then for the linearly independent
   components, the weights are formally irrelevant, but they can still be used
   to determine which components should be truncated or ignored (see section
   :ref:`sec:truncation-algorithm`).  They will also play a role when dynamic
   restarts are considered (see :ref:`sec:dynamic-restarts`).

.. note:: Here we present a little bit of analysis.  The first step will be to
   define $\ket{\mat{X}} \approx \ket{\d\mat{X}}\mat{w}$ and $\ket{\mat{G}}
   \approx \ket{\d\mat{G}}\mat{w}$ so that they have full rank and we can write
   $\ket{\mat{X}} = \ket{\uvect{X}}\mat{x}$, $\ket{\mat{G}} =
   \ket{\uvect{G}}\mat{g}$ with $\mat{x}$ and $\mat{g}$ being invertable.  (In
   the memory limited context, these will be small.  They can be computed using
   an SVD or rank-revealing QR decomposition.)  The two updates can now be
   written as:

   .. math::
      \mat{B}_{2a)} 
      = \mat{B}_{0} 
        - \mat{B}_{0}\ket{\uvect{G}}\bra{\uvect{G}} 
        + \ket{\mat{X}}\mat{g}^{-1}\bra{\uvect{G}}
      = \mat{B}_{0} 
        + (\ket{\mat{X}} - \mat{B}_{0}\ket{\mat{G}})
        \mat{g}^{-1}(\mat{g}^T)^{-1}\bra{\mat{G}}.

   The first equality expresses the update as first subtracting off the
   projection of the previous approximation on the subspace spanned by
   $\ket{\uvect{G}}$ and then reconstructing this from the new data.  In this
   sense the update is minimal and can be expressed as minimizing the Frobenius
   norm.  This follows from equation :eq:`proj_inv` with $\mat{\Gamma} =
   \ket{\uvect{G}_{\perp}}$.

   The main feature of this update is that, if the function is linear, then
   previous updates will be preserved so that after $N$ independent updates,
   the approximation will be exact: *NO NO NO.  Not True!*

   >>> np.random.seed(0)
   >>> import numpy.random
   >>> from numpy import mat
   >>> from numpy.linalg import svd, qr, inv
   >>> ns = numpy.random.randint(1,10,size=10)
   >>> N = sum(ns)
   >>> X_ = mat(np.random.rand(N,N))
   >>> J_ = mat(np.random.rand(N,N))
   >>> G_ = J_*X_
   >>> B = mat(np.eye(N))
   >>> B1 = mat(np.eye(N))
   >>> n0 = 0
   >>> for n in ns:
   ...    X = X_[:,n0:n0+n]
   ...    G = G_[:,n0:n0+n]
   ...    n0 += n
   ...    _, g = qr(G)
   ...    _, x = qr(X)
   ...    Xhat = svd(X)[0][:,:n]
   ...    Xperp = svd(X)[0][:,n:]
   ...    Ghat = svd(G)[0][:,:n]
   ...    Gperp = svd(G)[0][:,n:]
   ...    B = B + (X - B*G)*inv(g.T*g)*G.T
   ...    B1 = B1 - B1*Ghat*Ghat.T + X*inv(g)*Ghat.T



        

.. note:: I am trying to form a better intuition for these equations, so
   consider the following.  First assume that the $\mat{B}_0$ is not singular
   and that the matrices $\ket{\d{\mat{X}}}$ and $\ket{\d\mat{G}}$ are not rank
   deficient. (I.e., f there are $n$ previous points, then these have rank $n$.
   Otherwise, use the weights to reduce the size so that they satisfy this
   property appropriately.  The weights can now be ignored.)  In this case, one
   can satisfy the secant conditions exactly if the equations have the following
   structure:

   .. math::
      \mat{B} &= \mat{B}_{0} +
        (\ket{\d\mat{X}} - \mat{B}_{0}\ket{\d\mat{G}})\cdot
        \braket{\mat{A}|\d\mat{G}}^{-1}\bra{\mat{A}}

   for an arbitrary choice of $\bra{\mat{A}}$ insofar as
   $\braket{\mat{A}|\d\mat{G}}$ is not singular.  The ambiguity in the choice of
   $\bra{\mat{A}}$ is now captured in the desire to minimize the change from
   $\mat{B}_{0}$ and we have:

   .. math::
      \bra{\mat{A}_{\text{1a)}}} & = \bra{\d\mat{X}}\mat{B}_{0},\\
      \bra{\mat{A}_{\text{2b)}}} & = \bra{\d\mat{G}}.

   >>> from numpy.random import rand
   >>> from numpy import dot
   >>> from numpy.linalg import inv, svd
   >>> np.random.seed(0)
   >>> dG = rand(5,3)
   >>> dX = rand(5,3)
   >>> B0 = rand(5,5)
   >>> Gamma = dot(svd(dG)[0][:,-2:], rand(2,2))
   >>> np.allclose(0, dot(dG.T, Gamma))
   True
   >>> np.allclose(dot(dG.T, inv(dot(dG, dG.T) + dot(Gamma, Gamma.T))),
   ...             dot(inv(dot(dG.T, dG)), dG.T))
   True

   >>> A = dG.T
   >>> R = dX - dot(B0, dG)
   >>> B1 = B0 + dot(R, dot(inv(dot(A, dG)), A))
   
   
[Bierlaire:2006]_ suggests that a good choice of weights $w_i$ is
something like

.. math::
   w_i = \norm{x - x_i}^{-2}
       = \frac{1}{\braket{\d\mat{X}|\d\mat{X}}_{ii}}

which weights the small steps (and hence more likely to be in the
linear regime) more highly than long secant steps.  [Johnson:1988]_
suggests a similar criterion based on $\norm{G_{i}}$ or a fixed set of
weights depending on iteration number.

A good choice of $\mat{B}_{0}$ and $\mat{\Gamma}$ is not so simple,
but we can be lead by the original Broyden method which sets
$\mat{B}_{0}$ to be the previous approximate inverse Jacobian.
Another option is to set $\mat{B}_{0} = \mat{1}$ or some other
constant independent of iteration.  If enough iterates are maintained,
then this might also work (but I am not aware of this method having
been explored).

.. _sec:dynamic-restarts:

Dynamic Restarts
----------------

One interesting idea is to include an additional Jacobian residual
term $\mat{R} = \mat{B} - \mat{B}_{-1}$ in addition to $\mat{R}_{0} =
\mat{B} - \mat{B}_{0}$ where $\mat{B}_{0}$ is constant (typically
$\mat{B}_{0}\propto \mat{1}$) and $\mat{B}_{-1}$ is the previous
approximation.  If we also take $\mat{\Gamma}_{0} = w_{0}\mat{1}$,
then the weight $w_{0}$ will "pull" the inverse Jacobian back towards
$\mat{B}_{0}$.  This is useful if one can formulate some slow but
approximately convergent iteration scheme.  [Johnson:1988]_, for
example, advocates restarting the algorithm and resetting the Jacobian
approximation after some progress has been made.  Instead, by dynamically
increasing $w_0$ (if the Broyden steps are not improving the residual
for example) once can continuously "restart" the approximation by
pulling it back toward $\mat{B}_{0}$. This gives the following
generalized equations (which we express in several forms here):

.. math::
   \mat{J}_{\text{1a)}} &= \Bigl[
       \ket{\d\mat{G}}\mat{w}^2\bra{\d\mat{X}}
       + \mat{J}_{-1}\mat{\Gamma}\mat{\Gamma}^{T}
       + \mat{J}_{0}w_0^2
     \Bigr]\cdot\Bigl[
       \ket{\d\mat{X}}\mat{w}^2\bra{\d\mat{X}}
       + \mat{\Gamma}\mat{\Gamma}^{T} 
       + \mat{1}w_0^2
     \Bigr]^{-1},\\
   &= \mat{J}_{-1} +
     \Bigl(
       (\ket{\d\mat{G}} - \mat{J}_{-1}\ket{\d\mat{X}})\cdot\mat{w}^2
       \bra{\d\mat{X}}
       + (\mat{J}_{0} - \mat{J}_{-1})w_0^2
     \Bigr)\cdot\frac{1}{
       \ket{\d\mat{X}}\mat{w}^2\bra{\d\mat{X}}
       + \mat{\Gamma}\mat{\Gamma}^{T} + \mat{1}w_0^2},\\
   \mat{B}_{\text{1a)}} &= \mat{B}_{-1} + 
   \Bigl(
     (\ket{\d\mat{X}} - \mat{B}_{-1}\ket{\d\mat{G}})\cdot\mat{w}^2
     \bra{\d\mat{X}}
     + (\mat{1} - \mat{B}_{-1}\mat{B}_{0}^{-1})w_0^2
   \Bigr)\cdot\frac{1}{
       \mat{B}_{-1}\ket{\d\mat{G}}\mat{w}^2\bra{\d\mat{X}}
       + \mat{\Gamma}\mat{\Gamma}^{T} + \mat{B}_{-1}\mat{B}_{0}^{-1}w_0^2
       }\cdot\mat{B}_{-1}, \\
   \mat{B}_{\text{2b)}} &= \Bigl[
       \ket{\d\mat{X}}\mat{w}^2\bra{\d\mat{G}}
       + \mat{B}_{-1}\mat{\Gamma}\mat{\Gamma}^{T}
       + \mat{B}_{0}w_0^2
     \Bigr]\cdot\Bigl[
       \ket{\d\mat{G}}\mat{w}^2\bra{\d\mat{G}}
       + \mat{\Gamma}\mat{\Gamma}^{T} 
       + \mat{1}w_0^2
     \Bigr]^{-1},\\
   &= \mat{B}_{-1} +
     \Bigl(
       (\ket{\d\mat{X}} - \mat{B}_{-1}\ket{\d\mat{G}})\cdot\mat{w}^2
       \bra{\d\mat{G}}
      + (\mat{B}_{0} - \mat{B}_{-1})w_0^2
     \Bigr)\cdot\frac{1}{
       \ket{\d\mat{G}}\mat{w}^2\bra{\d\mat{G}}
       + \mat{\Gamma}\mat{\Gamma}^{T} + \mat{1}w_0^2
       }

This dynamic restart mechanism can be easily used if one can relate
the objective function $\vect{G}(\vect{x})$ to an approximately
convergent iteration:

.. math::
   \vect{x} \mapsto \vect{F}(\vect{x}) 
   = \vect{x} - \mat{B}_{0}\vect{G}(\vect{x}).

Pulling the approximate inverse Jacobian $\mat{B} \rightarrow
\mat{B}_{0}$ will pull the iterations back to this (hopefully)
convergent iteration.  Note that the typical linear mixing scheme
$\vect{x} \mapsto (1-\alpha)\vect{x} + \alpha\vect{F}(\vect{x})$ is
simply equivalent to scaling $\mat{B}_{0} \rightarrow
\alpha\mat{B}_{0}$ which may also be simply implemented.

.. note:: The diagonals of the matrix $\mat{J}_0 = \mat{B}_0^{-1}$ are
   specified by the parameter :attr:`IProblem.j0_scale`.  The
   parameter $\alpha$ is treated separately in the solver.

Regularization
--------------

Finally, the choice of $\mat{\Gamma}\mat{\Gamma}^{T}$ is such that the
denominators are positive definite.  Several choices are
discussed in [Bierlaire:2006]_:

"Geometric approach" : 
   Here we set $\mat{\Gamma} = \mat{0}$ which is fine unless there are
   singular or dependent direction which are common.  It is also sufficient with
   a dynamic restart where $w_0^2$ regularizes the denominator.  It is
   completely unsuitable, however, for a memory limited approach.
"Subspace decomposition" :
   The subspace decomposition method makes use of the following formulae:

   .. math::
        \frac{1}{\ket{G}\bra{H} +
          \ket{G_\perp}w_{r}^2\bra{G_\perp} + w_0^2\mat{1}} = 
        \frac{1}{w_0^2}
        \left(\mat{1} - \ket{G}\frac{1}
             {\mat{1}w_0^2 + \braket{H|G}}\bra{H}\right)
        \left(\mat{1} - \frac{\ket{G_\perp}w_r^2\bra{G_\perp}}
                           {w_0^2 + w_r^2}\right),\\
        \frac{1}{\ket{G}\bra{H} +
          \ket{H_\perp}w_{r}^2\bra{H_\perp} + w_0^2\mat{1}} = 
        \frac{1}{w_0^2}
        \left(\mat{1} - \frac{\ket{H_\perp}w_r^2\bra{H_\perp}}
                           {w_0^2 + w_r^2}\right)
        \left(\mat{1} - \ket{G}\frac{1}
             {\mat{1}w_0^2 + \braket{H|G}}\bra{H}\right)

   >>> np.random.seed(0)
   >>> w02 = 0.1
   >>> wr2 = 0.2
   >>> N = 10
   >>> m = 5
   >>> I = np.eye(N)
   >>> i = np.eye(m)
   >>> dG = np.mat(np.random.rand(N,m))
   >>> H = np.mat(np.random.rand(N,m))
   >>> dG_perp = np.linalg.svd(dG)[0][:,m:]
   >>> assert(np.allclose(dG.T*dG_perp, 0))
   >>> B = np.mat(np.random.rand(N,N))
   >>> D = dG*H.T + dG_perp*wr2*dG_perp.T + I*w02
   >>> Dinv = ((I - dG*np.linalg.inv(i*w02 + H.T*dG)*H.T)*
   ...         (I - wr2*dG_perp/(wr2+w02)*dG_perp.T))/w02
   >>> np.allclose(D*Dinv,I)
   True
   >>> H_perp = np.linalg.svd(H)[0][:,m:]
   >>> assert(np.allclose(H.T*H_perp, 0))
   >>> B = np.mat(np.random.rand(N,N))
   >>> D = dG*H.T + H_perp*wr2*H_perp.T + I*w02
   >>> Dinv = ((I - wr2*H_perp/(wr2+w02)*H_perp.T)*
   ...     (I - dG*np.linalg.inv(i*w02 + H.T*dG)*H.T))/w02
   >>> np.allclose(D*Dinv,I)
   True

   For the 
   We start with the subspace decomposition for method 2b).  Here we set
   $\mat{\Gamma} = \ket{G_\perp}\mat{w}_r$, arriving at the following inversion
   formula:

   The fraction has the following forms for method 1a) and 2b) respectively:

   .. math::
        \frac{1}{\ket{G}\bra{X} +
          \mat{B}^{-1}\ket{\Gamma}\bra{\Gamma} + w_0^2\mat{1}},\\
        \frac{1}{\ket{G}\bra{G} +
          \ket{\Gamma}\bra{\Gamma} + w_0^2\mat{1}}

   The subspace decomposition method chooses $\ket{\Gamma}$ to be orthogonal to
   either $\ket{X}$, $


   This method sets $\mat{\Gamma} = \ket{\uvect{X}_{\perp}}$ (case 1a) or
   $\mat{\Gamma} = \ket{\uvect{G}_{\perp}}$ (case 1a or 2b).  These are the
   basis vectors orthogonal to $\ket{\d\mat{X}}$ or $\ket{\d\mat{G}}$
   respectively.  Let the denominator be written as:


   (Old notes) A couple of points.  First, if $w_0 = 0$, then the results are
   independent of the magnitude of $\mat{\Gamma}$.  To see this, note that one
   has explicitly (for case 2b) for example):

   .. math::
      \frac{1}{\ket{\uvect{G}}\mat{x}\bra{\uvect{G}} 
        + \lambda\ket{\uvect{G}_\perp}\bra{\uvect{G}_\perp}} =
      \ket{\uvect{G}}\mat{x}^{-1}\bra{\uvect{G}} 
        + \frac{1}{\lambda}\ket{\uvect{G}_\perp}\bra{\uvect{G}_\perp}

   The second term will be eliminated when acted upon by the numerator
   factor $\bra{\d\mat{G}}$.  This method poses some formal difficulty for the
   inversion of the inverse formulation of case 2b) as the denominator contains
   the mixed term $\mat{B}_{-1}\ket{\d\mat{G}}\mat{w}^2\bra{\d\mat{X}}$,
   however, since one should generally have $\mat{B}_{-1}\ket{\d\mat{G}} \approx
   \ket{\d\mat{X}}$, this should not pose any difficulty in practise.
   Technically, we implement this by simply performing an SVD of the denominator
   and dropping the orthogonal subspace.

   .. note:: The standard Broyden method is reproduced with this approach if
      there is no restart parameter $w_0 = 0$ and only a single previous vector
      $\ket{X}$ is kept.
"Numerical approach":
   They recommend a numerical approach of performing a Modified
   Cholesky decomposition :func:`mmf.math.linalg.modchol` of $\mat{A}
   = \ket{\d\mat{G}}\mat{w}^2\bra{\d\mat{X}} = \mat{L}\mat{L}^{T} -
   \mat{E}$ where $\mat{E} = \mat{\Gamma}\mat{\Gamma}^T$ is a
   diagonal matrix as small as possible making $\mat{A} + \mat{E}$
   safely positive definite and well conditioned.

   Unfortunately, this Modified Cholesky based approach does not work
   for method 1a) where the denominator is both large ($N\times N$)
   and slightly asymmetric.  (Ideally $\mat{B}_{0}\ket{\d\mat{G}}
   \approx \ket{\d\mat{X}}$ so the asymmetry should be small.)

   Our implementation of this is the same as the "direct inversion"
   approach discussed below, but with a finite $\mat{\Gamma}$ such
   that the smallest singular value is not less than
   :attr:`JinvBroyden.min_svd` times the largest.
"Direct inversion":
   If $\mat{C} = \mat{\Gamma}\mat{\Gamma}^{T} + w_{0}^2\mat{1}$ is easily
   invertible, then the following formula suggested by 
   [Johnson:1988]_ can be used.  (In [Johnson:1988]_ they set
   $\mat{\Gamma} = w_{0}\mat{1}$ keeping and so the requirement of minimizing
   the effect of $\mat{\Gamma}$ comes in the limit $w_0 \rightarrow 0$.  Note
   that this is not exactly the same as our restart condition because in this
   case the Jacobian is pulled back to the previous Jacobian, not back to the
   restart Jacobian.):

   .. math::
      \frac{1}{\mat{C} + \ket{\mat{a}}\bra{\mat{b}}} =
      \mat{C}^{-1} - \mat{C}^{-1}\ket{\mat{a}}\cdot
        \frac{1}{\mat{1}
                 + \braket{\mat{b}|\mat{C^{-1}|\mat{a}}}}
      \cdot\bra{\mat{b}}\mat{C}^{-1}.

   >>> from numpy.linalg import inv
   >>> np.random.seed(3); N=10; n=3;
   >>> X = np.mat(np.random.rand(N, n))
   >>> G = np.mat(np.random.rand(N, n))
   >>> C = np.mat(np.random.rand(N, N))
   >>> I = np.mat(np.eye(N))
   >>> i = np.mat(np.eye(n))
   >>> np.allclose(inv(C + G*X.T),
   ...             inv(C) - inv(C)*G*inv(i + X.T*inv(C)*G)*X.T*inv(C))
   True

   This only requires inverting the small $n\times n$ matrix $\mat{1} +
   \braket{\mat{X}|\mat{\mat{G}}}$. One further simplification arises when the
   factor of $\bra{\mat{b}}$ appears to the left as is the case in both methods
   without the restart:

   .. math::
      :label: proj_inv

      \bra{\mat{b}}\frac{1}{\mat{C} + \ket{\mat{a}}\bra{\mat{b}}} =
      \frac{1}{\mat{1} + \braket{\mat{b}|\mat{C^{-1}|\mat{a}}}}
      \cdot\bra{\mat{b}}\mat{C}^{-1}.


Symmetric Jacobian
------------------
If it is known that the Jacobian is symmetric, then this constraint
can be built into the minimization with a Lagrange multiplier.

.. todo:: Finish figuring out how to solve the symmetric problem::

   >> A = rand(5,5)
   >> B = rand(5,5)
   >> B = B+B.T
   >> I = eye(5)
   >> M = zeros((25,25))
   >> for i in xrange(5):
   ..     for j in xrange(i+1):
   ..         for a in xrange(5):
   ..             for b in xrange(5):
   ..                 M[i+5*j, a+5*b] = (I[i,a]*A[j,b] + I[j,a]*A[i,b]
   ..                                   +I[j,b]*A[i,a] + I[i,b]*A[j,a])

   See http://dx.doi.org/10.1016/0024-3795(89)90067-0


Memory-Limited Extended Broyden Method
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Here we consider a slight generalization of the previous method in a memory
limited approach.  First, we keep a limited set of $n$ previous points
$\ket{\mat{X}}$ and function evaluations at those points $\ket{\mat{G}}$.  These
define two $n-1$ dimensional subspaces spanned by their differences:
$\ket{\uvect{X}} = \Sp\ket{\delta\mat{X}}$ and $\ket{\uvect{G}} =
\Sp\ket{\delta\mat{G}}$.  This leads to the "stateless" version described
first. Second, we keep track of an additional $2m$ basis vectors for spanning
the domain and range of the inverse Jacobian approximation.  Thus, we need
$N\times(m+n)$ storage.  We express everything in terms of these subspaces, and
so work only with $(m+n-1)\times(m+n-1)$ dimensional matrices as described
below.  Finally, for simplicity, we assume that the original problem is
appropriately scaled so that $\mat{B}_{0} = \mat{1}$.

Full "Inefficient" Version
--------------------------

The general update has the form:

.. math::
     \mat{B} \rightarrow 
     \mat{B} + 
      \left((\ket{\d\mat{X}} - \mat{B}\ket{\d\mat{G}})\bra{H} 
      + (1-\mat{B})w_0^2\right)\cdot
      \frac{1}{\ket{\d\mat{G}}\bra{H} +
        \ket{\mat{\Gamma}}w_{r}^2\bra{\mat{\Gamma}} + w_0^2\mat{1}}

where $\ket{H} \in \{\ket{\d\mat{X}}, \ket{\d\mat{G}}\}$ for methods 1a) and 2b)
respectively.


We start with the subspace decomposition method with $\mat{\Gamma} =
\ket{G_\perp}$ or $\mat{\Gamma} = \ket{H_\perp}$.  The inverse Jacobian has the
following general form where $\ket{R} = \ket{\d\mat{X}} -
\mat{B}\ket{\d\mat{G}}$ and $\ket{H} \in \{\ket{\d\mat{X}},\ket{\d\mat{G}}\}$
for methods 1a) and 2b) respectively.

.. math::
   \begin{aligned}
     \mat{B} &\rightarrow 
     \mat{B} + 
      \left(\ket{R}\bra{H} + (1-\mat{B})w_0^2\right)\cdot
      \frac{1}{\ket{\d\mat{G}}\bra{H} +
        \ket{\d\mat{G}_\perp}w_{r}^2\bra{\d\mat{G}_\perp} + w_0^2\mat{1}},\\
    &= \mat{1} + f(\mat{B} - \mat{1})\left(\mat{1} -
       \ket{\d\uvect{G}}\bra{\d\uvect{G}}\right)
    + (\ket{\d\mat{X}} - \ket{\d\mat{G}})
      \frac{1}{\mat{1}w_0^2 + \braket{H|\d\mat{G}}}
      \left((1-f)\bra{H} +
            f\braket{H|\d\mat{G}}
            \braket{\d\mat{G}|\d\mat{G}}^{-1}\bra{\d\mat{G}}\right)
  \end{aligned}

>>> np.random.seed(0)
>>> w02 = 0.1
>>> wr2 = 0.2
>>> N = 10
>>> m = 5
>>> I = np.eye(N)
>>> i = np.eye(m)
>>> dG = np.mat(np.random.rand(N,m))
>>> dX = np.mat(np.random.rand(N,m))
>>> H = np.mat(np.random.rand(N,m))
>>> dG_perp = np.linalg.svd(dG)[0][:,m:]
>>> dG_hat = np.linalg.svd(dG)[0][:,:m]
>>> assert(np.allclose(dG.T*dG_perp, 0))
>>> B = np.mat(np.random.rand(N,N))
>>> R = dX - B*dG
>>> B1 = B + (R*H.T + (I-B)*w02)*np.linalg.inv(
...     dG*H.T + dG_perp*wr2*dG_perp.T + I*w02)
>>> f = wr2/(wr2+w02)
>>> B2 = (I + f*(B-I)*(I - dG_hat*dG_hat.T) 
...             + (dX - dG)*np.linalg.inv(i*w02 + H.T*dG)*(
...          (1-f)*H.T + f*H.T*dG*np.linalg.inv(dG.T*dG)*dG.T))
>>> np.allclose(B1,B2)
True

.. math::
   \begin{aligned}
     \mat{B} &\rightarrow 
     \mat{B} + 
      \left(\ket{R}\bra{H} + (1-\mat{B})w_0^2\right)\cdot
      \frac{1}{\ket{\d\mat{G}}\bra{H} +
        \ket{H_\perp}w_{r}^2\bra{H_\perp} + w_0^2\mat{1}} \\
    &= \mat{1} + f(\mat{B} - \mat{1})
    + \left[\ket{\d\mat{X}} - \ket{\d\mat{G}} 
            - f(\mat{B} - \mat{1})
               (\ket{\d\mat{G}} + w_0^2\ket{H}\braket{H|H}^{-1})\right]
      \frac{1}{\mat{1}w_0^2 + \braket{H|\d\mat{G}}}\bra{H}
   \end{aligned}

>>> np.random.seed(0)
>>> w02 = 0.1
>>> wr2 = 0.2
>>> N = 10
>>> m = 5
>>> I = np.eye(N)
>>> i = np.eye(m)
>>> dG = np.mat(np.random.rand(N,m))
>>> dX = np.mat(np.random.rand(N,m))
>>> H = np.mat(np.random.rand(N,m))
>>> H_perp = np.linalg.svd(H)[0][:,m:]
>>> H_hat = np.linalg.svd(H)[0][:,:m]
>>> assert(np.allclose(H.T*H_perp, 0))
>>> B = np.mat(np.random.rand(N,N))
>>> R = dX - B*dG
>>> B1 = B + (R*H.T + (I-B)*w02)*np.linalg.inv(
...     dG*H.T + H_perp*wr2*H_perp.T + I*w02)
>>> f = wr2/(wr2+w02)
>>> B2 = (I + f*(B-I)
...         + (dX - dG - f*(B-I)*(dG + w02*H*np.linalg.inv(H.T*H)))
...            *np.linalg.inv(i*w02 + H.T*dG)*H.T)
>>> np.allclose(B1, B2)
True

From these expressions we can determine the range and domain of the final
inverse Jacobian.  Let the previous inverse be expressed as $\mat{B} = \mat{1} +
\ket{\alpha}\bra{\beta}$.

Method 1a)
++++++++++

For method 1a) $\ket{H} = \ket{\d\mat{X}}\mat{w}^2$. Choosing $\mat{\Gamma} =
\ket{\uvect{H}_\perp}$ gives the following, which reduces to Broyden's original
"Good" method in the limit of $w_0 \rightarrow 0$ and keeping exactly one
previous step.

.. math::
    \mat{B} = \mat{1} + f(\mat{B} - \mat{1})
    + \left[\ket{\d\mat{X}} - \ket{\d\mat{G}} 
            - f(\mat{B} - \mat{1})
               (\ket{\d\mat{G}} 
                + w_0^2\ket{\d\mat{X}}
                  \braket{\d\mat{X}|\d\mat{X}}^{-1}\mat{w}^{-2})\right]
      \frac{1}{\mat{1}w_0^2 + \mat{w}^2
               \braket{\d\mat{X}|\d\mat{G}}}\mat{w}^2\bra{\d\mat{X}}.


The range is $\Sp(\ket{\alpha}, \ket{\d\mat{X} - \d\mat{G}})$ and the domain is
$\Sp(\bra{\beta}, \bra{\d\mat{X}}, \bra{\d\mat{G}})$.  Choosing $\mat{\Gamma} =
\ket{\d\uvect{G}_\perp}$ gives the following:

.. math::
    \mat{B} = \mat{1} + 
     f(\mat{B} - \mat{1})\left(
         \mat{1} - \ket{\d\uvect{G}}\bra{\d\uvect{G}}\right)
    + (\ket{\d\mat{X}} - \ket{\d\mat{G}})
      \frac{1}{\mat{1}w_0^2 + \mat{w}^2\braket{\d\mat{X}|\d\mat{G}}}\mat{w}^2
      \left((1-f)\bra{\d\mat{X}} 
       + f\braket{\d\mat{X}|\d\uvect{G}}\bra{\d\uvect{G}}\right)

The range is $\Sp(\ket{\alpha}, \ket{\d\mat{X} - \d\mat{G}})$ and the
domain is $\Sp(\bra{\beta}, \bra{\d\mat{X}})$.

Method 2b)
++++++++++

Both expressions are the same for method 2b) since $\ket{H} = \ket{\d\mat{G}}$
and the final expression for $\mat{\Gamma} = \ket{\d\uvect{G}_\perp}$ is the
following, which reduces to Broyden's original "Bad" method in the limit of $w_0
\rightarrow 0$ and keeping exactly one previous step.

.. math::
    \mat{B} = \mat{1} + 
     f(\mat{B} - \mat{1})\left(\mat{1} 
      - \ket{\d\uvect{G}}\bra{\d\uvect{G}}\right)
    + (\ket{\d\mat{X}} - \ket{\d\mat{G}})
      \frac{1}{\mat{1}w_0^2 + 
               \mat{w}^2\braket{\d\mat{G}|\d\mat{G}}}\mat{w}^2\bra{\d\mat{G}}.

The range is $\Sp(\ket{\alpha}, \ket{\d\mat{X} - \d\mat{G}})$ and the domain is
$\Sp(\bra{\beta}, \bra{\d\mat{G}})$.

.. note:: These reduce to the following in the limit of $f \rightarrow 0$ where
   one sets $w_r\rightarrow 0$.  This limit is a "stateless" version where the
   Jacobian does not depend on the previous Jacobian (however, in preliminary
   investigations, this has not performed well.)

   .. math::
      \mat{B}_{\text{1a)}} &= \mat{1} + 
        (\ket{\d\mat{X}} - \ket{\d\mat{G}})\cdot
        \frac{1}{w_0^2\mat{1} + \mat{w}^2\cdot\braket{\d\mat{X}|\d\mat{G}}}
        \cdot\mat{w}^2\cdot\bra{\d\mat{X}}\\
      \mat{B}_{\text{2b)}} &= \mat{1} +
        (\ket{\d\mat{X}} - \ket{\d\mat{G}})\cdot
        \frac{1}{w_0^2\mat{1} + \mat{w}^2\cdot\braket{\d\mat{G}|\d\mat{G}}}
        \cdot\mat{w}^2\cdot\bra{\d\mat{G}}

   >>> np.random.seed(0)
   >>> w02 = 0.1
   >>> N = 10
   >>> m = 5
   >>> I = np.eye(N)
   >>> dG = np.mat(np.random.rand(N,m))
   >>> dX = np.mat(np.random.rand(N,m))
   >>> B = np.mat(np.random.rand(N,N))
   >>> R = dX - B*dG
   >>> w2 = np.mat(np.diag(np.random.rand(m)**2))
   >>> B1a = B + (R*w2*dX.T + (I - B)*w02)*np.linalg.inv(
   ...     B*dG*w2*dX.T + B*w02)*B
   >>> B2b = B + (R*w2*dG.T + (I - B)*w02)*np.linalg.inv(
   ...     dG*w2*dG.T + I*w02)
   >>> B1a_ = I + (dX - dG)*np.linalg.inv(
   ...     np.eye(m)*w02 + w2*dX.T*dG)*w2*dX.T
   >>> B2b_ = I + (dX - dG)*np.linalg.inv(
   ...     np.eye(m)*w02 + w2*dG.T*dG)*w2*dG.T
   >>> np.allclose(B1a, B1a_), np.allclose(B2b, B2b_)
   True, True
   

We begin by representing the Jacobian and inverse in a limited
subspace as follows:

.. math::
   \mat{J} &= \mat{1} + \ket{\uvect{\alpha}}\mat{j}\bra{\uvect{\beta}},\\
   \mat{B} &= \mat{1} + \ket{\uvect{\alpha}}\mat{b}\bra{\uvect{\beta}}.

The inversion relation gives

.. math::
   \mat{b} &= \frac{-1}{\mat{j}^{-1} +
                       \braket{\uvect{\beta}|\uvect{\alpha}}},\\
   \mat{j} &= \frac{-1}{\mat{b}^{-1} +
                       \braket{\uvect{\beta}|\uvect{\alpha}}}.


We drop the use of the $\mat{\Gamma}$ matrix, instead relying on the
projectors $\ket{\uvect{\alpha}}$ and $\ket{\uvect{\beta}}$ and
minimizing the following

.. math::
   &\min_{\mat{J}}\left(\sum_{j}
      \Norm{(\ket{\d{G}_i} - \mat{J}\ket{\d{X}_i})\mat{w}_{ij}}_{2}^{2}
      + w_0^{2} \norm{\mat{J} - \mat{J}_{0}}_{F}^{2}
      + w_r^{2} \norm{\mat{J} - \mat{1}}_{F}^{2}
    \right),\\
   &\min_{\mat{B}}\left(\sum_{j}
      \Norm{(\ket{\d{X}_{i}} - \mat{J}\ket{\d{G}_{i}})\mat{w}_{ij}}_{2}^{2}
      + w_0^{2} \norm{\mat{B} - \mat{J}_{0}}_{F}^{2}
      + w_r^{2} \norm{\mat{B} - \mat{1}}_{F}^{2}
    \right).

Here we have three weights $\mat{w}$ specifies the priority of the
secant conditions: This may be a matrix, allowing for more complicated
preference schemes involving all $n(n-1)/2$ differences rather than
just the $n-1$ basis vectors.

.. note:: It is assumed that $\mat{w}$ is symmetric in this sense that
   $\mat{w}^2$ is used everywhere.  The input need not be symmetric
   and $\mat{w}^{2} \equiv \mat{w}\mat{w}^T$ will be used throughout.

.. note:: In order for this to work well, the Jacobian of the objective function
   must be well represented by the identity $\mat{1}$ plus a correction of
   low rank.  If, for a given problem, the Jacobian has the form 
   $\mat{J} \approx \mat{J}_0 + \ket{}\bra{}$ then one should rescale the
   function $\ket{\mat{G}} \rightarrow \mat{J}_0^{-1} \ket{\mat{G}}$ so that the
   leading term is the identity.  It is assumed here that this rescaling has
   been performed.

   This is related to the notion of "linear mixing" as an iterative scheme:

   .. math::
      x \rightarrow (1-\alpha)x + \alpha F(x).

   If this is known to converge, then one can probably assume that $\mat{B}
   \approx \alpha\mat{1} + \ket{}\bra{}$ and one should rescale the objective
   $G(x) \rightarrow \alpha G(x)$.  If this has been done consistently, then
   pulling the Jacobian back toward the identity with a non-zero $w_r$ will pull
   the Newton iteration back toward the linear mixing which should restore
   convergence.

for methods 1a) and 2b) respectively.  This gives

.. math::
   \mat{j}_{\text{1a)}} &=
     \left\{
       \braket{\uvect{\alpha}|\d\mat{G}-\d\mat{X}}\mat{w}^2
       \braket{\d\mat{X}|\uvect{\beta}}
       +
       w_{0}^2\braket{\uvect{\alpha}|\uvect{\alpha}_{0}}
              \mat{j}_{0}
              \braket{\uvect{\beta}_{0}|\uvect{\beta}}
     \right\}
   \cdot
   \frac{1}{\braket{\uvect{\beta}|\d\mat{X}}\mat{w}^2
            \braket{\d\mat{X}|\uvect{\beta}} + w_{0}^2 + w_{r}^2},\\
   \mat{b}_{\text{2b)}} &= 
     \left\{
       \braket{\uvect{\alpha}|\d\mat{X}-\d\mat{G}}\mat{w}^2
       \braket{\d\mat{G}|\uvect{\beta}}
       +
       w_{0}^2\braket{\uvect{\alpha}|\uvect{\alpha}_{0}}
              \mat{b}_{0}
              \braket{\uvect{\beta}_{0}|\uvect{\beta}}       
     \right\}
   \cdot
   \frac{1}{\braket{\uvect{\beta}|\d\mat{G}}\mat{w}^2
            \braket{\d\mat{G}|\uvect{\beta}} + w_{0}^2 + w_{r}^2}.

From these formula, it is clear that no information will be lost if we
have the following:

  1a)
    .. math::
       \ket{\uvect{\alpha}} &=
         \Sp(\ket{\uvect{\alpha}_{0}}, \ket{\d\mat{X}-\d\mat{G}}),\\
       \ket{\uvect{\beta}} &=
         \Sp(\ket{\uvect{\beta}_{0}}, \ket{\d\mat{X}}),\\
  2b)
    .. math::
       \ket{\uvect{\alpha}} &=
         \Sp(\ket{\uvect{\alpha}_{0}}, \ket{\d\mat{X}-\d\mat{G}}),\\
       \ket{\uvect{\beta}} &=
         \Sp(\ket{\uvect{\beta}_{0}}, \ket{\d\mat{G}}),\\

The purpose of the $w_{0}$ term here is to minimize the change of the
approximation from the previous result.  This is important for the
representation of the Jacobian in the subspace spanned by
$\ket{\uvect{\alpha}}$ and $\ket{\uvect{\beta}}$ that lies outside of
the space of secant conditions.  The parameters $w_{r}$ is new here and
"pulls" the solution back toward the identity $\mat{1}$ which ideally
describes a convergent (but slow) method.  This simulates a "restart"
when large -- resetting the Jacobian to the identity and thus should
be used when convergence is poor (or there is a divergence).

The numeric issue here is dealing with poorly behaved matrices.  To
deal with this, we introduce a parameter `singular_tol` with the goal
that all singular values of the diadic $\mat{b}$ of the final inverse
Jacobian lie between `singular_tol` and `1/singular_tol`.  To do this,
all matrix multiplications of the smaller matrices are done using the
SVD with singular values outside the range of `singular_tol**2`
truncated in intermediate calculations and a final truncation within
the required range.



Memory-limited Implementation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To limit the amount of memory required, we store the following:

$\ket{\mat{X}}$, $\ket{\mat{G}}$ :
   $N \times n$ matrices: the columns of which are the previous steps
   and function evaluations.  (The oldest steps may be replaced and
   indices used to keep track of the ordering to minimize the
   requirement for working with this array.)
$\ket{\uvect{A}}$, $\ket{\uvect{B}}$ :
   $N \times m - (n-1)$ matrices representing the extra basis vectors
   required to span the full space of $\ket{\uvect{\alpha}}$ and
   $\ket{\uvect{\beta}}$.

In order to prevent working with the large matrices, we form the
following matrices:

.. math::
   \begin{pmatrix}
     \braket{\mat{X}-\mat{G}|\mat{X}-\mat{G}} 
     & \braket{\uvect{A}|\mat{X}-\mat{G}}\\
     \braket{\mat{X}-\mat{G}|\uvect{A}} 
     & \braket{\uvect{A}|\uvect{A}}
   \end{pmatrix},
   \qquad
   \begin{pmatrix}
     \braket{\mat{X}|\mat{X}} 
     & \braket{\uvect{B}|\mat{X}}\\
     \braket{\mat{X}|\uvect{B}} 
     & \braket{\uvect{B}|\uvect{B}}
   \end{pmatrix},
   \qquad
   \begin{pmatrix}
     \braket{\mat{G}|\mat{G}} 
     & \braket{\uvect{B}|\mat{G}}\\
     \braket{\mat{G}|\uvect{B}} 
     & \braket{\uvect{B}|\uvect{B}}
   \end{pmatrix}.

.. note:: This is predicated on the following analysis for extracting
   the span of a set of vectors.  Consider the SVD of 
   a set of vectors:

   .. math::
      \ket{\mat{X}} = 
      \begin{pmatrix}
        \ket{\uvect{U}} & \ket{\uvect{U}_{\perp}}
      \end{pmatrix}
      \begin{pmatrix}
      \mat{\sigma}\\
        & \mat{0}
      \end{pmatrix}
      \begin{pmatrix}
        \mat{V}^{\dagger}\\
        \mat{V}_{\perp}^{\dagger}
      \end{pmatrix}
    
   The desired basis $\ket{\uvect{U}} = \Sp(\ket{\mat{X}})$ may be
   expressed in terms of the original vectors as

   .. math::
      \ket{\uvect{U}} = \ket{\mat{X}}\mat{V}\mat{\sigma}^{-1}.

   Hence, we may determine this from the original matrix by
   considering the eigenvalue decomposition of the small matrix

   .. math::
      \braket{\mat{X}|\mat{X}} = 
      \begin{pmatrix}
        \mat{V} & \mat{V}_{\perp}
      \end{pmatrix}
      \begin{pmatrix}
        \mat{\sigma}^2\\
          & \mat{0}
      \end{pmatrix}
      \begin{pmatrix}
        \mat{V}^{\dagger}\\
        \mat{V}_{\perp}^{\dagger}
      \end{pmatrix}.

   A rank revealing Cholesky decomposition of $\braket{\mat{X}|\mat{X}} =
   \mat{r}^{T}\mat{r}$ may also be similarly used.

   A corollary is that, given an orthonormal basis
   $\ket{\uvect{\alpha}_{0}}$ and a new set of vectors $\ket{\beta}$,
   we may construct the augmented basis by diagonalizing the small
   matrix

   .. math::
      \braket{\beta|\beta} - 
      \braket{\beta|\uvect{\alpha}_{0}}
      \braket{\uvect{\alpha}_{0}|\beta} 
      &= \mat{v} \mat{\sigma}^2 \mat{v}^{T},\\
      \begin{pmatrix}
        \ket{\uvect{\alpha}_{0}} &
        \ket{\uvect{\alpha}_{1}}
      \end{pmatrix} &=
      \begin{pmatrix}
        \ket{\uvect{\alpha}_{0}} &
        \ket{\beta}
      \end{pmatrix}
      \cdot
      \begin{pmatrix}
        \mat{1} & -\braket{\uvect{\alpha}_{0}|\beta}\mat{v}\mat{\sigma}^{-1}\\
        \mat{0} & \mat{v}\mat{\sigma}^{-1}
      \end{pmatrix}.

   >>> np.random.seed(0)
   >>> alpha_0 = np.linalg.qr(np.random.rand(10, 5))[0][:,:5]
   >>> beta = np.random.rand(10, 2)
   >>> alpha_0_beta = np.dot(alpha_0.T, beta)
   >>> s2, v = np.linalg.eigh(np.dot(beta.T, beta) 
   ...                        - np.dot(alpha_0_beta.T, alpha_0_beta))
   >>> s = np.sqrt(s2)
   >>> alpha_1 = np.dot(beta - np.dot(alpha_0, alpha_0_beta), v/s[None,:])
   
   $\ket{\uvect{\alpha}_{0}}$ and $\ket{\uvect{\alpha}_{1}}$ are
   orthogonal and normalized:

   >>> np.allclose(np.dot(alpha_1.T, alpha_0), 0)
   True
   >>> np.allclose(np.dot(alpha_1.T, alpha_1), np.eye(2))
   True

For large systems, we cannot store the full matrices.  If the number
of iterations is small, then we can form all of the matrices in terms
of dyads (see :class:`DyadicSum`), however, if the number of
iterations becomes too large, then we must truncate the spaces.  Let
us consider the following information with $N$ being the (large) size
of the space, and $n$ be the number of previous steps maintained.

$\ket{x}$, $\ket{x_0}$, $\ket{g}$, $\ket{g_0}$ : 
   1-d arrays of length `N` of the current and previous steps along
   with the value of the objective function at these points.
$\ket{\uvect{X}}$, $\ket{\uvect{G}}$ :
   $N \times (n-1)$ arrays of orthonormal basis vectors spanning the
   space of difference vectors $\ket{x_i} - \ket{x_j}$ and $\ket{g_i}
   - \ket{g_j}$ respectively.
$\ket{\uvect{B}_x}$, $\ket{\uvect{B}_g}$ :
   $N \times m$ arrays of orthonormal basis vectors for representing
   the inverse Jacobian.

Given these, we can express everything in terms of various $n \times
n$ matrices in these bases.  Thus, the inverse Jacobian is expressed
as:

.. math::
   \mat{B} = \mat{1} + \ket{\uvect{X}}\mat{b}\bra{\uvect{G}}.

One might like to include an optional preconditioner $\mat{B}
\rightarrow \mat{B}\op{B}_0$.  For method 1a) the preconditioner
factors to the right: to implement this, the simplest approach is to
simply precondition the initial function $G(x) \rightarrow
\op{B}_{0}G(x)$.  For method 2b) a simple factorization does
not work quite as well.  [Johnson:1988]_ also suggests allowing for a
factor $\alpha$ to reduce the weight of the initial identity
approximation:

.. math::
   \mat{B} = \alpha\mat{1} + \ket{\uvect{X}}\mat{b}\bra{\uvect{G}}.

We now express everything in terms of the basis vectors
$\ket{\uvect{X}}$ and $\ket{\uvect{G}}$.  Thus, we define the
following matrices:


.. math::
   \ket{\d\mat{X}} &= \ket{\uvect{X}}\mat{d}_x, \\
   \ket{\d\mat{G}} &= \ket{\uvect{G}}\mat{d}_g.

This allows use to express the equations as:

.. math::
   \mat{B}\ket{\uvect{G}} = \ket{\uvect{G}}\alpha
    + \ket{\uvect{X}}\mat{b}

.. math::
   \mat{B}_{\text{1a)}} &\rightarrow
     \mat{1} + \ket{\uvect{X}}\mat{b}\bra{\uvect{G}}
     +
     \left(\ket{\uvect{X}}\mat{d}_x 
           - \mat{B}_{0}\ket{\uvect{G}}\mat{d}_g\right)
     \cdot\mat{w}^2\cdot\mat{d}_{x}^{T}\bra{\uvect{X}}\cdot\frac{1}{
       \mat{B}_{0}\ket{\uvect{G}}
       \mat{d}_g\mat{w}^2\mat{d}_x^{T}\bra{\uvect{X}}
       + w_{0}^2\mat{\Gamma}\mat{\Gamma}^{T}}\cdot
       (\mat{1} +  \ket{\uvect{X}}\mat{b}\bra{\uvect{G}}), \\
   \mat{B}_{\text{2b)}} &= \mat{B}_{0}
     + (\ket{\d\mat{X}} - \mat{B}_{0}\ket{\d\mat{G}})\cdot\mat{w}^2
     \bra{\d\mat{G}}\cdot\frac{1}{
       \ket{\d\mat{G}}\mat{w}^2\bra{\d\mat{G}}
       + w_{0}^2\mat{\Gamma}\mat{\Gamma}^{T}}


For extremely large systems, we wish to represent the inverse Jacobian
as a dyadic sum:

.. math::
   \mat{B} = (\mat{1} + \mat{U}\mat{\sigma}\mat{V}^{T})\mat{B}_{0}

where $\mat{U}$ and $\mat{V}$ are $n \times k$ where $n$ is the size
of the problem (large) and $k$ is the number of previous steps we
keep (small).  The main difficulty with these formula is inverting the
large matrix.  The "Good" method makes this even more difficult
because the denominator is not symmetric while the "Bad" method --
apart from possibly suffering some of the numerical instabilities of
the "Bad" Broyden method -- does not easily allow the form of the
dyadic inverse Jacobian to be preserved (if $\mat{B}_{0}$ is
non-trivial).

.. _sec:truncation-algorithm:

Truncation Algorithm
--------------------
In the memory limited-implementation, one must choose how to discard
the previous steps.  This choice must be made in two place: the set of
previous steps $\ket{x}$ and $\ket{g}$ to be retained, and the
truncation of the current inverse Jacobian approximation $\mat{B}$.



In principle, given $n < N$ previous steps,
it is likely that the all of the steps are linearly independent, and
so the minimization condition is irrelevant - one simply solves the
$n$ secant equations in the subspace spanned by the difference
vectors. (The subspace decomposition method has this property -
Johnson's method does not quite have this property when $0 < w_0$.)

Once one needs to limit 


.. [Broyden:1965] C. G. Broyden, `A Class of Methods for Solving
   Nonlinear Simultaneous Equations
   <http://www.jstor.org/stable/2003941>`_, Math. Comput. 19(92),
   577-593 (1965)

.. [Johnson:1988] D. D. Johnson, `Modified Broyden's method for
   accelerating convergence in self-consistent calculations
   <http://dx.doi.org/10.1103/PhysRevB.38.12807>`_, Phys. Rev. B
   38(18), 12807--12813 (1988)

.. [Bierlaire:2006] M. Bierlaire, F. Crittin, and M. Themans, `A
   multi-iterate method to solve systems of nonlinear equations
   <http://dx.doi.org/10.1016/j.ejor.2006.09.080>`_, European Journal
   of Operational Research 183(1), 20 -- 41 (2006)

Parameters
==========
A modification of the Newton step can be used to allow manual control
of some (small) subset of the parameters.  As an example (and in our
notations) we consider a set of chemical potentials :math:`\mu` and
the respective densities :math:`N`.  Allowing the chemical potentials
to change to soon, before the other parameters have relaxed, can throw
off an iteration leading to wild oscillations.   Instead, one might
want to wait until a degree of convergence for fixed chemical
potentials has been achieved, and then let the vary.

We partition the problem and Jacobian as

.. math::
   \begin{pmatrix}
      \d{G}\\
      \d{N}
   \end{pmatrix} = 
   \mat{J}\cdot
   \begin{pmatrix}
     \d{x}\\
     \d{\mu}
   \end{pmatrix}

The full Broyden step would be to find :math:`\d{x}` and :math:`\d{\mu}`
from :math:`\d{G}` and :math:`\d{N}` using :math:`\mat{J}^{-1}`.
Instead, we need to find :math:`\d{x}` (and :math:`\d{N}`) from
:math:`\d{G}` and the specified "control" parameters :math:`\d{\mu}`.

To do this, we partition the inverse Jacobian as

.. math::
   \mat{J}^{-1} = \begin{pmatrix}
     \cdots & \cdots\\
     \cdots & \mat{d}
   \end{pmatrix}

One can solve the problem explicit using the full partitioning, but in
practise, the matrices are large and one can really only form the small
block :math:`\mat{d}` corresponding to the small number of control
parameters.  The following formula allow one to compute the required
components using only this small matrix, and the inverse Jacobian
(often stored in a compact dyadic representation):

.. math::
   \begin{pmatrix}
     \d{G}\\
     \d{N}
   \end{pmatrix} &= 
   \left(
     \mat{1} - \begin{pmatrix}
       \mat{0} & \mat{0}\\
       \mat{0} & \mat{d}^{-1}
     \end{pmatrix}
     \left[
       \mat{J}^{-1} - \mat{1}\right]
   \right)
   \cdot
   \begin{pmatrix}
     \d{G}\\
     \d{\mu}
   \end{pmatrix}, \\
   \begin{pmatrix}
     \d{x}\\
     \d{\mu}
   \end{pmatrix} &= 
   \mat{J}^{-1}
   \cdot
   \begin{pmatrix}
     \d{G}\\
     \d{N}
   \end{pmatrix}.

For calculational purposes we form 

.. math::
   \begin{pmatrix}
     a\\
     b
   \end{pmatrix} &= 
   \mat{J}^{-1}
   \cdot
   \begin{pmatrix}
     \d{G}\\
     \d{\mu}
   \end{pmatrix}, \\
   \d{N} &= \d{\mu} - \mat{d}^{-1}(b - \d{\mu}), \\
   \begin{pmatrix}
     \d{x}\\
     \d{\mu}
   \end{pmatrix} &= 
   \mat{J}^{-1}
   \cdot
   \begin{pmatrix}
     \d{G}\\
     \d{N}
   \end{pmatrix}.
   
.. todo:: Use :func:`scipy.linalg.fblas.dgemm` to allow matrix
   multiplications etc. to be done in place.  In general, minimize
   memory footprint and make everything work for very large matrices
   (maybe with an option if it is slower).

Sensitive Parameters
~~~~~~~~~~~~~~~~~~~~
In some situations, certain parameters may be considered "sensitive".
Typically, these are collective parameters.  For example, in problems
representing functions over a grid, one may have a couple of
parameters (such as chemical potentials) where the dependent quantity
to be set to zero is an integral of other quantities.  Premature
optimization of these parameter can cause problems.

A typical case is one where the other "local" parameters must first
achieve a certain degree of convergence before the dependent quantity
can be accurately estimated.  If one attempts to adjust the sensitive
parameter before this convergence had been achieved, then the
parameters will fluctuate wildly.

What one needs to do is to hold these sensitive parameters fixed for a
while, and then adjust them once the local parameters have settled
down.  In order to ensure that these "noisy" parameters do not mess up
the Jacobian, one must call :meth:`Broyen.update` or
:meth:`Broyden.update_J` with an appropriate argument `skip_inds` that
identifies the noisy parameters.  One may use this in conjunction with
the previous techniques to provide steps that are consistent with
holding these sensitive parameters fixed until the "noise" is
sufficiently reduced that updates can be resumed.

Characterizing this "noise" can be tricky: one might for example look
for the change in error from a previous iteration.  However, if a very
small step size has been taken (for example as a result of trying to
improve the Jacobian or because of a line-search), then this could
result in an incorrect assessment about the degree of convergence.

The simplest criterion is for the case of models where holding the
sensitive parameters fixed (and ignoring the error in this component)
will lead to a convergent iteration.  Here the natural test for"noise"
is to check for a weak level of convergence of the other parameters
(maybe in conjunction with the change in the parameter error change
from iteration to iteration).  This is equivalent to an efficient
method of solving the following recursive problem: Define an outer
problem with only the sensitive parameters.  The objective function
solves the inner problem with these parameters fixed, then the outer
problem solves for the sensitive parameters.

.. _sec-notes:

Linear Algebra
==============
Here we summarize some of the linear algebra required here.

Subspace Reduction
~~~~~~~~~~~~~~~~~~
One of the problems we shall encounter is that we have a matrix
expressed in a basis as

.. math::
   \begin{pmatrix} \ket{a_0} & \ket{a_1} \end{pmatrix}
   \begin{pmatrix} 
     \mat{c}_{11} & \mat{c}_{12}\\
     \mat{c}_{21} & \mat{c}_{22}
   \end{pmatrix}
   \begin{pmatrix} \bra{b_0} \\ \bra{b_1} \end{pmatrix}

and we need to reduce the rank of the subspaces $\ket{a_1}$ and
$\bra{b_1}$ while holding subspaces $\ket{a_0}$ and $\bra{b_0}$
fixed.  I am not yet sure what the optimal solution to this problem
is, but a very good solution can be obtained by performing two SVD's:

.. math::
   \begin{pmatrix}
     \mat{c}_{21} & \mat{c}_{22}
   \end{pmatrix}
   &= 
   \mat{q}_{a}\mat{\sigma}_{a}\mat{V}^{T}_{a},\\
   \begin{pmatrix}
     \mat{c}_{12} \\ \mat{c}_{22}
   \end{pmatrix}
   &= 
   \mat{V}_{b}
   \begin{pmatrix} 
     \mat{\sigma}_{b}\\
     \mat{0}
   \end{pmatrix}
   \mat{q}_{b}^{T}.

The truncation scheme is then to use the two transformations obtained
to determine the rearrangement.  This typically approaches the optimal
truncation to within a couple of percent (though a tuned matrix could
probably break this).

To demonstrate this, we project out one state from a 4 by 4 matrix.
We compute the optimal projection using a non-linear solver:

>>> np.random.seed(0)
>>> def my_projection(A):
...     A = np.asmatrix(A)
...     P = np.asmatrix(np.diag([1.0, 1.0, 1.0, 0.0]))
...     qa, _, _ = np.linalg.svd(A[2:,:])
...     _, _, qbt = np.linalg.svd(A[:,2:])
...     Qa = Qbt = np.eye(4)
...     Qa[2:,2:] = qa
...     Qbt[2:, 2:] = qbt
...     return Qa*P*Qa.T*A*Qbt.T*P*Qbt
>>> solver = sp.optimize.fmin_powell
>>> def optimal_projection(A):
...     P = np.asmatrix(np.diag([1.0, 1.0, 1.0, 0.0]))
...     def trans(A, x):
...         t1, t2 = x
...         qa = np.mat([[np.cos(t1), -np.sin(t1)],
...                      [np.sin(t1), np.cos(t1)]])
...         qbt = np.mat([[np.cos(t2), -np.sin(t2)],
...                       [np.sin(t2), np.cos(t2)]])
...         Qa = Qbt = np.eye(4)
...         Qa[2:,2:] = qa
...         Qbt[2:,2:] = qbt
...         return Qa.T*P*Qa*A*Qbt.T*P*Qbt
...     def f(x):
...         return np.linalg.norm(trans(A,x) - A, 2)
...     x = solver(f, [0, 0], disp=False)
...     return trans(A, x)
>>> def rel_err(A):
...     e1 = np.linalg.norm(my_projection(A) - A, 2)
...     e0 = np.linalg.norm(optimal_projection(A) - A, 2)
...     return(abs(e0-e1)/abs(e0))
>>> errs = []
>>> for n in xrange(10):
...     #print("%i of 10" % n)
...     errs.append(rel_err(np.random.rand(4,4) - np.random.rand(4,4)))
>>> np.mean(errs), np.std(errs), np.max(errs) < 0.1 # doctest: +SKIP
(0.01..., 0.02..., True)





  To do this we note that we can perform a block diagonalization
of the matrix using the SVD of the lower-right block $\mat{c}_{22} =
\mat{u}\mat{\sigma}\mat{v}^{T}$ and truncating the basis based on
this.  This follows from the following decomposition:

.. math::
   \begin{pmatrix} 
     \ket{a_0} &
     \ket{a_0}\mat{c}_{12}\mat{c}_{22}^{-1} + \ket{a_1}
   \end{pmatrix}
   \begin{pmatrix} 
     \mat{c}_{11} - \mat{c}_{12}\mat{c}_{22}^{-1}\mat{c}_{21} \\
     & \mat{c}_{22}
   \end{pmatrix}
   \begin{pmatrix}
     \bra{b_0} \\ 
     \mat{c}_{22}^{-1}\mat{c}_{21}\bra{b_0} + \bra{b_1}
   \end{pmatrix}

Notes
=====
Here we summarize some problems, convergence characteristics etc.

1) In one case we got stuck in a loop adding a point that was already
   included. What happened is that we were using
   :attr:`JinvBroyden.x_method` `='min'` which was choosing a
   particular point.  A step was made from this point producing a new
   point which was added, but the new point did not have a lower
   $\norm{G(x)}$ so the same point was used over and over again with
   no significant update to the Jacobian.

   A similar problem can occur with :attr:`JinvBroyden.replace_method`
   `='largest'`.  The resolution consists of one of the following:

   a) Using to :attr:`JinvBroyden.x_method` `= 'recent'` and
      :attr:`JinvBroyden.replace_method` `='oldest'` to ensure that
      that all the points are cycled through and that the newest point
      is used.  These are now the defaults.
   b) Using a line-search to ensure that a downhill step is chosen to
      ensure that the newest points have the smallest error.

To Do
=====
.. todo:: Check why things work better with limited memory options.
   I.e. keeping a large :attr:`JinvBroyden.n_prev` can make things worse.

"""
from __future__ import division

__all__ = ['DyadicSum', 'JinvBroyden', 'ExtendedBroyden', 
           '_JinvExtendedBroyden', 'JinvExtendedBroydenFull']

import warnings
import copy
import math

import numpy as np

# We import these here in case we want to change in future to use more
# efficient implementations.
from numpy import dot
from numpy.linalg import eigh

import scipy as sp

from zope.interface import implements

import mmf.objects
from mmf.objects import StateVars, process_vars
from mmf.objects import Required, Computed, Delegate

import mmf.math.linalg as mmf_linalg

from mmf.solve.solver_interface import IJinv


_FINFO = np.finfo(float)
_EPS = _FINFO.eps
_TINY = _FINFO.tiny
_SINGULAR_TOL = 1e-6

[docs]class DyadicSum(StateVars):
    r"""Represents a matrix as a sum of $n$ dyads of length $N$:

    .. math::
       \mat{M} = \mat{1} + \sum_{n}\ket{a_n}\sigma_n\bra{b_n}
               = \mat{1} + \ket{\mat{A}}\mat{\sigma}\bra{\mat{B}}.

    The two sets of bases are stored in $N \times n$ matrices
    $\ket{\mat{A}}$ and $\ket{\mat{B}}$ and the $n\times n$ matrix
    $\mat{\sigma}$.

    If :attr:`use_svd` is `True`, then the singular value
    decomposition of $\mat{\sigma}$ is used to keep $\mat{\sigma}$
    diagonal with positive entries and only these diagonals are stored.
    
    .. todo:: Perform an analysis of the inverse jacobian, changing bases
       if needed to make the dyads linearly independent.

    .. note:: We assume everything is real here: no complex
       conjugation is performed with transposition.
       
    Examples
    --------
    >>> np.random.seed(3)
    >>> N, n, m = 20, 5, 3
    >>> At = np.random.random((N, n+m))
    >>> B = np.random.random((n+m, N))
    >>> at = At[:, :n]
    >>> b = B[:n, :]
    >>> sigma = np.eye(n)
    >>> s = DyadicSum(at=at, b=b, sigma=sigma, n_max=n)
    >>> np.allclose(np.dot(s.at*s.sigma, s.b), np.dot(at, b))
    True

    Now we add the remaining terms.  The dyadic sum should be the
    best rank n approximation of `At*B`:

    >>> s.add_dyad(at=At[:, n:], b=B[n:, :], sigma=np.eye(m))
    >>> U, d, Vt = np.linalg.svd(np.dot(At, B))
    >>> d[n:] = 0                   # Make rank deficient approx
    >>> np.allclose(np.dot(s.at*s.sigma, s.b), np.dot(U*d, Vt))
    True

    Note that the approximation is not exact:

    >>> np.allclose(np.dot(s.at*s.sigma, s.b), np.dot(At, B))
    False

    Examples
    --------
    >>> J = DyadicSum()
    >>> x = np.mat([[1.0],
    ...             [2.0]])
    >>> J*x - x
    matrix([[ 0.],
            [ 0.]])
    >>> b = np.mat([[1.0, 0.0]])
    >>> a = np.mat([[0.0],
    ...             [2.0]])

    >>> J.add_dyad(a, b)
    >>> J*x
    matrix([[ 1.],
            [ 4.]])
    >>> x.T*J
    matrix([[ 5.,  2.]])

    >>> import mmf.archive
    >>> A = mmf.archive.Archive()
    >>> A.insert(J=J)
    'J'
    >>> s = str(A)
    >>> d = {}
    >>> exec(s, d)
    >>> np.allclose(J.asarray(), d['J'].asarray())
    True

    Here is an example of limiting the size of the dyadic.

    >>> np.random.seed(3)
    >>> s1 = np.eye(10)
    >>> s2 = DyadicSum(n_max=10)
    >>> for n in xrange(100):
    ...    a = np.random.random(10)
    ...    b = np.random.random(10)
    ...    _add_dyad(s1, a, b)
    ...    s2.add_dyad(a, b)

    This should still be exact because the space is only ten
    dimensional.

    >>> np.allclose(s1, s2.asarray())
    True

    The dynamic_range parameter is a more mathematical way of keeping
    the basis fixed.  It ensures that the most significant singular
    values will be kept.

    >>> s1 = DyadicSum(dynamic_range=0.01)
    >>> s2 = DyadicSum()
    >>> for n in xrange(100):
    ...    a = np.random.random(20)
    ...    b = np.random.random(20)
    ...    s1.add_dyad(a, b)
    ...    s2.add_dyad(a, b)
    
    >>> d1 = np.linalg.svd(s1.asarray(), compute_uv=False)
    >>> d2 = np.linalg.svd(s2.asarray(), compute_uv=False)
    >>> (abs(d1 - d2)/max(d2)).max() < 0.02
    True
    """
    _state_vars = [
        ('at', np.empty((0, 0), dtype=float),
            r"""`(N, n`) array $\ket{\mat{A}}$. We use the ket form
            here because that is most often used and more efficient
            (presently, :func:`numpy.dot` makes copies if the arrays
            are not both `C_CONTIGUOUS`."""),
        ('b', np.empty((0, 0), dtype=float),
            r"""`(n, N`) array $\bra{\mat{B}}$."""),
        ('sigma', np.empty(0, dtype=float),
            r"""`(n, n)` array $\mat{\sigma}$ or `(n, )` array
            $\diag\mat{\sigma}$ if :attr:`use_svd`."""),
        ('n_max', np.inf,
            r"""Maximum number $n$ of Dyads to store.  If this is finite,
            then the $N \times n$ matrices are pre-allocated, and when
            $m$ new dyads are added, they replace the least significant $m$
            dyads, requiring no more allocations, otherwise the dyads
            are added as required."""),
        ('inplace', False,
            r"""If :attr:`n_max` is finite, then this signifies that
            the matrices :attr:`at` and :attr:`b` should be allocated
            only once and all operations should be done inplace on
            these.  In particular, when new dyads are added, the least
            significant components of the ole matrix are dropped
            *first* then the dyads are added.

            If this is `False`, then the new dyads are added first and
            *then* the least-significant components are dropped.  This
            latter form requires more storage, but may be more
            accurate.

            .. note:: Presently, the inplace operations cannot be performed
               because `scipy` does not expose the correct BLAS
               routines, so the only purpose of this flag is to
               somewhat limit memory usage, but this is probably
               quite ineffective at the moment."""),
        ('use_svd', True,
            r"""If `True`, then :meth:`orthogonalize` will be run
            whenever a new dyad is added.  (Generally a good idea, but
            can be slow if there are a lot of dyads)."""),
        ('dynamic_range', _EPS,
            r"""Each call to :meth:`orthogonalize` will only keep the
            singular values within this fraction of the highest."""),
        ]
    process_vars()

[docs]    def __init__(self, *v, **kw):
        if 'at' in kw or 'b' in kw or 'sigma' in kw:
            self.at = np.ascontiguousarray(self.at)
            self.b = np.ascontiguousarray(self.b)
            assert(self.at.shape[0] == self.b.shape[1])
            assert ((self.at.shape[1], self.b.shape[0]) == self.sigma.shape or
                    (self.at.shape[1], self.b.shape[0]) == self.sigma.shape*2)
            self.orthogonalize()
            
        if 'n_max' in kw:
            if self.n_max < np.inf:
                self.use_svd = True
                if len(self.b) != self.n_max:
                    # Reset n_max
                    self.orthogonalize()
                    
        if ('use_svd' in kw and not self.use_svd and self.n_max < np.inf):
            raise ValueError(
                "Finite n_max = %i requires svd." % (self.n_max, ))

[docs]    def reset(self):
        """Reset the matrix.

        Examples
        --------
        >>> M = DyadicSum()
        >>> M.asarray()
        array(1.0)
        >>> M.add_dyad([1, 0], [1, 0])
        >>> M.add_dyad([0, 2], [1, 0])
        >>> M.add_dyad([3, 0], [0, 1])
        >>> M.add_dyad([0, 4], [0, 1])
        >>> M.asarray()
        array([[ 2.,  3.],
               [ 2.,  5.]])
        >>> M.reset()
        >>> M.asarray()
        array(1.0)
        """
        self.suspend()
        self.at = np.empty((0, 0), dtype=float)
        self.b = np.empty((0, 0), dtype=float)
        self.sigma = np.empty(0, dtype=float)
        self.resume()
        
[docs]    def add_dyad(self, at, b, sigma=None):
        r"""Add `|a>d<b|` to `J`.

        Parameters
        ----------
        at, b : array
           These should be either 1-d arrays  (for a single dyad) or
           2-d arrays (`(N, m)` for `at` and `(m, N)` for `b`)
           arrays representing the bra's `<a|` and `<b|`.
        sigma : array
           Matrix linking `at` and `b`.  Must be provided if `len(at.T)`
           and `len(b)` are different.

        Examples
        --------

        Here is a regression test for a bug: adding a set of dyads
        that would push the total number beyond `n_max`.

        >>> np.random.seed(0)
        >>> at = np.random.random((10, 3))
        >>> b = np.random.random((3, 10))
        >>> s = DyadicSum(n_max=2)
        >>> s.add_dyad(at=at, b=b)
        """
        # Checking
        at = np.asarray(at)
        b = np.asarray(b)
        if 1 == len(at.shape):
            at = at.reshape((len(at), 1))
        if 1 == len(b.shape):
            b = b.reshape((1, len(b)))

        a = at.T
        if sigma is None:
            assert len(a) == len(b)
            sigma = np.eye(len(a), len(b))
        else:
            sigma = np.asarray(sigma)
            if 1 == len(sigma.shape):
                sigma = np.diag(sigma)
            assert len(a) == sigma.shape[0]
            assert len(b) == sigma.shape[1]

        assert a.shape[1] == b.shape[1]
        if 0 < len(self.b):
            assert self.at.shape[0] == at.shape[0]
            assert self.b.shape[1] == b.shape[1]

            if self.n_max == np.inf or not self.inplace:
                # Just add them
                at_ = np.hstack([self.at, at])
                b_ = np.vstack([self.b, b])
                self_sigma = self.sigma
                if 1 == len(self_sigma.shape):
                    self_sigma = np.diag(self_sigma)
                sigma_ = mmf_linalg.block_diag((self_sigma, sigma))
            else:
                assert len(a) <= self.n_max
                assert len(b) <= self.n_max
                at_ = self.at
                b_ = self.b
                sigma_ = np.diag(self.sigma)

                na, nb = sigma.shape
                sigma_[-na:, -nb:] = sigma
                at_[:, -na:] = at
                b_[-nb:, :] = b
        else:
            # Dyadic sum is empty: just add the new ones
            at_ = at
            b_ = b
            sigma_ = sigma

        self.orthogonalize(at=at_, b=b_, sigma=sigma_)


[docs]    def orthogonalize(self, at=None, b=None, sigma=None):
        r"""Perform a QR decomposition of the dyadic components to
        make them orthogonal.

        Notes
        -----
        Let $\ket{\mat{A}} =$ :attr:`at` and $\bra{\mat{B}} =$
        :attr:`b` and $\mat{\sigma} =$ :attr:`sigma`:

        .. math::
           \ket{\mat{A}}\mat{\sigma}\bra{\mat{B}}
           = \ket{\uvect{Q}_a}\mat{r}_a\mat{\sigma}\mat{r}_b^T\bra{\uvect{Q}_b}
           = \ket{\uvect{Q}_a}\mat{u}\mat{d}\mat{v}^T\bra{\uvect{Q}_b}
           = \ket{\uvect{A}}\mat{d}\bra{\uvect{B}}

        In order to compute the QR factorization of $\ket{\mat{A}}$
        without working with the large matrices, we could do a Cholesky
        decomposition on

        .. math::
          \braket{\mat{A}|\mat{A}} &= \mat{r}_a^T\mat{r}_{a}\\
          \ket{\uvect{Q}} &= \ket{\mat{A}}\mat{r}_{a}^{-1}

        .. note:: These should be nicely accelerated using the BLAS
           routines for performing the operations in place.  In
           principle, :func:`numpy.dot` should do this, but at present
           there are issues with contiguity that force copies to be
           made.  Also, SciPy does not yet expose all BLAS operations,
           in particular `_SYRK` is missing.

        We perform a QR decomposition of these

        .. math::
           \mat{A} &= \mat{Q}_{A}\mat{R}_{A}, \\
           \mat{B} &= \mat{Q}_{B}\mat{R}_{B}

        and then an SVD of the product of the R factors
        $\mat{U}\mat{D}\mat{V}^{\dagger} = \mat{R}_{A}\mat{R}_{B}^T$
        to obtain the new basis:

        .. math::
           \mat{M} = \mat{A}\mat{B}^{T}
           = \mat{Q}_{A}\mat{R}_{A}\mat{R}_{B}^{T}\mat{Q}_{B}^T
           = \left(\mat{Q}_{A}\mat{U}\right)\mat{D}
             \left(\mat{V}^{\dagger}\mat{Q}_{B}^T\right)

        .. math::
           \mat{A}_{\text{new}} &= \mat{Q}_{A}\mat{U}\sqrt{\mat{D}}, \\
           \mat{B}_{\text{new}} &= \mat{Q}_{B}\mat{V}^{*}\sqrt{\mat{D}}.
        """
        if at is None:
            at = self.at
            if 0 == np.prod(at.shape):
                # Empty.  Do nothing.  This should only happen on
                # default initialization.
                return 

        if b is None:
            b = self.b
        if sigma is None:
            sigma = self.sigma
            if 1 == len(sigma.shape):
                sigma = np.diag(sigma)

        cholesky = False
        if cholesky:
            # Cholesky version: avoids working with large arrays.
            # This could be optimized using BLAS and LAPACK here for
            # both inplace operations and to avoid making copies.
            #  Right now there is no point in the added complication.
            aa = dot(at, at.T)
            bb = dot(b, b.T)
        
            la = np.linalg.cholesky(aa)
            lb = np.linalg.cholesky(bb)
            ra = la.T
            rb = lb.T
            at = sp.linalg.cho_solve(clow=(lb, True), b=at.T).T
            b = sp.linalg.cho_solve(clow=(lb, True), b=b)
        else:
            qa, ra = np.linalg.qr(at)      # at = qa*ra
            qb, rb = np.linalg.qr(b.T)     # b = rb.T*qb.T
            sigma = dot(ra, dot(sigma, rb.T))
            if self.use_svd:
                u, d_, vt = np.linalg.svd(sigma)
                max_inds = np.where(d_/d_.max() >=
                                    self.dynamic_range)[0]
                if 0 < len(max_inds):
                    max_ind = min(max_inds[-1] + 1, self.n_max)
                else:
                    max_ind = self.n_max
                max_ind = min(max_ind, len(u))
                at = dot(qa, u[:, :max_ind])
                b = dot(vt[:max_ind, :], qb.T)
                sigma = d_[:max_ind]
            else:
                at = qa
                b = qb.T

        self.__dict__['at'] = at
        self.__dict__['b'] = b
        self.__dict__['sigma'] = sigma

[docs]    def apply_transform(self, f, fr=None):
        """Apply the linear transform `f` to the vectors forming the
        dyads.  If `fr` is `None`, this is applied to the right
        (bra's), otherwise `f` is applied to both sides.
        This is equivalent to the transformation
        `J ->I  + F*(J - I)*Fr.T`.

        .. note::
           This should be modified to be done in place if possible.
        
        Examples
        --------
        """
        if fr is None:
            fr = f
        if 0 < len(self.at):
            self.suspend()
            self.at = f(self.at)
            self.b = fr(self.b.T).T
            self.resume()
            self.orthogonalize()

[docs]    def asarray(self):
        """Return the matrix.

        Examples
        --------
        >>> M = DyadicSum()
        >>> M.asarray()
        array(1.0)
        >>> M.add_dyad([1, 0], [1, 0])
        >>> M.add_dyad([0, 2], [1, 0])
        >>> M.add_dyad([3, 0], [0, 1])
        >>> M.add_dyad([0, 4], [0, 1])
        >>> M.asarray()
        array([[ 2.,  3.],
               [ 2.,  5.]])
        """
        if len(self.b) == 0:
            return np.array(1.0)

        if 1 == len(self.sigma.shape):
            M = dot(self.at*self.sigma, self.b)
        else:
            M = dot(self.at, dot(self.sigma, self.b))
        return M + np.eye(*M.shape)

[docs]    def diag(self, k=0):
        r"""Return the diagonal of the matrix.

        Examples
        --------
        >>> M = DyadicSum()
        >>> M.diag()
        array(1.0)
        >>> M.add_dyad([1, 0], [1, 0])
        >>> M.add_dyad([0, 2], [1, 0])
        >>> M.add_dyad([3, 0], [0, 1])
        >>> M.add_dyad([0, 4], [0, 1])
        >>> M.diag()
        array([ 2.,  5.])
        """
        if len(self.b) == 0:
            return np.array(1.0)

        b = self.b
        
        if 1 == len(self.sigma.shape):
            at = self.at*self.sigma
        else:
            at = dot(self.at, self.sigma)
        na = at.shape[0]
        nb = b.shape[1]
        ka = max(0, -k)
        kb = max(0, k)
        n = min(na - ka, nb - kb)
        d = (at[ka:ka+n, :].T*b[:, kb:kb+n]).sum(axis=0)
        if 0 == k:
            d += 1
        return d

    def __mul__(self, x):
        r"""Matrix multiplication: Return self*x."""
        if 0 == len(self.b):
            res = x
        else:
            shape = x.shape
            if 1 == len(shape):
                x = x.reshape((len(x), 1))
            if 1 == len(self.sigma.shape):
                res = x + dot(self.at,
                                 np.multiply(self.sigma[:, None],
                                             dot(self.b, x)))
            else:
                res = x + dot(self.at, dot(self.sigma,
                                                 dot(self.b, x)))
            res = res.reshape(shape)
        return res

    def __rmul__(self, x):
        r"""Matrix multiplication: Return x*self."""
        if 0 == len(self.b):
            res = x
        else:
            shape = x.shape
            if 1 == len(shape):
                x = x.reshape((1, len(x)))
            if 1 == len(self.sigma.shape):
                res = x + dot(
                    np.multiply(dot(x, self.at),
                                self.sigma[None, :]), self.b)
            else:
                res = x + dot(dot(dot(x, self.at),
                                        self.sigma), self.b)
            res = res.reshape(shape)
        return res
            
    def __getitem__(self, key):
        r"""Minimal support of two-dimensional indexing.

        Examples
        --------
        >>> np.random.seed(1)
        >>> d = DyadicSum()
        >>> for n in xrange(10):
        ...     d.add_dyad(np.random.rand(100), np.random.rand(100))
        
        >>> inds = np.s_[[1, 2, 3, 5, 4, -98], 20:-2]
        >>> np.allclose(d[inds],
        ...             d.asarray()[inds])
        True
        >>> inds = np.array([1, 3, 4, 2])
        >>> np.allclose(d[inds[:, None], inds[None, :]],
        ...             d.asarray()[inds[:, None], inds[None, :]])
        True

        .. warning:: Assumes that the indexing is for extracting
           sub-arrays.  This is different from numpy.

        >>> np.allclose(d[inds, inds],
        ...             d.asarray()[inds, inds])
        False
        >>> np.allclose(d[inds, inds],
        ...             d.asarray()[inds[:, None], inds[None, :]])
        True
        """
        if 2 == len(key):
            at = self.at[key[0], :]
            b = self.b[: , key[1]]
            if 2 < len(b.shape):
                b = np.rollaxis(b, 0, -1)
            if 1 == len(self.sigma.shape):
                res = dot(at*self.sigma, b).squeeze()
            else:
                res = dot(at, dot(self.sigma, b)).sqeeze()

            if res.shape == ():
                res = res.reshape((1, 1))
                              
            ka = np.arange(self.at.shape[0])[key[0]].ravel()
            kb = np.arange(self.b.shape[1])[key[1]].ravel()
            res[np.where(ka[:, None] == kb[None, :])] += 1
        else:
            raise ValueError(
                "% only supports two-dimensional indexing.  Got %s" %
                (self.__class__.__name__, repr(key)))
        return res

[docs]class JinvExtendedBroydenFull(StateVars):
    r"""Class to represent an inverse Jacobian for the Extended
    Broyden method using full matrices (for small problems only).
    """
    implements(IJinv)
    _state_vars = [
        ('curr', None,\
             r"""Index of current point in :attr`:x`.  The Jacobian
             was updated to be valid at this point.  In the process of adding a
             point, the current point may be pushed off of the stack (since only
             :attr:`n_prev` points are kept.  In this case, :attr:`curr` will be
             ``curr < -len(xs)`` (will only happen if :meth:`add_point` is
             passed the flag `replace=False`."""),
        ('xs', [],\
             r"List of previous steps $x_i$ (newest last)."),
        ('gs', [],\
             r"List of previous objectives $g_i = G(x_i)$."),
        ('j_inv', None, r"""Current approximation."""),
        ('j_inv0', 1.0,\
             r"""Convergent approximation $\mat{B}_0$ to pull back
             to."""),
        ('method', 'J',\
             r"""`'J'` for Broyden's good method 1a) and `'B'` for the \"bad\"
             method 2b)."""),
        ('n_prev', 1000, r"""Number $n$ of previous steps to
            keep and use."""),
        ('max_j_inv', 100.0,
            """If :attr:`robust`, then the inverse Jacobian is clipped
            so that the maximum value is this."""),
        ('min_j_inv', 0.0001,
            """The inverse Jacobian can become singular.  If this is
            the case, then a step might have very small length even if
            the error is not small (the step size can be reduced by a
            factor of the minimum singular value of the inverse
            Jacobian). If the step reduction is less than this value,
            then the step is scaled up to have length
            :attr:`min_step`."""),
        ('_eps', _EPS,\
             r"""Two points in :attr:`xs` are considered equal if
             the norm of their difference is less than this.  No two
             points in :attr:`xs` will have a distance less than
             this."""),
        ('_x0', None, "Internal storage for :attr:`x0`"),
        ('_g0', None, "Internal storage for :attr:`g0`"),
        ]
    process_vars()
    def __init__(self, *v, **kw):
        self.j_inv0 = np.asarray(self.j_inv0)
        if self.j_inv is None:
            self.j_inv = self.j_inv0
        else:
            self.j_inv = np.asarray(self.j_inv)
        if 0 < len(self.xs):
            N = len(self.xs[0])
            if (self.j_inv.shape != (N, N) or 
                self.j_inv0.shape != (N, N)):
                one = np.eye(N, N)
                self.j_inv0 = np.dot(self.j_inv0, one)
                self.j_inv = np.dot(self.j_inv, one)

    @property
    def x(self):
        if self.xs != [] and self.curr is not None:
            return self.xs[self.curr]
        else:
            return None

    @property
    def g(self):
        if self.gs != [] and self.curr is not None:
            return self.gs[self.curr]
        else:
            return None

    @property
    def x0(self):
        if self._x0 is None:
            return self.x
        else:
            return self._x0

    @property
    def g0(self):
        if self._g0 is None:
            return self.g
        else:
            return self._g0

    def get_w(self, dXdX, dGdG, x_scale=1):
        r"""Return weights `w, w_0, w_r`.

        Parameters
        ----------
        x_scale : array, optional
            Typical scale over which `g(x)` is expected to be linear.  This is
            used to calculate the weights `w`:

            .. math::
               w_{mn} = \delta_{mn}
                        \frac{\texttt{x\_scale}_n}
                             {\norm{\vect{x} - \vect{x}_n}}.

            Thus, thus is greater than one for points where the approximation
            should be linear and less than one elsewhere.  This only really
            affects the dynamic restart mechanism where `w_0=1`.  In
            this case, components where `w/w_0 > 1` will be maintained while
            other components will be restarted.

        Returns
        -------
        w : array
           Array of weights for `dX`. This is $\mat{w}$.
        w_r : array
           Pull back weight pulling the Jacobian inverse back to the previous
           approximation $\mat{B}_{-1}$.  This is $w_r$ (assumes $\mat{\Gamma} =
           w_r\uvect{G}_{\perp}$ for example)).  As with the regular weights,
           this only plays a role if we are employing a dynamic restart, in
           which case the Jacobian is also pulled back toward the previous value
           instead of to $\mat{B}_0$ for larger values of `w_r/w_0`.

           Typically this is small so it only serves to project the inverse when
           the other weights are zero.
        """
        w = np.diag(
            np.sqrt(np.divide(1.0, 
                              np.diag(dXdX)/x_scale**2 + _SINGULAR_TOL)))

        #w[-1,-1] = 1
        #if 1 < len(w):
        #    w[-2, -2] = 1
        #if 2 < len(w):
        #    w[-3, -3] = 1
        #w = dXdX[-1,-1]/(np.diag(dXdX) + _SINGULAR_TOL)
        w_r = np.sqrt(_SINGULAR_TOL)
        return w, w_r

    def __mul__(self, dg, mutate=False):
        r"""Return the matrix multiplication `j_inv*dg`.

        Parameters
        ----------
        dg : array
           This should be in column vector format with dimension
           `dg.shape = (N,_)`.
        mutate : bool
           If `True`, then `dg` will be mutated, otherwise a copy will
           be returned.  (Note: with the present implementation of
           :func:`numpy.dot`, one copy of `dg` will be made).
        """
        
        res = np.dot(self.j_inv, dg)
        if mutate:
            # Should use BLAS if we can... don't know how yet.
            dg[::] = res
        return res

    def __rmul__(self, dxt, mutate=False):
        r"""Return the matrix multiplication `dxt*j_inv`.

        Parameters
        ----------
        dxt : array
           This should be in row vector format with dimension
           `dg.shape = (_,N)`.
        mutate : bool
           If `True`, then `dxt` will be mutated, otherwise a copy will
           be returned.  (Note: with the present implementation of
           :func:`numpy.dot`, one copy of `dxt` will be made).
        """
        res = np.dot(dxt, self.j_inv)
        if mutate:
            # Should use BLAS if we can... don't know how yet.
            dxt[::] = res
        return res

    def reset(self):
        r"""Reset the Jacobian to a diagonal form."""
        self.j_inv = self.j_inv0

    def set_point(self, x, g, **kw):
        r"""Add `x` and `g` and make this the current point `(x, g)` and update
        the Jacobian to be valid here.

        Parameters
        ----------
        x, g : arrays
        """
        x0 = self.x
        g0 = self.g

        x = np.asarray(x).ravel()
        g = np.asarray(g).ravel()

        self.curr = self.add_point(x, g)
        self.update(**kw)
        return self.incompatibility(x, g, x0, g0)
        
    def add_point(self, x, g, replace=True):
        r"""Add the point `(x, g)` to the Jacobian information.  Does not
        necessarily update the Jacobian (use :meth:`update`) but may.  Does not
        change the current point (use :meth:`set_point`).

        Parameters
        ----------
        x, g : arrays
        replace : bool (optional)
           If `True`, then any points in :attr:`xs` with distance less 
           than :attr:`_eps` from `x` (`inf` norm) are removed before
           the new point is added.  (Could be useful if points need to
           be recomputed at a higher tolerance for example.)
           
           If `False`, then new point is only added if there are no
           points within distance :attr:`_eps` from `x` and a
           :exc:`ValueError` exception is raised if `g` is not within
           distance :attr:`_eps` from the corresponding value in
           :attr:`gs`.  The old points are not moved in this case.
        """
        x = np.asarray(x).ravel()
        g = np.asarray(g).ravel()

        xs = self.xs
        gs = self.gs
        norms = np.array([abs(x - _x).max() for _x in self.xs])
        inds = np.where(norms < self._eps)[0]
        if 0 == len(inds):
            xs.append(x)
            gs.append(g)
            curr = -1
        else:
            curr = inds[-1] - len(xs)
            if replace:
                for ind in reversed(inds):
                    del xs[ind]
                    del gs[ind]
                xs.append(x)
                gs.append(g)
                curr = -1
            else:
                g_inds = np.where([abs(g - _g) for _g in self.gs[inds]])[0]
                if 0 == len(g_inds):
                    pass
                else:
                    raise ValueError(
                        ("New `x` matches current `xs%s` but `g` does " + 
                         "not match `gs%s`") % (str(inds), str(g_inds)))
        self.suspend()
        n_prev = self.n_prev
        self.xs = self.xs[-n_prev:]
        self.gs = self.gs[-n_prev:]
        self.resume()
        return curr

    def update(self, x0=None, g0=None, 
               skip_inds=None, include_skip=False, mags=1, warn=False,
               restart=False, x_scale=1.0):
        r"""Update the inverse Jacobian so that it is valid at the
        point `(x, g)` (the current point by default).  Does not necessarily add
        `x` and `g` (use :meth:`add_point`) or set the current point (use
        :meth:`set_point`) but may do either or both depending on the algorithm.
        
        Return a measure of the incompatibility of the new data with
        the current approximation. (Not implemented yet.)

        Parameters
        ----------
        restart : bool
           If `True`, then the weight `w_0=1` is used to dynamically restart the
           calculation by pulling the inverse Jacobian back to `B_0`.

        Examples
        --------
        Here we perform some tests with a linear function

        >>> np.random.seed(0)
        >>> N = 4
        >>> J = np.random.rand(N, N)
        >>> x0 = np.random.rand(N)
        >>> x = np.random.rand(N)
        >>> def g(x):
        ...     return np.dot(J, x - x0)

        >>> B = JinvExtendedBroydenFull(method='J')
        >>> B.add_point(x0, g(x0))
        -1
        >>> B.set_point(x, g(x))
        0
        >>> np.allclose(B*(g(x) - g(x0)), x - x0)
        True

        >>> for n in xrange(N):
        ...     _x = np.random.rand(N)
        ...     incompatibility = B.set_point(_x, g(_x));

        >>> np.allclose(B.j_inv, np.linalg.inv(J))
        True

        >>> B = JinvExtendedBroydenFull(method='B')
        >>> B.add_point(x0, g(x0))
        -1
        >>> B.set_point(x, g(x))
        0
        >>> np.allclose(B*(g(x) - g(x0)), x - x0)
        True

        >>> for n in xrange(N):
        ...     _x = np.random.rand(N)
        ...     incompatibility = B.set_point(_x, g(_x));

        >>> np.allclose(B.j_inv, np.linalg.inv(J))
        True
        """
        if x0 is None:
            x0 = self.x
            g0 = self.g
            self._x0 = None
            self._g0 = None
        else:
            x0 = self._x0 = np.asarray(x0).ravel()
            g0 = self._g0 = np.asarray(g0).ravel()

        inds = np.where(np.array([abs(x0 - _x).max() for _x in self.xs])
                        >= self._eps)[0]
        if 0 == len(inds):
            # Can't update as there is no information
            pass
        else:
            j_inv = np.asarray(self.j_inv)
            j_inv0 = np.asarray(self.j_inv0)

            dX = x0[:,None] - np.vstack(self.xs).T[:,inds]
            dG = g0[:,None] - np.vstack(self.gs).T[:,inds]

            one = np.eye(len(x0))

            # get weights and apply
            dXdX = np.dot(dX.T, dX)
            dGdG = np.dot(dG.T, dG)
            w, w_r = self.get_w(dGdG=dGdG, dXdX=dXdX, x_scale=x_scale)
            if restart:
                w_0 = 1.0
            else:
                w_0 = 0.0
            dG = np.dot(dG, w)
            dX = np.dot(dX, w)

            BdG = np.dot(j_inv, dG)
            numer = dX - BdG
            # Form denominator and correct numerator
            if 'J' == self.method:
                # Broyden's good method
                _U, _d, _Vt = np.linalg.svd(dX, full_matrices=True)
                Gamma = _U[:,len(np.where(_d >= _SINGULAR_TOL)[0]):]
                
                # j_inv*J0
                _tmp = np.linalg.solve(j_inv0.T, j_inv.T).T
                denom = (np.dot(BdG, dX.T) 
                         + w_r**2*np.dot(Gamma, Gamma.T)
                         + w_0**2*_tmp)
                numer = np.dot(numer, dX.T) + (one - _tmp)*w_0**2
            else:
                # Broyden's "bad" method
                _U, _d, _Vt = np.linalg.svd(dG, full_matrices=True)
                Gamma = _U[:,len(np.where(_d >= _SINGULAR_TOL)[0]):]
                denom = (np.dot(dG, dG.T) 
                         + w_r**2*np.dot(Gamma, Gamma.T) 
                         + w_0**2*one)
                numer = np.dot(numer, dG.T) + (j_inv0 - j_inv)*w_0**2

            # Invert denominator and multiply using SVD.  This is where the
            # appropriate choice of Gamma must be made.
            u, d, vt = np.linalg.svd(denom)
            di = np.divide(1, d)
            bad_inds = np.where(d < _SINGULAR_TOL)
            if False:
                # Subspace decomposition method
                di[bad_inds] = 1
            else:
                # Subspace ignoring method???
                di[bad_inds] = 0

            denom_inv =  np.dot(vt.T*di, u.T)
            if 'J' == self.method:
                # Broyden's "good" method
                denom_inv = np.dot(denom_inv, j_inv)
            j_inv_new = j_inv + np.dot(numer, denom_inv)

            # Remove bad components.
            u, d, vt = np.linalg.svd(j_inv_new)
            bad_inds = np.where(np.logical_or(d < _SINGULAR_TOL, 
                                              d > 1/_SINGULAR_TOL))
            d[bad_inds] = 1
            j_inv_new = np.dot(u*d, vt)

            # Update variables
            self.j_inv = j_inv_new

    def incompatibility(self, x, g, x0, g0):
        r"""Return a dimensionless measure of the compatibility of a step from
        `x0` to `x` with the current Jacobian.  An incompatibility of
        0 means perfectly compatible (this is what should happen if
        the function is linear and the Jacobian is correct along the
        specified step).
        """
        return 0
        raise NotImplementedError
        
    def asarray(self):
        """Return the Jacobian matrix.

        Examples
        --------
        >>> M = JinvExtendedBroydenFull()
        >>> M.asarray()
        array(1.0)
        """
        return np.array(self.j_inv)

    def change_basis(self, f, f_inv=None):
        if 0 == len(self.xs):
            return

        self.suspend()
        # gs must be G(xs), can't use transformed value.
        self.xs = []
        self.gs = []
        
        u, d, vt = np.linalg.svd(self.j_inv)
        u = f(u)
        if f_inv is None:
            vt = f(vt.T).T
        else:
            vt = f_inv(vt)

        self.j_inv = np.dot(u*d, vt)

        u, d, vt = np.linalg.svd(self.j_inv0)
        u = f(u)
        if f_inv is None:
            vt = f(vt.T).T
        else:
            vt = f_inv(vt)

        self.j_inv0 = np.dot(u*d, vt)
        self.resume()

    def get_x(self, x_controls=None,  min_step=_EPS, max_step=np.inf):
        r"""Return `x`: the Broyden step to get to the root of `G(x)`.

        This is the downhill Newton step from the point specified by
        :attr:`curr` that satisfies the constraints `x_controls`.

        Parameters
        ----------
        x_controls : (array, array), optional
           Tuple `(inds, vals)` indices and the respective values of the
           control parameters.
        min_step : float, optional
           Minimum step length for testing.  Used to give some
           non-zero step `dx` of length `sqrt(min_step) even if
           `j_inv` is singular.  This may be random.  A warning will
           be raised.
        max_step : float, optional
           Maximum length of the step.
           
        Examples
        --------
        Here is a little test with a linear function

        >>> j = np.array([[1., 2, 3], [1, -3, 2], [-2, 1, 4]])
        >>> j_inv = JinvExtendedBroydenFull(j_inv=np.linalg.inv(j))
        >>> sol = np.array([1, 2, 3])
        >>> def G(x):
        ...     return np.dot(j, x - sol)

        >>> x0 = np.array([0.0, 0.0, 0.0])
        >>> x = np.array([1.0, 2.0, 3.0])

        Now try to step to the solution, but hold the second variable as
        a control at 1.  Since the function is exactly linear and we
        have explicitly given the correct Jacobian, we should end up
        with a root in the other two parameters.

        >>> j_inv.set_point(x=x, g=G(x))
        0
        >>> x = j_inv.get_x(x_controls=([1], [1.0]))
        >>> x
        array([ 1.5,  1. ,  3.5])
        >>> np.allclose(G(x), [0., 4.5, 0.])
        True

        This needs to work with very large arrays, so here is an
        example.  Here we use an example where the vectors have length
        $2^{20}$ and we have $2^{11} = 2048$ controls.  Implemented poorly,
        the arrays would be too large to be indexed.

        >> np.random.seed(0)
        >> n_x = 2**20
        >> n_d = 2**11
        >> a = np.random.random(n_x)
        >> b = np.random.random(n_x)
        >> g = np.random.random(n_x)
        >> x = np.zeros(n_x)
        >> inds = np.unique(np.random.randint(0, n_x, size=n_d))
        >> n_d = len(inds)
        >> vals = np.random.rand(n_d)
        >> x_controls = (inds, vals)

        >> j_inv = JinvExtendedBroydenFull(dyadic=True)
        >> j_inv.j_inv.add_dyad(a, b)
        >> j_inv.set_point(x=a, g=b)
        0.0
        >> x = j_inv.get_x(x_controls=x_controls)
        >> np.allclose(x[inds], vals)
        True
        """
        if 0 == len(self.xs):
            raise ValueError("No `xs` stored: call update first.")

        x = self.x
        g = self.g
        
        j_inv = self.j_inv
        if x_controls is not None and 0 < len(x_controls[0]):
            inds, vals = x_controls
            inds = np.asarray(inds, dtype=int)
            vals = np.asarray(vals)
            dg_dmu = copy.copy(g)

            d = j_inv[inds[:, None], inds[None, :]]

            dg_dmu[inds] = x[inds] - vals 
            dmu = _ket(dg_dmu[inds])
            b = (np.asarray(j_inv*_ket(dg_dmu)))[inds, :]
            dN = dmu - np.linalg.solve(d, b - dmu)
            dg_dN = dg_dmu
            dg_dN[inds] = dN[:]
            dx_dmu = np.asarray(j_inv*_ket(dg_dN))

            assert np.allclose(dx_dmu[inds], dmu)
            # Set these to make sure no round-off errors have
            # accumulated.
            dx_dmu[inds].flat = x[inds] - vals

            dx = -dx_dmu
        else:
            dx = -np.asarray(j_inv*_ket(g))
        dx = np.asarray(dx).ravel()
        g_mag = np.linalg.norm(g)
        dx_mag = np.linalg.norm(dx)
        if dx_mag == 0 and g_mag > 0:
            warnings.warn("Warning: Singular Jinv, taking random step")
            dx = np.random.rand(len(dx))*np.sqrt(min_step)
        elif np.divide(dx_mag, g_mag) < self.min_j_inv:
            warnings.warn("Warning: Singular Jinv, taking minimum step")
            dx *= math.sqrt(min_step)/np.linalg.norm(dx)

        dx_len = math.sqrt((dx*dx).sum())
        if max_step < dx_len:
            dx *= max_step/dx_len

        return x + dx

[docs]class _JinvExtendedBroyden(StateVars):
    r"""Class to represent an inverse Jacobian for the Extended
    Broyden method.  Presently, this class does not do any of the
    special processing to minimize the memory footprint described in
    the module description.

    We represent the inverse Jacobian as:

    .. math::
       \mat{B} = \mat{1} +
         \ket{\uvect{\alpha}}\mat{b}\bra{\uvect{\beta}}

    Notes
    -----    
    Since python is implemented in C, the arrays are typically stored
    as `C_CONTIGUOUS` and :func:`numpy.dot` is efficient if both
    arguments are `C_CONTIGUOUS`, otherwise copies are made.  Thus,
    where one needs to compute $\braket{\mat{A}|\mat{B}}$, one should
    store the transpose of the first matrix $\bra{\mat{A}}$.

    To Do
    -----
    .. todo:: Reduce memory footprint.  In particular, one can make
       use of set of previous points $\ket{\mat{X}}$ and function
       evaluations $\ket{\mat{G}}$ to reduce the space required by
       the dyadic basis $\ket{\uvect{\alpha}}$ and $\ket{\uvect{\beta}}$.
    """
    implements(IJinv)
    _state_vars = [
        ('X', Required, 
            r"$N\times n$ array $\ket{\mat{X}}$ of previous steps $x_i$."),
        ('G', Required, 
            r"$N\times n$ array $\ket{\mat{G}}$ of objectives $G(x_i)$."),
        ('_jinv', Delegate(DyadicSum)),
        ('_b', None,
            r"Dyadic portion of the inverse Jacobian in the implied basis."),
        ('method', 1, 
         r"""`1` for Broyden's good method and `2` for the \"bad\"
            method."""),
        ('n_prev_steps', np.inf, r"""Number $n$ of previous steps to
            keep and use."""),
        ('_check', True, r"""If `True`, perform some potentially
            expensive sanity checks."""),
        ]
    process_vars()
[docs]    def __init__(self, *v, **kw):
        N, n = self.X.shape
        assert self.G.shape == self.X.shape
        if self._alpha is None:
            self._alpha = np.zeros((N, 0), dtype=float)
            self._betat = np.zeros((0, N), dtype=float)

        m = self._alpha.shape[1]
        assert m == self._betat.shape[0]
        assert self._alpha.shape == (N, m)
        assert self._betat.shape == (m, N)

        if self._check:
            # Check orthonormality of basese
            assert np.allclose(np.dot(self.betat, self.alpha),
                               np.eye(m))
            
        if self._b is None:
            self._b = np.zeros((m, m), dtype=float)
            
        assert self._b.shape == (m, m)
            
    def __mul__(self, dg, mutate=False):
        r"""Return the matrix multiplication `B*dg`.

        Parameters
        ----------
        dg : array
           This should be in column vector format with dimension
           `dg.shape = (N,_)`.
        mutate : bool
           If `True`, then `dg` will be mutated, otherwise a copy will
           be returned.  (Note: with the present implementation of
           :func:`numpy.dot`, one copy of `dg` will be made).

        Notes
        -----
        .. math::
           \mat{B}\ket{\d{g}} = \ket{\d{g}} +
           \begin{pmatrix}
             \ket{G - X}\mat{q}_{a} &
             \ket{\uvect{a}}
           \end{pmatrix}
           \mat{b}
           \begin{pmatrix}
             \mat{q}_{b}\bra{X,G}\\
             \bra{\uvect{b}}
           \end{pmatrix}\ket{\d{g}}
        """
        dg = np.asarray(dg)
        if 1 == len(dg.shape):
            dg = dg[:, None]
        na = self._qa.shape[1]
        ma = self._a.shape[1]
        if 1 == self.method:
            # The default dot copies dg here, so hopefully it is small.
            XGdg = dot(dg.T, self.X).T
        else:
            assert 2 == self.method
            XGdg = dot(dg.T, self.G).T

        tmp = dot(self._b, np.vstack([dot(self._qb, XGdg), 
                                      dot(self._bt, dg)]))
        tmp1 = dot(self._qa, tmp[:na,:])
        tmp2 = tmp[na:,:]
        if mutate:
            res = dg
        else:
            res = 1*dg

        res += np.dot(self.G, tmp1)
        res -= np.dot(self.X, tmp1)
        res += np.dot(self._a, tmp2)
        return res

    def __rmul__(self, dxt, mutate=False):
        r"""Return the matrix multiplication `dxt*B`.

        Parameters
        ----------
        dxt : array
           This should be in row vector format with dimension
           `dg.shape = (_,N)`.
        mutate : bool
           If `True`, then `dxt` will be mutated, otherwise a copy will
           be returned.  (Note: with the present implementation of
           :func:`numpy.dot`, one copy of `dxt` will be made).

        Notes
        -----
        .. math::
           \bra{\d{x}}\mat{B} = \bra{\d{x}} +
           \bra{\d{x}}
           \begin{pmatrix}
             \ket{G - X}\mat{q}_{a} &
             \ket{\uvect{a}}
           \end{pmatrix}
           \mat{b}
           \begin{pmatrix}
             \mat{q}_{b}\bra{X,G}\\
             \bra{\uvect{b}}
           \end{pmatrix}
        """
        nb = self._qb.shape[0]
        mb = self._bt.shape[0]
        tmp = dot(np.hstack([dot(dot(dxt, self.G) - dot(dxt, self.X),
                                 self._qa),
                             dot(dxt, self._a)]), self._b)

        tmp1 = dot(tmp[:, :nb], self._qb)
        tmp2 = tmp[:, nb:]
        
        if 1 == self.method:
            # The default dot copies dg here, so hopefully it is small.
            tmp1 = dot(self.X, tmp1.T).T
        else:
            assert 2 == self.method
            tmp1 = dot(self.G, tmp1.T).T

        if mutate:
            res = dxt
        else:
            res = 1*dxt

        res += tmp1
        res += np.dot(tmp2, self._bt)
        return res

[docs]    def update(self, x0=None, g0=None):
        r"""Update the inverse Jacobian so that it is valid at the
        point `(x, g)`.
        
        Return a measure of the incompatibility of the new data with
        the current approximation.

        Examples
        --------
        Here we perform some tests with a linear function

        >>> np.random.seed(0)
        >>> N = 4
        >>> J = np.random.rand(N, N)
        >>> x0 = np.random.rand(N)
        >>> def g(x):
        ...     return np.dot(J, x - x0)
        >>> B = _JinvExtendedBroyden(X=np.zeros((N,1)),
        ...                         G=g(0*x0)[:,None])
        >>> x = np.random.rand(N)
        >>> B.update(x, g(x))
        >>> np.allclose(B*(g(x) - g(x0)), x - x0)
        True
        >>> for n in xrange(N):
        ...     x = np.random.rand(N)
        ...     B.update(x, g(x))

        Notes
        -----
        Our goal here is to minimize the amount of processing of large
        matrices.  As such, we try to work with projections as much as
        possible.  We start with the following large matrices:        
        $\ket{X}$, $\ket{G}$, $\ket{x}$, $\ket{g}$, $\ket{a_0}$,
        $\ket{b_0}$.
        
        We use the following:

        .. math::
           \ket{\delta\mat{x}} &= \ket{x} - \ket{X_0}, \qquad
           \ket{\delta\mat{g}} = \ket{g} - \ket{G_0},\\
           \ket{\d\mat{X}} &= \ket{\mat{X}} - \ket{x}, \qquad
           \ket{\d\mat{G}} = \ket{\mat{G}} - \ket{g},\\
           \ket{\uvect{\alpha}_0} &= \begin{pmatrix}
               \ket{\mat{G} - \mat{X}}\mat{q}_{a} &
               \ket{\mat{A}_0}
             \end{pmatrix},\\
           \bra{\uvect{\beta}_0} &= \begin{pmatrix}
               \mat{q}_{b}\bra{\mat{X}, \mat{G}} \\
               \bra{\mat{B}_0}
             \end{pmatrix},\\
           \ket{\uvect{\alpha}} &= \begin{pmatrix}
               \ket{\uvect{\alpha}_0} &
               \ket{\uvect{\alpha}_1}
             \end{pmatrix},\\
           \bra{\uvect{\beta}} &= \begin{pmatrix}
               \bra{\uvect{\beta}_0}\\
               \bra{\uvect{\beta}_1}
             \end{pmatrix},\\
           \ket{\uvect{\alpha}_1} &= 
             \ket{\uvect{\alpha}_0}\mat{c}_{a} +
              \ket{\delta{g} - \delta{x}}\mat{d}_{a},\\
           \bra{\uvect{\beta}_1} &= 
             \mat{c}_{b}^{T}\bra{\uvect{\beta}_0} +
               \mat{d}_{b}^{T}\bra{\delta{x}, \delta{g}},\\
           \mat{d}_{a} &= \mat{v}_{a}\mat{s}_{a}^{-1},\qquad
           \mat{c}_{a} = 
             -\braket{\uvect{\alpha}_0|\delta{g} - \delta{x}}
                 \mat{v}_{a}\mat{s}_{a}^{-1},\\
           \mat{d}_{b} &= \mat{v}_{b}\mat{s}_{b}^{-1},\qquad
           \mat{c}_{b} = 
             -\braket{\uvect{\beta}_0|\delta{x},\delta{g}}
                 \mat{v}_{b}\mat{s}_{b}^{-1},\\
           \mat{v}_{a}\mat{s}_{a}^2\mat{v}_{a}^{T} &=
           \braket{\delta{g} - \delta{x}|\delta{g} - \delta{x}}
             - \braket{\delta{g} - \delta{x}|\uvect{\alpha}_0}
               \braket{\uvect{\alpha}_0|\delta{g} - \delta{x}},\\
           \mat{v}_{b}\mat{s}_{b}^2\mat{v}_{b}^{T} &=
           \braket{\delta{x},\delta{g}|\delta{x},\delta{g}}
             - \braket{\delta{x},\delta{g}|\uvect{\beta}_0}
               \braket{\uvect{\beta}_0|\delta{x},\delta{g}},\\
           \braket{\d\mat{G}-\d\mat{X}|\uvect{\alpha}} &= \begin{pmatrix}
             \braket{\d\mat{G}-\d\mat{X}|\uvect{\alpha}_0} &
             \braket{\d\mat{G}-\d\mat{X}|\uvect{\alpha}_1}
           \end{pmatrix},\\
           \braket{\d\mat{G} - \d\mat{X}|\uvect{\alpha}_0} &= \begin{pmatrix}
             \braket{\d\mat{G} - \d\mat{X}|\mat{G} - \mat{X}}\mat{q}_{a} &
             \braket{\d\mat{G} - \d\mat{X}|\mat{A}_0}
           \end{pmatrix},\\
           \braket{\d\mat{G} - \d\mat{X}|\uvect{\alpha}_1} &= 
             \braket{\d\mat{G} - \d\mat{X}|\uvect{\alpha}_0}\mat{c}_{a} +
             \braket{\d\mat{G} - \d\mat{X}|\delta{g}-\delta{x}}\mat{d}_{a},\\
           \braket{\d\mat{X},\d\mat{G}|\uvect{\beta}} &= \begin{pmatrix}
             \braket{\d\mat{X},\d\mat{G}|\uvect{\beta}_0} &
             \braket{\d\mat{X},\d\mat{G}|\uvect{\beta}_1}
             \end{pmatrix} ,\\
           \braket{\d\mat{X},\d\mat{G}|\uvect{\beta}_0} &= \begin{pmatrix}
             \braket{\d\mat{X},\d\mat{G}|\mat{X},\mat{G}}\mat{q}_{b}^{T} &
             \braket{\d\mat{X},\d\mat{G}|\mat{B}_0}
           \end{pmatrix},\\
           \braket{\d\mat{X},\d\mat{G}|\uvect{\beta}_1} &= 
             \braket{\d\mat{X},\d\mat{G}|\uvect{\beta}_0}\mat{c}_{b} +
             \braket{\d\mat{X},\d\mat{G}|\delta{x},\delta{g}}\mat{d}_{b},\\
           \braket{\uvect{\alpha}|\uvect{\alpha}_0} 
             &= \braket{\uvect{\beta}|\uvect{\beta}_0} = \begin{pmatrix}
             \mat{1}\\ \mat{0}
             \end{pmatrix},\\
           \braket{\beta|\alpha} &= 
           \begin{pmatrix}
             \mat{q}_{b}\braket{\mat{X}, \mat{G}|
               \mat{G} - \mat{X}}\mat{q}_{a} &
             \mat{q}_{b}\braket{\mat{X}, \mat{G}|
               \mat{G} - \mat{X}}\mat{q}_{a} \\
           \end{pmatrix}
        """
        # First we cast all of the arrays as matrices so we can
        # use * as matrix multiplication.  Large matrices are
        # preceeded by an underscore.
        x = np.asarray(x0)
        g = np.asarray(g0)
        if 1 == len(x.shape):
            x = x[:,None]
        if 1 == len(g.shape):
            g = g[:,None]
        _X = np.asmatrix(self.X)
        _G = np.asmatrix(self.G)
        _b0 = np.asmatrix(self._bt.T)
        _a0 = np.asmatrix(self._a)
        qa = np.asmatrix(self._qa)
        qb = np.asmatrix(self._qb)

        # We assume that g is only one column, so a copy here is okay.
        _dg = np.asmatrix(g - _G[:,0])
        _dx = np.asmatrix(x - _X[:,0])
        _dgdx = _dg - _dx

        # Consider optimizing these copies away later
        _dG = np.asmatrix(_G - g)
        _dX = np.asmatrix(_X - x)
        _dGdX = _dG - _dX
        _GX = np.asmatrix(_G - _X)
        if 1 == self.method:
            _dxg = _dx
            _XG = np.asmatrix(_X)
            _dXG = _dX
        else:
            _dxg = _dg
            _XG = np.asmatrix(_G)
            _dXG = _dG

        # These are all of the brackets required.
        dXdX = _dX.T*_dX
        dGdG = _dG.T*_dG
        dXG_XG = _dXG.T*_XG
        dXG_dxg = _dXG.T*_dxg
        dXG_b0 = _dXG.T*_b0

        dxg_XG = _dxg.T*_XG
        dxg_GX = _dxg.T*_GX
        dxg_dxg = _dxg.T*_dxg
        dxg_b0 = _dxg.T*_b0
        dxg_a0 = _dxg.T*_a0

        dGdX_a0 = _dGdX.T*_a0
        GX_a0 = _GX.T*_a0
        XG_GX = _XG.T*_GX
        X_a0 = _X.T*_a0
        g_a0 = g.T*_a0
        x_a0 = x.T*_a0
        XG_a0 = _XG.T*_a0
        b0_a0 = _b0.T*_a0
        GX_b0 = _GX.T*_b0

        dGdX_GX = _dGdX.T*_GX
        dGdX_dgdx = _dGdX.T*_dgdx
        dgdx_GX = _dgdx.T*_GX
        dgdx_XG = _dgdx.T*_XG
        dgdx_a0 = _dgdx.T*_a0
        dxg_dgdx = _dxg.T*_dgdx
        dgdx_dgdx = _dgdx.T*_dgdx
        dgdx_b0 = _dgdx.T*_b0

        # Now we form the small matrices
        dxg_beta0 = np.hstack([dxg_XG*qb.T, dxg_b0])
        ub, sb, _ = np.linalg.svd(dxg_dxg - dxg_beta0*dxg_beta0.T)

        # Only keep significant basis vectors
        inds = np.where(sb >= _SINGULAR_TOL)[0]
        sb = sb[inds]
        ub = ub[:, inds]

        db = np.asmatrix(ub/(np.sqrt(sb)[None, :] + _SINGULAR_TOL**2))
        cb = -dxg_beta0.T*db

        dXG_beta0 = np.hstack([dXG_XG*qb.T, dXG_b0])
        dXG_beta1 = dXG_beta0*cb + dXG_dxg*db
        dXG_beta = np.hstack([dXG_beta0, dXG_beta1])

        dGdX_a0 = GX_a0 - (g_a0 - x_a0)
        dgdx_alpha0 = np.hstack([dgdx_GX*qa, dgdx_a0])
        ua, sa, _ = np.linalg.svd(dgdx_dgdx -
                                  dgdx_alpha0*dgdx_alpha0.T)

        # Only keep significant basis vectors (assuming everything has
        # been scaled to be of unit norm here...)
        inds = np.where(sa >= _SINGULAR_TOL)[0]
        sa = sa[inds]
        ua = ua[:, inds]
        da = np.asmatrix(ua/(np.sqrt(sa)[None, :] + _SINGULAR_TOL**2))
        ca = -dgdx_alpha0.T*da

        dGdX_alpha0 = np.hstack([dGdX_GX*qa, dGdX_a0])
        dGdX_alpha1 = dGdX_alpha0*ca + dGdX_dgdx*da
        dGdX_alpha = np.hstack([dGdX_alpha0, dGdX_alpha1])

        # the basis vectors alpha and beta are stacked from three sets
        # with sizes n, m0 and (m-m0)
        N, n_ = self.X.shape
        na0, na = self._qa.shape
        nb, nb0 = self._qb.shape

        ma0 = self._a.shape[1]  # Previous size of aux space
        mb0 = self._bt.shape[0]

        ma = ma0 + ca.shape[1]  # New size of aux space
        mb = mb0 + cb.shape[1]

        eye0 = np.asmatrix(np.eye(na + ma0, nb + mb0, dtype=float))
        eye1 = np.asmatrix(np.eye(na + ma, nb + mb, dtype=float))

        alpha_alpha0 = np.asmatrix(np.eye(na + ma, na + ma0, dtype=float))
        beta_beta0 = np.asmatrix(np.eye(nb + mb, nb + mb0, dtype=float))

        beta0_dgdx = np.vstack([qb*dgdx_XG.T,
                                dgdx_b0.T])
        dxg_alpha0 = np.hstack([dxg_GX*qa, dxg_a0])

        beta0_alpha0 = np.bmat(
            [[qb*XG_GX*qa, qb*XG_a0],
             [GX_b0.T*qa, b0_a0]])
        beta0_alpha1 = beta0_alpha0*ca + beta0_dgdx*da
        beta1_alpha0 = cb.T*beta0_alpha0 + db.T*dxg_alpha0
        beta1_alpha1 = (cb.T*(beta0_alpha0*ca + beta0_dgdx*da) +
                        db.T*(dxg_alpha0*ca + dxg_dgdx*da))
        beta_alpha = np.bmat([[beta0_alpha0, beta0_alpha1],
                              [beta1_alpha0, beta1_alpha1]])

        ############################################################
        # Now we have the bases and all the required matrices, so we
        # do the real work.
        def inv(A):
            if 0 == np.prod(A.shape):
                # Empty matrix needs a degenerate case
                return A.T
            return np.linalg.pinv(A, rcond=_SINGULAR_TOL**2)

        w, w0, wr = self.get_w(dXdX, dGdG)
        if w is not None:
            w = np.asarray(w)
            if 1 == len(w.shape):
                dGdX_alpha *= w[:, None]
                dXG_beta *= w[:, None]
            else:
                dGdX_alpha = dot(w.T, dGdX_alpha)
                dXG_beta = dot(w.T, dXG_beta)
        
                
        if 1 == self.method:
            num = dGdX_alpha.T*dXG_beta
            if self._b is not None:
                j0 = -inv(inv(self._b) + beta0_alpha0)
                num += w0*w0*alpha_alpha0*j0*beta_beta0.T
            denom = dXG_beta.T*dXG_beta + (w0*w0 + wr*wr)*np.eye(nb+mb)
            j = num*inv(denom)
            b = -inv(inv(j) + beta_alpha)
        elif 2 == self.method:
            num = -dGdX_alpha.T*dXG_beta
            if self._b is not None:
                num += w0*w0*alpha_alpha0*self._b*beta_beta0.T
            denom = dXG_beta.T*dXG_beta + (w0*w0 + wr*wr)*np.eye(nb+mb)
            b = num*inv(denom)

        # Set singular values to zero so they don't contribute.
        if 0 == np.prod(b.shape):
            # There is no dyadic contribution.
            b = None
            _a = None
            _bt = None
        else:
            u, d, vt = np.linalg.svd(b, full_matrices=False)
            bad_inds = np.where(np.logical_or(d < _SINGULAR_TOL,
                                          1/_SINGULAR_TOL < d))[0]
            d[bad_inds] = 0.0
            b = dot(np.multiply(u,d[None, :]), vt)

            # Now that we have b, we perform an svd in the auxiliary basis to
            # determine what basis vectors to keep.
            qa_, qbt_ = self._restrict(b, na, nb)
            b[na:, nb:] = qa_*b[na:, nb:]*qbt_

            #            na + ma0    ma - ma0
            # |alpha> = [alpha0, alpha0*ca + dgdx*da]
            #         = [GX*qa, [a0, [GX*qa, a0]*ca + dgdx*da]]
            #         = [GX*qa, A_new]
            #         = [GX*qa, [GX, a0, dgdx]*A]
            # |A_new> = [GX, a0, dgdx]*A
            # A_[na_ + ma0 + (ma - ma0), ma = ma0 + (ma-ma0)]
            # A = [[0, qa*ca]
            #      [1, ca]
            #      [0, da]]
            A = np.bmat(
                [[np.zeros((na0, ma0)), qa*ca[:na,:]],
                 [np.eye(ma0, ma0), ca[na:,:]],
                 [np.zeros((da.shape[0], ma0)), da]])
            
            Bt = np.bmat(
                [[np.zeros((nb0, mb0)), qb.T*cb[:nb,:]],
                 [np.eye(mb0, mb0), cb[nb:,:]],
                 [np.zeros((db.shape[0], mb0)), db]]).T
            A = A*qa_.T
            Bt = qbt_.T*Bt
            
            if ma > self.max_aux_basis_size:
                ma = self.max_aux_basis_size
                A = A[:, :ma]
                b = b[:ma, :]
            if mb > self.max_aux_basis_size:
                mb = self.max_aux_basis_size
                Bt = Bt[:mb, :]
                b = b[:, :mb]
            _a = np.hstack([_GX, _a0, _dgdx])*A
            _bt = Bt*np.vstack([_XG.T,
                                _b0.T,
                                _dxg.T])

        self.suspend()
        self._a = _a
        self._bt = _bt
        self._b = b

        # These are not needed: just for checking and debugging
        _alpha0 = np.hstack([_GX*qa, _a0])
        _alpha1 = _alpha0*ca + _dgdx*da
        _alpha = np.hstack([_alpha0, _alpha1])

        self.resume()

[docs]    def get_w(self, dXdX, dGdG):
        r"""Return weights `w, w0, wr`.

        Returns
        -------
        w : array
           Array of weights for `dx`.
        """
        N, n_ = self.X.shape
        ma = self._a.shape[1]
        mb = self._bt.shape[0]
        na = self._qa.shape[1]
        nb = self._qb.shape[0]
        w = np.eye(dXdX.shape[0])
        w0 = _SINGULAR_TOL
        wr = 0
        return w, w0, wr

    def _restrict(self, b, na=0, nb=0):
        r"""Return unitary matrices `(qa, qb.T)` which can be used to
        transform `b` so that truncating the matrix will introduce
        minimal errors.  The first `n0` rows and columns of `b` will
        be held fixed.

        The present implementation is an approximation: it is not
        optimal.

        Parameters
        ----------
        b : array of shape (n,n)
           Array to be rearranged and truncated.
        na, nb : int
           Number of rows and columns to be held fixed.  If this is
           `0`, then the SVD will be used: ``qa, d, qb.T = svd(b)``.
        """
        ua, da, vat = np.linalg.svd(b[na:, :])
        ub, db, vbt = np.linalg.svd(b[:, nb:])
        return ua, vbt

    @staticmethod
    def _ortho(a0, b):
        r"""Return `(A, B, s)` such that 
        `a1 = dot(a0, A) + dot(b, B)` is orthonormal and
        orthogonal to `a0`.

        Parameters
        ----------
        a0 : array
           The columns of this must be orthonormal.
        b : array
           Need not be orthonormal.

        Returns
        -------
        A, B : array
        s : array
           This is the list of singular values for the
           transformation.  If these are too small, then the
           corresponding dimensions should be dropped.  I.e. use only
           `a1[:, np.where(tol < s)[0]]`

        Examples
        --------
        >>> np.random.seed(0)
        >>> a0 = np.linalg.qr(np.random.rand(10, 5))[0][:,:5]
        >>> b = np.random.rand(10, 2)
        >>> A, B, s = JinvExtendedBroyden._ortho(a0, b)
        >>> a1 = np.dot(a0, A) + np.dot(b, B)
        >>> np.allclose(np.dot(a1.T, a0), 0)
        True
        >>> np.allclose(np.dot(a1.T, a1), np.eye(2))
        True

        Notes
        -----
        If `a0` and `b` are big (in the first dimension), then it is
        best if `a0` is `F_CONTIGUOUS` (i.e. the transpose of a
        `C_CONTIGUOUS` array) and `b` is `C_CONTIGUOUS` to avoid
        :func:`np.dot` from making unneeded copies.  (One copy will
        always be made when computing `dot(b.T, b)`).

        Uses the relationship

        .. math::
           \braket{\beta|\beta} - 
           \braket{\beta|\uvect{\alpha}_{0}}
           \braket{\uvect{\alpha}_{0}|\beta} 
           &= \mat{v} \mat{\sigma}^2 \mat{v}^{T},\\
           \begin{pmatrix}
             \ket{\uvect{\alpha}_{0}} &
             \ket{\uvect{\alpha}_{1}}
           \end{pmatrix} &=
           \begin{pmatrix}
             \ket{\uvect{\alpha}_{0}} &
             \ket{\beta}
           \end{pmatrix}
           \cdot
           \begin{pmatrix}
             \mat{1} & -\braket{\uvect{\alpha}_{0}|\beta}\mat{v}
             \mat{\sigma}^{-1}\\
             \mat{0} & \mat{v}\mat{\sigma}^{-1}
           \end{pmatrix}.
        """
        a0_b = dot(a0.T, b)
        s2, v = eigh(dot(b.T, b) - dot(a0_b.T, a0_b))
        s = np.sqrt(s2)
        A = -np.dot(a0_b, v/s[None, :])
        B = v/s[None, :]
        return (A, B, s)

[docs]    def asarray(self):
        """Return the Jacobian matrix.

        Examples
        --------
        >>> M = _JinvExtendedBroyden()
        >>> M.asarray()
        array(1.0)
        """
        if np.prod(self._b.shape) == 0:
            return np.array(1.0)

        _alpha = np.hstack([dot(self.G - self.X, self._qa), self._a])
        if 1 == self.method:
            _XG = self.X
        else:
            _XG = self.G

        _betat = np.vstack([dot(self._qb, _XG.T),
                            self._bt])

        M = dot(dot(_alpha, self._b), _betat)
        return M + np.eye(self.X.shape[0], self.G.shape[0], dtype=float)

class _JinvExtendedBroydenMemoryLimited(StateVars):
    r"""Class to represent an inverse Jacobian for the Extended
    Broyden method.

    We represent the inverse Jacobian as:

    .. math::
       \mat{B} &= \mat{1} + \ket{\uvect{\alpha}}\mat{b}\bra{\uvect{\beta}},\\
       \ket{\uvect{\alpha}} &= \begin{pmatrix}
          \ket{G - X}\mat{q}_{a} & \ket{\uvect{a}}\end{pmatrix},\\
       \bra{\uvect{\beta}} &= \begin{pmatrix}
          \mat{q}_{b}\bra{X,G},\\
          \bra{\uvect{b}}\end{pmatrix},

    with the choice of $\ket{X}$ (method 1) or $\ket{G}$ (method 2)
    for $\ket{\uvect{\beta}}$ depends on the method.

    Notes
    -----    
    Since python is implemented in C, the arrays are typically stored
    as `C_CONTIGUOUS` and :func:`numpy.dot` is efficient if both
    arguments are `C_CONTIGUOUS`, otherwise copies are made.  Thus,
    where one needs to compute $\braket{\mat{A}|\mat{B}}$, one should
    store the transpose of the first matrix $\bra{\mat{A}}$.

    To Do
    -----
    .. todo:: Add support for non-square matrices.  This is only a
       bookkeeping issue, but probably not that useful so maybe leave
       for a while.
    """
    implements(IJinv)
    _state_vars = [
        ('X', Required, 
            r"$N\times n$ array $\ket{\mat{X}}$ of previous steps $x_i$."),
        ('G', Required, 
            r"$N\times n$ array $\ket{\mat{G}}$ of objectives $G(x_i)$."),
        ('_a', None,
            r"$N\times m$ array of extra range basis vectors."),
        ('_bt', None, r"""$m\times N$ array of extra domain basis
            vectors stored in row-vector format for improved
           :func:`numpy.dot` performance."""),
        ('_qa', None, r"""Transformation $\mat{q}_{a}$ such that
            $(\ket{G}-\ket{X})\mat{q}_{a}$ is the first part of the
            basis for the range."""),
        ('_qb', None, r"""Transformation $\mat{q}_{b}$ such that
            $\mat{q}_{b}*\bra{X,G}$ is the first part of the basis for
            the domain."""),
        ('_b', None,
            r"Dyadic portion of the inverse Jacobian in the implied basis."),
        ('method', 1, 
            r"""`1` for Broyden's good method and `2` for the \"bad\"
            method."""),
        ('max_aux_basis_size', np.inf, r"""Maxiumum number of extra
            basis states to store for representing the inverse
            Jacobian."""),
        ('_check', True, r"""If `True`, perform some potentially
            expensive sanity checks."""),
        ]
    process_vars()
    def __init__(self, *v, **kw):
        N, n_ = self.X.shape
        assert self.G.shape == self.X.shape
        if self._a is None:
            self._a = np.zeros((N, 0), dtype=float)
            self._bt = np.zeros((0, N), dtype=float)

        ma = self._a.shape[1]
        mb = self._bt.shape[0]

        assert self._a.shape == (N, ma)
        assert self._bt.shape == (mb, N)

        if self._qa is None:
            _GX = self.G - self.X
            GXGX = np.dot(_GX.T, _GX)
            del _GX
            u, d, vt = np.linalg.svd(GXGX)
            inds = np.where(d/d.max() > _SINGULAR_TOL)[0]
            self._qa = u[:,inds]/np.sqrt(d[inds])[None,:]

        na = self._qa.shape[1]
        assert self._qa.shape == (n_, na)

        if self._qb is None:
            if 1 == self.method:
                _XG = self.X
            else:
                _XG = self.G
            XGXG = np.dot(_XG.T, _XG)
            del _XG
            u, d, vt = np.linalg.svd(XGXG)
            inds = np.where(d/d.max() > _SINGULAR_TOL)[0]
            self._qb = (u[:,inds]/np.sqrt(d[inds])[None,:]).T
        
        nb = self._qb.shape[0]
        assert self._qb.shape == (nb, n_)

        if self._check:
            # Check orthonormality of basese
            qa = np.asmatrix(self._qa)
            _GX = self.G - self.X
            GXGX = np.asmatrix(np.dot(_GX.T, _GX))
            GXa0 = np.dot(_GX.T, self._a) 
            a0a0 = np.dot(self._a.T, self._a) 
            del _GX
            assert np.allclose(qa.T*GXGX*qa, np.eye(na))
            assert np.allclose(GXa0, 0)
            assert np.allclose(a0a0, np.eye(ma))

            qb = np.asmatrix(self._qb)
            if 1 == self.method:
                _XG = self.X
            else:
                _XG = self.G
            b0XG = np.dot(self._bt, _XG) 
            b0b0 = np.dot(self._bt, self._bt.T)
            XGXG = np.asmatrix(np.dot(_XG.T, _XG))
            del _XG
            assert np.allclose(qb*XGXG*qb.T, np.eye(nb))
            assert np.allclose(b0XG, 0)
            assert np.allclose(b0b0, np.eye(mb))
        
        if self._b is None:
            self._b = np.zeros((na + ma, nb + mb), dtype=float)
        
        assert self._b.shape == (na + ma, nb + mb)
            
    def __mul__(self, dg, mutate=False):
        r"""Return the matrix multiplication `B*dg`.

        Parameters
        ----------
        dg : array
           This should be in column vector format with dimension
           `dg.shape = (N,_)`.
        mutate : bool
           If `True`, then `dg` will be mutated, otherwise a copy will
           be returned.  (Note: with the present implementation of
           :func:`numpy.dot`, one copy of `dg` will be made).

        Notes
        -----
        .. math::
           \mat{B}\ket{\d{g}} = \ket{\d{g}} +
           \begin{pmatrix}
             \ket{G - X}\mat{q}_{a} &
             \ket{\uvect{a}}
           \end{pmatrix}
           \mat{b}
           \begin{pmatrix}
             \mat{q}_{b}\bra{X,G}\\
             \bra{\uvect{b}}
           \end{pmatrix}\ket{\d{g}}
        """
        dg = np.asarray(dg)
        if 1 == len(dg.shape):
            dg = dg[:, None]
        na = self._qa.shape[1]
        ma = self._a.shape[1]
        if 1 == self.method:
            # The default dot copies dg here, so hopefully it is small.
            XGdg = dot(dg.T, self.X).T
        else:
            assert 2 == self.method
            XGdg = dot(dg.T, self.G).T

        tmp = dot(self._b, np.vstack([dot(self._qb, XGdg), 
                                      dot(self._bt, dg)]))
        tmp1 = dot(self._qa, tmp[:na,:])
        tmp2 = tmp[na:,:]
        if mutate:
            res = dg
        else:
            res = 1*dg

        res += np.dot(self.G, tmp1)
        res -= np.dot(self.X, tmp1)
        res += np.dot(self._a, tmp2)
        return res

    def __rmul__(self, dxt, mutate=False):
        r"""Return the matrix multiplication `dxt*B`.

        Parameters
        ----------
        dxt : array
           This should be in row vector format with dimension
           `dg.shape = (_,N)`.
        mutate : bool
           If `True`, then `dxt` will be mutated, otherwise a copy will
           be returned.  (Note: with the present implementation of
           :func:`numpy.dot`, one copy of `dxt` will be made).

        Notes
        -----
        .. math::
           \bra{\d{x}}\mat{B} = \bra{\d{x}} +
           \bra{\d{x}}
           \begin{pmatrix}
             \ket{G - X}\mat{q}_{a} &
             \ket{\uvect{a}}
           \end{pmatrix}
           \mat{b}
           \begin{pmatrix}
             \mat{q}_{b}\bra{X,G}\\
             \bra{\uvect{b}}
           \end{pmatrix}
        """
        nb = self._qb.shape[0]
        mb = self._bt.shape[0]
        tmp = dot(np.hstack([dot(dot(dxt, self.G) - dot(dxt, self.X),
                                 self._qa),
                             dot(dxt, self._a)]), self._b)

        tmp1 = dot(tmp[:, :nb], self._qb)
        tmp2 = tmp[:, nb:]
        
        if 1 == self.method:
            # The default dot copies dg here, so hopefully it is small.
            tmp1 = dot(self.X, tmp1.T).T
        else:
            assert 2 == self.method
            tmp1 = dot(self.G, tmp1.T).T

        if mutate:
            res = dxt
        else:
            res = 1*dxt

        res += tmp1
        res += np.dot(tmp2, self._bt)
        return res

    def update(self, x0=None, g0=None):
        r"""Update the inverse Jacobian so that it is valid at the
        point `(x, g)`.
        
        Return a measure of the incompatibility of the new data with
        the current approximation.

        Examples
        --------
        Here we perform some tests with a linear function

        >>> np.random.seed(0)
        >>> N = 4
        >>> J = np.random.rand(N, N)
        >>> x0 = np.random.rand(N)
        >>> def g(x):
        ...     return np.dot(J, x - x0)
        >>> B = _JinvExtendedBroyden(X=np.zeros((N,1)),
        ...                         G=g(0*x0)[:,None])
        >>> x = np.random.rand(N)
        >>> B.update(x, g(x))
        >>> np.allclose(B*(g(x) - g(x0)), x - x0)
        True
        >>> for n in xrange(N):
        ...     x = np.random.rand(N)
        ...     B.update(x, g(x))

        Notes
        -----
        Our goal here is to minimize the amount of processing of large
        matrices.  As such, we try to work with projections as much as
        possible.  We start with the following large matrices:        
        $\ket{X}$, $\ket{G}$, $\ket{x}$, $\ket{g}$, $\ket{a_0}$,
        $\ket{b_0}$.
        
        We use the following:

        .. math::
           \ket{\delta\mat{x}} &= \ket{x} - \ket{X_0}, \qquad
           \ket{\delta\mat{g}} = \ket{g} - \ket{G_0},\\
           \ket{\d\mat{X}} &= \ket{\mat{X}} - \ket{x}, \qquad
           \ket{\d\mat{G}} = \ket{\mat{G}} - \ket{g},\\
           \ket{\uvect{\alpha}_0} &= \begin{pmatrix}
               \ket{\mat{G} - \mat{X}}\mat{q}_{a} &
               \ket{\mat{A}_0}
             \end{pmatrix},\\
           \bra{\uvect{\beta}_0} &= \begin{pmatrix}
               \mat{q}_{b}\bra{\mat{X}, \mat{G}} \\
               \bra{\mat{B}_0}
             \end{pmatrix},\\
           \ket{\uvect{\alpha}} &= \begin{pmatrix}
               \ket{\uvect{\alpha}_0} &
               \ket{\uvect{\alpha}_1}
             \end{pmatrix},\\
           \bra{\uvect{\beta}} &= \begin{pmatrix}
               \bra{\uvect{\beta}_0}\\
               \bra{\uvect{\beta}_1}
             \end{pmatrix},\\
           \ket{\uvect{\alpha}_1} &= 
             \ket{\uvect{\alpha}_0}\mat{c}_{a} +
              \ket{\delta{g} - \delta{x}}\mat{d}_{a},\\
           \bra{\uvect{\beta}_1} &= 
             \mat{c}_{b}^{T}\bra{\uvect{\beta}_0} +
               \mat{d}_{b}^{T}\bra{\delta{x}, \delta{g}},\\
           \mat{d}_{a} &= \mat{v}_{a}\mat{s}_{a}^{-1},\qquad
           \mat{c}_{a} = 
             -\braket{\uvect{\alpha}_0|\delta{g} - \delta{x}}
                 \mat{v}_{a}\mat{s}_{a}^{-1},\\
           \mat{d}_{b} &= \mat{v}_{b}\mat{s}_{b}^{-1},\qquad
           \mat{c}_{b} = 
             -\braket{\uvect{\beta}_0|\delta{x},\delta{g}}
                 \mat{v}_{b}\mat{s}_{b}^{-1},\\
           \mat{v}_{a}\mat{s}_{a}^2\mat{v}_{a}^{T} &=
           \braket{\delta{g} - \delta{x}|\delta{g} - \delta{x}}
             - \braket{\delta{g} - \delta{x}|\uvect{\alpha}_0}
               \braket{\uvect{\alpha}_0|\delta{g} - \delta{x}},\\
           \mat{v}_{b}\mat{s}_{b}^2\mat{v}_{b}^{T} &=
           \braket{\delta{x},\delta{g}|\delta{x},\delta{g}}
             - \braket{\delta{x},\delta{g}|\uvect{\beta}_0}
               \braket{\uvect{\beta}_0|\delta{x},\delta{g}},\\
           \braket{\d\mat{G}-\d\mat{X}|\uvect{\alpha}} &= \begin{pmatrix}
             \braket{\d\mat{G}-\d\mat{X}|\uvect{\alpha}_0} &
             \braket{\d\mat{G}-\d\mat{X}|\uvect{\alpha}_1}
           \end{pmatrix},\\
           \braket{\d\mat{G} - \d\mat{X}|\uvect{\alpha}_0} &= \begin{pmatrix}
             \braket{\d\mat{G} - \d\mat{X}|\mat{G} - \mat{X}}\mat{q}_{a} &
             \braket{\d\mat{G} - \d\mat{X}|\mat{A}_0}
           \end{pmatrix},\\
           \braket{\d\mat{G} - \d\mat{X}|\uvect{\alpha}_1} &= 
             \braket{\d\mat{G} - \d\mat{X}|\uvect{\alpha}_0}\mat{c}_{a} +
             \braket{\d\mat{G} - \d\mat{X}|\delta{g}-\delta{x}}\mat{d}_{a},\\
           \braket{\d\mat{X},\d\mat{G}|\uvect{\beta}} &= \begin{pmatrix}
             \braket{\d\mat{X},\d\mat{G}|\uvect{\beta}_0} &
             \braket{\d\mat{X},\d\mat{G}|\uvect{\beta}_1}
             \end{pmatrix} ,\\
           \braket{\d\mat{X},\d\mat{G}|\uvect{\beta}_0} &= \begin{pmatrix}
             \braket{\d\mat{X},\d\mat{G}|\mat{X},\mat{G}}\mat{q}_{b}^{T} &
             \braket{\d\mat{X},\d\mat{G}|\mat{B}_0}
           \end{pmatrix},\\
           \braket{\d\mat{X},\d\mat{G}|\uvect{\beta}_1} &= 
             \braket{\d\mat{X},\d\mat{G}|\uvect{\beta}_0}\mat{c}_{b} +
             \braket{\d\mat{X},\d\mat{G}|\delta{x},\delta{g}}\mat{d}_{b},\\
           \braket{\uvect{\alpha}|\uvect{\alpha}_0} 
             &= \braket{\uvect{\beta}|\uvect{\beta}_0} = \begin{pmatrix}
             \mat{1}\\ \mat{0}
             \end{pmatrix},\\
           \braket{\beta|\alpha} &= 
           \begin{pmatrix}
             \mat{q}_{b}\braket{\mat{X}, \mat{G}|
               \mat{G} - \mat{X}}\mat{q}_{a} &
             \mat{q}_{b}\braket{\mat{X}, \mat{G}|
               \mat{G} - \mat{X}}\mat{q}_{a} \\
           \end{pmatrix}
        """
        # First we cast all of the arrays as matrices so we can
        # use * as matrix multiplication.  Large matrices are
        # preceeded by an underscore.
        x = np.asarray(x0)
        g = np.asarray(g0)
        if 1 == len(x.shape):
            x = x[:,None]
        if 1 == len(g.shape):
            g = g[:,None]
        _X = np.asmatrix(self.X)
        _G = np.asmatrix(self.G)
        _b0 = np.asmatrix(self._bt.T)
        _a0 = np.asmatrix(self._a)
        qa = np.asmatrix(self._qa)
        qb = np.asmatrix(self._qb)

        # We assume that g is only one column, so a copy here is okay.
        _dg = np.asmatrix(g - _G[:,0])
        _dx = np.asmatrix(x - _X[:,0])
        _dgdx = _dg - _dx

        # Consider optimizing these copies away later
        _dG = np.asmatrix(_G - g)
        _dX = np.asmatrix(_X - x)
        _dGdX = _dG - _dX
        _GX = np.asmatrix(_G - _X)
        if 1 == self.method:
            _dxg = _dx
            _XG = np.asmatrix(_X)
            _dXG = _dX
        else:
            _dxg = _dg
            _XG = np.asmatrix(_G)
            _dXG = _dG

        # These are all of the brackets required.
        dXdX = _dX.T*_dX
        dGdG = _dG.T*_dG
        dXG_XG = _dXG.T*_XG
        dXG_dxg = _dXG.T*_dxg
        dXG_b0 = _dXG.T*_b0

        dxg_XG = _dxg.T*_XG
        dxg_GX = _dxg.T*_GX
        dxg_dxg = _dxg.T*_dxg
        dxg_b0 = _dxg.T*_b0
        dxg_a0 = _dxg.T*_a0

        dGdX_a0 = _dGdX.T*_a0
        GX_a0 = _GX.T*_a0
        XG_GX = _XG.T*_GX
        X_a0 = _X.T*_a0
        g_a0 = g.T*_a0
        x_a0 = x.T*_a0
        XG_a0 = _XG.T*_a0
        b0_a0 = _b0.T*_a0
        GX_b0 = _GX.T*_b0

        dGdX_GX = _dGdX.T*_GX
        dGdX_dgdx = _dGdX.T*_dgdx
        dgdx_GX = _dgdx.T*_GX
        dgdx_XG = _dgdx.T*_XG
        dgdx_a0 = _dgdx.T*_a0
        dxg_dgdx = _dxg.T*_dgdx
        dgdx_dgdx = _dgdx.T*_dgdx
        dgdx_b0 = _dgdx.T*_b0

        # Now we form the small matrices
        dxg_beta0 = np.hstack([dxg_XG*qb.T, dxg_b0])
        ub, sb, _ = np.linalg.svd(dxg_dxg - dxg_beta0*dxg_beta0.T)

        # Only keep significant basis vectors
        inds = np.where(sb >= _SINGULAR_TOL)[0]
        sb = sb[inds]
        ub = ub[:, inds]

        db = np.asmatrix(ub/(np.sqrt(sb)[None, :] + _SINGULAR_TOL**2))
        cb = -dxg_beta0.T*db

        dXG_beta0 = np.hstack([dXG_XG*qb.T, dXG_b0])
        dXG_beta1 = dXG_beta0*cb + dXG_dxg*db
        dXG_beta = np.hstack([dXG_beta0, dXG_beta1])

        dGdX_a0 = GX_a0 - (g_a0 - x_a0)
        dgdx_alpha0 = np.hstack([dgdx_GX*qa, dgdx_a0])
        ua, sa, _ = np.linalg.svd(dgdx_dgdx -
                                  dgdx_alpha0*dgdx_alpha0.T)

        # Only keep significant basis vectors (assuming everything has
        # been scaled to be of unit norm here...)
        inds = np.where(sa >= _SINGULAR_TOL)[0]
        sa = sa[inds]
        ua = ua[:, inds]
        da = np.asmatrix(ua/(np.sqrt(sa)[None, :] + _SINGULAR_TOL**2))
        ca = -dgdx_alpha0.T*da

        dGdX_alpha0 = np.hstack([dGdX_GX*qa, dGdX_a0])
        dGdX_alpha1 = dGdX_alpha0*ca + dGdX_dgdx*da
        dGdX_alpha = np.hstack([dGdX_alpha0, dGdX_alpha1])

        # the basis vectors alpha and beta are stacked from three sets
        # with sizes n, m0 and (m-m0)
        N, n_ = self.X.shape
        na0, na = self._qa.shape
        nb, nb0 = self._qb.shape

        ma0 = self._a.shape[1]  # Previous size of aux space
        mb0 = self._bt.shape[0]

        ma = ma0 + ca.shape[1]  # New size of aux space
        mb = mb0 + cb.shape[1]

        eye0 = np.asmatrix(np.eye(na + ma0, nb + mb0, dtype=float))
        eye1 = np.asmatrix(np.eye(na + ma, nb + mb, dtype=float))

        alpha_alpha0 = np.asmatrix(np.eye(na + ma, na + ma0, dtype=float))
        beta_beta0 = np.asmatrix(np.eye(nb + mb, nb + mb0, dtype=float))

        beta0_dgdx = np.vstack([qb*dgdx_XG.T,
                                dgdx_b0.T])
        dxg_alpha0 = np.hstack([dxg_GX*qa, dxg_a0])

        beta0_alpha0 = np.bmat(
            [[qb*XG_GX*qa, qb*XG_a0],
             [GX_b0.T*qa, b0_a0]])
        beta0_alpha1 = beta0_alpha0*ca + beta0_dgdx*da
        beta1_alpha0 = cb.T*beta0_alpha0 + db.T*dxg_alpha0
        beta1_alpha1 = (cb.T*(beta0_alpha0*ca + beta0_dgdx*da) +
                        db.T*(dxg_alpha0*ca + dxg_dgdx*da))
        beta_alpha = np.bmat([[beta0_alpha0, beta0_alpha1],
                              [beta1_alpha0, beta1_alpha1]])

        ############################################################
        # Now we have the bases and all the required matrices, so we
        # do the real work.
        def inv(A):
            if 0 == np.prod(A.shape):
                # Empty matrix needs a degenerate case
                return A.T
            return np.linalg.pinv(A, rcond=_SINGULAR_TOL**2)

        w, w0, wr = self.get_w(dXdX, dGdG)
        if w is not None:
            w = np.asarray(w)
            if 1 == len(w.shape):
                dGdX_alpha *= w[:, None]
                dXG_beta *= w[:, None]
            else:
                dGdX_alpha = dot(w.T, dGdX_alpha)
                dXG_beta = dot(w.T, dXG_beta)
        
                
        if 1 == self.method:
            num = dGdX_alpha.T*dXG_beta
            if self._b is not None:
                j0 = -inv(inv(self._b) + beta0_alpha0)
                num += w0*w0*alpha_alpha0*j0*beta_beta0.T
            denom = dXG_beta.T*dXG_beta + (w0*w0 + wr*wr)*np.eye(nb+mb)
            j = num*inv(denom)
            b = -inv(inv(j) + beta_alpha)
        elif 2 == self.method:
            num = -dGdX_alpha.T*dXG_beta
            if self._b is not None:
                num += w0*w0*alpha_alpha0*self._b*beta_beta0.T
            denom = dXG_beta.T*dXG_beta + (w0*w0 + wr*wr)*np.eye(nb+mb)
            b = num*inv(denom)

        # Set singular values to zero so they don't contribute.
        if 0 == np.prod(b.shape):
            # There is no dyadic contribution.
            b = None
            _a = None
            _bt = None
        else:
            u, d, vt = np.linalg.svd(b, full_matrices=False)
            bad_inds = np.where(np.logical_or(d < _SINGULAR_TOL,
                                          1/_SINGULAR_TOL < d))[0]
            d[bad_inds] = 0.0
            b = dot(np.multiply(u,d[None, :]), vt)

            # Now that we have b, we perform an svd in the auxiliary basis to
            # determine what basis vectors to keep.
            qa_, qbt_ = self._restrict(b, na, nb)
            b[na:, nb:] = qa_*b[na:, nb:]*qbt_

            #            na + ma0    ma - ma0
            # |alpha> = [alpha0, alpha0*ca + dgdx*da]
            #         = [GX*qa, [a0, [GX*qa, a0]*ca + dgdx*da]]
            #         = [GX*qa, A_new]
            #         = [GX*qa, [GX, a0, dgdx]*A]
            # |A_new> = [GX, a0, dgdx]*A
            # A_[na_ + ma0 + (ma - ma0), ma = ma0 + (ma-ma0)]
            # A = [[0, qa*ca]
            #      [1, ca]
            #      [0, da]]
            A = np.bmat(
                [[np.zeros((na0, ma0)), qa*ca[:na,:]],
                 [np.eye(ma0, ma0), ca[na:,:]],
                 [np.zeros((da.shape[0], ma0)), da]])
            
            Bt = np.bmat(
                [[np.zeros((nb0, mb0)), qb.T*cb[:nb,:]],
                 [np.eye(mb0, mb0), cb[nb:,:]],
                 [np.zeros((db.shape[0], mb0)), db]]).T
            A = A*qa_.T
            Bt = qbt_.T*Bt
            
            if ma > self.max_aux_basis_size:
                ma = self.max_aux_basis_size
                A = A[:, :ma]
                b = b[:ma, :]
            if mb > self.max_aux_basis_size:
                mb = self.max_aux_basis_size
                Bt = Bt[:mb, :]
                b = b[:, :mb]
            _a = np.hstack([_GX, _a0, _dgdx])*A
            _bt = Bt*np.vstack([_XG.T,
                                _b0.T,
                                _dxg.T])

        self.suspend()
        self._a = _a
        self._bt = _bt
        self._b = b

        # These are not needed: just for checking and debugging
        _alpha0 = np.hstack([_GX*qa, _a0])
        _alpha1 = _alpha0*ca + _dgdx*da
        _alpha = np.hstack([_alpha0, _alpha1])

        self.resume()

    def get_w(self, dXdX, dGdG):
        r"""Return weights `w, w0, wr`.

        Returns
        -------
        w : array
           Array of weights for `dx`.
        """
        N, n_ = self.X.shape
        ma = self._a.shape[1]
        mb = self._bt.shape[0]
        na = self._qa.shape[1]
        nb = self._qb.shape[0]
        w = np.eye(dXdX.shape[0])
        w0 = _SINGULAR_TOL
        wr = 0
        return w, w0, wr

    def _restrict(self, b, na=0, nb=0):
        r"""Return unitary matrices `(qa, qb.T)` which can be used to
        transform `b` so that truncating the matrix will introduce
        minimal errors.  The first `n0` rows and columns of `b` will
        be held fixed.

        The present implementation is an approximation: it is not
        optimal.

        Parameters
        ----------
        b : array of shape (n,n)
           Array to be rearranged and truncated.
        na, nb : int
           Number of rows and columns to be held fixed.  If this is
           `0`, then the SVD will be used: ``qa, d, qb.T = svd(b)``.
        """
        ua, da, vat = np.linalg.svd(b[na:, :])
        ub, db, vbt = np.linalg.svd(b[:, nb:])
        return ua, vbt

    @staticmethod
    def _ortho(a0, b):
        r"""Return `(A, B, s)` such that 
        `a1 = dot(a0, A) + dot(b, B)` is orthonormal and
        orthogonal to `a0`.

        Parameters
        ----------
        a0 : array
           The columns of this must be orthonormal.
        b : array
           Need not be orthonormal.

        Returns
        -------
        A, B : array
        s : array
           This is the list of singular values for the
           transformation.  If these are too small, then the
           corresponding dimensions should be dropped.  I.e. use only
           `a1[:, np.where(tol < s)[0]]`

        Examples
        --------
        >>> np.random.seed(0)
        >>> a0 = np.linalg.qr(np.random.rand(10, 5))[0][:,:5]
        >>> b = np.random.rand(10, 2)
        >>> A, B, s = JinvExtendedBroyden._ortho(a0, b)
        >>> a1 = np.dot(a0, A) + np.dot(b, B)
        >>> np.allclose(np.dot(a1.T, a0), 0)
        True
        >>> np.allclose(np.dot(a1.T, a1), np.eye(2))
        True

        Notes
        -----
        If `a0` and `b` are big (in the first dimension), then it is
        best if `a0` is `F_CONTIGUOUS` (i.e. the transpose of a
        `C_CONTIGUOUS` array) and `b` is `C_CONTIGUOUS` to avoid
        :func:`np.dot` from making unneeded copies.  (One copy will
        always be made when computing `dot(b.T, b)`).

        Uses the relationship

        .. math::
           \braket{\beta|\beta} - 
           \braket{\beta|\uvect{\alpha}_{0}}
           \braket{\uvect{\alpha}_{0}|\beta} 
           &= \mat{v} \mat{\sigma}^2 \mat{v}^{T},\\
           \begin{pmatrix}
             \ket{\uvect{\alpha}_{0}} &
             \ket{\uvect{\alpha}_{1}}
           \end{pmatrix} &=
           \begin{pmatrix}
             \ket{\uvect{\alpha}_{0}} &
             \ket{\beta}
           \end{pmatrix}
           \cdot
           \begin{pmatrix}
             \mat{1} & -\braket{\uvect{\alpha}_{0}|\beta}\mat{v}
             \mat{\sigma}^{-1}\\
             \mat{0} & \mat{v}\mat{\sigma}^{-1}
           \end{pmatrix}.
        """
        a0_b = dot(a0.T, b)
        s2, v = eigh(dot(b.T, b) - dot(a0_b.T, a0_b))
        s = np.sqrt(s2)
        A = -np.dot(a0_b, v/s[None, :])
        B = v/s[None, :]
        return (A, B, s)

    def asarray(self):
        """Return the Jacobian matrix.

        Examples
        --------
        >>> M = _JinvExtendedBroyden()
        >>> M.asarray()
        array(1.0)
        """
        if np.prod(self._b.shape) == 0:
            return np.array(1.0)

        _alpha = np.hstack([dot(self.G - self.X, self._qa), self._a])
        if 1 == self.method:
            _XG = self.X
        else:
            _XG = self.G

        _betat = np.vstack([dot(self._qb, _XG.T),
                            self._bt])

        M = dot(dot(_alpha, self._b), _betat)
        return M + np.eye(self.X.shape[0], self.G.shape[0], dtype=float)

[docs]class JinvBroyden(StateVars):
    r"""Class to represent an inverse Jacobian for the Generalized
    Broyden method.

    This is intended to solve root-finding problems
    of the form

    .. math::
       \vect{G}(\vect{x}) = 0.

    This is designed for large scale problems where it is typically
    prohibitive to store the entire Jacobian, so a limited
    memory approximation is typically made by using :attr:`dyadic` and
    :attr:`n_prev`.  As a result, we assume that the problem is
    preconditions so that the Jacobian is well approximated by the
    identity.

    .. math::
       \mat{J} \approx \mat{1}.

    In other words, we implicitly expect that the problem is
    preconditioned in such a way that

    .. math::
       \vect{G}(\vect{x}) = \vect{x} - \vect{F}(\vect{x})

    where the iteration $\vect{x} \mapsto \vect{F}(\vect{x})$ is
    "almost" convergent.  This assumption plays out in much of the
    internal analysis about when behaviour is singular.

    The core of this is one of the following two updates to the
    inverse Jacobian:

    .. math::
       \mat{B}_{\text{1a)}} &= \mat{B}_{0} +
         (\ket{\d\mat{X}} - \mat{B}_{0}\ket{\d\mat{G}})\cdot\mat{w}^2
         \bra{\d\mat{X}}\cdot\frac{1}{
           \mat{B}_{0}\ket{\d\mat{G}}\mat{w}^2\bra{\d\mat{X}}
           + w_{0}^2\mat{\Gamma}\mat{\Gamma}^{T}}\cdot\mat{B}_{0}, \\
       \mat{B}_{\text{2b)}} &= \mat{B}_{0}
         + (\ket{\d\mat{X}} - \mat{B}_{0}\ket{\d\mat{G}})\cdot\mat{w}^2
         \bra{\d\mat{G}}\cdot\frac{1}{
           \ket{\d\mat{G}}\mat{w}^2\bra{\d\mat{G}}
           + w_{0}^2\mat{\Gamma}\mat{\Gamma}^{T}}

    where the weights $\mat{w}$ are computed using :meth:`_get_weight`
    and the inversion of the denominator is computed using
    :meth:`_inversion`.  The displacements are computed from the
    current step $\ket{x}$ and $\ket{g}$ and the previous steps
    $\ket{\mat{X}}$ and $\ket{\mat{G}}$, of which we store the
    previous :attr:`n_prev` steps.

    .. math::
       \ket{\d\mat{X}} &= \ket{x} - \ket{\mat{X}}, \\
       \ket{\d\mat{G}} &= \ket{g} - \ket{\mat{G}}.

    Examples
    --------
    We start with a linear function.  The Jacobian should fully
    represent this.

    >>> np.random.seed(0)
    >>> A = np.random.random((3, 3))
    >>> Ainv = np.linalg.inv(A)
    >>> b = np.random.random(3)
    >>> x0 = np.random.rand(3)
    >>> def g(x):
    ...     return np.dot(A, x) + b

    For both :attr:`method` "good" (`'J'`) and "bad" (`'B'`) the
    :attr:`inv_method` `'svd'` reproduces the Jacobian once the full
    space is sampled:

    >>> B = JinvBroyden(dyadic=True, inv_method='svd')
    >>> for B.method in ['J', 'B']:
    ...     B.reset_all()
    ...     x_ = 1*x0
    ...     for n in xrange(4):
    ...         g_ = g(x_)
    ...         incompatibility = B.set_point(x=x_, g=g_)
    ...         x_ = B.get_x()
    ...     np.allclose(B.j_inv.asarray(), Ainv)
    True
    True

    The :attr:`inv_method` `'w0'` does not exactly reproduce this as
    it "remembers" the initial Jacbian $\mat{1}$, but the results are
    almost correct if `w0` is small enough:
    
    >>> B.inv_method = 'w0'
    >>> B.x_method = 'recent'
    >>> B.w0 = 0.0001
    >>> for B.method in ['J', 'B']:
    ...     B.reset_all()
    ...     x_ = 1*x0
    ...     for n in xrange(4):
    ...         g_ = g(x_)
    ...         incompatibility = B.set_point(x=x_, g=g_)
    ...         x_ = B.get_x()
    ...     # abs(B.j_inv.asarray() - Ainv).max()
    ...     print(np.allclose(B.j_inv.asarray(), Ainv),
    ...           np.allclose(B.j_inv.asarray(), Ainv, atol=0.001),
    ...           np.allclose(B.j_inv.asarray(), Ainv, atol=0.003))
    (False, True, True)
    (False, False, True)

    This also demonstrates that the "good" method 1a) converges better
    to the Jacobian.

    Here we start with a non-trivial Jacobian but no vectors `xs`
    and then update.  This happens, for example, with
    :meth:`change_basis`.  We only keep `xs` and `gs` for exact `x =
    g(x)` and this is violated by linear changes in the basis, so we
    discard the points but keep the Jacobian.

    >>> np.random.seed(0)
    >>> B = JinvBroyden(dyadic=False, inv_method='svd', method='B')
    >>> B0 = np.random.random((4, 4))
    >>> B.j_inv = 1*B0
    >>> x = np.random.random(4)
    >>> g = np.random.random(4)
    >>> B.set_point(x=x, g=g)
    0.0

    This does not yet modify the Jacobian because we have only one point

    >>> abs(B0 - B.j_inv).max()
    0.0

    Now we update the Jacobian

    >>> dx = np.random.random(4)
    >>> dg = np.random.random(4)
    >>> B.set_point(x=x+dx, g=g+dg)        # doctest: +ELLIPSIS
    0.79...
    
    The new Jacobian now fits the slope properly,

    >>> np.allclose(B.j_inv*dg[:, None], dx[:, None])
    True

    But has not been modified in orthogonal directions (only true for
    :attr:`method` `'B'`).

    >>> dg_perp = np.linalg.qr(np.vstack([dg, np.random.random(4)]).T)[0][:, 1]
    >>> np.allclose(np.dot(dg_perp, dg), 0)
    True
    >>> np.allclose(np.dot(B0, dg_perp[:, None]), B.j_inv*dg_perp[:, None])
    True

    If you want to reproduce the original Broyden algorithms, then one
    should set: :attr:`n_prev` to `2`, :attr:`x_method` to `'recent'`,
    :attr:`inv_method` to `'svd'`, and :attr:`replace_method` to `'oldest'`.
    """
    implements(IJinv)
    _state_vars = [
        ('curr', None,\
             r"""Index of current point in :attr`:xs`.  The Jacobian
             was updated to be valid at this point."""),
        ('xs', None,\
             r"""Previous abscissa stored as row-vectors
            $\bra{\mat{X}}$"""),
        ('gs', None,\
             r"""Previous `g = G(x)` stored as row-vectors
            $\bra{\mat{G}}$"""),
        ('n_prev', 10,\
             r"Number of previous steps to store."),
        ('method', 'J',
            r"""Denotes which method of update is used in
            :meth:`update` (see for details)."""),
        ('x_method', 'recent',
            r"""Denotes which method is used to determine the next
            abscissa.  The two options determine at which point
            :meth:`update` updates the Jacobian inverse to be valid
            and stores this index in :attr:`curr`.  This is the point
            from which :meth:`get_x` takes a downhill step.  See
            :meth:`update` for possible options.
            
            .. warning:: Methods other than `'recent'` should be used
               with caution.  It is possible to get stuck into a loop
               here where the step does not update the Jacobian and so
               the same step is repeated over and over.  See the
               section :ref:`sec-notes`."""),
        ('replace_method', 'oldest',
            r"""Determines which point is replaced in order to satisfy
               the memory-limitation :attr:`n_prev`.  See
               :meth:`add_point` for possible values.

            .. warning:: Methods other than `'oldest'` should be used
               with caution.  It is possible to get stuck into a loop
               here where newest point is continually replaced, making
               no progress. See yje section :ref:`sec-notes`."""),
        ('inv_method', 'svd',
            r"""Method used by :meth:`_inversion` to regulate
            singular denominators.  See :meth:`_inversion` for
            options."""),
        ('min_svd', 1e-8,
            r"""Minimum singular value allowed in :meth:`_inversion`
            for the 'svd' :attr:`inv_method`.  This parameter does not seem to
            be too sensitive although it may be in problems where the
            Jacobian is singular."""),
        ('w0', 1,
            r"""Weight for minimal Jacobian change.  Large values
            reduce the influence of the previous Jacobian inverse."""),
        ('w_min', np.sqrt(_EPS),
            r"""Minimum weight (to ensure that $\mat{w}^2$ is
            invertible)."""),
        ('dyadic', False,
            r"""If `True`, then use a dyadic representation for Jinv.  If
            this is a number greater than 1, then this number is
            passed as :attr:`DyadicSum.n_max` limiting the rank of the
            dyadic representation."""),
        ('j_inv', 1, "Internal representation of matrix"),
        ('robust', False,
            """If `True`, then use the singular value decomposition to
            analyse the Jacobian.  Much slower, but could prevent some
            numerical problems."""),
        ('max_j_inv', 100.0,
            """If :attr:`robust`, then the inverse Jacobian is clipped
            so that the maximum value is this."""),
        ('min_j_inv', 0.0001,
            """The inverse Jacobian can become singular.  If this is
            the case, then a step might have very small length even if
            the error is not small (the step size can be reduced by a
            factor of the minimum singular value of the inverse
            Jacobian). If the step reduction is less than this value,
            then the step is scaled up to have length
            :attr:`min_step`."""),
        ('minimize_inverse', False,
            """If `True`, then use the update the minimizes the change
            in the Frobenius norm of the inverse Jacobian, otherwise
            we use the Sherman-Morrison formula to update so as to
            minimize the norm of the Jacobian."""),
        ('incompatibility_measure', 'dg',
            r"""Specifies which criterion to use for measured the
            compatibility of an update with the present inverse
            Jacobian.  See :attr:`incompatibility` for supported
            methods."""),
        ('check_roundoff', True,
            r"""If `True`, then do some extra calculations to check
            for roundoff error."""),
        ('roundoff_cut', 0.001,
            r"""If the estimated relative rounding errors in dg or dx
            are greater than this and :attr:`check_roundoff`, then
            these are not used."""),
        ('_j_inv_transform', NotImplemented,
            """Optional methods for updating the Jacobian inverse in
            :meth:`change_basis`.  For testing only."""),
        ('_dx_norm2_min', _EPS**2,
            r"""Used in :meth:`update`.  If the relative L2 norm
            squared of `x0 - x` is less than this, then the point `x`
            is removed from the analysis.  This prevents naked
            singularities in `dx`.

            .. warning:: This parameter is crucial to obtain good
               convergence.  If it is too large, then the nearest --
               most significant -- points will not be used resulting
               in poor convergence.  If it is too small, then
               ill-condition updates to the Jacobian may be made with
               no information content because of round off errors."""),
        ('_age', None,
            r"""Age of the various points so that one can identify the
            oldest."""),
        ('_x0', None, "Internal storage for :attr:`x0`"),
        ('_g0', None, "Internal storage for :attr:`g0`"),
        ]
    process_vars()
    def __init__(self, *v, **kw):
        # pylint: disable-msg=W0231
        if self.dyadic:
            n_max = np.inf
            if self.dyadic > 1:
                n_max = self.dyadic
            if self.j_inv is 1:
                self.j_inv = DyadicSum(n_max=n_max)
            elif isinstance(self.j_inv, DyadicSum):
                self.j_inv.n_max = n_max
            else:
                raise NotImplementedError(
                    "No support yet for initializing a" +
                    " DyadicSum from an array.")
        elif self.j_inv is not 1:
            self.j_inv = np.asmatrix(self.j_inv)


    @property
    def x(self):
        if self.xs is not None and self.curr is not None:
            return self.xs[self.curr]
        else:
            return None

    @property
    def g(self):
        if self.gs is not None and self.curr is not None:
            return self.gs[self.curr]
        else:
            return None

    @property
    def x0(self):
        if self._x0 is None:
            return self.x
        else:
            return self._x0

    @property
    def g0(self):
        if self._g0 is None:
            return self.g
        else:
            return self._g0

    def __mul__(self, dg):
        if self.j_inv is 1:
            return dg
        else:
            return np.asarray(self.j_inv.__mul__(dg))

    def __rmul__(self, dxt):
        if self.j_inv is 1:
            return dxt
        else:
            return self.j_inv.__rmul__(dxt)

    def _get_weight(self, dx, dg):
        r"""Return `(w0, w)` where $\mat{w}$ is the non-singular
        weight matrix and $w_0$ is the weight for the Jacbian
        residual.

        Parameters
        ----------
        dx, dg : array
           `(n, N)` arrays of previous differences $\bra{\d{X}} =
           \bra{x} - \bra{\mat{X}}$ (and similarly for $G$) where
           $\bra{x}$ is the current step and $\bra{X}$ are the
           previous steps.

        Notes
        -----
        Currently the weights are implemented as

        .. math::
           w_{ij} = w_{\text{max}}^{-1}
              \delta_{ij}\frac{1}{\braket{\d\mat{X}|\d\mat{X}}_{ii}}

        These are normalized so that the largest weight is 1 and all
        weights are larger than :attr:`w_min`.
        """
        w = np.maximum(self.w_min, 1/(np.diag(dot(dx, dx.T))**2 + _TINY))
        w = 0.0*w + 1.0;
        return (self.w0, np.diag(w/w.max()))

    def _inversion(self, dx, dg, j_inv):
        r"""Return `(ket_a, bra_b)` where

        .. math::
           \ket{a} &= \ket{\d{\mat{X}}} - \mat{B}_{0}\ket{\d{\mat{G}}}\\
           \bra{b} &= \bra{\mat{B}}
           \frac{1}{\ket{\mat{A}}\bra{\mat{B}}
                    + w_0^2\mat{\Gamma}\mat{\Gamma}^T}\mat{C}

        1a) :
           .. math::
              \bra{\mat{B}} = \mat{w}^2\bra{\d\mat{X}}, \qquad
              \ket{\mat{A}} = \mat{B}_{0}\ket{\d\mat{G}}, \qquad
              \mat{C} = \mat{B}_{0},
        2b) :
           .. math::
              \bra{\mat{B}} = \mat{w}^2\bra{\d\mat{G}}, \qquad
              \ket{\mat{A}} = \ket{\d\mat{G}}, \qquad
              \mat{C} = \mat{1}
        
        Parameters
        ----------
        dx : 2-d array
           `(N, n)` array $\ket{\d{\mat{X}}}$.
        dg : 2-d array
           `(N, n)` array $\ket{\d{\mat{G}}}$.
        j_inv : operator
           `(N, N)` operator $\mat{B}_{0}$ representing the previous
           inverse Jacobian.  This is used to compute the residue and
           to implement the 1a) `w0` method.

        Notes
        -----
        The following methods permitted as specified by :attr:`inv_method`:

        `'svd'` :
           Subspace decomposition based on the singular value
           decomposition and corresponds to the choice

           .. math::
              \mat{\Gamma} = \ket{\uvect{B}_{\perp}}

           which is the orthonormal complement to the span
           $\ket{\uvect{B}}$ of $\ket{\mat{B}}$.

           In the basis $\left(\ket{\uvect{B}},
           \ket{\uvect{B}_{\perp}}\right)$, the matrix
           $\ket{\mat{A}}\bra{\mat{B}}$ is block lower
           triangular:

           .. math::
              \begin{pmatrix}
                \braket{\uvect{B}|\mat{A}}
                \braket{\mat{B}|\uvect{B}} & \mat{0}\\
                \braket{\uvect{B}_{\perp}|\mat{\tilde{A}}}
                \braket{\mat{B}|\uvect{B}} & w_{0}^2\mat{1}
              \end{pmatrix}^{-1}.

           Hence the inverse is also block lower triangular so that the
           action of $\bra{\mat{B}}$ will project out only the upper
           component and we can ignore $w_{0}$, writing:

           .. math::
              \bra{b} &= \braket{\mat{B}|\mat{A}}^{-1}
                         \bra{\mat{B}}\mat{C}\\
              \bra{b}_{\text{1a}} &=
                 \left(
                   \mat{w}^2\braket{\d\mat{X}|\mat{B}_{0}|\d\mat{G}}
                 \right)^{-1}
                 \mat{w}^2\bra{\d\mat{X}}\mat{B}_{0}
               = \braket{\d\mat{X}|\mat{B}_{0}|\d\mat{G}}^{-1}
                 \bra{\d\mat{X}}\mat{B}_{0}\\
              \bra{b}_{\text{2b}} &=
                 \left(
                   \mat{w}^2\braket{\d\mat{G}|\d\mat{G}}
                 \right)^{-1}
                 \mat{w}^2\bra{\d\mat{G}}
               = \braket{\d\mat{G}|\d\mat{G}}^{-1}
                 \bra{\d\mat{G}}

           This makes it clear that the weights are irrelevant in this
           method: They are only significant if the system is
           over-determined or for restricting the subspace.
        `'w0'` :
           Here we set $\mat{\Gamma} = w_0\mat{1}$.  This is
           what is done in [Johnson:1988]_.  The inversion formula
           becomes:

           .. math::
              \bra{b} &=
              \frac{1}{
                \braket{\mat{B}|\mat{B}_{0}|\mat{A}}
                + w_0^2\mat{1}}
              \bra{\mat{B}}\mat{C}\\
              \bra{b}_{\text{1a}} &=
              \frac{1}{
                \mat{w}^2\braket{\d\mat{X}|\mat{B}_{0}|\d\mat{G}}
                + w_0^2\mat{1}}
              \mat{w}^2\bra{\d\mat{X}}\mat{B}_{0}
            = \frac{1}{
                \braket{\d\mat{X}|\mat{B}_{0}|\d\mat{G}}
                + w_0^2\mat{w}^{-2}}\bra{\d\mat{X}}\mat{B}_{0}\\
              \bra{b}_{2b} &=
              \frac{1}{
                \mat{w}^2\braket{\d\mat{G}|\d\mat{G}}
                + w_0^2\mat{1}}
              \mat{w}^2\bra{\d\mat{G}}
            = \frac{1}{
                \braket{\d\mat{G}|\d\mat{G}}
                + w_0^2\mat{w}^{-2}}\bra{\d\mat{G}}
        `'numeric'` :
           This implements the numerical method.  This is the same as
           the `'w0'` method except that `w0` is chosen to be as small
           as possible to ensure that the smallest singular value of
           the denominator is larger than :attr:`min_svd` times the
           largest.

        .. note::
           A desirable property is that the update work for perfectly
           linear models in the sense that, give a linear
           model: $\ket{\mat{X}} = \ket{\mat{X}_0} +
           \mat{J}^{-1}\ket{\mat{G}}$, the update will satisfy 
           $\mat{B}\ket{\mat{G}} = \mat{J}^{-1}\ket{\mat{G}}$: In
           other words, the action of $\mat{B}$ and $\mat{A}$ are the
           same on the subspace $\ket{\uvect{G}}$ spanned by
           $\ket{\d\mat{G}}$.  This require that:

           .. math::
              \ket{\mat{G}}\braket{b|\uvect{G}} =
              \ket{\mat{G}}\bra{\mat{B}}
              \frac{1}{\ket{\mat{A}}\bra{\mat{B}}
                       + w_0^2\mat{\Gamma}\mat{\Gamma}^{T}}\mat{C}
              \ket{\uvect{G}} = \ket{\uvect{G}}

           which is satisfied by both `'svd'` methods but only
           approximated by the `'w0'` method in the limit $w_{0}
           \rightarrow 0$.
           
           One potential problem is that the weights might be very
           small, cause a premature truncation of the SVD, however, a
           separate decomposition of the components fails to preserve
           the desired behaviour:
           
           >>> np.random.seed(0)
           >>> N = 10; n=5; m = 1
           >>> A = np.mat(np.random.rand(N, N))
           >>> B0 = np.mat(np.random.rand(N, N))
           >>> B0 = 0*np.eye(N)
           >>> G = np.random.rand(N, n-m)
           >>> G = np.mat(np.dot(np.hstack((G, ) + (G[:, :1], )*m),
           ...                              np.random.rand(n, n)))
           >>> Ug, dg, VTg = np.linalg.svd(G); Ug = Ug[:, :n]; VTg = VTg[:n, :]
           >>> Pg = Ug[:, :n-m]
           >>> X = A*G
           >>> b = X
           >>> a = G
           >>> w = np.mat(np.diag(np.random.rand(n)))
           >>> wi = np.linalg.inv(w)
           >>> Qa, Ra = np.linalg.qr(a)
           >>> Qb, Rb = np.linalg.qr(b)
           >>> U, d, VT = np.linalg.svd(Ra*w*Rb.T)
           >>> di = np.where(d > 1e-12, 1/d, 0) # Singular portions.        
           >>> Inv1 = Qb*VT.T*np.diag(di)*U.T*Qa.T
           >>> np.allclose(G*w*b.T*Inv1*Pg, Pg)
           True

           >>> Ua, da, VTa = np.linalg.svd(a); Ua = Ua[:, :n]; VTa = VTa[:n, :]
           >>> Ub, db, VTb = np.linalg.svd(b); Ub = Ub[:, :n]; VTb = VTb[:n, :]
           >>> dai = np.where(da > 1e-12, 1/da, 0) # Singular portions.
           >>> dbi = np.where(db > 1e-12, 1/db, 0) # Singular portions.        
           >>> Inv2 = Ub*np.diag(dbi)*VTb*wi*VTa.T*np.diag(dai)*Ua.T
           >>> np.allclose(G*w*b.T*Inv2*Pg, Pg)
           False
        """
        w0, w = self._get_weight(dx=dx, dg=dg)
        w2 = dot(w.T, w)

        at = dg.T
        if 'J' == self.method:
            # Broyden's "good" method
            if self.dyadic:
                b = j_inv.__rmul__(dx)
            else:
                b = dot(dx, j_inv)
        elif 'B' == self.method:
            # Broyden's "bad" method
            b = dg
        else:
            raise ValueError("Method 'method' = %s not supported." %
                             (self.method, ))
        # Include weights
        b = np.asarray(dot(w2, b))

        if 'svd' == self.inv_method:
            # Should optimize so that copies are not made.
            u, d, vt = np.linalg.svd(dot(b, at))
            d_inv = np.where(d/d.max() < self.min_svd, 0, 1/d)
            inv = dot(vt.T*d_inv, u.T)
            bra_b = dot(inv, b)
        elif self.inv_method in ['w0', 'numeric']:
            # Johnson's method
            den = dot(b, at)
            if 'numeric' == self.inv_method:
                eigs = np.linalg.eigvals(den)
                w02 = self.min_svd*abs(eigs).max() - min(0, eigs.real.min())
            else:
                w02 = w0**2
                
            den += w02*np.eye(len(b))
            bra_b = np.linalg.solve(den, b)
        else:
            raise ValueError("Method 'inv_method' = %s not supported." %
                             (self.inv_method, ))

        ket_a = dx.T - np.asarray(j_inv*np.mat(dg.T))
        return ket_a, bra_b

    def add_point(self, x, g):
        r"""Add point `(x, g)` to :attr:`xs` and :attr:`gs` and return
        the index.

        Notes
        -----
        If using a memory limited version, only :attr:`n_prev` points
        are maintained.  The order in which points are replaced is
        specified by :attr:`replace_method` which may have one of the
        following values:
        
        `'furthest'`:
           Replace the point that is furthest from `x`, thus keeping
           points close to the solution.
        `'largest'`:
           Replace the point that has the largest $\norm{g}_{2}$, thus
           keeping points close to the solution.
        `'oldest'`:
           Replace the oldest point, thus keeping the most recent
           points.
        
        Examples
        --------
        >>> np.random.seed(0)
        >>> B = JinvBroyden()
        >>> x0 = np.random.random(3)
        >>> g0 = 2*x0
        >>> B.add_point(x=x0, g=g0)
        0
        >>> B.add_point(x=2*x0, g=2*g0)
        1

        Adding a duplicate `x` value will just change `g`: 
        >>> len(B.xs)
        2
        >>> B.add_point(x=x0, g=g0/2)
        0
        >>> len(B.xs)
        2
        >>> np.allclose(B.gs[0], g0/2)
        True

        If adding the point would make more than :attr:`n_prev`
        points, then either the `'furthest'` point from `x` or the point
        with `'largest'` `g` is replaced, as specified by the flag
        :attr:`replace_method`.

        >>> B.n_prev = 2
        >>> B.replace_method = 'furthest'
        >>> B.add_point(x=3*x0, g=0*g0)
        0
        >>> B.replace_method = 'largest'        
        >>> B.add_point(x=3.1*x0, g=1*g0)
        1

        To reproduce the original Broyden behaviour, one should use
        the `'oldest'` :attr:`replace_method`:

        >>> B = JinvBroyden(replace_method='oldest', n_prev=3)
        >>> for n in xrange(5):
        ...     B.add_point(x=n*x0, g=n*g0), B._age
        (0, array([0]))
        (1, array([1, 0]))
        (2, array([2, 1, 0]))
        (0, array([0, 2, 1]))
        (1, array([1, 0, 2]))

        Here is an example of the original Broyden behaviour (these
        results were precomputed using original Broyden routines):

        >>> def g(x): return np.array([np.linalg.norm(x), x.sum()])
        >>> B = JinvBroyden(dyadic=False, method='J', n_prev=2,
        ...                 x_method='recent', replace_method='oldest',
        ...                 inv_method='svd')
        >>> x = np.array([1, 2])
        >>> for n in xrange(12):
        ...     incompat = B.set_point(x=x, g=g(x))
        ...     x = B.get_x()
        ...     print x
        [-1.23606798 -1.        ]
        [-2.53373794  0.82504005]
        [ 4.85954425 -4.07499765]
        [-10.4372347   11.47369596]
        [ 14.52239974 -14.63948627]
        [-78.27732015  77.51003951]
        [ 35.85727402 -35.8634207 ]
        [ 132.21082218 -132.33155288]
        [ 0.00993506 -0.01682379]
        [-0.00230879  0.00360778]
        [-0.00615491  0.0056663 ]
        [-0.00027107 -0.00017255]

        >>> B = JinvBroyden(dyadic=False, method='B', n_prev=2,
        ...                 x_method='recent', replace_method='oldest',
        ...                 inv_method='svd')
        >>> x = np.array([1, 2])
        >>> for n in xrange(12):
        ...     incompat = B.set_point(x=x, g=g(x))
        ...     x = B.get_x()
        ...     print x
        [-1.23606798 -1.        ]
        [-2.21588124  0.37800712]
        [-1.74692251  0.06694124]
        [-1.03087007  0.59532897]
        [ 0.73055565 -0.48593282]
        [ 2.35041873 -2.10818778]
        [-0.27073117  0.2984435 ]
        [-0.64849785  0.64808622]
        [-0.0555911   0.06113293]
        [-0.00685355  0.00753659]
        [  2.20253269e-07  -2.42442632e-07]
        [  5.12344665e-07  -5.17238905e-07]

        >>> B = JinvBroyden(dyadic=False, method='J', n_prev=3,
        ...                 x_method='recent',
        ...                 inv_method='svd')
        >>> x = np.array([1, 2])
        >>> for n in xrange(5):
        ...     incompat = B.set_point(x=x, g=g(x))
        ...     x = B.get_x()
        ...     print x
        [-1.23606798 -1.        ]
        [-2.53373794  0.82504005]
        [ 2.53310799 -2.53310799]
        [ 15.48393424 -15.48393424]
        [  ...e-12  ...e-12]
        """
        # Special case of empty xs
        if self.xs is None:
            self.xs = 1.0*x[None, :]
            self.gs = 1.0*g[None, :]
            self._age = np.array([0])
            return 0

        # Increase ages:
        self._age += 1
        
        # Special case of duplicate point
        ind = (x == self.xs).all(axis=1)
        if ind.any():
            # Don't add duplicate points, but change g and reset age.
            ind = np.where(ind)[0][0]
            self.gs[ind] = g
            self._age[ind] = 0
            return ind

        # Regular case
        if len(self.xs) < self.n_prev:
            self.xs = np.vstack([self.xs, x])
            self.gs = np.vstack([self.gs, g])
            ind = len(self.xs) - 1
            self._age = np.append(self._age, 0)
        else:
            if self.replace_method == 'furthest':
                dx = x - self.xs
                ind = np.argmax((dx*dx).sum(axis=1))
                del dx
            elif self.replace_method == 'largest':
                ind = np.argmax((self.gs*self.gs).sum(axis=1))
            elif self.replace_method == 'oldest':
                ind = np.argmax(self._age)
            else:
                raise ValueError(
                    "Unsupported replace_method = %s" %
                    (self.replace_method,))
            self.xs[ind] = x
            self.gs[ind] = g
            self._age[ind] = 0
        return ind
        
    def update(self, x0=None, g0=None, j_inv=None,
               skip_inds=None, include_skip=False, mags=1, warn=False):
        r"""Update the inverse Jacobian so that it is valid at the
        point `(x0, g0)` (use :attr:`x` and :attr:`g` by default).  Does not
        necessarily add `x0` and `g0` (use :meth:`add_point`) or set the current
        point (use :meth:`set_point`) but may do either or both depending on the
        algorithm.

        The update method is governed by :attr:`method` which may be
        one of the following:

        'J' :
           Update to minimize the Jacobian and related residues.  This
           corresponds to Broyden's "good" algorithm.
        'B' :
           Update to minimize the inverse Jacobian and related
           residues.  This corresponds to Broyden's "bad" algorithm.

        Parameters
        ----------
        x0, g0 : array
           New function value `g0 = G(x0)`.
        skip_inds : None, array, optional
           This is an array of the indices of sensitive parameters.
           The change in these indices will be ignored.  This is
           important if there are "sensitive" or "noisy" parameters
           that should only be trusted once a certain degree of
           convergence has been achieved.
        include_skip : bool
           If `True` and `skip_inds` is provided, then cross-terms are
           ignored, but both the main and skip blocks are updated.
        mags : array, optional
           This should be set to the vector of typical magnitudes for
           `G`.  One option is `abs(G0) + abs(G1)`.  Assumed to be `1`
           if not provided.

        Notes
        -----
        The inverse Jacobian is updated so that

        .. math::
           \mat{J}^{-1}\ket{\d{G}} = \ket{\d{x}}.

        There are two senses in which the update can be considered as
        minimal controlled by the flag :attr:`minimize_inverse`.  If
        this is `True`, we minimize :math:`\norm{J^{-1}_0 -
        J^{-1}_1}_{\text{Frobenius}}`: 

        .. math::
           J^{-1} \rightarrow 
              J^{-1} + \frac{(\ket{\d{x}} - J^{-1}\ket{\d{G}})\bra{\d{G}}}
                            {\braket{\d{G}|\d{G}}},

        otherwise we minimize :math:`\norm{J_0 - J_1}_{\text{Frobenius}}`:
          
        .. math::
           J \rightarrow
              J + \frac{(\ket{\d{G}} - J\ket{\d{x}})\bra{\d{x}}}
                       {\braket{\d{x}|\d{x}}}.

        This can be analytically inverted to give an update for the inverse:

        .. math::
           J^{-1} \rightarrow J^{-1}
           + \frac{\ket{\d{x}} - J^{-1}\ket{\d{G}}\bra{\d{x}}J^{-1}}
                  {\braket{\d{x}|J^{-1}|\d{G}}}.

        Any update that satisfies the secant condition can be expressed as:

        .. math::
           J^{-1} \rightarrow J^{-1}
           + \frac{\ket{\d{x}} - J^{-1}\ket{\d{G}}\bra{v}}
                  {\braket{v|\d{G}}}.

        as long as :math:`\braket{v|\d{G}}` is not zero.  The two
        conditions correspond to different choices of either
        :math:`\bra{v}=\bra{\d{G}}` to minimize the Jacobian norm
        (Broyden's "second" method) or
        :math:`\bra{v}=\bra{\d{x}}J^{-1}` to minimize the inverse
        Jacobian norm (Broyden's "second" method).  Numerical Recipies
        has the latter but I am not sure it is better...  It is more
        complicated.

        Examples
        --------

        >>> B = JinvBroyden()
        >>> B.set_point([1, 2], [1, 2])
        0.0
        >>> B.j_inv
        1
        >>> B.set_point([3, 4], [3, 4])
        0.0

        This should change g but not add a new x
        >>> B.set_point([1, 2], [4, 5])       # doctest: +ELLIPSIS
        1.06...
        >>> B.xs
        array([[ 1., 2.],
               [ 3., 4.]])
        >>> B.gs
        array([[ 4., 5.],
               [ 3., 4.]])
        
        """
        if x0 is None:
            x0 = self.x
            g0 = self.g
            self._x0 = None
            self._g0 = None
        else:
            x0 = self._x0 = np.asarray(x0).ravel()
            g0 = self._g0 = np.asarray(g0).ravel()

        j_inv = self.j_inv

        if j_inv is 1:
            j_inv = np.eye(len(x0), dtype=float)

        #if skip_inds is not None:
        #    skip_inds = np.asarray(skip_inds, dtype=int)
        #    if include_skip:
        #        dx_skip = 1*dx
        #        dg_skip = 1*dg
        #    dg[skip_inds] = 0           # Upper block
        #    dx[skip_inds] = 0
        #    dx_dgs = [(dx, dg)]
        #    if include_skip:
        #        dx_skip -= dx           # Skip block
        #        dg_skip -= dg
        #        dx_dgs.append((dx_skip, dg_skip))
        #else:
        #    dx_dgs = [(dx, dg)]

        # Add points

        if self.xs is None: 
            # Initial update.  Nothing to do.
            return

        x0 = x0[None, :]
        g0 = g0[None, :]

        dx = self.xs - x0
        dg = self.gs - g0

        dx_norms = (dx*dx).sum(axis=1)/(x0*x0).sum(axis=1)

        is_ = np.where(dx_norms >= self._dx_norm2_min)[0]
        if 0 == len(is_):
            # Do nothing.
            return
            #raise ValueError(
            #    "No points to update! " +
            #    "max(dx_norms) = %g < _dx_norm2_min = %g" %
            #    (dx_norms.max(), self._dx_norm2_min))
        dx = dx[is_, :]
        dg = dg[is_, :]
        dx_norms = dx_norms[is_]

        if self.check_roundoff:
            # Check calculation for roundoff errors.
            x_err = _EPS/np.sqrt(dx_norms + _TINY)
            dg_norms = (dg*dg).sum(axis=1)/(g0*g0).sum(axis=1)
            g_err = _EPS/np.sqrt(dg_norms + _TINY)
            if (x_err > self.roundoff_cut).any() and warn:
                print("Warning: %i roundoff errors in dx" %
                      (x_err > self.roundoff_cut).sum())
            if (g_err > self.roundoff_cut).any() and warn:
                print("Warning: %i roundoff errors in dg" %
                      (g_err > self.roundoff_cut).sum())
            is_ = np.where(np.logical_and(x_err <= self.roundoff_cut,
                                          g_err <= self.roundoff_cut))[0]
            dx = dx[is_, :]
            dg = dg[is_, :]

        # B -> B + ket_a*bra_b
        ket_a, bra_b = self._inversion(dx=dx, dg=dg, j_inv=j_inv)
        self.j_inv = j_inv
        self.add_dyad(at=ket_a, b=bra_b)

    def set_point(self, x, g, **kw):
        r"""Add `x` and `g` and make this the current point `(x, g)` and update
        the Jacobian to be valid here.

        Return a measure of the incompatibility of the new data with
        the current approximation.  This need not "add" the point to
        the current data set (the functionality provided by
        :meth:`add_point`).

        Parameters
        ----------
        x, g : array
           New function value `g = G(x)`.
        **kw :
           See :meth:`update`.

        As a measure of the significance of this correction, we return
        the following dimensionless quantity:

        .. math::
             \chi_G = \frac{\norm{(\mat{J}^{-1} - \mat{J}_{0}^{-1})\d{G}}}
                           {\max_{i=0, 1}\norm{\mat{J}^{-1}_{i}\d{G}}}

        Examples
        --------
        >>> B = JinvBroyden()
        >>> B.set_point([1, 2], [1, 2])
        0.0
        >>> B.j_inv
        1
        >>> B.set_point([3, 4], [3, 4])
        0.0

        This should change g but not add a new x
        >>> B.set_point([1, 2], [4, 5])       # doctest: +ELLIPSIS
        1.06...
        >>> B.xs
        array([[ 1., 2.],
               [ 3., 4.]])
        >>> B.gs
        array([[ 4., 5.],
               [ 3., 4.]])
        
        """
        x0 = self.x
        g0 = self.g

        x = np.asarray(x).ravel()
        g = np.asarray(g).ravel()

        self.curr = self.add_point(x, g)
        self.update(**kw)
        return self.incompatibility(x, g, x0, g0)

    def add_dyad(self, at, b):
        r"""Add the dyad $\ket{a}\bra{b}$ to :attr:`j_inv`.

        Parameters
        ----------
        at : array
           1-d array or 2-d `(N, n)` array $\ket{a}$.
        b : array
           1-d array or 2-d `(n, N)` array $\ket{b}$.
        """
        at = np.asarray(at)
        b = np.asarray(b)

        if 1 == len(at.shape):
            at = at[:, None]
        if 1 == len(b.shape):
            b = b[None, :]
        if self.dyadic:
            self.j_inv.add_dyad(at=at, b=b)
        else:
            self.j_inv += dot(at, b)
        
    def get_x(self, x_controls=None,  min_step=_EPS, max_step=np.inf):
        r"""Return `x`: the Broyden step to get to the root of `G(x)`.

        This is the downhill Newton step from the point specified by
        :attr:`curr` that satisfies the constraints `x_controls`.

        Parameters
        ----------
        x_controls : (array, array), optional
           Tuple `(inds, vals)` indices and the respective values of the
           control parameters.
        min_step : float, optional
           Minimum step length for testing.  Used to give some
           non-zero step `dx` of length `sqrt(min_step) even if
           `j_inv` is singular.  This may be random.  A warning will
           be raised.
        max_step : float, optional
           Maximum length of the step.
           
        Examples
        --------
        Here is a little test with a linear function

        >>> j = np.array([[1., 2, 3], [1, -3, 2], [-2, 1, 4]])
        >>> j_inv = JinvBroyden(j_inv=np.linalg.inv(j))
        >>> sol = np.array([1, 2, 3])
        >>> def G(x):
        ...     return np.dot(j, x - sol)

        >>> x0 = np.array([0.0, 0.0, 0.0])
        >>> x = np.array([1.0, 2.0, 3.0])

        Now try to step to the solution, but hold the second variable as
        a control at 1.  Since the function is exactly linear and we
        have explicitly given the correct Jacobian, we should end up
        with a root in the other two parameters.

        >>> j_inv.set_point(x=x, g=G(x))
        0.0
        >>> x = j_inv.get_x(x_controls=([1], [1.0]))
        >>> x
        array([ 1.5,  1. ,  3.5])
        >>> np.allclose(G(x), [0., 4.5, 0.])
        True

        This needs to work with very large arrays, so here is an
        example.  Here we use an example where the vectors have length
        $2^{20}$ and we have $2^{11} = 2048$ controls.  Implemented poorly,
        the arrays would be too large to be indexed.

        >>> np.random.seed(0)
        >>> n_x = 2**20
        >>> n_d = 2**11
        >>> a = np.random.random(n_x)
        >>> b = np.random.random(n_x)
        >>> g = np.random.random(n_x)
        >>> x = np.zeros(n_x)
        >>> inds = np.unique(np.random.randint(0, n_x, size=n_d))
        >>> n_d = len(inds)
        >>> vals = np.random.rand(n_d)
        >>> x_controls = (inds, vals)
        >>> j_inv = JinvBroyden(dyadic=True)
        >>> j_inv.j_inv.add_dyad(a, b)
        >>> j_inv.set_point(x=a, g=b)
        0.0
        >>> x = j_inv.get_x(x_controls=x_controls)
        >>> np.allclose(x[inds], vals)
        True
        """
        if self.curr is None or 0 == len(self.xs):
            raise ValueError(
                "No `xs` stored: call update first.")

        x = self.xs
        g = self.gs
        x = self.xs[self.curr]
        g = self.gs[self.curr]
        
        j_inv = self.j_inv
        if j_inv is 1 or self.dyadic and 0 == len(self.j_inv.at):
            dx = -g
            if x_controls is not None and 0 < len(x_controls[0]):
                dx[x_controls[0]] = x_controls[1] - x[x_controls[0]]
        else:
            if x_controls is not None and 0 < len(x_controls[0]):
                inds, vals = x_controls
                inds = np.asarray(inds, dtype=int)
                vals = np.asarray(vals)
                dg_dmu = copy.copy(g)
                n_d = len(inds)
                n_x = len(x)

                d = j_inv[inds[:, None], inds[None, :]]

                dg_dmu[inds] = x[inds] - vals 
                dmu = _ket(dg_dmu[inds])
                b = (np.asarray(j_inv*_ket(dg_dmu)))[inds, :]
                dN = dmu - np.linalg.solve(d, b - dmu)
                dg_dN = dg_dmu
                dg_dN[inds] = dN[:]
                dx_dmu = np.asarray(j_inv*_ket(dg_dN))

                assert np.allclose(dx_dmu[inds], dmu)
                # Set these to make sure no round-off errors have
                # accumulated.
                dx_dmu[inds].flat = x[inds] - vals

                dx = -dx_dmu
            else:
                dx = -np.asarray(j_inv*_ket(g))
            dx = np.asarray(dx).ravel()
        g_mag = np.linalg.norm(g)
        dx_mag = np.linalg.norm(dx)
        if dx_mag == 0 and g_mag > 0:
            warnings.warn("Warning: Singular Jinv, taking random step")
            dx = np.random.rand(len(dx))*np.sqrt(min_step)
        elif np.divide(dx_mag, g_mag) < self.min_j_inv:
            warnings.warn("Warning: Singular Jinv, taking minimum step")
            dx *= math.sqrt(min_step)/np.linalg.norm(dx)

        dx_len = math.sqrt((dx*dx).sum())
        if max_step < dx_len:
            dx *= max_step/dx_len

        return x + dx

    def reset(self):
        r"""Reset the Jacobian but keep :attr:`xs` and :attr:`gs`.

        See Also
        --------
        :meth:`reset_all`
        """
        if self.dyadic:
            self.j_inv.reset()
        else:
            self.j_inv = 1

    def reset_all(self):
        r"""Reset both the points :attr:`xs`, :attr:`gs`, and the
        Jacobian.

        See Also
        --------
        :meth:`reset`
        """
        self.suspend()
        self.xs = None
        self.gs = None
        self.curr = None
        self.reset()
        self.resume()
        
    @property
    def J(self):
        r"""Current approximation of Jacobian."""
        if self.dyadic:
            j_inv = self._j_inv.asarray()
        else:
            j_inv = self._j_inv
        return np.linalg.inv(j_inv)

    def incompatibility(self, x, g, x0, g0, skip_inds=None, complement=False):
        r"""Return a dimensionless measure of the compatibility of the
        specified step with the current Jacobian.  An incompatibility of
        0 means perfectly compatible (this is what should happen if
        the function is linear and the Jacobian is correct along the
        specified step).

        There are several options here as specified by
        :attr:`incompatibility_measure`.

        `dg` : This option considers the change in the inverse derivative
           the direction `dg`

           .. math::
              \chi_G = \frac{\norm{(\mat{J}^{-1} - \mat{J}_{0}^{-1})\d{G}}}
                            {\max_{i=0, 1}\norm{\mat{J}^{-1}_{i}\d{G}}}
                            
        `sigma` : This option is valid with the :attr:`dyadic` option
           and only considers the incompatibility in the space spanned
           by the dyads.

        Examples
        --------
        >>> def g(x):
        ...     return x*x + x*np.sin(x.sum())

        >>> j = JinvBroyden()
        >>> x0 = np.zeros(10)
        >>> g0 = g(x0)
        >>> j.set_point(x=x0, g=g0)
        0.0
        >>> x1 = np.ones(10)
        >>> g1 = g(x1)
        >>> j.incompatibility(x=x1, g=g1, x0=x0, g0=g0)  # doctest: +ELLIPSIS
        0.377...

        This does not modify the Jacobian as can be noted by the fact
        that update returns the same incompatibility
        >>> j.set_point(x=x1, g=g1)        # doctest: +ELLIPSIS
        0.377...
        
        After an update, the step should be fully compatibility:
        >>> j.incompatibility(x=x1, g=g1, x0=x0, g0=g0) < 1e-12
        True
        """
        if self.xs is None:
            # If there are no other points, then consider the update
            # compatible
            return 0
        
        dx = x[None, :] - self.xs
        dg = g[None, :] - self.gs

        if skip_inds is not None:
            skip_inds = np.asarray(skip_inds)
            if complement:
                dx_ = 1*dx
                dg_ = 1*dg
                dx_[skip_inds] = 0
                dg_[skip_inds] = 0
                dx -= dx_
                dg -= dg_
            else:
                dx[skip_inds] = 0
                dg[skip_inds] = 0
            
        ng = np.linalg.norm(dg)
        if 0 == ng:
            res = np.linalg.norm(dx)/(ng + _TINY)
        elif 'dg' == self.incompatibility_measure:
            nx = np.linalg.norm(dx)
            dg = dg/(ng + _TINY)
            dx = dx/(ng + _TINY)
            dx1 = self.__rmul__(dg)
            if skip_inds is not None:
                if complement:
                    dx1_ = 1*dx1
                    dx1_[skip_inds] = 0
                    dx1 -= dx1_
                else:
                    dx1[skip_inds] = 0
            res = (np.linalg.norm(dx - dx1)/
                   (max(np.linalg.norm(dx1), nx) + _TINY))
        elif 'sigma' == self.incompatibility_measure and self.dyadic:
            raise NotImplementedError(
                "incompatibility_measure '%s' not implemented." %
                (self.incompatibility_measure, ))            
        else:
            raise NotImplementedError(
                "incompatibility_measure '%s' not implemented." %
                (self.incompatibility_measure, ))
        return res
    
    def change_basis(self, t):
        r"""Use the functions `x_new = t(x)` and `g_new = t(g)` to
        update the data corresponding to a change of basis.  Raise an
        :exc:`AttributeError` if the change cannot be implemented (or
        don't define this method.)

        Parameters
        ----------
        t : function
           Change of basis transformation (should be essentially
           matrix multiplication, but for large bases one might want
           to prevent forming this matrix explicitly.)

        Notes
        -----
        Let the function be represented by the set of
        points :math:`f^{n}_i = f(x^{n}_i)` on the smaller set of
        abscissa and :math:`f^{N}_j = f(x^{N}_j)` on the larger set.
        The transformation is of the form:

        .. math:: f^N_i = T_{ij}f^{n}_{j}

        We update the inverse Jacobian in the following way:

        .. math:: J^{-1}_N = \mat{1} + T(J^{-1}_n - \mat{1})T^{T}

        Several other options have been considered, but in practise
        this has proved to be not only the easiest to implement, but
        also the most efficient.  If we do not add and subtract the
        identity, then the resultant inverse Jacobian will be rank
        deficient and thus the resulting steps will never explore the
        singular directions which is highly undesirable.  One can also
        think of this as directly transforming the inverse Jacobian
        related to the fixed-point problem :math:`f \rightarrow f -
        G[f]`.

        Also, if using the dyadic representation, then this procedure
        only requires applying the linear interpolation operator to
        the components of the dyads: Transforming the identity portion
        would change the form requiring a significantly more
        complicated representation.

        We have also explored using :math:`T^{-1}` in place of
        :math:`T^T`, but this has several problems: 1) Computing
        :math:`T^{-1}` can be expensive if the basis is large, 2)
        Since :math:`T` is rank deficient, :math:`T^{-1}` is not
        unique, but must be chosen from one of the pseudo-inverses and
        it is not clear how to best do this, 3) For the diadic
        representation, we would like to work directly with the
        interpolation operator on both components of the dyads, so
        using the transpose is much easier than having to solve the
        inverse problem.

        If :math:`T^T\cdot T \approx \mat{1}`, then the interpretation
        of the transformation is this: We have built up the Jacobian
        inverse as minimal updates from the identity using the
        displacements :math:`\ket{\d{x}}` and :math:`\ket{\d{G}}`.
        The transformed inverse Jacobian using the above proceedure is
        what would have been obtain by minimal updates from the
        identity using the transformed displacements
        :math:`T\ket{\d{x}}` and :math:`T\ket{\d{G}}`.  Thus, the
        transformed Jacobian inverse should have the same amount of
        information as the original, and should in general reduce the
        number of operations required to form a good approximation by
        the rank of :math:`J^{-1} - \mat{1}`.

        It may be beneficial to choose the interpolation operator such
        that :math:`T^T\cdot T = \mat{1}` over the smaller basis.  We
        have not investigated this, but standard interpolation
        operators seem sufficiently close to the identity that this
        extra complication is probably not justified.

        The code contains additional update methods for testing
        purposes to allow these results to be verified (except for the
        dyadic representation which can only be easily transformed
        using this method).  Here we present results for a few
        problems.

        >>> from scipy.interpolate import *
        >>> from mmf.objects import *
        >>> def G(f, x, normalize=True):
        ...     c1 = np.cosh(1)
        ...     dx = np.diff(x[1:])
        ...     ddf = np.diff(np.diff(f))/dx/dx
        ...     a = 1
        ...     if normalize:
        ...         a = 1 + 2./dx/dx
        ...     G = np.hstack([[f[0] - c1],
        ...                    (f[1:-1] - ddf)/a,
        ...                    [f[-1] - c1]])
        ...     return G
        >>> def test(N1, N2, normalize, meth, k=3, tol=1e-14):
        ...     '''Return `(i1, i2, iN, gain)`.  `i1` is the number of
        ...        steps at low resolution.  `i2` is the subsequent
        ...        number of steps required to polish the
        ...        high-resolution solution.  `iN` is the number of
        ...        steps at only the high resolution, and `gain = i1+i2
        ...        - iN` is the gain in speed.  (Negative is good)'''
        ...     x1 = np.linspace(-1, 1, N1)
        ...     x2 = np.linspace(-1, 1, N2)
        ...     #def t(f):
        ...     #    'The interpolation function'
        ...     #    return splev(x2, splrep(x1, f, s=0, k=k))
        ...     t = lambda f: splev(x2, splrep(x1, f, s=0, k=k))
        ...     f = 0*x1                # Initial guess
        ...     g = G(f, x1, normalize)
        ...     j_inv = JinvBroyden(_j_inv_transform=meth)
        ...     j_inv.set_point(x=f, g=g)
        ...     err = np.max(abs(g))
        ...     i1 = 1
        ...     while err > tol:
        ...         # Find solution over x1 abscissa
        ...         f0, g0 = f, g
        ...         f = j_inv.get_x()
        ...         g = G(f, x1, normalize)
        ...         err = np.max(abs(g))
        ...         j_inv.set_point(x=f, g=g)
        ...         i1 += 1
        ...     j_inv.change_basis(t)   # Change basis
        ...     f1 = t(f)
        ...     g1 = G(f1, x2, normalize)
        ...     j_inv.set_point(x=f1, g=g1)
        ...     err = np.max(abs(g1))
        ...     i2 = 1
        ...     while err > tol:
        ...         # Find solution over x2 abscissa
        ...         f0, g0 = f1, g1
        ...         f1 = j_inv.get_x()
        ...         g1 = G(f1, x2, normalize)
        ...         err = np.max(abs(g1))
        ...         j_inv.set_point(x=f1, g=g1)
        ...         i2 += 1
        ...     # Now try only at high resolution.
        ...     f1 = 0*f1
        ...     g1 = G(f1, x2, normalize)
        ...     j_inv = JinvBroyden(_j_inv_transform=meth)
        ...     j_inv.set_point(x=f1, g=g1)        
        ...     err = np.max(abs(g1))
        ...     i3 = 1
        ...     while err > tol:
        ...         # Find solution over x2 abscissa
        ...         f0, g0 = f1, g1
        ...         f1 = j_inv.get_x()
        ...         g1 = G(f1, x2, normalize)
        ...         err = np.max(abs(g1))
        ...         j_inv.set_point(x=f1, g=g1)
        ...         i3 += 1
        ...     return i1, i2, i3, (i1+i2) - i3
        >>> def tabulate(N1, N2, k=3, normalize=True, tol=1e-10):
        ...     meth = Container()
        ...     for meth.inv_l in [0, 1]:
        ...         for meth.inv_r in [0, 1]:
        ...             for meth.set_diag in [0, 1]:
        ...                 res = test(N1, N2, normalize,
        ...                            k=k, tol=tol,
        ...                            meth=dict(meth.items()))
        ...                 print repr(meth), res
        >>> tabulate(10, 50, k=1, tol=1e-12)
        Container(set_diag=0, inv_r=0, inv_l=0) (7, 28, 29, 6)
        Container(set_diag=1, inv_r=0, inv_l=0) (7, 28, 29, 6)
        Container(set_diag=0, inv_r=1, inv_l=0) (7, 28, 29, 6)
        Container(set_diag=1, inv_r=1, inv_l=0) (7, 39, 29, 17)
        Container(set_diag=0, inv_r=0, inv_l=1) (7, 29, 29, 7)
        Container(set_diag=1, inv_r=0, inv_l=1) (7, 40, 29, 18)
        Container(set_diag=0, inv_r=1, inv_l=1) (7, 29, 29, 7)
        Container(set_diag=1, inv_r=1, inv_l=1) (7, 41, 29, 19)

        Container(set_diag=0, inv_r=0, inv_l=0) (7, 28, 29, 6)
        Container(set_diag=1, inv_r=0, inv_l=0) (7, 28, 29, 6)
        Container(set_diag=0, inv_r=1, inv_l=0) (7, 28, 29, 6)
        Container(set_diag=1, inv_r=1, inv_l=0) (7, 40, 29, 18)
        Container(set_diag=0, inv_r=0, inv_l=1) (7, 29, 29, 7)
        Container(set_diag=1, inv_r=0, inv_l=1) (7, 40, 29, 18)
        Container(set_diag=0, inv_r=1, inv_l=1) (7, 29, 29, 7)
        Container(set_diag=1, inv_r=1, inv_l=1) (7, 41, 29, 19)

        Container(set_diag=0, inv_r=0, inv_l=0) (7, 28, 29, 6)
        Container(set_diag=1, inv_r=0, inv_l=0) (7, 28, 29, 6)
        Container(set_diag=0, inv_r=1, inv_l=0) (7, 28, 29, 6)
        Container(set_diag=1, inv_r=1, inv_l=0) (7, 42, 29, 20)
        Container(set_diag=0, inv_r=0, inv_l=1) (7, 29, 29, 7)
        Container(set_diag=1, inv_r=0, inv_l=1) (7, 42, 29, 20)
        Container(set_diag=0, inv_r=1, inv_l=1) (7, 29, 29, 7)
        Container(set_diag=1, inv_r=1, inv_l=1) (7, 43, 29, 21)


        Container(set_diag=0, inv_r=0, inv_l=0) (7, 30, 30, 7)
        Container(set_diag=1, inv_r=0, inv_l=0) (7, 30, 30, 7)
        Container(set_diag=0, inv_r=1, inv_l=0) (7, 31, 30, 8)
        Container(set_diag=1, inv_r=1, inv_l=0) (7, 38, 30, 15)
        Container(set_diag=0, inv_r=0, inv_l=1) (7, 32, 30, 9)
        Container(set_diag=1, inv_r=0, inv_l=1) (7, 40, 30, 17)
        Container(set_diag=0, inv_r=1, inv_l=1) (7, 31, 30, 8)
        Container(set_diag=1, inv_r=1, inv_l=1) (7, 41, 30, 18)

        The results are as follows: First the number of iterations
        performed on the small system.  Second is the number of iterations
        performed on the system with the interpolation applied to the
        Jacobian.  The sum of these indicates how many total
        iterations were required to converge in the two-stage process.
        Third is the number of iterations required to converge to the
        final solution using only the large system.  The last number
        is the improvement of the two-stage method (negative is good).
        
        Here are the old results with the old Broyden method::
        
          Container(set_diag=0, inv_r=0, inv_l=0) (9, 66, 109, -34)
          Container(set_diag=1, inv_r=0, inv_l=0) (9, 68, 109, -32)
          Container(set_diag=0, inv_r=1, inv_l=0) (9, 95, 109, -5)
          Container(set_diag=1, inv_r=1, inv_l=0) (9, 70, 109, -30)
          Container(set_diag=0, inv_r=0, inv_l=1) (9, 99, 109, -1)
          Container(set_diag=1, inv_r=0, inv_l=1) (9, 96, 109, -4)
          Container(set_diag=0, inv_r=1, inv_l=1) (9, 77, 109, -23)
          Container(set_diag=1, inv_r=1, inv_l=1) (9, 71, 109, -29)
        """
        self.suspend()
        
        j_inv = self.j_inv

        # We won't transform the points and function values because
        # the function is not linear, so this would make the values
        # inaccurate.  Instead, we reset these.
        self.xs = None
        self.gs = None
        
        # Now apply transform to the Jacobian.
        if j_inv is 1:
            return

        if self.dyadic:
            j_inv.apply_transform(t)
        else:
            if not hasattr(self, '_j_inv_transform'):
                # One could also apply t directly to the columns and
                # then rows of j_inv - I but I don't think this will be
                # any faster.

                N, N = j_inv.shape
                I = np.eye(N)
                t_mat = np.mat(np.vstack([t(I[n, :])
                                          for n in xrange(N)]).T)
                M, N = t_mat.shape
                j_inv = np.eye(M) + t_mat*(j_inv - I)*t_mat.T

            else:
                d = np.diag(j_inv)
                N, N = j_inv.shape
                I = np.eye(N)
                t_mat = np.mat(np.vstack([t(I[n, :])
                                          for n in xrange(N)]).T)
                ti_mat = np.linalg.pinv(t_mat)
                
                if self._j_inv_transform.get('inv_l', False):
                    tl = ti_mat.T
                else:
                    tl = t_mat

                if self._j_inv_transform.get('inv_r', False):
                    tr = ti_mat
                else:
                    tr = t_mat.T

                if self._j_inv_transform.get('direct', False):
                    j_inv = tl*j_inv*tr
                else:
                    M, N = t_mat.shape
                    j_inv = np.eye(M) + tl*(j_inv - I)*tr

                if self._j_inv_transform.get('set_diag', False):
                    m = np.arange(j_inv.shape[0])
                    j_inv[m, m] = t(d)
            self.j_inv = j_inv
        self.resume()

def _ket(x):
    """Return `x` as a ket (column) vector."""
    return np.mat(x.ravel()).T
    
def _bra(x):
    """Return `x` as a bra (row) vector."""
    return np.mat(x.ravel())
    
def _add_dyad(J, a, b):
    """Add the dyad `|a><b|` to the matrix `J` inplace and return
    `J`."""
    try:
        J.add_dyad(a, b)
    except AttributeError:
        J += _ket(a)*_bra(b)

# class Jinv(StateVars):
#     r"""Class to represent an inverse Jacobian."""
#     implements(IJinv)
#     _state_vars = [
#         ('dyadic', False,
#             r"""If `True`, then use a dyadic representation for Jinv.  If
#             this is a number greater than 1, then this number is
#             passed as :attr:`DyadicSum.n_max` limiting the rank of the
#             dyadic representation."""),
#         ('j_inv', 1, "Internal representation of matrix"),
#         ('robust', False,
#             """If `True`, then use the singular value decomposition to
#             analyse the Jacobian.  Much slower, but could prevent some
#             numerical problems."""),
#         ('max_j_inv', 100.0,
#             """If :attr:`robust`, then the inverse Jacobian is clipped
#             so that the maximum value is this."""),
#         ('min_j_inv', 0.0001,
#             """The inverse Jacobian can become singular.  If this is
#             the case, then a step might have very small length even if
#             the error is not small (the step size can be reduced by a
#             factor of the minimum singular value of the inverse
#             Jacobian). If the step reduction is less than this value,
#             then the step is scaled up to have length
#             :attr:`min_step`."""),
#         ('minimize_inverse', False,
#             """If `True`, then use the update the minimizes the change
#             in the Frobenius norm of the inverse Jacobian, otherwise
#             we use the Sherman-Morrison formula to update so as to
#             minimize the norm of the Jacobian."""),
#         ('incompatibility_measure', 'dg',
#             r"""Specifies which criterion to use for measured the
#             compatibility of an update with the present inverse
#             Jacobian.  See :attr:`incompatibility` for supported
#             methods."""),
#         ('_j_inv_transform', NotImplemented,
#             """Optional methods for updating the Jacobian inverse in
#             :meth:`change_basis`.  For testing only."""),
#         ]
#     process_vars()
#     def __init__(self, *v, **kw):
#         # pylint: disable-msg=W0231
#         if self.dyadic:
#             n_max = np.inf
#             if self.dyadic > 1:
#                 n_max = self.dyadic
#             if self.j_inv is 1:
#                 self.j_inv = DyadicSum(n_max=n_max)
#             elif isinstance(self.j_inv, DyadicSum):
#                 self.j_inv.n_max = n_max
#             else:
#                 raise NotImplementedError(
#                     "No support yet for initializing a" +
#                     " DyadicSum from an array.")

#     def __mul__(self, dg):
#         return self.j_inv.__mul__(dg)

#     def __rmul__(self, dxt):
#         return self.j_inv.__rmul__(dxt)

#     def reset(self, m0=NotImplemented):
#         self.j_inv.reset(m0=m0)
        
#     def change_basis(self, t):
#         r"""Use the function `x_new = t(x)` to update the data
#         corresponding to a change of basis.  Raise an
#         :exc:`AttributeError` if the change cannot be implemented (or
#         don't define this method.)

#         Parameters
#         ----------
#         t : function
#            Change of basis transformation (should be essentially
#            matrix multiplication, but for large bases one might want
#            to prevent forming this matrix explicitly.)

#         Notes
#         -----
#         Let the function be represented by the set of
#         points :math:`f^{n}_i = f(x^{n}_i)` on the smaller set of
#         abscissa and :math:`f^{N}_j = f(x^{N}_j)` on the larger set.
#         The transformation is of the form:

#         .. math:: f^N_i = T_{ij}f^{n}_{j}

#         We update the inverse Jacobian in the following way:

#         .. math:: J^{-1}_N = \mat{1} + T(J^{-1}_n - \mat{1})T^{T}

#         Several other options have been considered, but in practise
#         this has proved to be not only the easiest to implement, but
#         also the most efficient.  If we do not add and subtract the
#         identity, then the resultant inverse Jacobian will be rank
#         deficient and thus the resulting steps will never explore the
#         singular directions which is highly undesirable.  One can also
#         think of this as directly transforming the inverse Jacobian
#         related to the fixed-point problem :math:`f \rightarrow f -
#         G[f]`.

#         Also, if using the dyadic representation, then this procedure
#         only requires applying the linear interpolation operator to
#         the components of the dyads: Transforming the identity portion
#         would change the form requiring a significantly more
#         complicated representation.

#         We have also explored using :math:`T^{-1}` in place of
#         :math:`T^T`, but this has several problems: 1) Computing
#         :math:`T^{-1}` can be expensive if the basis is large, 2)
#         Since :math:`T` is rank deficient, :math:`T^{-1}` is not
#         unique, but must be chosen from one of the pseudo-inverses and
#         it is not clear how to best do this, 3) For the diadic
#         representation, we would like to work directly with the
#         interpolation operator on both components of the dyads, so
#         using the transpose is much easier than having to solve the
#         inverse problem.

#         If :math:`T^T\cdot T \approx \mat{1}`, then the interpretation
#         of the transformation is this: We have built up the Jacobian
#         inverse as minimal updates from the identity using the
#         displacements :math:`\ket{\d{x}}` and :math:`\ket{\d{G}}`.
#         The transformed inverse Jacobian using the above proceedure is
#         what would have been obtain by minimal updates from the
#         identity using the transformed displacements
#         :math:`T\ket{\d{x}}` and :math:`T\ket{\d{G}}`.  Thus, the
#         transformed Jacobian inverse should have the same amount of
#         information as the original, and should in general reduce the
#         number of operations required to form a good approximation by
#         the rank of :math:`J^{-1} - \mat{1}`.

#         It may be beneficial to choose the interpolation operator such
#         that :math:`T^T\cdot T = \mat{1}` over the smaller basis.  We
#         have not investigated this, but standard interpolation
#         operators seem sufficiently close to the identity that this
#         extra complication is probably not justified.

#         The code contains additional update methods for testing
#         purposes to allow these results to be verified (except for the
#         dyadic representation which can only be easily transformed
#         using this method).  Here we present results for a few
#         problems.

#         >>> from scipy.interpolate import *
#         >>> def G(f, x, normalize=True):
#         ...     c1 = np.cosh(1)
#         ...     dx = np.diff(x[1:])
#         ...     ddf = np.diff(np.diff(f))/dx/dx
#         ...     a = 1
#         ...     if normalize:
#         ...         a = 1 + 2./dx/dx
#         ...     G = np.hstack([[f[0] - c1],
#         ...                    (f[1:-1] - ddf)/a,
#         ...                    [f[-1] - c1]])
#         ...     return G
#         >>> def test(N1, N2, normalize, meth, k=3, tol=1e-14):
#         ...     '''Return `(i1, i2, iN, gain)`.  `i1` is the number of
#         ...        steps at low resolution.  `i2` is the subsequent
#         ...        number of steps required to polish the
#         ...        high-resolution solution.  `iN` is the number of
#         ...        steps at only the high resolution, and `gain = i1+i2
#         ...        - iN` is the gain in speed.  (Negative is good)'''
#         ...     x1 = np.linspace(-1, 1, N1)
#         ...     x2 = np.linspace(-1, 1, N2)
#         ...     #def t(f):
#         ...     #    'The interpolation function'
#         ...     #    return splev(x2, splrep(x1, f, s=0, k=k))
#         ...     t = lambda f: splev(x2, splrep(x1, f, s=0, k=k))
#         ...     f = 0*x1                # Initial guess
#         ...     g = G(f, x1, normalize)
#         ...     j_inv = Jinv(_j_inv_transform=meth)
#         ...     err = np.max(abs(g))
#         ...     i1 = 1
#         ...     while err > tol:
#         ...         # Find solution over x1 abscissa
#         ...         f0, g0 = f, g
#         ...         f = f + j_inv.get_dx(g0)
#         ...         g = G(f, x1, normalize)
#         ...         err = np.max(abs(g))
#         ...         j_inv.update(dg=g - g0, dx=f - f0)
#         ...         i1 += 1
#         ...     j_inv.change_basis(t)   # Change basis
#         ...     f1 = t(f)
#         ...     g1 = G(f1, x2, normalize)
#         ...     err = np.max(abs(g1))
#         ...     i2 = 1
#         ...     while err > tol:
#         ...         # Find solution over x2 abscissa
#         ...         f0, g0 = f1, g1
#         ...         f1 = f1 + j_inv.get_dx(g0)
#         ...         g1 = G(f1, x2, normalize)
#         ...         err = np.max(abs(g1))
#         ...         j_inv.update(dx=f1 - f0, dg=g1 - g0)
#         ...         i2 += 1
#         ...     # Now try only at high resolution.
#         ...     f1 = 0*f1
#         ...     g1 = G(f1, x2, normalize)
#         ...     j_inv = Jinv(_j_inv_transform=meth)
#         ...     err = np.max(abs(g1))
#         ...     i3 = 1
#         ...     while err > tol:
#         ...         # Find solution over x2 abscissa
#         ...         f0, g0 = f1, g1
#         ...         f1 = f1 + j_inv.get_dx(g0)
#         ...         g1 = G(f1, x2, normalize)
#         ...         err = np.max(abs(g1))
#         ...         j_inv.update(dx=f1 - f0, dg=g1 - g0)
#         ...         i3 += 1
#         ...     return i1, i2, i3, (i1+i2) - i3
#         >>> def tabulate(N1, N2, k=3, normalize=True, tol=1e-10):
#         ...     meth = Container()
#         ...     for meth.inv_l in [0, 1]:
#         ...         for meth.inv_r in [0, 1]:
#         ...             for meth.set_diag in [0, 1]:
#         ...                 res = test(N1, N2, normalize,
#         ...                            k=k, tol=tol,
#         ...                            meth=dict(meth.items()))
#         ...                 print repr(meth), res
#         >>> tabulate(10, 50, k=1, tol=1e-15)
#         Container(set_diag=0, inv_r=0, inv_l=0) (9, 66, 109, -34)
#         Container(set_diag=1, inv_r=0, inv_l=0) (9, 68, 109, -32)
#         Container(set_diag=0, inv_r=1, inv_l=0) (9, 95, 109, -5)
#         Container(set_diag=1, inv_r=1, inv_l=0) (9, 70, 109, -30)
#         Container(set_diag=0, inv_r=0, inv_l=1) (9, 99, 109, -1)
#         Container(set_diag=1, inv_r=0, inv_l=1) (9, 96, 109, -4)
#         Container(set_diag=0, inv_r=1, inv_l=1) (9, 77, 109, -23)
#         Container(set_diag=1, inv_r=1, inv_l=1) (9, 71, 109, -29)
#         """
#         self.suspend()

#         j_inv = self.j_inv

#         if j_inv is 1:
#             return

#         if self.dyadic:
#             j_inv.apply_transform(t)
#         else:
#             if not hasattr(self, '_j_inv_transform'):
#                 # One could also apply t directly to the columns and
#                 # then rows of j_inv - I but I don't think this will be
#                 # any faster.

#                 N, N = j_inv.shape
#                 I = np.eye(N)
#                 t_mat = np.mat(np.vstack([t(I[n, :])
#                                           for n in xrange(N)]).T)
#                 M, N = t_mat.shape
#                 j_inv = np.eye(M) + t_mat*(j_inv - I)*t_mat.T

#             else:
#                 d = np.diag(j_inv)
#                 N, N = j_inv.shape
#                 I = np.eye(N)
#                 t_mat = np.mat(np.vstack([t(I[n, :])
#                                           for n in xrange(N)]).T)
#                 ti_mat = np.linalg.pinv(t_mat)
                
#                 if self._j_inv_transform.get('inv_l', False):
#                     tl = ti_mat.T
#                 else:
#                     tl = t_mat

#                 if self._j_inv_transform.get('inv_r', False):
#                     tr = ti_mat
#                 else:
#                     tr = t_mat.T

#                 if self._j_inv_transform.get('direct', False):
#                     j_inv = tl*j_inv*tr
#                 else:
#                     M, N = t_mat.shape
#                     j_inv = np.eye(M) + tl*(j_inv - I)*tr

#                 if self._j_inv_transform.get('set_diag', False):
#                     m = np.arange(j_inv.shape[0])
#                     j_inv[m, m] = t(d)
#             self.j_inv = j_inv
#         self.resume()

#     @property
#     def J(self):
#         r"""Current approximation of Jacobian."""
#         if self.dyadic:
#             j_inv = self._j_inv.asarray()
#         else:
#             j_inv = self._j_inv
#         return np.linalg.inv(j_inv)

#     def incompatibility(self, dx, dg, skip_inds=None, complement=False):
#         r"""Return a dimensionless measure of the compatibility of the
#         specified step with the current Jacobian.  An incompatibility of
#         0 means perfectly compatible (this is what should happen if
#         the function is linear and the Jacobian is correct).

#         There are several options here as specified by
#         :attr:`incompatibility_measure`.

#         `dg` : This option considers the change in the inverse derivative
#            the direction `dg`

#            .. math::
#               \chi_G = \frac{\norm{(\mat{J}^{-1} - \mat{J}_{0}^{-1})\d{G}}}
#                             {\max_{i=0, 1}\norm{\mat{J}^{-1}_{i}\d{G}}}
                            
#         `sigma` : This option is valid with the :attr:`dyadic` option
#            and only considers the incompatibility in the space spanned
#            by the dyads.

#         Examples
#         --------
#         >>> def g(x):
#         ...     return x*x + x*np.sin(x.sum())

#         >>> j = Jinv()
#         >>> x0 = np.zeros(10)
#         >>> g0 = g(x0)
#         >>> dx = np.ones(10) - x0
#         >>> dg = g(x0 + dx) - g0
#         >>> j.incompatibility(dx, dg)   # doctest: +ELLIPSIS
#         0.377...

#         This does not modify the Jacobian as can be noted by the fact
#         that update returns the same incompatibility
#         >>> j.update(dx=dx, dg=dg)      # doctest: +ELLIPSIS
#         0.377...
        
#         After an update, the step should be fully compatibility:
#         >>> j.incompatibility(dx, dg) < 1e-12
#         True
        
#         """
#         if skip_inds is not None:
#             skip_inds = np.asarray(skip_inds)
#             if complement:
#                 dx_ = 1*dx
#                 dg_ = 1*dg
#                 dx_[skip_inds] = 0
#                 dg_[skip_inds] = 0
#                 dx -= dx_
#                 dg -= dg_
#             else:
#                 dx[skip_inds] = 0
#                 dg[skip_inds] = 0
            
#         ng = np.linalg.norm(dg)
#         if 0 == ng:
#             res = np.linalg.norm(dx)/(ng + _TINY)
#         elif 'dg' == self.incompatibility_measure:
#             nx = np.linalg.norm(dx)
#             dg = dg/(ng + _TINY)
#             dx = dx/(ng + _TINY)
#             dx1 = np.asarray(self.j_inv*_ket(dg)).ravel()
#             if skip_inds is not None:
#                 if complement:
#                     dx1_ = 1*dx1
#                     dx1_[skip_inds] = 0
#                     dx1 -= dx1_
#                 else:
#                     dx1[skip_inds] = 0
#             res = (np.linalg.norm(dx - dx1)/
#                    (max(np.linalg.norm(dx1), nx) + _TINY))
#         elif 'sigma' == self.incompatibility_measure and self.dyadic:
#             raise NotImplementedError(
#                 "incompatibility_measure '%s' not implemented." %
#                 (self.incompatibility_measure, ))            
#         else:
#             raise NotImplementedError(
#                 "incompatibility_measure '%s' not implemented." %
#                 (self.incompatibility_measure, ))
#         return res
    
#     def update(self, dx, dg, j_inv=None, skip_inds=None,
#                include_skip=False, mags=1):
#         r"""Update based on differences `dg` and `dx`.  Return the
#         incompatibility value returned by :meth:`incompatibility`
        
#         Parameters
#         ----------
#         dg : array
#            Change `dg = G(x1) - G(x0)`.
#         dx : array
#            Change `dx = x1 - x0`
#         j_inv : matrix, optional
#            If this is provided, then it is used to set the Jacobian
#            (implementing Newton's method), otherwise the standard
#            Broyden update is used.
#         skip_inds : None, array, optional
#            This is an array of the indices of sensitive parameters.
#            The change in these indices will be ignored.  This is
#            important if there are "sensitive" or "noisy" parameters
#            that should only be trusted once a certain degree of
#            convergence has been achieved.
#         include_skip : bool
#            If `True` and `skip_inds` is provided, then cross-terms are
#            ignored, but both the main and skip blocks are updated.
#         mags : array, optional
#            This should be set to the vector of typical magnitudes for
#            `G`.  One option is `abs(G0) + abs(G1)`.  Assumed to be `1`
#            if not provided.

#         Notes
#         -----
#         The inverse Jacobian is updated so that

#         .. math::
#            \mat{J}^{-1}\ket{\d{G}} = \ket{\d{x}}.

#         There are two senses in which the update can be considered as
#         minimal controlled by the flag :attr:`minimize_inverse`.  If
#         this is `True`, we minimize :math:`\norm{J^{-1}_0 -
#         J^{-1}_1}_{\text{Frobenius}}`: 

#         .. math::
#            J^{-1} \rightarrow 
#               J^{-1} + \frac{(\ket{\d{x}} - J^{-1}\ket{\d{G}})\bra{\d{G}}}
#                             {\braket{\d{G}|\d{G}}},

#         otherwise we minimize :math:`\norm{J_0 - J_1}_{\text{Frobenius}}`:
          
#         .. math::
#            J \rightarrow
#               J + \frac{(\ket{\d{G}} - J\ket{\d{x}})\bra{\d{x}}}
#                        {\braket{\d{x}|\d{x}}}.

#         This can be analytically inverted to give an update for the inverse:

#         .. math::
#            J^{-1} \rightarrow J^{-1}
#            + \frac{\ket{\d{x}} - J^{-1}\ket{\d{G}}\bra{\d{x}}J^{-1}}
#                   {\braket{\d{x}|J^{-1}|\d{G}}}.

#         Any update that satisfies the secant condition can be expressed as:

#         .. math::
#            J^{-1} \rightarrow J^{-1}
#            + \frac{\ket{\d{x}} - J^{-1}\ket{\d{G}}\bra{v}}
#                   {\braket{v|\d{G}}}.

#         as long as :math:`\braket{v|\d{G}}` is not zero.  The two
#         conditions correspond to different choices of either
#         :math:`\bra{v}=\bra{\d{G}}` to minimize the Jacobian norm
#         (Broyden's "second" method) or
#         :math:`\bra{v}=\bra{\d{x}}J^{-1}` to minimize the inverse
#         Jacobian norm (Broyden's "second" method).  Numerical Recipies
#         has the latter but I am not sure it is better...  It is more
#         complicated.

#         As a measure of the significance of this correction, we return
#         the following dimensionless quantity:

#         .. math::
#              \chi_G = \frac{\norm{(\mat{J}^{-1} - \mat{J}_{0}^{-1})\d{G}}}
#                            {\max_{i=0, 1}\norm{\mat{J}^{-1}_{i}\d{G}}}
#         """
#         dg = np.asarray(dg).ravel()
#         dx = np.asarray(dx).ravel()

#         incompatibility = self.incompatibility(dx=dx, dg=dg)
#         j_inv = self.j_inv

#         if j_inv is 1:
#             j_inv = np.eye(len(dx), dtype=float)

#         if skip_inds is not None:
#             skip_inds = np.asarray(skip_inds, dtype=int)
#             if include_skip:
#                 dx_skip = 1*dx
#                 dg_skip = 1*dg
#             dg[skip_inds] = 0           # Upper block
#             dx[skip_inds] = 0
#             dx_dgs = [(dx, dg)]
#             if include_skip:
#                 dx_skip -= dx           # Skip block
#                 dg_skip -= dg
#                 dx_dgs.append((dx_skip, dg_skip))
#         else:
#             dx_dgs = [(dx, dg)]

#         for dx, dg in dx_dgs:
#             if np.linalg.norm(dx) <= 0 or np.linalg.norm(dg) <= 0:
#                 # It could happen, such as when changing resolutions, that
#                 # this is called without actually generating a new step,
#                 # so we just pass here.
#                 return 0.0

#             w = np.asarray(_ket(dx) - j_inv*_ket(dg)).ravel()

#             # Zero out noisy components
#             w = np.where(abs(w) > _EPS*mags, w, 0.0)

#             if (w != 0).any():
#                 if self.minimize_inverse:
#                     vt = _bra(dg)
#                 else:
#                     vt = j_inv.__rmul__(_bra(dx))
#                     dot_prod = np.dot(vt, _ket(dg))
#                     dx_norm = np.linalg.norm(dx)
#                     dg_norm = np.linalg.norm(dg)
#                     reduction = abs(dot_prod/dx_norm/dg_norm)
#                     import pdb;pdb.set_trace()
#                     if reduction < self.min_j_inv:
#                         # Use other step
#                         vt = _bra(dg)
#                 t = np.dot(vt, dg)
#                 w = w/t
#                 _add_dyad(j_inv, w, vt)

#         if self.robust:
#             U, d, Vinv = np.linalg.svd(j_inv)
#             print np.hstack((abs(np.dot(U.T, _ket(abs(G)))), _ket(d)))
#             if self.max_j_inv < d.max():
#                 warnings.warn("Warning: Clipping Jacobian: %f"%d.max())
#                 d = np.where(d<self.max_j_inv, d, self.max_j_inv)
#             j_inv = U*mat(np.diag(d))*Vinv


#         self.j_inv = j_inv

#         return incompatibility

#     def get_dx(self, g, x=None, x_controls=None,
#                min_step=_EPS, max_step=np.inf):
#         r"""Return `dx`: the Broyden step to get to the root of `G(x)`.

#         Parameters
#         ----------
#         g : array
#            The step is determined from `dx = -j_inv*g`.
#         x : array, optional
#            If `x_controls` is provided, then `x` must also be provided to
#            determine the offset of the controlled parameters.
#         x_controls : (array, array), optional
#            Tuple `(inds, vals)` indices and the respective values of the
#            control parameters.
#         min_step : float, optional
#            Minimum step length for testing.  Used to give some
#            non-zero step `dx` of length `sqrt(min_step) even if
#            `j_inv` is singular.  This may be random.  A warning will
#            be raised.
#         max_step : float, optional
#            Maximum length of the step.
           
#         Examples
#         --------
#         Here is a little test with a linear function

#         >>> j = np.array([[1., 2, 3], [1, -3, 2], [-2, 1, 4]])
#         >>> j_inv = Jinv(j_inv=np.linalg.inv(j))
#         >>> sol = np.array([1, 2, 3])
#         >>> def G(x):
#         ...     return np.dot(j, x - sol)
#         >>> x0 = np.array([0.0, 0.0, 0.0])
#         >>> x = np.array([1.0, 2.0, 3.0])

#         Now try to step to the solution, but hold the second variable as
#         a control at 1.  Since the function is exactly linear and we
#         have explicitly given the correct Jacobian, we should end up
#         with a root in the other two parameters.

#         >>> x = x + j_inv.get_dx(g=G(x), x=x, x_controls=([1], [1.0]))
#         >>> x
#         array([ 1.5,  1. ,  3.5])
#         >>> G(x)
#         array([ 0. ,  4.5,  0. ])

#         This needs to work with very large arrays, so here is an
#         example.  Here we use an example where the vectors have length
#         $2^{20}$ and we have $2^{11} = 2048$ controls.  Implemented poorly,
#         the arrays would be too large to be indexed.

#         >>> np.random.seed(0)
#         >>> n_x = 2**20
#         >>> n_d = 2**11
#         >>> a = np.random.rand(n_x)
#         >>> b = np.random.rand(n_x)
#         >>> g = np.random.rand(n_x)
#         >>> x = np.zeros(n_x)
#         >>> inds = np.unique1d(np.random.randint(0, n_x, size=n_d))
#         >>> n_d = len(inds)
#         >>> vals = np.random.rand(n_d)
#         >>> x_controls = (inds, vals)
#         >>> j_inv = Jinv(dyadic=True)
#         >>> j_inv.j_inv.add_dyad(a, b)
#         >>> dx = j_inv.get_dx(g=g, x=x, x_controls=x_controls)
#         >>> np.allclose((x + dx)[inds], vals)
#         True
#         """
#         j_inv = self.j_inv
#         if j_inv is 1:
#             dx = -g
#         else:
#             if x_controls is not None and 0 < len(x_controls[0]):
#                 inds, vals = x_controls
#                 inds = np.asarray(inds, dtype=int)
#                 vals = np.asarray(vals)
#                 dg_dmu = copy.copy(g)
#                 n_d = len(inds)
#                 n_x = len(x)

#                 d = j_inv[inds[:, None], inds[None, :]]

#                 dg_dmu[inds] = x[inds] - vals 
#                 dmu = _ket(dg_dmu[inds])
#                 b = (j_inv*_ket(dg_dmu))[inds, :]
#                 dN = dmu - np.linalg.solve(d, b - dmu)
#                 dg_dN = dg_dmu
#                 dg_dN[inds] = dN[:]
#                 dx_dmu = j_inv*_ket(dg_dN)

#                 assert np.allclose(dx_dmu[inds], dmu)
#                 # Set these to make sure no round-off errors have
#                 # accumulated.
#                 dx_dmu[inds].flat = x[inds] - vals

#                 dx = -dx_dmu
#             else:
#                 dx = -(j_inv*_ket(g))
#             dx = np.asarray(dx).ravel()

#         g_mag = np.linalg.norm(g)
#         dx_mag = np.linalg.norm(dx)
#         if dx_mag == 0 and g_mag > 0:
#             warnings.warn("Warning: Singular Jinv, taking random step")
#             dx = np.random.rand(len(dx))*np.sqrt(min_step)
#         elif np.divide(dx_mag, g_mag) < self.min_j_inv:
#             warnings.warn("Warning: Singular Jinv, taking minimum step")
#             dx *= math.sqrt(min_step)/np.linalg.norm(dx)

#         dx_len = math.sqrt(dot(dx, dx))
#         if max_step < dx_len:
#             dx *= max_step/dx_len

#         return dx
    
# I am not yet satsified with a good standard way of providing
# options.  I would like something that provides the user with an easy
# way to
# 1) See the current set of options (i.e. a pretty implementation of
#    __repr__)
# 2) Maintain sub-options (like JacobianOptions below)
# 3) Provide documentation about the options through help.
# 4) Be efficient.
#
# I have tried to meet these criteria with the Options class, but am
# not sure it is the best solution yet.
from mmf.objects import option

class JacobianOptions(mmf.objects.Options):
    r"""These are options the affect the Jacobian calculation.
    """
    @option
    def max_h():
        r"""Maximum step size to use in Jacobian computations."""
        return 1.0

class BroydenOptions(mmf.objects.Options):
    r"""These are options that affect the Broyden iteration.

    Jacobian : Options controlling the Jacobian calculation.
    """
    @option
    def Jacobian():
        r"""Jacobian computation options"""
        return JacobianOptions()

class Function:
    r"""Wrapper for function objects to allow for a variable
    tolerance.  This allows for the broyden algorithm to adaptively
    update the tolerance.  The function may take advantage of a low
    tolerance to perform a faster calculation.

    >>> import math
    >>> def f(x, abs_tol):
    ...   "Return -ln(1+abs(x)) to specified tolerance"
    ...   x = abs(x)
    ...   A = 0.0
    ...   n = 1
    ...   while True:
    ...     An = (-x)**n/n
    ...     A += An
    ...     n += 1
    ...     if abs(An) < abs_tol:
    ...       break
    ...   return A
    >>> f1 = Function(f, abs_tol=1e-2)
    >>> round(f1(0.1)+math.log(1.0+0.1), 2)
    0.0
    >>> round(f1(0.1)+math.log(1.0+0.1), 4)
    0.0003
    >>> f1 = Function(f, abs_tol=1e-4)
    >>> round(f1(0.1)+math.log(1.0+0.1), 4)
    0.0
    """
    def __init__(self, f,
                 fixed_tol=_EPS,
                 abs_tol=None,
                 return_tol=None):
        r"""The following forms are accepted.  The function f must be
        callable as shown:
        
        Function(f, abs_tol=tol)           # ans = f(x, tol, ...)
        Function(f, return_tol=tol) # (ans, err) = f(x, tol...)
        Function(f, fixed_tol=tol)  # ans = f(x, ...)
        Function(f)                # ans = f(x, ...)

        The default form assumes f computes to machine precision.
        
        >>> import math
        >>> def tolToPlaces(tol): 
        ...   return int(math.ceil(abs(-math.log10(tol))))
        >>> def f(x, tol): return round(x, tolToPlaces(tol))
        >>> f1 = Function(f, abs_tol=0.01)
        >>> f1(0.1234)
        0.12
        >>> def f(x): return x
        >>> f1 = Function(f, fixed_tol=1e-16)
        >>> f1(0.1234)
        0.1234
        >>> def f(x, tol):
        ...   tol = max(1e-3, tol)
        ...   return (round(x, tolToPlaces(tol)), tol)
        >>> f1 = Function(f, return_tol=1e-16)
        >>> f1(0.1234)
        0.123
        """
        self._f = f
        if abs_tol is not None:
            self.abs_tol = abs_tol
            self._type = 'abs_tol'
        elif return_tol is not None:
            self.abs_tol = return_tol
            self._type = 'return_tol'
        elif fixed_tol is not None:
            self.abs_tol = fixed_tol
            self._type = 'fixed_tol'
        else:
            raise TypeError('Function type not supported')
        
    def __call__(self, x, *vargin, **kwargs):
        if 'abs_tol' is self._type:
            return self._f(x, self.abs_tol, *vargin, **kwargs)
        elif 'fixed_tol' is self._type:
            return self._f(x, *vargin, **kwargs)
        if 'return_tol' is self._type:
            (ans, tol) = self._f(x, self.abs_tol, *vargin, **kwargs)
            self.abs_tol = tol
            return ans
        else:
            raise TypeError('Function type not supported')

def zeros(f, x0, f0=None, j0=None, f_tol=None, opts=None):
    r"""Return dictionary of results containing solution to
    multi-dimensional root problem f(x) = [0, 0, ...]
    """
    results = {}
    results['x'] = copy.copy(x0)

    j = np.matrix([[1, 2, 3],
                   [-1, 2, -3],
                   [1, -2, -3]])
    results['j'] = j
    return results
    
def ComputeInitialJacobian():
    r"""
    if isempty(J0in) | ~all(size(J0in)==[NF, b.NX])
    Broyden.J0 = np.eye(NF, b.NX);
    else
    # We do not consider the provided Jacobian as a restart because it
    # might contain errors.
    b.Restart = False
    Broyden.J0 = J0in;
    end
    """

def ComputeJacobian(f, x, y=None, opts=None):
    r"""
    Compute the Jacobian for function f(x) in a straightforward way
    using finite differences: J = (f(x+h)-f(x))/h
    """
    if opts is None:
        opts = JacobianOptions()
    if y is None:               # Allow user to supply y=f(x)
        y = f(x)
    Nx = len(x)
    Ny = len(y)
    J = np.zeros((Ny, Nx), float)
    h = math.sqrt(f.abs_tol)
    if opts.max_h is not None:
        h = min(h, opts.max_h)

    for n in range(len(x)):
        # Here is the trick for ensuring that x+h is accurate: we
        # choose h so this is true.
        h_n = ((x[n]+h)-x[n])
        if x[n] < 0:            # Make step in same direction as sign
            h_n = -h_n          # of x component .
        x[n] += h_n
        y_n = f(x)
        x[n] -= h_n
        J[:, n] = (y_n - y)/h_n
    
    return J

    r"""
    # We choose the direction for the step based on the sign of the
    # component of x.  The step-size is determined by the parameter

    xdir = ceil((sign(x)+1)./2).*2-1;
    h = sqrt(f_tol).*max(abs(x), 1).*xdir;
    if norm(h, 2) > opt.JacobianMaxh
      h = h.*opt.JacobianMaxh./norm(h, 2);
    end
    
    # Here is the trick for ensuring that x+h is accurate: we
    # choose h so this is true.
    h = (x+h) - x;
    
    for n = 1:MaxJacobianSteps
      if (3 <= opt.Verbose)
        disp([num2str(n) ' of ' num2str(MaxJacobianSteps)]);
      end
      
      if (3 <= opt.Verbose)
        disp(['h = ' num2str(h(n))]);
      end
      
      t = x(n);
      x(n) = x(n)+h(n);
      
      <<Compute Residue, Fx and FXarg from x>>
      Fn = Fx;
      
      x(n) = t;
      Broyden.J0(:, n) = (Fn-Broyden.F0)/h(n);
    end
  end
  if (10 <= opt.Verbose)
    disp('Singular Values of Jacobian');
    svd(Broyden.J0)
  end
else
  if 2 <= opt.Verbose
    if ((opt.MaxFeval <= 0) | (Flag.Fcnt < opt.MaxFeval-(b.NX+1)))
    disp('Not enough function evals left to compute Jacobian');
    elseif ((opt.MaxTime <= 0) |(EstJacobianTime < RemainingTime))
      disp('Not enough time left to compute Jacobian');
    end
  end
end

# The restart flag indicates if the iteration has been restarted (or
# started) with a new Jacobian.  This might help a stuck iteration.
b.Restart = True

"""
class BroydenData(StateVars):
    r"""Represents the current state of a Broyden calculation."""
    _state_vars = [
        ('x', None, \
            """Current value of `x`, i.e. the point where the current
            inverse Jacobian approximation :attr:`Jinv` is
            valid."""),
        ('G', None, "Current residual `G(x)`."),
        ('Jinv', None, \
            """Current inverse Jacobian approximation."""),
        ('dyadic', False, \
            r"""If `True`, then use a dyadic representation for Jinv.  If
            this is a number greater than 1, then this number is
            passed as :attr:`DyadicSum.n_max` limiting the rank of the
            dyadic representation."""),
        ]
    process_vars()
    def __init__(self, *v, **kw):
        if self.x is not None:
            self.x = np.asarray(self.x, dtype=float).ravel()
        if self.G is not None:
            self.G = np.asarray(self.G, dtype=float).ravel()
        if (self.x is not None
            and self.G is not None
            and self.Jinv is None):
            if self.dyadic:
                n_max = np.inf
                if self.dyadic > 1:
                    n_max = self.dyadic
                    
                self.Jinv = DyadicSum(n_max=n_max)
            else:
                self.Jinv = np.mat(np.eye(len(self.x)))
        
class Broyden(StateVars):
    r"""Implements a rudimentary Broyden root search search by keeping
    track of the Jacobian an allowing for Broyden steps to be made.

    This algorithm is used for solving the problem :math:`G(x) = 0`
    using a quasi-Newton method:

    .. math::
       x \rightarrow x - J^{-1}G(x)

    where :math:`J` is an approximation for the Jacobian :math:`J\equiv
    \partial_{x}G(x)`.  By default, one starts with the approximation
    :math:`J=1`, which is equivalent to the step :math:`x \rightarrow
    x - G(x)`.  Thus, if one wishes to improve the convergence of a
    reasonable fixed-point iteration scheme :math:`x\rightarrow F(x)`,
    then a reasonable choice is :math:`G(x) = x - F(x)`.
    
    Examples
    --------
    Here we present an example solving the problem and solution:

    .. math::
       x = x^2 \qquad x=1.

    In this case, the Broyden algorithm is equivalent to the secant
    algorithm with the first two points `x0` and `G(x0)`.

    >>> def G(x):
    ...     return x - x*x
    >>> x0 = 2.0
    >>> b = Broyden(x0=x0, G0=G(x0))
    >>> b.x, b.J
    (4.0, matrix([[ 1.]]))
    >>> b.update(G(b.x))                # doctest: +ELLIPSIS
    1.2...
    >>> b.x, b.J                        # doctest: +ELLIPSIS
    (1.6..., matrix([[-5.]]))
    >>> b.update(G(b.x))                # doctest: +ELLIPSIS
    0.08...
    >>> b.x, b.J                        # doctest: +ELLIPSIS
    (1.3913043478..., matrix([[-4.6]]))
    >>> for n in xrange(8): b.update(G(b.x)) # doctest: +ELLIPSIS
    0.56...
    0.24...
    0.23...
    0.09...
    0.02...
    0.003...
    ...e-05
    ...e-07
    >>> b.x, b.J
    (1.0, matrix([[-1.]]))

    Here is a two-dimensional problem:

    .. math::
       \begin{aligned}
          x &= 2x^2, &
          y &= y^2, &
          x , y &= 0.5, 1.0
       \end{aligned}
       
    >>> def G(x):
    ...     return [x[0] - x[0]*x[0]*2.0, x[1] - x[1]*x[1]]
    >>> x0 = [2.0, 2.0]
    >>> b = Broyden(x0=x0, G0=G(x0))
    >>> for n in xrange(40):
    ...    res = b.update(G(b.x))
    >>> map(lambda r:round(r, 12), b.x)
    [0.5, 1.0]
    """
    _state_vars = [
        ('x0=broyden_data.x', None,
            """Current value of `x`, i.e. the point where the current
            inverse Jacobian approximation :attr:`Jinv0` is
            valid."""),
        ('G0=broyden_data.G', None, \
             "Current function evaluation `G(x0)`."),
        ('Jinv0=broyden_data.Jinv', None, \
            r"""Approximate Jacobian inverse at `x0`.  (Defaults to
            the identity)."""),
        ('max_step', np.inf, \
            """Maximium step length.  Use to prevent a singular
            Jacobian from shooting the solution far away."""),
        ('min_step', np.sqrt(_EPS), \
            """Minimum step length to take when the Jacobian is
            singular.  See :attr:`min_Jinv`.  This should be a good
            length for estimating the Jacobian through finite
            differences, i.e. typically the square-root of the
            precision to which the function is evaluate."""),
        ('step_factor', 1, \
            """A number between 0 and 1.  The step taken is `x +
            step_factor*dx`"""),
        ('robust', False, \
            """If `True`, then use the singular value decomposition to
            analyse the Jacobian.  Much slower, but could prevent some
            numerical problems."""),
        ('max_Jinv', 100.0, \
            """If :attr:`robust`, then the inverse Jacobian is clipped
            so that the maximum value is this."""),
        ('min_Jinv', 0.0001, \
            """The inverse Jacobian can become singular.  If this is
            the case, then a step might have very small length even if
            the error is not small (the step size can be reduced by a
            factor of the minimum singular value of the inverse
            Jacobian). If the step reduction is less than this value,
            then the step is scaled up to have length
            :attr:`min_step`."""),
        ('minimize_inverse', False, \
            """If `True`, then use the update the minimizes the change
            in the Frobenius norm of the inverse Jacobian, otherwise
            we use the Sherman-Morrison formula to update so as to
            minimize the norm of the Jacobian."""),
        ('x', Computed, "Next value of `x`."),
        ('_packing_info', (type(None), None), \
             "Packing information to reshape `x`."),
        ('_max_step2', Computed, "Square of :attr:`max_step`"),
        ('broyden_data', Delegate(BroydenData, ['dyadic']), \
            """:class:`BroydenData` instance representing the current
            state."""),
        ('_Jinv_transform', NotImplemented, \
            """Optional methods for updating the Jacobian inverse in
            :meth:`change_basis`.  For testing only."""),
        ]
    process_vars()
    
    def __init__(self, *v, **kw):
        self._max_step2 = self.max_step*self.max_step
        if 'x0' in kw and kw['x0'] is not None:
            self._define_packing_info(kw['x0'])
            self.x0 = self._pack(self.x0)
            N = self.x0.shape[0]
        if 'G0' in kw and self.G0 is not None:
            self.G0 = self._pack(self.G0)         
        if self.x0 is not None:
            self._update_x(self.x0, self.G0)
            
    def _define_packing_info(self, x):
        r"""
        Parameters
        ----------
        x : array
           Original representation.
           
        Returns
        -------
        _packing_info : typle
           (type(x), shape)
        """
        shape = np.asarray(x).shape
        type_ = type(x)
        self._packing_info = (type_, shape)
            
    def _pack(self, x):
        r"""Return `x` in a flat representation.
        
        Parameters
        ----------
        x : array
           Original representation.

        Returns
        -------
        x : array
           Flat representation of array.
        """
        return np.asarray(x, dtype=float).ravel()
    
    def _unpack(self, x):
        r"""Return `x` in the original shape.
        
        Parameters
        ----------
        x : array
           Flat representation of array.
           
        Returns
        -------
        x : array
           Reshaped representation.
        """
        type_, shape = self._packing_info

        if issubclass(type_, list):
            return x.tolist()
        elif issubclass(type_, np.matrix):
            return np.asmatrix(x).reshape(shape)
        elif issubclass(type_, np.ndarray):
            return np.asarray(x).reshape(shape)
        elif (issubclass(type_, float) or
              issubclass(type_, int) or
              issubclass(type_, complex)):
            return np.asarray(x).ravel()[0]
        else:
            raise ValueError("Type %s not supported."%type_)
                
    def change_basis(self, t):
        r"""Use the function `x_new = t(x)` to update the data
        corresponding to a change of basis.  Raise an
        :exc:`AttributeError` if the change cannot be implemented (or
        don't define this method.)

        Parameters
        ----------
        t : function
           Change of basis transformation (should be essentially
           matrix multiplication, but for large bases one might want
           to prevent forming this matrix explicitly.)

        Notes
        -----
        Let the function be represented by the set of
        points :math:`f^{n}_i = f(x^{n}_i)` on the smaller set of
        abscissa and :math:`f^{N}_j = f(x^{N}_j)` on the larger set.
        The transformation is of the form:

        .. math:: f^N_i = T_{ij}f^{n}_{j}

        We update the inverse Jacobian in the following way:

        .. math:: J^{-1}_N = \mat{1} + T(J^{-1}_n - \mat{1})T^{T}

        Several other options have been considered, but in practise
        this has proved to be not only the easiest to implement, but
        also the most efficient.  If we do not add and subtract the
        identity, then the resultant inverse Jacobian will be rank
        deficient and thus the resulting steps will never explore the
        singular directions which is highly undesirable.  One can also
        think of this as directly transforming the inverse Jacobian
        related to the fixed-point problem :math:`f \rightarrow f -
        G[f]`.

        Also, if using the dyadic representation, then this procedure
        only requires applying the linear interpolation operator to
        the components of the dyads: Transforming the identity portion
        would change the form requiring a significantly more
        complicated representation.

        We have also explored using :math:`T^{-1}` in place of
        :math:`T^T`, but this has several problems: 1) Computing
        :math:`T^{-1}` can be expensive if the basis is large, 2)
        Since :math:`T` is rank deficient, :math:`T^{-1}` is not
        unique, but must be chosen from one of the pseudo-inverses and
        it is not clear how to best do this, 3) For the diadic
        representation, we would like to work directly with the
        interpolation operator on both components of the dyads, so
        using the transpose is much easier than having to solve the
        inverse problem.

        If :math:`T^T\cdot T \approx \mat{1}`, then the interpretation
        of the transformation is this: We have built up the Jacobian
        inverse as minimal updates from the identity using the
        displacements :math:`\ket{\d{x}}` and :math:`\ket{\d{G}}`.
        The transformed inverse Jacobian using the above proceedure is
        what would have been obtain by minimal updates from the
        identity using the transformed displacements
        :math:`T\ket{\d{x}}` and :math:`T\ket{\d{G}}`.  Thus, the
        transformed Jacobian inverse should have the same amount of
        information as the original, and should in general reduce the
        number of operations required to form a good approximation by
        the rank of :math:`J^{-1} - \mat{1}`.

        It may be beneficial to choose the interpolation operator such
        that :math:`T^T\cdot T = \mat{1}` over the smaller basis.  We
        have not investigated this, but standard interpolation
        operators seem sufficiently close to the identity that this
        extra complication is probably not justified.

        The code contains additional update methods for testing
        purposes to allow these results to be verified (except for the
        dyadic representation which can only be easily transformed
        using this method).  Here we present results for a few
        problems.

        >>> from scipy.interpolate import *
        >>> from mmf.objects import *
        >>> def G(f, x, normalize=True):
        ...     c1 = np.cosh(1)
        ...     dx = np.diff(x[1:])
        ...     ddf = np.diff(np.diff(f))/dx/dx
        ...     a = 1
        ...     if normalize:
        ...         a = 1 + 2./dx/dx
        ...     G = np.hstack([[f[0] - c1],
        ...                    (f[1:-1] - ddf)/a,
        ...                    [f[-1] - c1]])
        ...     return G
        >>> def test(N1, N2, normalize, meth, k=3, tol=1e-14):
        ...     '''Return `(i1, i2, iN, gain)`.  `i1` is the number of
        ...        steps at low resolution.  `i2` is the subsequent
        ...        number of steps required to polish the
        ...        high-resolution solution.  `iN` is the number of
        ...        steps at only the high resolution, and `gain = i1+i2
        ...        - iN` is the gain in speed.  (Negative is good)'''
        ...     x1 = np.linspace(-1, 1, N1)
        ...     x2 = np.linspace(-1, 1, N2)
        ...     #def t(f):
        ...     #    'The interpolation function'
        ...     #    return splev(x2, splrep(x1, f, s=0, k=k))
        ...     t = lambda f: splev(x2, splrep(x1, f, s=0, k=k))
        ...     f = 0*x1                # Initial guess
        ...     g = G(f, x1, normalize)
        ...     b = Broyden(x0=f, G0=g, _Jinv_transform=meth)
        ...     err = np.max(abs(g))
        ...     i1 = 1
        ...     while err > tol:
        ...         # Find solution over x1 abscissa
        ...         f = b.x
        ...         g = G(f, x1, normalize)
        ...         err = np.max(abs(g))
        ...         b.update(x=f, g=g)
        ...         i1 += 1
        ...     b.change_basis(t)       # Change basis
        ...     f1 = b.x
        ...     g = G(f1, x2, normalize)
        ...     err = np.max(abs(g))
        ...     i2 = 1
        ...     while err > tol:
        ...         # Find solution over x2 abscissa
        ...         f = b.x
        ...         g = G(f, x2, normalize)
        ...         err = np.max(abs(g))
        ...         b.update(x=f, g=g)
        ...         i2 += 1
        ...     # Now try only at high resolution.
        ...     f1 = 0*f1
        ...     g = G(f1, x2, normalize)
        ...     b = Broyden(x0=f1, G0=g, _Jinv_transform=meth)
        ...     err = np.max(abs(g))
        ...     i3 = 1
        ...     while err > tol:
        ...         # Find solution over x2 abscissa
        ...         f1 = b.x
        ...         g1 = G(f1, x2, normalize)
        ...         err = np.max(abs(g1))
        ...         b.update(x=f1, g=g1)
        ...         i3 += 1
        ...     return i1, i2, i3, (i1+i2) - i3
        >>> def tabulate(N1, N2, k=3, normalize=True, tol=1e-10):
        ...     meth = Container()
        ...     for meth.inv_l in [0, 1]:
        ...         for meth.inv_r in [0, 1]:
        ...             for meth.set_diag in [0, 1]:
        ...                 res = test(N1, N2, normalize,
        ...                            k=k, tol=tol,
        ...                            meth=dict(meth.items()))
        ...                 print repr(meth), res
        >>> tabulate(10, 50, k=1, tol=1e-15)
        Container(set_diag=0, inv_r=0, inv_l=0) (9, 67, 111, -35)
        Container(set_diag=1, inv_r=0, inv_l=0) (9, 77, 111, -25)
        Container(set_diag=0, inv_r=1, inv_l=0) (9, 90, 111, -12)
        Container(set_diag=1, inv_r=1, inv_l=0) (9, 68, 111, -34)
        Container(set_diag=0, inv_r=0, inv_l=1) (9, 78, 111, -24)
        Container(set_diag=1, inv_r=0, inv_l=1) (9, 71, 111, -31)
        Container(set_diag=0, inv_r=1, inv_l=1) (9, 107, 111, 5)
        Container(set_diag=1, inv_r=1, inv_l=1) (9, 73, 111, -29)

        Container(set_diag=0, inv_r=0, inv_l=0) (9, 64, 105, -32)
        Container(set_diag=1, inv_r=0, inv_l=0) (9, 82, 105, -14)
        Container(set_diag=0, inv_r=1, inv_l=0) (9, 101, 105, 5)
        Container(set_diag=1, inv_r=1, inv_l=0) (9, 68, 105, -28)
        Container(set_diag=0, inv_r=0, inv_l=1) (9, 65, 105, -31)
        Container(set_diag=1, inv_r=0, inv_l=1) (9, 69, 105, -27)
        Container(set_diag=0, inv_r=1, inv_l=1) (9, 96, 105, 0)
        Container(set_diag=1, inv_r=1, inv_l=1) (9, 75, 105, -21)

        Here are some old results::
        
          Container(set_diag=0, inv_r=0, inv_l=0) (9, 64, 105, -32)
          Container(set_diag=1, inv_r=0, inv_l=0) (9, 82, 105, -14)
          Container(set_diag=0, inv_r=1, inv_l=0) (9, 87, 105, -9)
          Container(set_diag=1, inv_r=1, inv_l=0) (9, 70, 105, -26)
          Container(set_diag=0, inv_r=0, inv_l=1) (9, 90, 105, -6)
          Container(set_diag=1, inv_r=0, inv_l=1) (9, 68, 105, -28)
          Container(set_diag=0, inv_r=1, inv_l=1) (9, 100, 105, 4)
          Container(set_diag=1, inv_r=1, inv_l=1) (9, 75, 105, -21)

           Container(set_diag=0, inv_r=0, inv_l=0) (9, 73, 134, -52)
           Container(set_diag=1, inv_r=0, inv_l=0) (9, 77, 134, -48)
           Container(set_diag=0, inv_r=1, inv_l=0) (9, 88, 134, -37)
           Container(set_diag=1, inv_r=1, inv_l=0) (9, 71, 134, -54)
           Container(set_diag=0, inv_r=0, inv_l=1) (9, 86, 134, -39)
           Container(set_diag=1, inv_r=0, inv_l=1) (9, 71, 134, -54)
           Container(set_diag=0, inv_r=1, inv_l=1) (9, 94, 134, -31)
           Container(set_diag=1, inv_r=1, inv_l=1) (9, 70, 134, -55)
        """
        self.suspend()

        if self.x0 is not None:
            self.x0 = t(self.x0)      

        self.G0 = None
        
        Jinv = self.Jinv0

        if Jinv is None:
            return

        if self.dyadic:
            Jinv.apply_transform(t)
        else:
            if not hasattr(self, '_Jinv_transform'):
                # One could also apply t directly to the columns and
                # then rows of Jinv - I but I don't think this will be
                # any faster.

                N, N = Jinv.shape
                I = np.eye(N)
                t_mat = np.mat(np.vstack([t(I[n, :])
                                          for n in xrange(N)]).T)
                M, N = t_mat.shape
                Jinv = np.eye(M) + t_mat*(Jinv - I)*t_mat.T

            else:
                d = np.diag(Jinv)
                N, N = Jinv.shape
                I = np.eye(N)
                t_mat = np.mat(np.vstack([t(I[n, :])
                                          for n in xrange(N)]).T)
                ti_mat = np.linalg.pinv(t_mat)
                
                if self._Jinv_transform.get('inv_l', False):
                    tl = ti_mat.T
                else:
                    tl = t_mat

                if self._Jinv_transform.get('inv_r', False):
                    tr = ti_mat
                else:
                    tr = t_mat.T

                if self._Jinv_transform.get('direct', False):
                    Jinv = tl*Jinv*tr
                else:
                    M, N = t_mat.shape
                    Jinv = np.eye(M) + tl*(Jinv - I)*tr

                if self._Jinv_transform.get('set_diag', False):
                    m = np.arange(Jinv.shape[0])
                    Jinv[m, m] = t(d)
            self.Jinv0 = Jinv
        self.resume()

    @property
    def J(self):
        r"""Current approximation of Jacobian."""
        return np.linalg.inv(self.Jinv0)
    
    def update(self, g, x=None, Jinv=None,  skip_inds=None):
        r"""Update based on new value of `g(x)`.  Changes the current
        position (the new Jacobian should be valid at `x`).  Return a
        dimensionless measure of the change in Jinv along the
        specified direction.

        Use the property `x` to access the next value of `x`.

        Parameters
        ----------
        g : array
           New value of `G(x)`.
        x : array, optional
           Value of abscissa `x` at which `g` is evaluated.
           If `x` is `None`, then assume that `x == self.x`.
        Jinv : matrix, optional
           If this is provided, then it is used to set the Jacobian
           (implementing Newton's method), otherwise the standard
           Broyden update is used.
        skip_inds : None, array, optional
           This is an array of the indices of sensitive parameters.
           The change in these indices will be ignored.  This is
           important if there are "sensitive" or "noisy" parameters
           that should only be trusted once a certain degree of
           convergence has been achieved.

        Notes
        -----
        The inverse Jacobian is updated so that

        .. math::
           \mat{J}^{-1}\ket{\d{G}} = \ket{\d{x}}

        where the changes are with respect to :attr:`x0` and
        :attr:`G0` respectively.

        As a measure of the significance of this correction, we return
        the following dimensionless quantity:

        .. math::
             \chi_G &= \frac{\norm{\vect{x}_2 - \vect{x}_1}}
                            {\norm{\vect{x}_2 - \vect{x}_0}}, \\
                    &= \frac{\norm{\left(\mat{J}^{-1}_{1}
                             - \mat{J}^{-1}_{0}\right)\cdot\uvect{G}}}
                          {\max_{n=0, 1}\norm{\mat{J}^{-1}_{n}\cdot\uvect{G}}}

        Another option might be
        .. math::
           \frac{\norm{(\mat{J}^{-1} - \mat{J}_{0}^{-1})\d{G}}}
                {\mat{J}^{-1}\norm{\d{G}}}
        """
        if self.x0 is None or self.G0 is None:
            # Initial call.  Just store values an initialize
            self.suspend()
            self.x0 = copy.copy(x)
            self.G0 = g
            self.resume()
            return 1.0
        
        if x is None:
            x = self.x

        x = self._pack(x)
            
        g = self._pack(g)

        if Jinv is None:
            res = self.update_J(x=x, g=g, skip_inds=skip_inds)
        else:
            # Compute measure of change in Jinv
            G0_ = _ket(self.G0)/(np.linalg.norm(self.G0) + _TINY)
            J0G0 = self.Jinv0*G0_
            JG0 = Jinv*G0_
            res = np.linalg.norm(JG0 - J0G0)/(
                max(np.linalg.norm(JG0), np.linalg.norm(J0G0)) + _TINY)

            self.Jinv0 = Jinv
            #self.update_J(x=x, g=g)

        self.broyden_data.x[:] = x
        self.broyden_data.G[:] = g
        return res

    def get_x(self, x_controls=None, x0=None, G0=None):
        """Return the new step subject to the constraints specified in
        `x_controls`.

        Parameters
        ----------
        x0, G0 : array, optional
           If these are provided, the new step is relative to these
           points, otherwise :attr:`x0` and :attr:`G0` are used.
        x_controls : (array, array), optional
           If this is provided, then the specified elements are used
           as control parameters when determining the step.
           `x_controls` should be a pair `(inds, vals)`  of indices
           into the array `x` and values such that the step
           should preserve ``x[ind] = c*vals + (1-c)*x0[ins]`` for
           some ``0 < c =< 1``.  (I.e. this is a convex extrapolation
           from `x0` to `x`).

           If the solver contains information about the function `G`
           such as gradient information, then the extrapolated state
           `x_new` will incorporate this information extrapolating the
           step in such a way as to satisfy the controls.
        """
        if x_controls is None and x0 is None and G0 is None:
            return self.x
        else:
            if x0 is None:
                x0 = self.x0
            if G0 is None:
                G0 = self.G0
            dx = self._get_dx(x0, G0, x_controls)
            return self._unpack(x0 + dx)
            
    def _update_x(self, x, g, x_controls=None):
        """Update the new step :attr:`x`."""
        dx = self._get_dx(x, g, x_controls)
        self.__dict__['x'] = self._unpack(x + dx)
        
    def update_J(self, x, g, skip_inds=None):
        r"""Update the inverse Jacobian to include information about
        `(x, J)` but without changing the present location (use
        :meth:`update` to set the new position too`.)    Return a
        dimensionless measure of the change in Jinv along the
        specified direction.

        Parameters
        ----------
        x, g : array
           These are the values at the new point that should be used
           to update the inverse Jacobian `g = G(x)`.
        skip_inds : None, array, optional
           This is an array of the indices of sensitive parameters.
           The change in these indices will be ignored.  This is
           important if there are "sensitive" or "noisy" parameters
           that should only be trusted once a certain degree of
           convergence has been achieved.
           
        Notes
        -----
        At each step, we update the Jacobian in a minimal sense so
        as to maintain the secant condition :math:`\ket{\d{x}} =
        J^{-1}_1\cdot\ket{\d{G}}`.

        There are two senses in which the update can be considered as
        minimal controlled by the flag :attr:`minimize_inverse`.  If
        this is `True`, we minimize :math:`\norm{J^{-1}_0 -
        J^{-1}_1}_{\text{Frobenius}}`: 

        .. math::
           J^{-1} \rightarrow 
              J^{-1} + \frac{(\ket{\d{x}} - J^{-1}\ket{\d{G}})\bra{\d{G}}}
                            {\braket{\d{G}|\d{G}}},

        otherwise we minimize :math:`\norm{J_0 - J_1}_{\text{Frobenius}}`:
          
        .. math::
           J \rightarrow
              J + \frac{(\ket{\d{G}} - J\ket{\d{x}})\bra{\d{x}}}
                       {\braket{\d{x}|\d{x}}}.

        This can be analytically inverted to give an update for the inverse:

        .. math::
           J^{-1} \rightarrow J^{-1}
           + \frac{\ket{\d{x}} - J^{-1}\ket{\d{G}}\bra{\d{x}}J^{-1}}
                  {\braket{\d{x}|J^{-1}|\d{G}}}.

        Any update that satisfies the secant condition can be expressed as:

        .. math::
           J^{-1} \rightarrow J^{-1}
           + \frac{\ket{\d{x}} - J^{-1}\ket{\d{G}}\bra{v}}
                  {\braket{v|\d{G}}}.

        as long as :math:`\braket{v|\d{G}}` is not zero.  The two
        conditions correspond to different choices of either
        :math:`\bra{v}=\bra{\d{G}}` to minimize the Jacobian norm
        (Broyden's "second" method) or
        :math:`\bra{v}=\bra{\d{x}}J^{-1}` to minimize the inverse
        Jacobian norm (Broyden's "second" method).  Numerical Recipies
        has the latter but I am not sure it is better...  It is more
        complicated.

        As a measure of the significance of this correction, we return
        the following dimensionless quantity:

        .. math::
             \chi_G &= \frac{\norm{\vect{x}_2 - \vect{x}_1}}
                            {\norm{\vect{x}_2 - \vect{x}_0}}, \\
                    &= \frac{\norm{\left(\mat{J}^{-1}_{1}
                             - \mat{J}^{-1}_{0}\right)\cdot\uvect{G}}}
                          {\max_{n=0, 1}\norm{\mat{J}^{-1}_{n}\cdot\uvect{G}}}

        Another option might be
        .. math::
           \frac{\norm{(\mat{J}^{-1} - \mat{J}_{0}^{-1})\d{G}}}
                {\mat{J}^{-1}\norm{\d{G}}}
        """
        G0 = self.broyden_data.G
        x0 = self.broyden_data.x
        Jinv0 = self.broyden_data.Jinv

        if x0 is None or G0 is None:
            # Initial call.  Just store values an initialize
            self.suspend()
            self.x0 = copy.copy(x)
            self.G0 = g
            self.resume()
            return 1.0

        x = self._pack(x)    
        g = self._pack(g)

        dg = g - G0
        dx = x - x0

        # Unit vector and change for later measuring change in Jinv
        # Could use dg here rather than G0
        G0_ = _ket(G0/(np.linalg.norm(G0) + _TINY))
        J0G0 = Jinv0*G0_
        
        if skip_inds is not None:
            skip_inds = np.asarray(skip_inds, dtype=int)
            dg[skip_inds] = 0
            dx[skip_inds] = 0
            
        if np.linalg.norm(dx) <= 0 or np.linalg.norm(dg) <= 0:
            # It could happen, such as when changing resolutions, that
            # this is called without actually generating a new step,
            # so we just pass here.
            return 0.0

        w = _ket(dx) - Jinv0*_ket(dg)

        # Zero out noisy components
        mags = _ket(abs(G0) + abs(g))
        w = np.where(abs(w) > _EPS*mags, w, 0.0)

        if (w != 0).any():
            if self.minimize_inverse:
                vt = _bra(dg)
            else:
                vt = _bra(dx)*Jinv0
                dot_prod = dot(vt, _ket(dg))
                dx_norm = np.linalg.norm(dx)
                dg_norm = np.linalg.norm(dg)
                reduction = abs(dot_prod/dx_norm/dg_norm)
                if reduction < self.min_Jinv:
                    # Use other step
                    vt = _bra(dg)
            t = dot(vt, dg)
            w = w/t
            _add_dyad(Jinv0, w, vt)

        if self.robust:
            U, d, Vinv = np.linalg.svd(Jinv0)
            print np.hstack((abs(dot(U.T, _ket(abs(g)))), _ket(d)))
            if self.max_Jinv < d.max():
                warnings.warn("Warning: Clipping Jacobian: %f"%d.max())
                d = np.where(d<self.max_Jinv, d, self.max_Jinv)
            Jinv0 = U*np.mat(np.diag(d))*Vinv


        self.broyden_data.Jinv = Jinv0

        # Compute measure for change in Jinv.
        JG0 = Jinv0*G0_

        res = np.linalg.norm(JG0 - J0G0)/(
            max(np.linalg.norm(JG0), np.linalg.norm(J0G0)) + _TINY)

        # Update predicted position.
        self._update_x(x=x, g=g)

        return res

    def _get_dx(self, x, g, x_controls):
        r"""Get step from error `G(x)`.

        Parameters
        ----------
        x_controls : (array, array) or None
           Tuple `(inds, vals)` indices and the respective values of the
           control parameters.

        Examples
        --------
        Here is a little test with a linear function

        >>> J = np.array([[1, 2, 3], [1, -3, 2], [-2, 1, 4]], dtype=float)
        >>> sol = np.array([1, 2, 3])
        >>> def G(x):
        ...     return np.dot(J, x - sol)
        >>> x0 = np.array([0.0, 0.0, 0.0])
        >>> b = Broyden(x0=x0, G0=G(x0), Jinv0=np.linalg.inv(J))
        >>> x = np.array([1.0, 2.0, 3.0])

        Now try to step to the solution, but hold the second variable as
        a control at 1.  Since the function is exactly linear and we
        have explicitly given the correct Jacobian, we should end up
        with a root in the other two parameters.
        
        >>> res = b.update(x=x, g=G(x))
        >>> b.x
        array([ 1.,  2.,  3.])
        >>> x = b.get_x(x_controls=([1], [1.0]))
        >>> x
        array([ 1.5,  1. ,  3.5])
        >>> np.allclose(G(x), [0., 4.5, 0.])
        True        
        """
        if g is None:
            dx = 0*x
            return dx
        elif self.broyden_data.Jinv is None:
            dx = -g
        else:
            Jinv = self.broyden_data.Jinv
            if x_controls is not None and 0 < len(x_controls[0]):
                inds, vals = x_controls
                inds = np.asarray(inds, dtype=int)
                vals = np.asarray(vals)
                dg_dmu = copy.copy(g)
                n_d = len(inds)
                n_x = len(x)
                basis = np.mat(np.zeros((n_x, n_d), dtype=float))
                basis[inds, np.arange(n_d)] = 1
                dg_dmu[inds] = x[inds] - vals 
                d = basis.T*Jinv*basis
                dmu = _ket(dg_dmu[inds])
                b = (Jinv*_ket(dg_dmu))[inds, :]
                dN = dmu - np.linalg.solve(d, b - dmu)
                dg_dN = dg_dmu
                dg_dN[inds] = dN[:]
                dx_dmu = Jinv*_ket(dg_dN)

                assert np.allclose(dx_dmu[inds], dmu)
                # Set these to make sure no round-off errors have
                # accumulated.
                dx_dmu[inds].flat = x[inds] - vals

                dx = -dx_dmu
            else:
                dx = -(Jinv*_ket(g))
            dx = np.asarray(dx).ravel()

        G_mag = np.linalg.norm(g)
        dx_mag = np.linalg.norm(dx)
        if dx_mag == 0 and G_mag > 0:
            warnings.warn("Warning: Singular Jinv, taking random step")
            dx = np.random.rand(len(dx))*self.min_step
        elif np.divide(dx_mag, G_mag) < self.min_Jinv:
            warnings.warn("Warning: Singular Jinv, taking minimum step")
            dx *= self.min_step/np.linalg.norm(dx)

        dx *= self.step_factor

        dx_len2 = dot(dx, dx)
        if self._max_step2 < dx_len2:
            dx *= math.sqrt(self._max_step2/dx_len2)

        return dx

class BroydenDyadic(Broyden):
    r"""Implements a rudimentary Broyden search by keeping track of the
    Jacobian an allowing for Broyden steps to be made.  Uses a dyadic
    sum representation to reduce the memory overhead.

    Examples
    --------
    >> def G(x):
    ...     return x - x*x
    >> x0 = 2.0
    >> b = BroydenDyadic(x0=x0, G0=G(x0))
    >> b.x
    4.0
    >> b.update(G(b.x))                # doctest: +ELLIPSIS
    6.0...
    >> b.x                             # doctest: +ELLIPSIS
    1.60000...
    >> b.update(G(b.x))                # doctest: +ELLIPSIS
    0.08...
    >> b.x                             # doctest: +ELLIPSIS
    1.3913043478...
    >> for n in xrange(8): b.update(G(b.x)) # doctest: +ELLIPSIS
    0.56...
    0.24...
    0.23...
    0.09...
    0.02...
    0.003...
    ...e-05
    ...e-07
    >> b.x
    1.0

    Here is a two-dimensional problem:
    x = 2*x*x
    y = y*y
    x , y = 0.5, 1.0
       
    >> def G(x):
    ...     return [x[0] - x[0]*x[0]*2.0, x[1] - x[1]*x[1]]
    >> x0 = [2.0, 2.0]
    >> b = BroydenDyadic(x0=x0, G0=G(x0))
    >> for n in xrange(30):
    ...    res = b.update(G(b.x))
    >> map(lambda r:round(r, 12), b.x)
    [0.5, 1.0]
    """
    _state_vars = [('dyadic=broyden_data.dyadic', True)]
    process_vars()
    
class ModifiedBroyden(object):
    r"""Implements a modified Broyden search.

    Similar to that described in
    J. Chem. Phys. Vol. 108, No. 100, 4426.
    but with a straightforward implementation (not efficient).

    Notes
    =====
    N = length of vectors x and F(x)
    Ni = subspace size
    X = [x0 x1 ... xi]
    G = [G(x0) G(x1) ... G(xi)] where G(x) = x - F(x)
    _next_x = Current step.

    >> def F(x):
    ...     return x*x
    >> x0 = 2.0
    >> B = ModifiedBroyden(x0, F(x0))
    >> B.x
    4.0
    >> B.update(F(B.x))
    >> B.x                  # doctest: +ELLIPSIS
    1.59999999...
    >> B.update(F(B.x))
    >> B.x                     # doctest: +ELLIPSIS
    1.3913043478...
    >> for n in xrange(8): B.update(F(B.x))
    >> B.x
    1.0

    Here is a two-dimensional problem.
    x = 2*x*x
    y = y*y
    x , y = 0.5, 1.0
    >> def F(x):
    ...     return [x[0]*x[0]*2.0, x[1]*x[1]]
    >> x0 = [2.0, 2.0]
    >> B = ModifiedBroyden(x0, F(x0))
    >> for n in xrange(30):
    ...    B.update(F(B.x))
    >> map(lambda r:round(r, 12), B.x)
    [0.5, 1.0]
    """
    def __init__(self, x0, Fx0, alpha=1.0, W=None, max_step=np.inf,
                 mmf_method=True, DIIS=5):
        r"""Initialize the Brodyen object.
        
        :Parameters:
            alpha : Initial Jacobian is alpha*eye(N)
            W     : W is a function for computing the weights w[i].
                    It will be passed the vector arange(Ni) and should
                    return the pair (w0, wi) where wi is a vector.
            mmf_method : If True then use my form of DeltaG.

        """
        self._pack, self._unpack = pack_unpack(x0)
        self.max_step = max_step
        self.max_step2 = max_step*max_step
        if W is None:
            def W(i):
                r"""Reproduce standard Broyden method."""
                w = array(i)*0.0
                w[-1] = 1.0
                return (0.001, w)

            def W(i):
                r"""A Weigted Broyden."""
                N = len(i)
                w = array(i)/N
                return (0.001, w)

            def W(i):
                r"""A Weigted Broyden."""
                N = len(i)
                w = array(i)*0.0
                w[-10:] = 1.0
                return (0.001, w)
        
        self.W = W
        self.mmf_method = mmf_method
        self.DIIS = DIIS

        self.X = mat(self._pack(x0))
        self.G = mat(self.X[:, 0] - self._pack(Fx0))
        self.Jinv = alpha*mat(np.eye(self.N))
        self.R = -self.Jinv*self.G
        self._next_x = self.get_next_x(self.X[:, 0], self.G[:, 0])

    @property
    def kappa(self):
        return self.G.T*self.G

    @property
    def N(self):
        r"""Number of parameters."""
        return self.X.shape[0]

    @property
    def Ni(self):
        r"""Number of iterations."""
        return self.X.shape[1]

    def get_x(self):
        return self._unpack(self._next_x)
    def set_x(self, x):
        self._next_x = self._pack(x)
    x = property(get_x, set_x, doc="New point x.")

    def update(self, Fx, x=None):
        r"""Update based on new value of F(x).

        Use the property x to access the next value of x.
        """
        if x is None:
            x = self._pack(self.x)
        else:
            x = self._pack(x)

        Fx = self._pack(Fx)
        G = x - Fx
        if self.mmf_method:
            dg = array(G - self.G)
            dX = array(x - self.X)
        else:
            G_ = np.hstack((self.G, G))
            X_ = np.hstack((self.X, x))
            dg = array(G_[:, 1:] - G_[:, :-1])
            dX = array(X_[:, 1:] - X_[:, :-1])
        
        Nni = array(np.sqrt(np.diag(mat(dg.T)*mat(dg)))
                    ).reshape((1, self.Ni)) + \
                    np.finfo(float).tiny
        DeltaG = dg/Nni
        DeltaX = dX/Nni
        
        (w_tilde, w) = self.W(np.arange(0, self.Ni))
        w = array(w).reshape((1, self.Ni))
        w_tilde2 = w_tilde*w_tilde
        DeltaG *= w
        DeltaX *= w

        An = w_tilde2*self.Jinv + mat(DeltaX)*mat(DeltaG).T
        Bn = w_tilde2*np.eye(self.N) + mat(DeltaG)*mat(DeltaG).T
        
        self.Jinv = np.linalg.solve(Bn.T, An.T).T
        self.X = np.hstack((self.X, x))
        self.G = np.hstack((self.G, G))
        self.R = np.hstack((self.R, -self.Jinv*G))
        self._next_x = self.get_next_x(x, G)

    def get_next_x(self, x, G):
        r"""Get new x from error G = x - F(x)."""
        dx = -(self.Jinv*G)
        if self.DIIS:
            X = self.X
            G = self.G
            if self.mmf_method:
                R = -self.Jinv*self.G
            else:
                R = self.R
            Ni = min(self.Ni, self.DIIS)
            Ri = R[:, -Ni:]
            A = np.empty((Ni+1, Ni+1), dtype=float)
            A[:Ni, :Ni] = Ri.T*Ri
            A[-1, :] = -1.0
            A[:, -1] = -1.0
            A[-1, -1] = 0.0
            B = np.zeros((Ni+1, 1))
            B[-1, 0] = -1.0
            c = mat(np.linalg.solve(A, B))
            x_new = mat((X+R)[:, -Ni:])*c[:-1, :]
            dx = x_new - x
        
        dx_len2 = (dx.T*dx)[0, 0]
        if self.max_step2 < dx_len2:
            dx *= math.sqrt(self.max_step2/dx_len2)
        return x + dx

[docs]class ExtendedBroyden(StateVars):
    r"""This is a class implementing the memory limited extended
    Broyden update.

    """
    _state_vars = [
        ('method', 1),
#         ('X', Computed, r"""$N\times n$ array of the previous states.
#            Also used to store basis information so that redundant
#            storage is not required."""),
#         ('G', Computed, r"""$N\times n$ array of the previous function
#            evaluations. Also used to store basis information so that
#            redundant storage is not required."""),
#         ('A', Computed), 
#         ('B', Computed),
        ]
    process_vars()

    def get_J(self, dX, dG, w_0=0, w_r=0, 
              Qa=None, Qb=None, b0=None,
              w=None, singular_tol=_SINGULAR_TOL):
        r"""Return `(Qa, Qb, b)` such that the inverse Jacobian is
        expressed as:
        
        .. math::
           \mat{B} = \mat{Q}_{a}\diag(b)\mat{Q}_{b}^{T}
        
        Parameters
        ----------
        dX, dG : array
           These should be $N \times n$ arrays of differences $x_{i} -
           x_{j}$ and $G(x_{i}) - G(x_{j})$.  If there are $m$
           independent points then there may be up to $n=m(m-1)/2$
           independent differences, but because everything depends on
           these linearly, only $n=m-1$ entries are needed.  The
           additional information can be codified in the weight matrix
           `w`.  (If desired, the larger array may be passed -- a
           decomposition will be used to reduce the rank.)

           .. note:: The code will effectively only work with the
              matrices $\ket{\d\mat{X}}\mat{w}$ and
              $\ket{\d\mat{G}}\mat{w}$, so one could in principle pass
              in something like the full matrix of previous steps
              $\mat{X}$ and include the differencing in $\mat{w}$.
        w : array, (optional)
           This is the weight matrix.  If it is one-dimensional (with
           length $n$) then it is assumed to be diagonal.  A
           two-dimensional array may also be specified as discussed
           above.  Assumed to be the identity if not provided.
       Qa, Qb : array
           These are the basis vectors in which the previous inverse
           Jacobian is expressed:

           .. math::
              \mat{B}_{0} = \mat{Q}_{a}\mat{b}_{0}\mat{Q}_{b}^{T}

        b : array
           The non-identity part of the previous Jacobian expressed in
           the basis `Qa` and `Qb`.  If one-dimensional, then it will
           be interpreted as the diagonals.
        w_0 : float
           The magnitude of this in relationship to `w` determines how
           much to pull the Jacobian back to the previous Jacobian.
        w_r : float
           The magnitude of this in relationship to `w` determines how
           much to pull the Jacobian back to the identity.  This
           simulates a "restart".
        singular_tol : float
           The minimum singular value of the Jacobian and inverse
           Jacobian is limited by this.  Values outside this range are
           eliminated (along with the corresponding entries in the
           basis vectors `Qa` and `Qb`).

        Returns
        -------
        Qa, Qb : array
           These are $N \times m$ orthonormal arrays
           $\mat{Q}^{T}\mat{Q} = \mat{1}$ with $m \leq n$.
        b : array
           This diagonal array expresses the non-identity part of
           the inverse Jacobian $\mat{B}$ in the specified basis.
           This is computed from the SVD so these are positive entries
           between `min_svd` and `1/min_svd`.

        Notes
        -----
        This method computes using the following formulae

        .. math::
           \mat{b} &= \frac{-1}{\mat{j}^{-1}
                                + \braket{\uvect{\beta}|\uvect{\alpha}}},\\
           \mat{j}_{\text{1a)}} &=
             \left\{
               \braket{\uvect{\alpha}|\d\mat{G}-\d\mat{X}}\mat{w}^2
               \braket{\d\mat{X}|\uvect{\beta}}
               +
               w_{0}^2\braket{\uvect{\alpha}|\uvect{\alpha}_{0}}
                      \mat{j}_{0}
                      \braket{\uvect{\beta}_{0}|\uvect{\beta}}
             \right\}
           \cdot
           \frac{1}{\braket{\uvect{\beta}|\d\mat{X}}\mat{w}^2
                    \braket{\d\mat{X}|\uvect{\beta}} + w_{0}^2 + w_{r}^2},\\
           \mat{b}_{\text{2b)}} &= 
             \left\{
               \braket{\uvect{\alpha}|\d\mat{X}-\d\mat{G}}\mat{w}^2
               \braket{\d\mat{G}|\uvect{\beta}}
               +
               w_{0}^2\braket{\uvect{\alpha}|\uvect{\alpha}_{0}}
                      \mat{b}_{0}
                      \braket{\uvect{\beta}_{0}|\uvect{\beta}}       
             \right\}
           \cdot
           \frac{1}{\braket{\uvect{\beta}|\d\mat{G}}\mat{w}^2
                    \braket{\d\mat{G}|\uvect{\beta}} + w_{0}^2 + w_{r}^2}.

        Examples
        --------
        First we start with a problem of full rank:

        >>> np.random.seed(0)
        >>> dX = np.random.rand(10, 4)
        >>> dG = np.random.rand(10, 4)
        >>> w = np.ones(4)
        >>> broyden = ExtendedBroyden(method=1)
        >>> Qa, Qb, b = broyden.get_J(dX, dG, w=w)
        >>> np.allclose(np.dot(Qa.T, Qa), np.eye(4))
        True
        >>> np.allclose(np.dot(Qb.T, Qb), np.eye(4))
        True

        In this case, the inverse Jacobian exactly satisfies the
        secant condition:

        >>> B = np.eye(10) + np.dot(Qa*b[None, :], Qb.T)
        >>> np.allclose(np.dot(B, dG), dX)
        True

        Broyden's "bad" method is also supported

        >>> broyden.method = 2
        >>> Qa, Qb, b = broyden.get_J(dX, dG, w=w)
        >>> np.allclose(np.dot(Qa.T, Qa), np.eye(4))
        True
        >>> np.allclose(np.dot(Qb.T, Qb), np.eye(4))
        True
        >>> B = np.eye(10) + np.dot(Qa*b[None, :], Qb.T)
        >>> np.allclose(np.dot(B, dG), dX)
        True

        Now we consider a rank deficient problem

        >>> dX[:,0] = dX[:,1]
        >>> dG[:,0] = dG[:,1]
        >>> Qa, Qb, b = broyden.get_J(dX, dG, w=w)
        >>> b.shape
        (3,)
        >>> B = np.eye(10) + np.dot(Qa*b[None, :], Qb.T)
        >>> np.allclose(np.dot(B, dG), dX)
        True
        """
        def inv(A):
            return np.linalg.pinv(A, rcond=singular_tol**2)

        simple_algorithm = True
        if w is not None:
            w = np.asarray(w)
        if simple_algorithm:
            # This is the simple form of the algorithm where we
            # process the large matrices.
            Xa = dX - dG

            if 1 == self.method:
                # Broyden's "good" method
                Xb = dX
            elif 2 == self.method:
                # Broyden's "bad" method
                Xb = dG
            else:
                raise NotImplementedError

            if w is not None:
                if 1 == len(w.shape):
                    Xa *= w[None, :]
                    Xb *= w[None, :]
                else:
                    Xa = dot(Xa, w)
                    Xb = dot(Xb, w)

            if Qa is not None:
                Xa = np.hstack([Xa, Qa])
            if Qb is not None:
                Xb = np.hstack([Xb, Qb])
            alpha, r_a = np.linalg.qr(Xa)
            beta, r_b = np.linalg.qr(Xb)

            if 1 == self.method:
                d_beta = dot(dX.T, beta)
                beta_alpha = dot(beta.T, alpha)
                if b0 is not None:
                    bj0 = -inv(inv(b0) + beta_alpha)
            elif 2 == self.method:
                d_beta = dot(dG.T, beta)
                if b0 is not None:
                    bj0 = b0

            num = dot(dot(alpha.T, dX - dG), d_beta)
            if 1 == self.method:
                num *= -1

            if b0 is not None:
                num += w_0**2*dot(
                    dot(dot(alpha.T, Qa), bj0),
                    dot(Qb.T, beta))

            denom = dot(d_beta.T, d_beta)
            denom += np.eye(len(denom), dtype=float)*(w_0**2 + w_r**2)
            b = dot(num, inv(denom))
            if 1 == self.method:
                # We actually computed j rather than b
                j = b
                b = -inv(inv(j) + beta_alpha)
                
            u, d, vt = np.linalg.svd(b)
            inds = np.where(np.logical_and(
                    d >= singular_tol,
                    d <= 1./singular_tol))[0]
            d = d[inds]
            Qa = dot(alpha, u[:,inds])
            Qb = dot(beta, vt.T[:,inds])
            return Qa, Qb, d
        else:
            dX = np.asarray(dX)
            dG = np.asarray(dG)
            dXG = dX - dG
            xgxg = dot(dXG.T, dXG)
            gg = dot(dG.T, dG)

            if w is not None:
                if 1 == len(w.shape):
                    gg = w[:,None]*gg*w[None, :]
                    xgxg = w[:,None]*xgxg*w[None, :]
                else:
                    gg = dot(dot(w.T, gg), w)
                    xgxg = dot(dot(w.T, xgxg), w)

            d2_xg, u_xg = eigh(xgxg)
            assert np.all(0 <= d2_xg)
            d_xg = np.sqrt(d2_xg)
            alpha_dXG = dot(u_xg*d_xg[None, :], u_xg.T)

            if Qa is not None:
                alpha0_dXG = dot(Qa.T, dXG)
                alpha_dXG = np.hstack([alpha_dXG, alpha0_dXG])
                alpha0_alpha = np.vstack([dot(alpha0_dXG,
                                                 u_xg/(d_xg[None, :] + _TINY)),
                                          np.eye(Qa.shape[1], dtype=float)])

            d2_g, u_g = eigh(gg)
            assert np.all(0 <= d2_g)
            d_g = np.sqrt(d2_g)
            beta_dG = dot(u_g*d_g[None, :], u_g.T)

            if Qb is not None:
                beta0_dG = dot(Qb.T, dG)
                beta_dG = np.hstack([beta_dG, beta0_dG])
                beta0_beta = np.vstack([dot(beta0_dG,
                                               u_g/(d_g[None, :] + _TINY)),
                                        np.eye(Qb.shape[1], dtype=float)])

            if b0 is not None:
                alpha_b0_beta = 2*dot(dot(alpha0_alpha.T, b0),
                                         beta0_beta)
            else:
                alpha_b0_beta = 0

            num = (dot(alpha_dXG, beta_dG.T) + 
                   w_0**2*alpha_b0_beta)
            denom = dot(beta_dG, beta_dG.T)
            denom += np.eye(len(denom), dtype=float)*(w_0**2 + w_r**2)

            b = self._stable_divide(num, denom)

            u, d, vt = np.linalg.svd(b)

            u = dot(u_n, u)
            vt = dot(vt, u_d.T)
            inds = np.where(np.logical_and(
                     min_svd <= abs(d),
                     abs(d) <= 1./min_svd))[0]
            Qa = dot(Qa, u[:, inds])
            Qb = dot(Qb, vt.T[:, inds])
            return Qa, Qb, d
Navigation

Quick search

Source code for mmf.solve.broyden

Navigation