Source code for mmf.fit.lsq

r"""Least squares fitting with error estimates.

Linear Least Squares
====================
The simplest least-squares is the linear least squares approach which
involves fitting the coefficients $a_{n}$ in an expansion of the form:

.. math::
   y = \sum a_n f_{n}(x).

To view this as a linear algebra problem, define the matrix

.. math::
   \mat{X}_{m,n} = f_{n}(x_{m})

One can then write the problem as a standard least-squares problem of
minimizing the $\chi^2$ fitness measure over the vector of parameters
$\mat{a}$:

.. math::
   \chi^2 = \norm{\mat{\delta_y}^{-1}(\mat{y} - \mat{X}\cdot \mat{a})}_{2}^2.

Error Estimates
===============
.. warning:: The error estimates discussed here are only valid under the strict
   assumption that the errors in the input data are normally distributed.  In
   this case -- and when the model is linear over the region of variation --
   then the covariance matrix $C$ may be used to provide confidence estimates as
   discussed below.

   For a detailed discussion, please see [PTVF:2007]_ where they discuss
   that, if you have non-normal errors, then you may still minimize the $\chi^2$
   to fit your parameters, and use the covariance matrix $\mat{C}$ to determine
   the confidence contours.  You *may not* estimate which confidence interval
   the contours correspond to.  To get this estimate, you must perform a Monte
   Carlo simulation using the proper distribution function.

The following discussion assumes that the input data has normally distributed
errors.

Single Parameter Confidence Interval
++++++++++++++++++++++++++++++++++++

When considering a single parameter on its own, the error returned represents
the $1\sigma$ confidence interval (for normally distributed data) for that
parameter assuming that the additional parameters are chosen to minimize
$\chi^2$.  We provide two types of error and covariance estimates here depending
on the value of the `scale_cov` argument:

1) If `scale_cov = False`, then the error estimates describe the standard
   deviation of the distribution of the fitted parameters from a set of data
   with uncorrelated normal errors with the prescribed standard deviations `dy`.
   If the model is poor, or the errors are underestimated (very low quality of
   fit `Q`) , then this will give very small error (unreasonable) estimates for
   the parameters.

2) If `scale_cov = True`, then the covariance matrix is scaled by the
   reduced chi squared $\chi^2_{R} = \chi^2/\nu$ where $\nu =
   N - p$ is the number of degrees of freedom ($N$ data points and $p$
   parameters).  This is equivalent to the previous case if
   $\chi^2_{R} = 1$, and so can be thought of as scaling the
   error estimates `dy` to make the model fit.  In essence, this
   provides a much more reasonable estimate of the parameters
   as a curve fitting problem, but will not have the proper
   interpretation as in the prior case.  This is what the :mod:`scipy`
   documentation means when it states: "This matrix must be multiplied by the
   residual standard deviation to get the covariance of the parameter
   estimates" in the documentation of :func:`scipy.optimize.leastsq`. 

   .. note::
   
      One can estimate the `scale_cov = True` errors after the fact by simply
      scaling them by simply multiplying the covariance matrix by the reduced
      chi-squared $\mat{C} \rightarrow \mat{C}\chi^2_{R}$.  Likewise, the errors
      can be scaled by $\delta_a \rightarrow \delta_a \sqrt{\chi^2_{R}}$.

In short, if the error estimates `dy` are trusted, then do not scale
the covariance matrix, and take the quality of the fit `Q` seriously
and rejecting the parameter estimates if it is too small.  On the
other hand, if the errors are not known, or one is imply trying to
visually fit the data, then one should use the scaled covariance
matrix.

If you need different confidence bands, then note that the error is related to
the diagonal of the covariance matrix via:

.. math::
   \delta a_0 = \pm \sqrt{\Delta C_{00}}.

One can compute $\Delta$ for the appropriate confidence level using
:func:`Delta`.  Note that for a single parameter $\nu=1$ and the standard
$1\sigma$ ($68.3\%$) confidence interval, $\Delta = 1$::

    >>> Delta(nu=1.0,sigma=1.0)
    1.0

Thus, you can simply multiply the error by $\sqrt{\Delta}$ to find
the different confidence interval.

Multiple Parameter Confidence Intervals
+++++++++++++++++++++++++++++++++++++++

The previous characterization of errors -- and that returned in the parameter
error estimates -- is valid when each parameter is considered independently
under the assumption that the other parameters are adjusted to minimize
$\chi^2$. It is in general not valid when considering a group of parameters.
Instead, the confidence region of multiple parameters is in general described by
an ellipsoid described by the following procedure (see [PTVF:2007]_):

1) Determine $\Delta$ such that $\Gamma_{q}(\nu/2, \Delta/2) = 1-p$
   where $p = 68.3\%$ for one standard deviation, $p = 95.4\%$ for two
   standard deviations etc.  Here $\nu$ is the number of parameters.
2) The confidence ellipse is:

   .. math::
      \vect{\delta}_a^T\cdot \mat{c}^{-1} \cdot \vect{\delta}_{a} = \Delta

   where $\mat{c}$ is the block of the total covariance matrix
   containing the corresponding parameters of interest, and
   $\vect{\delta}_{a}$ is the vector of the errors in those
   parameters.  This analysis assumes that the other parameters are
   adjusted to minimize $\chi^2$ as these chosen parameters are
   varied and as a result, is slightly smaller than the extremal
   bounds of the confidence ellipse.

We present the result of this here in terms of a non-linear least
squares formula.  Suppose that $y = f(x, \vect{a})$, then within the
error ellipse we have

.. math::
   y = f(x, \vect{a} + \vect{\delta}_{a}) 
     \approx f + \partial_{\vect{a}}f\cdot \vect{\delta}_{a}.

If we diagonalize $\mat{c} = \mat{U}\mat{d}^2\mat{U}^{T}$ where
$\mat{d}$ is diagonal, then we can write the maximum value of $y$ as

.. math::
   f + \sqrt{\Delta} \max_{\norm{\vect{q}} = 1}
                     \partial_{\vect{a}}f\cdot \mat{d}\mat{U}\vect{q}.

Thus, we simply find the maximum and minimum components of the vector
$\partial_{\vect{a}}f\cdot \mat{d}\mat{U}$ and we have the range of
errors in $y$.  For the linear model $\partial_{a_{n}}f = f_{n}$.

Example
+++++++
Here we demonstrate this with a polynomial fit with data sampled with a Gaussian
distribution.  We generate the data with a fixed standard deviation, but first
underestimate the errors.  We then apply the `scale_cov=True` and show that this
is consistent with a proper error estimate.

.. plot::
   :width: 100%
   :include-source:

   plt.figure(figsize=[16,6])
   plt.subplot(121)

   import mmf.fit.lsq
   np.random.seed(0)
   fs = [lambda x:1, lambda x:x, lambda x:x*x]
   def f(x, p): return fs[0](x)*p[0] + fs[1](x)*p[1] + fs[2](x)*p[2]
   x = np.linspace(0.3, 1.3, 10); p0 = [1, 2, 3]; y0 = f(x, p0)
   X = np.linspace(-0.1, 1.5, 100)

   dy0 = 2.0                 # Large scatter in points
   y = y0 + np.random.normal(0, dy0, len(y0))
   dy = 0.5                  # Underestimate of errors

   plt.subplot(121)
   (p, cov, Q, F) = mmf.fit.lsq.lsqf(fs, x, y, dy=dy, scale_cov=False)

   plt.errorbar(x, y, 0*y + dy, fmt='.b', ecolor='b')
   plt.errorbar(0, p[0], np.sqrt(cov[0,0]), fmt='dk')
   plt.plot(X, F(X, a=p)[0], '-k', 
            X, F(X, a=p)[0] + F(X, a=p)[1], '--b',
            X, F(X, a=p)[0] - F(X, a=p)[1], '--b')
   plt.title("Q=%g" % Q)
   plt.axis((-0.1,1.5,0,15))

   plt.subplot(122)
   (p, cov, Q, F) = mmf.fit.lsq.lsqf(fs, x, y, dy=dy, scale_cov=True)

   plt.errorbar(x, y, 0*y + dy0, fmt='.g', ecolor='g')
   plt.errorbar(0, p[0], np.sqrt(cov[0,0]), fmt='dk')
   plt.plot(X, F(X, a=p)[0], '-k', 
            X, F(X, a=p)[0] + F(X, a=p)[1], ':g',
            X, F(X, a=p)[0] - F(X, a=p)[1], ':g')

   (p, cov, Q, F) = mmf.fit.lsq.lsqf(fs, x, y, dy=dy0, scale_cov=False)
   plt.errorbar(0, p[0], np.sqrt(cov[0,0]), fmt='dk')
   plt.plot(X, F(X, a=p)[0], '-k', 
            X, F(X, a=p)[0] + F(X, a=p)[1], '--g',
            X, F(X, a=p)[0] - F(X, a=p)[1], '--g')
   plt.title("Q=%g" % Q)

   plt.axis((-0.1,1.5,0,15))

As an example, here is a plot of some data scattered with a standard
deviation of `2` but fit with underestimated errors of `0.5` (left).
The quality of this fit is extremely poor (`Q=0`) and the error bounds (blue
dashed curves) are correspondingly underestimated.  This is the type
of fluctuation that the fitted curve would exhibit if the data
fluctuated with the smaller standard deviation, but does not describe
the scatter in the data.  If the error bars are trusted here and the
errors are normally distributed etc. then one would have to conclude
that the model is not very good.

On the right we consider the same fit, but with `scale_cov = True` (green dotted
curves) which much better captures the visual scatter of the data.  In fact,
this agrees with the error bounds (green dotted curves) that are obtained by
setting the appropriate standard deviation `dy = 2` (green error bars) with
`scale_cov = False`.  In this case the quality of the fit `Q = 0.4` is good and
the error bound agree well "visually" with the scatter in the data.  This shows
how -- if one has normally distributed errors of the same size -- one can use
`scale_cov=True` to estimate the errors.

Finally, since this is a polynomial model, one might think that the error in the
intercept would be simply the error in the parameter $a_0$ (shown as a black
error bar with a diamond at $x=0$).  This is the $1\sigma$ confidence interval
for the single parameter $a_0$ when all other parameters are adjusted to
minimize the $\chi^2$.  This corresponds to $\Delta(\sigma=1, \nu=1)=1$.  The
error band shown in the previous plot corresponds to the joint error keeping all
three parameters fixed $\sqrt{\Delta(\sigma=1, \nu=3)}=1.88\cdots$ which is the
larger ellipse shown below.  This is a projection of the $1\sigma$ "joint"
distribution ellipsoid for all three parameters.

.. plot::
   :width: 100%
   :include-source:

   plt.figure(figsize=[16,6])

   import mmf.fit.lsq
   np.random.seed(0)
   fs = [lambda x:1, lambda x:x, lambda x:x*x]
   def f(x, p): return fs[0](x)*p[0] + fs[1](x)*p[1] + fs[2](x)*p[2]
   x = np.linspace(0.3, 1.3, 10); p0 = [1, 2, 3]; y0 = f(x, p0)
   X = np.linspace(-0.1, 1.5, 100)

   dy = 2.0
   y = y0 + np.random.normal(0, dy, len(y0))

   (p, cov, Q, F) = mmf.fit.lsq.lsqf(fs, x, y, dy=dy, scale_cov=False)

   # Generate ellipses for 2-parameters a_0 and a_1
   def error_band(Cproj, sigma=1.0, nu=1.0):
       u, d, vt = np.linalg.svd(Cproj)

       Delta = mmf.fit.lsq.Delta(sigma=sigma, nu=nu)
       theta = np.linspace(0, 2.0*np.pi, 100)
       dx = np.sqrt(Delta)*np.exp(1j*theta)
       dp = np.dot(np.dot(u*np.sqrt(d),vt), [dx.real, dx.imag])
       assert np.allclose(Delta, 
                          (np.dot(dp.T, np.linalg.inv(Cproj))*dp.T).sum(axis=1))

       return dp

   plt.subplot(121)
   C01 = cov[:2,:2]
   ls = ['k-', 'g:', 'b--']
   for _l, _nu in enumerate([1,2,3]):
       dp = error_band(C01, nu=_nu)
       plt.plot(p[1] + dp[1], p[0] + dp[0], ls[_l])
       Delta = mmf.fit.lsq.Delta(sigma=1.0, nu=_nu)
       plt.axhline(p[0] + np.sqrt(Delta*cov[0,0]), c=ls[_l][0], ls=ls[_l][1])
       plt.axhline(p[0] - np.sqrt(Delta*cov[0,0]), c=ls[_l][0], ls=ls[_l][1])
       
   plt.xlabel("a_1")
   plt.ylabel("a_0")

   plt.subplot(122)
   C02 = cov[[0,2],:][:,[0,2]]
   for _l, _nu in enumerate([1,2,3]):
       dp = error_band(C02, nu=_nu)
       plt.plot(p[2] + dp[1], p[0] + dp[0], ls[_l])
       Delta = mmf.fit.lsq.Delta(sigma=1.0, nu=_nu)
       plt.axhline(p[0] + np.sqrt(Delta*cov[0,0]), c=ls[_l][0], ls=ls[_l][1])
       plt.axhline(p[0] - np.sqrt(Delta*cov[0,0]), c=ls[_l][0], ls=ls[_l][1])

   plt.xlabel("a_2")
   plt.ylabel("a_0")

The functions defined here perform linear least-squares fitting and
non-linear fitting, using :func:`scipy.optimize.leastsq`.


References
==========

.. [PTVF:2007] William H. Press, Saul A. Teukolsky, William T. Vetterling, and
   Brian P. Flannery, "Numerical Recipes: The Art of Scientific Computing",
   Cambridge University Press, (2007)

"""
from __future__ import division
__all__ = ['leastsq',
           'lsq', 
           'lsqf',
           'lsqm',
           'Delta',
           ]

import math

import numpy as np
import scipy as sp
import scipy.special
import scipy.optimize
import scipy.stats

[docs]def leastsq(f, p0, x, y, dy=None, df=None, scale_cov=False, 
            epsfcn=0.0, factor=100, ftol=1e-8, diag=None):
    r"""Return `(p, dp, Q, cov)` (or `(p, dp, Q, cov, F)` if `df` is
    provided) for a a least-square fit of the function `y = f(x, p)`
    for the parameters `p`.

    Parameters
    ----------
    f : function
       This should be a vectorized function `f(x, p)`.  A common
       example for polynomial fitting is::

         def f(x, p):
             return np.polyval(p, x)

    df : function, (optional)
       This should return the derivatives of `f` with respect to the
       parameters as a column vector (broadcast as `x[None, :]` and
       `p[:,None]`), for example, for polynomial fitting::

         def df(x, p):
             return x**np.arange(len(p))[::-1,None]

    p0 : array
       Initial guess for parameters.
    x, y : array
       This is the data
    dy : array, (optional)
       The errors.  These should be the standard deviations for the
       values `y`.   (The whole analysis is predicated on the
       assumption of Gaussian errors.)
    scale_cov : bool, (optional)
       If this is `True`, then the covariance matrix and error
       estimates will be scaled by `chi/sqrt(nu)` where `nu = len(x) -
       len(p)` is the number of degrees of freedom.  This scaling
       allows one to extract parameter estimates even if the errors
       `dy` are not properly estimated.
       
       .. note:: We do not use Bessel's correction here (which would count the
          number of degrees of freedom as `nu = len(x) - len(p) -1`).  This
          might lead to a slight bias in the error estimates but allows the code
          to work in plausible cases such as fitting two-points with one
          parameter.  We follow [PTVF:2007]_ here, not Wikipedia for
          example.

    Returns
    -------
    p : array
       Parameter estimates (mean values).
    dp : array
       Estimate of the errors in parameters assuming that they are
       independent.  Note that the actual 1-sigma confidence region
       for the parameters is the ellipsoid

       .. math::
          \vect{\delta}_{p}^{T} \cdot \mat{c} \vect{\delta}_{p} = 1.

       The meaning of these errors are that if a set of data is
       generated with uncorrelated normally distributed errors in `y`
       with standard deviation `dy` about mean values `f(x, p)`, then
       the estimates for the parameters `p` resulting from the fits
       will be approximately normally distributed with standard
       deviation `dp`.
    cov : array
       Two-dimensional covariance matrix $\mat{c}$.  This defines the
       1-sigma confidence interval if the errors are normally
       distributed.
    Q : float
       Quality of fit (should be between 0.01 and 0.1).  If errors are
       not specified or are inaccurate, then this is meaningless.

    Examples
    --------
    We start with a linear example of polynomial fitting as discussed
    above:

    >>> def f(x, p):
    ...     return np.polyval(p, x)
    >>> def df(x, p):
    ...     return x**np.arange(len(p))[::-1,None]
    
    Here are our parameters `f = 1 + 2*x + 3*x**2` and some errors:

    >>> p0 = [3, 2, 1]
    >>> dp0 = [0.03, 0.02, 0.01]

    Let's generate some normally distributed data over an abscissa
    with some errors `dy` that we can estimate using standard error
    analysis:
    
    >>> np.random.seed(1)
    >>> x = np.linspace(0, 3, 50)
    >>> dy = np.sqrt(((df(x, p0)*np.array(dp0)[:,None])**2).sum(axis=0))
    >>> y = f(x, p0) + np.random.normal(0, dy, len(x))

    Now we do the fit: we try with and without `df` to ensure that
    both are the same:

    >>> p,dp,Q,cov = leastsq(f=f,p0=[0,0,0],x=x,y=y,dy=dy)
    >>> p1,dp1,Q1,cov1,F = leastsq(f=f,p0=[0,0,0],x=x,y=y,dy=dy,df=df)
    >>> (np.allclose([p, dp], [p1, dp1]), np.allclose(cov, cov1))
    (True, True)

    Providing the derivative allows us to reduce the number of
    function calls though.  Now we generate a bunch of data and
    tabulate the fitted parameter values to see if the error estimates
    are reasonable:

    >>> P_ = np.array([
    ...     leastsq(f=f, p0=p, x=x, dy=dy, df=df,
    ...             y = f(x, p0) + np.random.normal(0, dy))[0]
    ...     for n in xrange(100)])
    >>> P_.mean(axis=0)
    array([ 3.00..., 2.00..., 0.99...])
    >>> P_.std(axis=0)
    array([ 0.012...,  0.022...,  0.0058...])

    This agrees with the previously calculated errors of the fit:

    >>> dp
    array([ 0.012...,  0.020...,  0.0058...])

    .. note:: The precise meaning of this error estimate is to describe the
       standard deviation of the fitted parameter values give
       independent normally distributed errors in the data with
       standard deviation `dy`. These are not estimates of the
       distribution of the original parameters which would be
       determined directly from the errors in the data:
   
       >>> np.sqrt(np.dot(np.linalg.pinv((df(x, p0).T**2)), dy**2))
       array([ 0.03,  0.02,  0.01])

       This relationship between `dy` and the distribution of the
       original parameters has nothing to do with the fit.  Were we to
       provide more data for the fit, the error in the parameters `dp`
       would decrease: we determining the mean of the parameter
       distribution more and more precisely.

    We also consider the extrapolation error using the model with the
    fitted parameters

    >>> x0 = 5.0                # Extrapolate to here
    >>> y0_ = np.array([
    ...     f(x0, leastsq(f=f, p0=p, x=x, dy=dy, df=df,
    ...                   y = f(x, p0) + np.random.normal(0, dy))[0])
    ...     for n in xrange(200)])
    >>> y0_.mean(), y0_.std()
    (86.00..., 0.227...)
    >>> F(x0)
    (86.18..., 0.220...)

    This agrees with the previously calculated errors of the fit:

    >>> dp
    array([ 0.012...,  0.020...,  0.0058...])


    Now we consider a non-linear model:

    .. math::
      f(p) = \sqrt{\left(\frac{p^2}{2m} - \mu\right)^2 + \Delta^2}

    First we generate some data, then we fit and extract the parameters

    >>> import numpy as np
    >>> from numpy import linspace, array, mean, std, ones, sqrt
    >>> from numpy.random import normal as rand
    >>> np.random.seed(1)
    >>> def f(x, m, mu, Delta):
    ...    return sqrt((x**2/2/m-mu)**2+Delta**2)
    >>> m, dm = 2.0, 0.01
    >>> mu, dmu = 2.0, 0.02
    >>> Delta, dDelta = 2.0, 0.03
    >>> x = linspace(0, 3, 50)
    >>> def get_y_dy(x, N=1000):
    ...     ys = [f(x0, rand(m,dm), rand(mu,dmu), rand(Delta,dDelta)) for
    ...           x0 in x*ones(N)]
    ...     y = ys[0]
    ...     dy = std(ys)
    ...     return (y,dy)
    >>> y_dy = [get_y_dy(x0) for x0 in x]
    >>> y = [ydy[0] for ydy in y_dy]
    >>> dy = [ydy[1] for ydy in y_dy]
    >>> #from pylab import errorbar,ion,clf
    >>> #ion();clf()
    >>> #errorbar(p,y,dy);
    >>> #import pdb;pdb.set_trace()
    >>> def fn(p, params):
    ...     m, mu, Delta = params
    ...     return f(p,m,mu,Delta)
    >>> p, err, Q, cov = leastsq(fn, [m, mu, Delta], x, y, dy)

    Is the fit is good?

    >>> Q
    0.6...

    Yes.  This is a Gaussian model and so the fit should be good.  In
    general, if the Gaussian approximation is not perfect, then `Q`
    may be lower.  Values of 0.001 may still be acceptable.  We check
    that the error estimates are also reasonable:
    
    >>> abs(p - array([m, mu, Delta]))/err
    array([ 1.3...,  1.6..., 1.6...])

    A brief note about how the errors and covariances scale.  If we scale the
    errors `dy` by a common factor `s`, then the fit `p` remains unchanged while
    the error-estimates scale by `s` and the covariance matrix scales by `s**2`:
    
    >>> s = 0.5
    >>> dy1 = s*np.array(dy)
    >>> p1, err1, Q1, cov1 = leastsq(fn, [m, mu, Delta], x, y, dy1)
    >>> np.allclose(p, p1)
    True
    >>> np.allclose(err1, s*err)
    True
    >>> np.allclose(cov1, s**2*cov)
    True

    Note that the `Q` value is horrible though because we have made the errors
    too small. 

    >>> Q1
    1...e-16

    The argument `scale_cov` scales the errors in this way so that the reduced
    $\chi^2_R \propto s^{-2} = 1$ would be one, and uses this to estimate the
    errors.

    >>> chi2 = (((fn(x, p1) - y)/dy1)**2).sum()
    >>> chi2_r = chi2/(len(x) - len(p1))
    >>> s1 = np.sqrt(chi2_r)
    >>> p2, err2, Q2, cov2 = leastsq(fn, [m, mu, Delta], x, y, dy1,
    ...                              scale_cov=True)
    >>> np.allclose(err2, err1*s1)
    True
    """
    x = np.array(x)
    y = np.array(y)
    N = len(y)                  # Number of data points
    try:
        Na = len(p0)            # Number of parameters
    except TypeError:
        Na = 1

    if dy is None:
        dy = y*0.0 + 1.0
    else:
        dy = np.array(dy)

    def residuals(p, x=x, y=y, dy=dy):
        return (f(x, p) - y)/dy

    if df is None:
        d_residuals = None
    else:
        def d_residuals(p, x=x, y=y, dy=dy):
            return df(x, p)/dy
    
    # This sometimes fails even where there is a reasonable answer.
    # See scipy ticket #984
    #def _f(x, *p):
    #    return f(x, p)
    #p, pcov = scipy.optimize.curve_fit(
    #    _f, xdata=x, ydata=y, p0=p0, sigma=dy,
    #    epsfcn=epsfcn, factor=factor)

    res = sp.optimize.leastsq(residuals, x0=p0, full_output=1,
                              Dfun=d_residuals, col_deriv=True,
                              epsfcn=epsfcn, factor=factor, 
                              ftol=ftol, diag=diag)
    (p, pcov, infodict, errmsg, ier) = res
    #return res
    #fjac = infodict['fjac']
    #ipvt = infodict['ipvt']
    
    # Compute the one sigma error estimates from the chi square:
    chi_sq = (residuals(p)**2).sum()
    if pcov is None:
        pcov = np.inf
    
    #D = Delta(N - Na, 1)  # This does not seem to work.
    #pcov *= D


    if N <= Na:
        # Special case since gammainc only works for positive values.
        Q = 0.0
        pcov = np.inf
        chi_sq_red = np.inf
    elif chi_sq == 0.0:
        Q = 1.0
        chi_sq_red = 0.0
    else:
        Q = 1.0 - scipy.special.gammainc((N-Na)/2.0, chi_sq/2.0)
        chi_sq_red = chi_sq/(N - Na)

    if scale_cov:
        pcov *= chi_sq_red

    shape = np.shape(p0)

    # Compute independent error estimates in the parameters
    err = np.reshape(np.sqrt(np.diag(pcov)),shape)

    p = np.reshape(p, shape)

    if df is not None:
        # If so, we can provide an extrapolation estimate.  Note that
        # I have not figured out how to do the sigma extrapolation yet.
        def F(x, p=p, f=f, df=df, # sigma=1
              ):
            r"""Returns (y,dy) where y = sum(a[n]*f[n](x)) and dy is the
            error estimate."""
            y = f(x, p)
            J = df(x, p)
            dy = np.sqrt(np.diag(np.dot(np.dot(J.T, pcov), J))).ravel()
            if np.asarray(y).shape is ():
                dy = dy[0]
            return y, dy
        return p, err, Q, pcov, F
    else:
        return p, err, Q, pcov

[docs]def lsq(X, y, dy=None):
    r"""Return `(a, cov, Q, residuals)` for a least
    squares fit of the linear least squares problem

    .. math::
       \min_{a}\norm{X\cdot a - y}^2_2.

    Parameters
    ----------
    X : 2d array X x Na
       Abscissa.  Least square problem is `norm(dot(X, a) - y)`.
    y : (N x 1) array of floats
       Data values.
    dy : (N x 1) array of floats, optional
       Errors in data (assumed to be normally distributed). Assumed to
       be `1` if not provided.

    Returns
    -------
    a : vector
       Vector of parameters estimates
    cov : matrix
       Covariance matrix
    Q : float
       Estimate of the goodness of the fit.  (`Q` should be between
       0.001 and 0.1.  Larger values indicate that the errors
       have been over-estimated.  Smaller values mean that the
       model is bad, or the errors have been underestimates, or
       the errors are not normally distributed.

    References
    ----------
    See [PTVF:2007]_.
    
    """
    N, Na = X.shape
    
    y = np.array(y).reshape((N,1))
    if dy is None:
        dy = 1
    dy = np.asarray(dy)
    if dy.shape is not ():
        dy = np.array(dy).reshape((N,1))
    if np.min(dy) == 0.0:
        dy += 1e-20

    A = X/dy

    y = y.ravel()
    dy = dy.ravel()

    Y = y/dy
    U, d, Vt = np.linalg.svd(A, full_matrices=False)
    V = Vt.T
    if N < Na:
        raise NotImplemented("Needs to be tested...")
        # Pad missing entries with large values:
        dU = np.ones(Na,float)/np.finfo(float).eps
        dU[:N] = 1./d*np.dot(U.T,Y)
        a = np.dot(V.T,dU)
        d2inv = np.ones(Na,float)/np.finfo(float).eps**2.0
        d2inv[:N] = 1./d**2
        cov = np.dot(V.T*d2inv, V)
    else:
        a = np.dot(np.dot(Y.T, U/d), V.T)
        cov = np.dot(V/d**2, V.T)

    residuals = np.dot(A,a) - Y
    residuals2 = np.dot(residuals, residuals)
    chi_sq = np.sum(residuals2)
    if N <= Na:
        # Special case since gammainc only works for positive values.
        Q = 0.0
    elif chi_sq == 0.0:
        Q = 1.0
    else:
        Q = 1.0 - scipy.special.gammainc((N-Na)/2.0,chi_sq/2.0)

    return a, cov, Q, residuals

[docs]def Delta(nu, sigma):
    r"""Return $\Delta$ required to estimate the errors for $\nu$
    parameters within a confidence interval of $\sigma$ standard
    deviations.

    .. math::
      \gamma_{q}(\nu/2, \Delta/2) = p(\sigma)
    
    where $p(\sigma)$ is the desired confidence from the normal
    distribution.  I.e. $68.3\%$ for $\sigma = 1$, $90\%$ for
    $\sigma=2$ etc.

    Examples
    --------
    >>> Delta(1, 1)
    1.0
    >>> Delta(1, 2)
    3.9999...
    >>> Delta(2, 1)
    2.295...
    >>> Delta(4, 2)
    9.715...

    """
    p = 2*scipy.stats.distributions.norm.cdf(sigma) - 1
    def f(Delta):
        return scipy.special.gammainc(nu/2.0, Delta/2.0) - p
    
    Delta_max = 1
    while f(Delta_max) < 0:
        Delta_max *= 2
    return scipy.optimize.brentq(f, 0, Delta_max)

[docs]def lsqf(f, x, y, dy, scale_cov=False, return_residuals=False):
    r"""Return `(a, cov, Q, F)` or `(a, cov, Q, F, residuals, chi2r)` for a
    least squares fit of the data `(x, y)` to the function::

      y = a[0]*f[0](x) + a[1]*f[1](x) + ...

    Parameters
    ----------
    f : 1d array of Na functions
    x : 1d array of N parameters
       Abscissa
    y : 1d array of N floats
       Data values
    dy : 1d array of N floats
       Errors in data (assumed to be normally distributed).
    scale_cov : bool, (optional)
       If this is `True`, then the covariance matrix and error
       estimates will be scaled by `chi/sqrt(nu)` where `nu = len(x) -
       len(p)` is the number of degrees of freedom.  This scaling
       allows one to extract parameter estimates even if the errors
       `dy` are not properly estimated.
    return_residuals : bool
       If `True`, then data residuals will be returned.

    Returns
    -------
    a : vector
       Vector of parameters estimates
    cov : matrix
       Covariance matrix
    Q : float
       Estimate of the goodness of the fit.  (`Q` should be between
       0.001 and 0.1.  Larger values indicate that the errors
       have been over-estimated.  Smaller values mean that the
       model is bad, or the errors have been underestimates, or
       the errors are not normally distributed.
    F : function
       This function takes abscissa `x -> (y,dy)` where `y = f(x)` is
       the best fit and dy is the estimated error.  The error
       estimate uses the full covariance matrix.
    residuals : Residuals if `return_residuals` is `True`.
    chi2r : Reduced chi square if `return_residuals` is `True`.

    Examples
    --------
    Generate a bunch of points on the parabola y = x**2-2*x+1 with
    errors and fit them.

    >>> scipy.random.seed(0)
    >>> N = 100
    >>> f = [lambda x:1.0,lambda x:x,lambda x:x**2]
    >>> a0 = [1.0,-2.0,1.0]
    >>> Na = len(a0)
    >>> def F(x,a=a0,f=f):
    ...     y = np.zeros(len(x),float)
    ...     for n in xrange(len(a)):
    ...         y += a[n]*f[n](x)
    ...     return y
    >>> x = scipy.random.uniform(-3,3,N)
    >>> dy = scipy.random.uniform(0,0.1,N)
    >>> y = scipy.random.normal(F(x, a=a0),dy)
    >>> a, cov, Q, F = lsqf(f,x,y,dy)
    >>> assert Q < 0.9 and 0.1 < Q, "Q value not good for data! Q=%f" % Q
    >>> #from pylab import *;ion();clf()
    >>> #errorbar(x,y-F(x, a=a)[0],dy,fmt='.')
    >>> #X = np.linspace(-3,3,1000)
    >>> #Y,dY = F(X, a=a)
    >>> #plot(X,dY,X,-dY)

    References
    ----------
    See [PTVF:2007]_.
    
    """
    Na = len(f)
    N = len(x)
    if isinstance(x,np.ndarray) or np.isscalar(x[0]):
        # Vectorize if x is an array
        x_is_array = True
        x = np.array(x).reshape(N)
    else:
        # Allow for a list of tuples for example...
        x_is_array = False

    X = np.empty((N,Na),float)
    for n in xrange(Na):
        if x_is_array:
            X[:,n] = f[n](x)
        else:
            for m in xrange(N):
                # Explicit loop to allow for general abscissa
                X[m,n] = f[n](x[m])
 
    a, cov, Q, residuals = lsq(X, y, dy)

    chi_sq = np.sum(residuals**2)
    if N <= Na:
        chi_sq_red = np.inf
    else: 
        chi_sq_red = chi_sq/(N - Na)

    if scale_cov:
        cov *= chi_sq_red

    # Compute error estimates
    assert np.allclose(cov, cov.T)
    d, U = np.linalg.eigh(cov); d = np.sqrt(d)
    Ud = np.asarray(U*d)
    def F(x, a=a, f=f, Ud=Ud, sigma=1):
        r"""Returns (y,dy) where y = sum(a[n]*f[n](x)) and dy is the
        error estimate.

        Parameters
        ----------
        x : float or array
          Abscissa: the function is vectorized, so arrays can be passed (as long
          as the functions `f` are vectorized).
        f : [function]
          List of functions.  This can be replaced.  For example, you might
          replace this by a list of the derivatives of ``f[n](x)`` to compute
          the error in the extrapolated derivative.
        a : [float]
          Fit coefficients.  The model is ``sum([a[n]*f[n](x) for n in ...]).
          This should not be changed.
        Ud: matrix
          This should not be changed.
        sigma : float
          This should not be changed.
        """
        try:
            N = len(x)
        except TypeError:
            x = [x]
            N = 1
            
        x = np.array(x)
        y = np.zeros(N,float)
        dy = np.zeros(N,float)
        X = np.empty((N,len(a)),float)
        for n in xrange(len(a)):
            if x_is_array:
                X[:,n] = f[n](x)
                y += a[n]*f[n](x)
            else:
                for m in xrange(N):
                    # Explicit loop to allow for general abscissa
                    X[m,n] = f[n](x[m])
                    y[m] += a[n]*f[n](x[m])
        
        X = np.asarray(X)
        tmp = np.dot(X, Ud)
        dy = np.sqrt(Delta(nu=len(a), sigma=sigma)
                     *np.diag(np.dot(tmp, tmp.T))).reshape(y.shape)
        return y, dy

    if return_residuals:
        return a, cov, Q, F, residuals, chi_sq_red
    else:
        return a, cov, Q, F

[docs]def lsqm(model,x,y,dy,active=None):
    r"""Perform a least squares fit using the functions defined in the
    model dictionary and return a dictionary of results as (value,err)
    pairs."""
    if active is None:
        active = model.keys()
    f = [model[param] for param in active]
    a, cov, Q, F, residuals, chi2r = lsqf(f,x,y,dy,return_residuals=True)
    res = {}
    for n in xrange(len(active)):
        res[active[n]] = a[n] + 1j*np.sqrt(cov[n,n])
    res['residuals'] = residuals
    res['Q'] = Q
    res['Cov'] = cov
    res['F'] = F
    res['chi2r'] = chi2r
    return res

def _test_1(N=100):
    r"""Test a linear fit."""
    f = [lambda x:1.0,
         lambda x:x]
    a0 = [1.0,-2.0]
    Na = len(a0)
    def F(x, a=a0, f=f):
        y = np.zeros(len(x),float)
        for n in xrange(len(a)):
            y += a[n]*f[n](x)
        return y
    scipy.random.seed(0)
    x = scipy.random.uniform(-3,3,N)
    dy = scipy.random.uniform(0,0.1,N)
    y = scipy.random.normal(F(x, a=a0),dy)
    a, cov, Q, F = lsqf(f,x,y,dy)
    print("Q=%f, a=%s, err=%s"%(Q,str(a),str(np.sqrt(np.diag(cov)))))
    assert Q < 0.9 and 0.1 < Q, "Q value not good for data! Q=%f" % Q
    #from pylab import *;ion();clf()
    #errorbar(x,y-F(x, a=a)[0],dy,fmt='.')
    X = np.linspace(-3,3,1000)
    Y,dY = F(X, a=a)
    #plot(X,dY,X,-dY)
    return Q, a, np.sqrt(np.diag(cov))

def _test_linear(N=6, M=200, plot=False):
    r"""Linear least squares problem.

    Here we generate some data to test the linear version.  We apply
    this to the polynomial fitting problem.
    """
    fs = [lambda x:x*x*x, lambda x:x*x, lambda x:x, lambda x:1.0 + 0*x]

    def f(x, p):
        return sum(f_(x)*p_ for (f_, p_) in zip(fs, p))

    def df(x, p):
        return np.vstack([f_(x) for f_ in fs])
    
    x = np.linspace(0, 3, N)
    p0 = [4,3,2,1]
    y0 = f(x, p0)
    
    np.random.seed(0)
    dy = np.random.rand(N)/10 + 0.01
    y = y0 + np.random.normal(0, dy)

    p, cov, Q, F = lsqf(fs, x, y, dy)
    p1, dp1, Q1, cov1, F1 = leastsq(f=f, p0=p0, x=x, y=y, dy=dy, df=df)
    
    assert np.allclose(p, p1)
    assert np.allclose(cov, cov1)

    p_ = []
    cov_ = []
    Q_ = []
    chi2_ = []
    for i in xrange(M):
        y = y0 + np.random.normal(0, dy)
        p, cov, Q, F = lsqf(fs, x=x, y=y, dy=dy)
        chi2_.append(np.sum(((y - f(x, p))/dy)**2))
        #p, dp, Q, cov, F = leastsq(f=f, p0=p0, x=x,
        #                           y=y0 + np.random.normal(0, dy),
        #                           dy=dy, df=df)
        p_.append(p)
        Q_.append(Q)
        cov_.append(cov.ravel())
    
    p_ = np.array(p_)
    chi2_ = np.array(chi2_)
    cov_ = np.array(cov_)
    cov = cov_.mean(axis=0).reshape(cov.shape)
    dcov = cov_.std(axis=0).reshape(cov.shape)
    
    if plot:
        for i in xrange(3):
            plt.hist(p_[:,i], np.sqrt(M), normed=True)
            _x = np.linspace(*(plt.axis()[:2] + (100,)))
            plt.plot(_x, sp.stats.norm.pdf(_x, p0[i], dp0[i]), ':')
            plt.plot(_x, sp.stats.norm.pdf(_x, p0[i], err_.mean(axis=0)[i]))
    return x, y_, f
    
    

def _test_nl(N=100, M=1000, plot=False):
    r"""Nonlinear test problem.

    Here we fit the following function to some data with errors:

    .. math::
       f(x) &= \sqrt{\left(\frac{x^2}{2m} - \mu\right)^2 + \Delta^2},\\
       f_{,m}(x) &= -\frac{x^2}{2m^2f(x)}\left(\frac{x^2}{2m} - \mu\right),\\
       f_{,\mu}(x) &= -\frac{1}{f(x)}\left(\frac{x^2}{2m} - \mu\right),\\
       f_{,\Delta}(x) &= \frac{\Delta}{f(x)},\\

    First we generate `N` points of data with Gaussian errors, then we
    fit and extract the parameters.  We do this `M` times and bin the
    results to show the distribution.

    Note that the error estimates are consistent with the distributed
    errors, but not with the original parameter distribution.  I am
    not quite sure why...  The latter discrepancy may be due to the
    non-linearity in the model, but linear models do not fare any better.
    """    
    from numpy.random import normal as rand
    if plot:
        import matplotlib.pyplot as plt

    # First define the function to be modelled using sympy so we can
    # compute derivatives.
        import sympy
    x, m, mu, Delta = sympy.symbols(['x', 'm', 'mu', 'Delta'])
    f_expr = sympy.sqrt((x**2/2/m - mu)**2 + Delta**2)

    _d = {}
    exec "def f(x, m, mu, Delta): return %s" % str(f_expr) in  _d
    f = _d['f']
    
    # These are the parameters and the standard deviations
    m0, dm0 = 2.0, 0.02
    mu0, dmu0 = 3.0, 0.04
    Delta0, dDelta0 = 4.0, 0.06
    p0 = [m0, mu0, Delta0]
    dp0 = [dm0, dmu0, dDelta0]

    
    # Generate a set of abscissa
    x = np.linspace(0, 3, N)

    # Generate the random parameters with a normal distribution
    m_ = rand(m0, dm0, (N, M))
    mu_ = rand(mu0, dmu0, (N, M))
    Delta_ = rand(Delta0, dDelta0, (N, M))
 
    # Broadcast the parameters 
    y_ = f(x[:, None], m_, mu_, Delta_)

    # Check the distribution of the errors.  Here are the mean and std
    # from the data:
    y_avg, y_std = y_.mean(axis=1), y_.std(axis=1)
    
    # And the predicted values using linear response
    args = dict(x=x, m=m0, mu=mu0, Delta=Delta0)
    #y0 = eval(str(f_expr), args)
    dy0 = np.sqrt(eval(str((f_expr.diff('m')*dm0)**2 +
                           (f_expr.diff('mu')*dmu0)**2 +
                           (f_expr.diff('Delta')*dDelta0)**2), args))
    
    # Check that these are consistent with zero.
    assert abs((y_std - dy0).mean()/(y_std - dy0).std() < 0.2)

    # Now use the data to generate a set of fits
    def fn(x, params):
        m, mu, Delta = params
        return f(x, m, mu, Delta)

    p_ = []
    err_ = []
    Q_ = []
    for i in xrange(M):
        p, err, Q, cov = leastsq(fn, p0, x, y_[:,i], dy0)
        p_.append(p)
        Q_.append(Q)
        err_.append(err)
    
    p_ = np.array(p_)
    err_ = np.array(err_)
    if plot:
        for i in xrange(3):
            plt.hist(p_[:,i], np.sqrt(M), normed=True)
            _x = np.linspace(*(plt.axis()[:2] + (100,)))
            plt.plot(_x, sp.stats.norm.pdf(_x, p0[i], dp0[i]), ':')
            plt.plot(_x, sp.stats.norm.pdf(_x, p0[i], err_.mean(axis=0)[i]))
    return x, y_, f

    #ion();clf()
    #errorbar(p,y,dy);
    #import pdb;pdb.set_trace()
    #p, err, Q, cov = leastsq(fn, [m, mu, Delta], x, y, dy)
Navigation

Quick search

Source code for mmf.fit.lsq

Navigation