Source code for mmf.fit.lsq

r"""Least squares fitting with error estimates.

Linear Least Squares
====================
The simplest least-squares is the linear least squares approach which
involves fitting the coefficients $a_{n}$ in an expansion of the form:

.. math::
   y = \sum a_n f_{n}(x).

To view this as a linear algebra problem, define the matrix

.. math::
   \mat{X}_{m,n} = f_{n}(x_{m})

One can then write the problem as a standard least-squares problem of
minimizing the $\chi^2$ fitness measure over the vector of parameters
$\mat{a}$:

.. math::
   \chi^2 = \norm{\mat{\delta_y}^{-1}(\mat{y} - \mat{X}\cdot \mat{a})}_{2}^2.

Error Estimates
===============
.. warning:: The error estimates discussed here are only valid under the strict
   assumption that the errors in the input data are normally distributed.  In
   this case -- and when the model is linear over the region of variation --
   then the covariance matrix $C$ may be used to provide confidence estimates as
   discussed below.

   For a detailed discussion, please see [PTVF:2007]_ where they discuss
   that, if you have non-normal errors, then you may still minimize the $\chi^2$
   to fit your parameters, and use the covariance matrix $\mat{C}$ to determine
   the confidence contours.  You *may not* estimate which confidence interval
   the contours correspond to.  To get this estimate, you must perform a Monte
   Carlo simulation using the proper distribution function.

The following discussion assumes that the input data has normally distributed
errors.

Single Parameter Confidence Interval
++++++++++++++++++++++++++++++++++++

When considering a single parameter on its own, the error returned represents
the $1\sigma$ confidence interval (for normally distributed data) for that
parameter assuming that the additional parameters are chosen to minimize
$\chi^2$.  We provide two types of error and covariance estimates here depending
on the value of the `scale_cov` argument:

1) If `scale_cov = False`, then the error estimates describe the standard
   deviation of the distribution of the fitted parameters from a set of data
   with uncorrelated normal errors with the prescribed standard deviations `dy`.
   If the model is poor, or the errors are underestimated (very low quality of
   fit `Q`) , then this will give very small error (unreasonable) estimates for
   the parameters.

2) If `scale_cov = True`, then the covariance matrix is scaled by the
   reduced chi squared $\chi^2_{R} = \chi^2/\nu$ where $\nu =
   N - p$ is the number of degrees of freedom ($N$ data points and $p$
   parameters).  This is equivalent to the previous case if
   $\chi^2_{R} = 1$, and so can be thought of as scaling the
   error estimates `dy` to make the model fit.  In essence, this
   provides a much more reasonable estimate of the parameters
   as a curve fitting problem, but will not have the proper
   interpretation as in the prior case.  This is what the :mod:`scipy`
   documentation means when it states: "This matrix must be multiplied by the
   residual standard deviation to get the covariance of the parameter
   estimates" in the documentation of :func:`scipy.optimize.leastsq`. 

   .. note::
   
      One can estimate the `scale_cov = True` errors after the fact by simply
      scaling them by simply multiplying the covariance matrix by the reduced
      chi-squared $\mat{C} \rightarrow \mat{C}\chi^2_{R}$.  Likewise, the errors
      can be scaled by $\delta_a \rightarrow \delta_a \sqrt{\chi^2_{R}}$.

In short, if the error estimates `dy` are trusted, then do not scale
the covariance matrix, and take the quality of the fit `Q` seriously
and rejecting the parameter estimates if it is too small.  On the
other hand, if the errors are not known, or one is imply trying to
visually fit the data, then one should use the scaled covariance
matrix.

If you need different confidence bands, then note that the error is related to
the diagonal of the covariance matrix via:

.. math::
   \delta a_0 = \pm \sqrt{\Delta C_{00}}.

One can compute $\Delta$ for the appropriate confidence level using
:func:`Delta`.  Note that for a single parameter $\nu=1$ and the standard
$1\sigma$ ($68.3\%$) confidence interval, $\Delta = 1$::

    >>> Delta(nu=1.0,sigma=1.0)
    1.0

Thus, you can simply multiply the error by $\sqrt{\Delta}$ to find
the different confidence interval.

Multiple Parameter Confidence Intervals
+++++++++++++++++++++++++++++++++++++++

The previous characterization of errors -- and that returned in the parameter
error estimates -- is valid when each parameter is considered independently
under the assumption that the other parameters are adjusted to minimize
$\chi^2$. It is in general not valid when considering a group of parameters.
Instead, the confidence region of multiple parameters is in general described by
an ellipsoid described by the following procedure (see [PTVF:2007]_):

1) Determine $\Delta$ such that $\Gamma_{q}(\nu/2, \Delta/2) = 1-p$
   where $p = 68.3\%$ for one standard deviation, $p = 95.4\%$ for two
   standard deviations etc.  Here $\nu$ is the number of parameters.
2) The confidence ellipse is:

   .. math::
      \vect{\delta}_a^T\cdot \mat{c}^{-1} \cdot \vect{\delta}_{a} = \Delta

   where $\mat{c}$ is the block of the total covariance matrix
   containing the corresponding parameters of interest, and
   $\vect{\delta}_{a}$ is the vector of the errors in those
   parameters.  This analysis assumes that the other parameters are
   adjusted to minimize $\chi^2$ as these chosen parameters are
   varied and as a result, is slightly smaller than the extremal
   bounds of the confidence ellipse.

We present the result of this here in terms of a non-linear least
squares formula.  Suppose that $y = f(x, \vect{a})$, then within the
error ellipse we have

.. math::
   y = f(x, \vect{a} + \vect{\delta}_{a}) 
     \approx f + \partial_{\vect{a}}f\cdot \vect{\delta}_{a}.

If we diagonalize $\mat{c} = \mat{U}\mat{d}^2\mat{U}^{T}$ where
$\mat{d}$ is diagonal, then we can write the maximum value of $y$ as

.. math::
   f + \sqrt{\Delta} \max_{\norm{\vect{q}} = 1}
                     \partial_{\vect{a}}f\cdot \mat{d}\mat{U}\vect{q}.

Thus, we simply find the maximum and minimum components of the vector
$\partial_{\vect{a}}f\cdot \mat{d}\mat{U}$ and we have the range of
errors in $y$.  For the linear model $\partial_{a_{n}}f = f_{n}$.

Example
+++++++
Here we demonstrate this with a polynomial fit with data sampled with a Gaussian
distribution.  We generate the data with a fixed standard deviation, but first
underestimate the errors.  We then apply the `scale_cov=True` and show that this
is consistent with a proper error estimate.

.. plot::
   :width: 100%
   :include-source:

   plt.figure(figsize=[16,6])
   plt.subplot(121)

   import mmf.fit.lsq
   np.random.seed(0)
   fs = [lambda x:1, lambda x:x, lambda x:x*x]
   def f(x, p): return fs[0](x)*p[0] + fs[1](x)*p[1] + fs[2](x)*p[2]
   x = np.linspace(0.3, 1.3, 10); p0 = [1, 2, 3]; y0 = f(x, p0)
   X = np.linspace(-0.1, 1.5, 100)

   dy0 = 2.0                 # Large scatter in points
   y = y0 + np.random.normal(0, dy0, len(y0))
   dy = 0.5                  # Underestimate of errors

   plt.subplot(121)
   (p, cov, Q, F) = mmf.fit.lsq.lsqf(fs, x, y, dy=dy, scale_cov=False)

   plt.errorbar(x, y, 0*y + dy, fmt='.b', ecolor='b')
   plt.errorbar(0, p[0], np.sqrt(cov[0,0]), fmt='dk')
   plt.plot(X, F(X, a=p)[0], '-k', 
            X, F(X, a=p)[0] + F(X, a=p)[1], '--b',
            X, F(X, a=p)[0] - F(X, a=p)[1], '--b')
   plt.title("Q=%g" % Q)
   plt.axis((-0.1,1.5,0,15))

   plt.subplot(122)
   (p, cov, Q, F) = mmf.fit.lsq.lsqf(fs, x, y, dy=dy, scale_cov=True)

   plt.errorbar(x, y, 0*y + dy0, fmt='.g', ecolor='g')
   plt.errorbar(0, p[0], np.sqrt(cov[0,0]), fmt='dk')
   plt.plot(X, F(X, a=p)[0], '-k', 
            X, F(X, a=p)[0] + F(X, a=p)[1], ':g',
            X, F(X, a=p)[0] - F(X, a=p)[1], ':g')

   (p, cov, Q, F) = mmf.fit.lsq.lsqf(fs, x, y, dy=dy0, scale_cov=False)
   plt.errorbar(0, p[0], np.sqrt(cov[0,0]), fmt='dk')
   plt.plot(X, F(X, a=p)[0], '-k', 
            X, F(X, a=p)[0] + F(X, a=p)[1], '--g',
            X, F(X, a=p)[0] - F(X, a=p)[1], '--g')
   plt.title("Q=%g" % Q)

   plt.axis((-0.1,1.5,0,15))

As an example, here is a plot of some data scattered with a standard
deviation of `2` but fit with underestimated errors of `0.5` (left).
The quality of this fit is extremely poor (`Q=0`) and the error bounds (blue
dashed curves) are correspondingly underestimated.  This is the type
of fluctuation that the fitted curve would exhibit if the data
fluctuated with the smaller standard deviation, but does not describe
the scatter in the data.  If the error bars are trusted here and the
errors are normally distributed etc. then one would have to conclude
that the model is not very good.

On the right we consider the same fit, but with `scale_cov = True` (green dotted
curves) which much better captures the visual scatter of the data.  In fact,
this agrees with the error bounds (green dotted curves) that are obtained by
setting the appropriate standard deviation `dy = 2` (green error bars) with
`scale_cov = False`.  In this case the quality of the fit `Q = 0.4` is good and
the error bound agree well "visually" with the scatter in the data.  This shows
how -- if one has normally distributed errors of the same size -- one can use
`scale_cov=True` to estimate the errors.

Finally, since this is a polynomial model, one might think that the error in the
intercept would be simply the error in the parameter $a_0$ (shown as a black
error bar with a diamond at $x=0$).  This is the $1\sigma$ confidence interval
for the single parameter $a_0$ when all other parameters are adjusted to
minimize the $\chi^2$.  This corresponds to $\Delta(\sigma=1, \nu=1)=1$.  The
error band shown in the previous plot corresponds to the joint error keeping all
three parameters fixed $\sqrt{\Delta(\sigma=1, \nu=3)}=1.88\cdots$ which is the
larger ellipse shown below.  This is a projection of the $1\sigma$ "joint"
distribution ellipsoid for all three parameters.

.. plot::
   :width: 100%
   :include-source:

   plt.figure(figsize=[16,6])

   import mmf.fit.lsq
   np.random.seed(0)
   fs = [lambda x:1, lambda x:x, lambda x:x*x]
   def f(x, p): return fs[0](x)*p[0] + fs[1](x)*p[1] + fs[2](x)*p[2]
   x = np.linspace(0.3, 1.3, 10); p0 = [1, 2, 3]; y0 = f(x, p0)
   X = np.linspace(-0.1, 1.5, 100)

   dy = 2.0
   y = y0 + np.random.normal(0, dy, len(y0))

   (p, cov, Q, F) = mmf.fit.lsq.lsqf(fs, x, y, dy=dy, scale_cov=False)

   # Generate ellipses for 2-parameters a_0 and a_1
   def error_band(Cproj, sigma=1.0, nu=1.0):
       u, d, vt = np.linalg.svd(Cproj)

       Delta = mmf.fit.lsq.Delta(sigma=sigma, nu=nu)
       theta = np.linspace(0, 2.0*np.pi, 100)
       dx = np.sqrt(Delta)*np.exp(1j*theta)
       dp = np.dot(np.dot(u*np.sqrt(d),vt), [dx.real, dx.imag])
       assert np.allclose(Delta, 
                          (np.dot(dp.T, np.linalg.inv(Cproj))*dp.T).sum(axis=1))

       return dp

   plt.subplot(121)
   C01 = cov[:2,:2]
   ls = ['k-', 'g:', 'b--']
   for _l, _nu in enumerate([1,2,3]):
       dp = error_band(C01, nu=_nu)
       plt.plot(p[1] + dp[1], p[0] + dp[0], ls[_l])
       Delta = mmf.fit.lsq.Delta(sigma=1.0, nu=_nu)
       plt.axhline(p[0] + np.sqrt(Delta*cov[0,0]), c=ls[_l][0], ls=ls[_l][1])
       plt.axhline(p[0] - np.sqrt(Delta*cov[0,0]), c=ls[_l][0], ls=ls[_l][1])
       
   plt.xlabel("a_1")
   plt.ylabel("a_0")

   plt.subplot(122)
   C02 = cov[[0,2],:][:,[0,2]]
   for _l, _nu in enumerate([1,2,3]):
       dp = error_band(C02, nu=_nu)
       plt.plot(p[2] + dp[1], p[0] + dp[0], ls[_l])
       Delta = mmf.fit.lsq.Delta(sigma=1.0, nu=_nu)
       plt.axhline(p[0] + np.sqrt(Delta*cov[0,0]), c=ls[_l][0], ls=ls[_l][1])
       plt.axhline(p[0] - np.sqrt(Delta*cov[0,0]), c=ls[_l][0], ls=ls[_l][1])

   plt.xlabel("a_2")
   plt.ylabel("a_0")

The functions defined here perform linear least-squares fitting and
non-linear fitting, using :func:`scipy.optimize.leastsq`.


References
==========

.. [PTVF:2007] William H. Press, Saul A. Teukolsky, William T. Vetterling, and
   Brian P. Flannery, "Numerical Recipes: The Art of Scientific Computing",
   Cambridge University Press, (2007)

"""
from __future__ import division
__all__ = ['leastsq',
           'lsq', 
           'lsqf',
           'lsqm',
           'Delta',
           ]

import math

import numpy as np
import scipy as sp
import scipy.special
import scipy.optimize
import scipy.stats

[docs]def leastsq(f, p0, x, y, dy=None, df=None, scale_cov=False, epsfcn=0.0, factor=100, ftol=1e-8, diag=None): r"""Return `(p, dp, Q, cov)` (or `(p, dp, Q, cov, F)` if `df` is provided) for a a least-square fit of the function `y = f(x, p)` for the parameters `p`. Parameters ---------- f : function This should be a vectorized function `f(x, p)`. A common example for polynomial fitting is:: def f(x, p): return np.polyval(p, x) df : function, (optional) This should return the derivatives of `f` with respect to the parameters as a column vector (broadcast as `x[None, :]` and `p[:,None]`), for example, for polynomial fitting:: def df(x, p): return x**np.arange(len(p))[::-1,None] p0 : array Initial guess for parameters. x, y : array This is the data dy : array, (optional) The errors. These should be the standard deviations for the values `y`. (The whole analysis is predicated on the assumption of Gaussian errors.) scale_cov : bool, (optional) If this is `True`, then the covariance matrix and error estimates will be scaled by `chi/sqrt(nu)` where `nu = len(x) - len(p)` is the number of degrees of freedom. This scaling allows one to extract parameter estimates even if the errors `dy` are not properly estimated. .. note:: We do not use Bessel's correction here (which would count the number of degrees of freedom as `nu = len(x) - len(p) -1`). This might lead to a slight bias in the error estimates but allows the code to work in plausible cases such as fitting two-points with one parameter. We follow [PTVF:2007]_ here, not Wikipedia for example. Returns ------- p : array Parameter estimates (mean values). dp : array Estimate of the errors in parameters assuming that they are independent. Note that the actual 1-sigma confidence region for the parameters is the ellipsoid .. math:: \vect{\delta}_{p}^{T} \cdot \mat{c} \vect{\delta}_{p} = 1. The meaning of these errors are that if a set of data is generated with uncorrelated normally distributed errors in `y` with standard deviation `dy` about mean values `f(x, p)`, then the estimates for the parameters `p` resulting from the fits will be approximately normally distributed with standard deviation `dp`. cov : array Two-dimensional covariance matrix $\mat{c}$. This defines the 1-sigma confidence interval if the errors are normally distributed. Q : float Quality of fit (should be between 0.01 and 0.1). If errors are not specified or are inaccurate, then this is meaningless. Examples -------- We start with a linear example of polynomial fitting as discussed above: >>> def f(x, p): ... return np.polyval(p, x) >>> def df(x, p): ... return x**np.arange(len(p))[::-1,None] Here are our parameters `f = 1 + 2*x + 3*x**2` and some errors: >>> p0 = [3, 2, 1] >>> dp0 = [0.03, 0.02, 0.01] Let's generate some normally distributed data over an abscissa with some errors `dy` that we can estimate using standard error analysis: >>> np.random.seed(1) >>> x = np.linspace(0, 3, 50) >>> dy = np.sqrt(((df(x, p0)*np.array(dp0)[:,None])**2).sum(axis=0)) >>> y = f(x, p0) + np.random.normal(0, dy, len(x)) Now we do the fit: we try with and without `df` to ensure that both are the same: >>> p,dp,Q,cov = leastsq(f=f,p0=[0,0,0],x=x,y=y,dy=dy) >>> p1,dp1,Q1,cov1,F = leastsq(f=f,p0=[0,0,0],x=x,y=y,dy=dy,df=df) >>> (np.allclose([p, dp], [p1, dp1]), np.allclose(cov, cov1)) (True, True) Providing the derivative allows us to reduce the number of function calls though. Now we generate a bunch of data and tabulate the fitted parameter values to see if the error estimates are reasonable: >>> P_ = np.array([ ... leastsq(f=f, p0=p, x=x, dy=dy, df=df, ... y = f(x, p0) + np.random.normal(0, dy))[0] ... for n in xrange(100)]) >>> P_.mean(axis=0) array([ 3.00..., 2.00..., 0.99...]) >>> P_.std(axis=0) array([ 0.012..., 0.022..., 0.0058...]) This agrees with the previously calculated errors of the fit: >>> dp array([ 0.012..., 0.020..., 0.0058...]) .. note:: The precise meaning of this error estimate is to describe the standard deviation of the fitted parameter values give independent normally distributed errors in the data with standard deviation `dy`. These are not estimates of the distribution of the original parameters which would be determined directly from the errors in the data: >>> np.sqrt(np.dot(np.linalg.pinv((df(x, p0).T**2)), dy**2)) array([ 0.03, 0.02, 0.01]) This relationship between `dy` and the distribution of the original parameters has nothing to do with the fit. Were we to provide more data for the fit, the error in the parameters `dp` would decrease: we determining the mean of the parameter distribution more and more precisely. We also consider the extrapolation error using the model with the fitted parameters >>> x0 = 5.0 # Extrapolate to here >>> y0_ = np.array([ ... f(x0, leastsq(f=f, p0=p, x=x, dy=dy, df=df, ... y = f(x, p0) + np.random.normal(0, dy))[0]) ... for n in xrange(200)]) >>> y0_.mean(), y0_.std() (86.00..., 0.227...) >>> F(x0) (86.18..., 0.220...) This agrees with the previously calculated errors of the fit: >>> dp array([ 0.012..., 0.020..., 0.0058...]) Now we consider a non-linear model: .. math:: f(p) = \sqrt{\left(\frac{p^2}{2m} - \mu\right)^2 + \Delta^2} First we generate some data, then we fit and extract the parameters >>> import numpy as np >>> from numpy import linspace, array, mean, std, ones, sqrt >>> from numpy.random import normal as rand >>> np.random.seed(1) >>> def f(x, m, mu, Delta): ... return sqrt((x**2/2/m-mu)**2+Delta**2) >>> m, dm = 2.0, 0.01 >>> mu, dmu = 2.0, 0.02 >>> Delta, dDelta = 2.0, 0.03 >>> x = linspace(0, 3, 50) >>> def get_y_dy(x, N=1000): ... ys = [f(x0, rand(m,dm), rand(mu,dmu), rand(Delta,dDelta)) for ... x0 in x*ones(N)] ... y = ys[0] ... dy = std(ys) ... return (y,dy) >>> y_dy = [get_y_dy(x0) for x0 in x] >>> y = [ydy[0] for ydy in y_dy] >>> dy = [ydy[1] for ydy in y_dy] >>> #from pylab import errorbar,ion,clf >>> #ion();clf() >>> #errorbar(p,y,dy); >>> #import pdb;pdb.set_trace() >>> def fn(p, params): ... m, mu, Delta = params ... return f(p,m,mu,Delta) >>> p, err, Q, cov = leastsq(fn, [m, mu, Delta], x, y, dy) Is the fit is good? >>> Q 0.6... Yes. This is a Gaussian model and so the fit should be good. In general, if the Gaussian approximation is not perfect, then `Q` may be lower. Values of 0.001 may still be acceptable. We check that the error estimates are also reasonable: >>> abs(p - array([m, mu, Delta]))/err array([ 1.3..., 1.6..., 1.6...]) A brief note about how the errors and covariances scale. If we scale the errors `dy` by a common factor `s`, then the fit `p` remains unchanged while the error-estimates scale by `s` and the covariance matrix scales by `s**2`: >>> s = 0.5 >>> dy1 = s*np.array(dy) >>> p1, err1, Q1, cov1 = leastsq(fn, [m, mu, Delta], x, y, dy1) >>> np.allclose(p, p1) True >>> np.allclose(err1, s*err) True >>> np.allclose(cov1, s**2*cov) True Note that the `Q` value is horrible though because we have made the errors too small. >>> Q1 1...e-16 The argument `scale_cov` scales the errors in this way so that the reduced $\chi^2_R \propto s^{-2} = 1$ would be one, and uses this to estimate the errors. >>> chi2 = (((fn(x, p1) - y)/dy1)**2).sum() >>> chi2_r = chi2/(len(x) - len(p1)) >>> s1 = np.sqrt(chi2_r) >>> p2, err2, Q2, cov2 = leastsq(fn, [m, mu, Delta], x, y, dy1, ... scale_cov=True) >>> np.allclose(err2, err1*s1) True """ x = np.array(x) y = np.array(y) N = len(y) # Number of data points try: Na = len(p0) # Number of parameters except TypeError: Na = 1 if dy is None: dy = y*0.0 + 1.0 else: dy = np.array(dy) def residuals(p, x=x, y=y, dy=dy): return (f(x, p) - y)/dy if df is None: d_residuals = None else: def d_residuals(p, x=x, y=y, dy=dy): return df(x, p)/dy # This sometimes fails even where there is a reasonable answer. # See scipy ticket #984 #def _f(x, *p): # return f(x, p) #p, pcov = scipy.optimize.curve_fit( # _f, xdata=x, ydata=y, p0=p0, sigma=dy, # epsfcn=epsfcn, factor=factor) res = sp.optimize.leastsq(residuals, x0=p0, full_output=1, Dfun=d_residuals, col_deriv=True, epsfcn=epsfcn, factor=factor, ftol=ftol, diag=diag) (p, pcov, infodict, errmsg, ier) = res #return res #fjac = infodict['fjac'] #ipvt = infodict['ipvt'] # Compute the one sigma error estimates from the chi square: chi_sq = (residuals(p)**2).sum() if pcov is None: pcov = np.inf #D = Delta(N - Na, 1) # This does not seem to work. #pcov *= D if N <= Na: # Special case since gammainc only works for positive values. Q = 0.0 pcov = np.inf chi_sq_red = np.inf elif chi_sq == 0.0: Q = 1.0 chi_sq_red = 0.0 else: Q = 1.0 - scipy.special.gammainc((N-Na)/2.0, chi_sq/2.0) chi_sq_red = chi_sq/(N - Na) if scale_cov: pcov *= chi_sq_red shape = np.shape(p0) # Compute independent error estimates in the parameters err = np.reshape(np.sqrt(np.diag(pcov)),shape) p = np.reshape(p, shape) if df is not None: # If so, we can provide an extrapolation estimate. Note that # I have not figured out how to do the sigma extrapolation yet. def F(x, p=p, f=f, df=df, # sigma=1 ): r"""Returns (y,dy) where y = sum(a[n]*f[n](x)) and dy is the error estimate.""" y = f(x, p) J = df(x, p) dy = np.sqrt(np.diag(np.dot(np.dot(J.T, pcov), J))).ravel() if np.asarray(y).shape is (): dy = dy[0] return y, dy return p, err, Q, pcov, F else: return p, err, Q, pcov
[docs]def lsq(X, y, dy=None): r"""Return `(a, cov, Q, residuals)` for a least squares fit of the linear least squares problem .. math:: \min_{a}\norm{X\cdot a - y}^2_2. Parameters ---------- X : 2d array X x Na Abscissa. Least square problem is `norm(dot(X, a) - y)`. y : (N x 1) array of floats Data values. dy : (N x 1) array of floats, optional Errors in data (assumed to be normally distributed). Assumed to be `1` if not provided. Returns ------- a : vector Vector of parameters estimates cov : matrix Covariance matrix Q : float Estimate of the goodness of the fit. (`Q` should be between 0.001 and 0.1. Larger values indicate that the errors have been over-estimated. Smaller values mean that the model is bad, or the errors have been underestimates, or the errors are not normally distributed. References ---------- See [PTVF:2007]_. """ N, Na = X.shape y = np.array(y).reshape((N,1)) if dy is None: dy = 1 dy = np.asarray(dy) if dy.shape is not (): dy = np.array(dy).reshape((N,1)) if np.min(dy) == 0.0: dy += 1e-20 A = X/dy y = y.ravel() dy = dy.ravel() Y = y/dy U, d, Vt = np.linalg.svd(A, full_matrices=False) V = Vt.T if N < Na: raise NotImplemented("Needs to be tested...") # Pad missing entries with large values: dU = np.ones(Na,float)/np.finfo(float).eps dU[:N] = 1./d*np.dot(U.T,Y) a = np.dot(V.T,dU) d2inv = np.ones(Na,float)/np.finfo(float).eps**2.0 d2inv[:N] = 1./d**2 cov = np.dot(V.T*d2inv, V) else: a = np.dot(np.dot(Y.T, U/d), V.T) cov = np.dot(V/d**2, V.T) residuals = np.dot(A,a) - Y residuals2 = np.dot(residuals, residuals) chi_sq = np.sum(residuals2) if N <= Na: # Special case since gammainc only works for positive values. Q = 0.0 elif chi_sq == 0.0: Q = 1.0 else: Q = 1.0 - scipy.special.gammainc((N-Na)/2.0,chi_sq/2.0) return a, cov, Q, residuals
[docs]def Delta(nu, sigma): r"""Return $\Delta$ required to estimate the errors for $\nu$ parameters within a confidence interval of $\sigma$ standard deviations. .. math:: \gamma_{q}(\nu/2, \Delta/2) = p(\sigma) where $p(\sigma)$ is the desired confidence from the normal distribution. I.e. $68.3\%$ for $\sigma = 1$, $90\%$ for $\sigma=2$ etc. Examples -------- >>> Delta(1, 1) 1.0 >>> Delta(1, 2) 3.9999... >>> Delta(2, 1) 2.295... >>> Delta(4, 2) 9.715... """ p = 2*scipy.stats.distributions.norm.cdf(sigma) - 1 def f(Delta): return scipy.special.gammainc(nu/2.0, Delta/2.0) - p Delta_max = 1 while f(Delta_max) < 0: Delta_max *= 2 return scipy.optimize.brentq(f, 0, Delta_max)
[docs]def lsqf(f, x, y, dy, scale_cov=False, return_residuals=False): r"""Return `(a, cov, Q, F)` or `(a, cov, Q, F, residuals, chi2r)` for a least squares fit of the data `(x, y)` to the function:: y = a[0]*f[0](x) + a[1]*f[1](x) + ... Parameters ---------- f : 1d array of Na functions x : 1d array of N parameters Abscissa y : 1d array of N floats Data values dy : 1d array of N floats Errors in data (assumed to be normally distributed). scale_cov : bool, (optional) If this is `True`, then the covariance matrix and error estimates will be scaled by `chi/sqrt(nu)` where `nu = len(x) - len(p)` is the number of degrees of freedom. This scaling allows one to extract parameter estimates even if the errors `dy` are not properly estimated. return_residuals : bool If `True`, then data residuals will be returned. Returns ------- a : vector Vector of parameters estimates cov : matrix Covariance matrix Q : float Estimate of the goodness of the fit. (`Q` should be between 0.001 and 0.1. Larger values indicate that the errors have been over-estimated. Smaller values mean that the model is bad, or the errors have been underestimates, or the errors are not normally distributed. F : function This function takes abscissa `x -> (y,dy)` where `y = f(x)` is the best fit and dy is the estimated error. The error estimate uses the full covariance matrix. residuals : Residuals if `return_residuals` is `True`. chi2r : Reduced chi square if `return_residuals` is `True`. Examples -------- Generate a bunch of points on the parabola y = x**2-2*x+1 with errors and fit them. >>> scipy.random.seed(0) >>> N = 100 >>> f = [lambda x:1.0,lambda x:x,lambda x:x**2] >>> a0 = [1.0,-2.0,1.0] >>> Na = len(a0) >>> def F(x,a=a0,f=f): ... y = np.zeros(len(x),float) ... for n in xrange(len(a)): ... y += a[n]*f[n](x) ... return y >>> x = scipy.random.uniform(-3,3,N) >>> dy = scipy.random.uniform(0,0.1,N) >>> y = scipy.random.normal(F(x, a=a0),dy) >>> a, cov, Q, F = lsqf(f,x,y,dy) >>> assert Q < 0.9 and 0.1 < Q, "Q value not good for data! Q=%f" % Q >>> #from pylab import *;ion();clf() >>> #errorbar(x,y-F(x, a=a)[0],dy,fmt='.') >>> #X = np.linspace(-3,3,1000) >>> #Y,dY = F(X, a=a) >>> #plot(X,dY,X,-dY) References ---------- See [PTVF:2007]_. """ Na = len(f) N = len(x) if isinstance(x,np.ndarray) or np.isscalar(x[0]): # Vectorize if x is an array x_is_array = True x = np.array(x).reshape(N) else: # Allow for a list of tuples for example... x_is_array = False X = np.empty((N,Na),float) for n in xrange(Na): if x_is_array: X[:,n] = f[n](x) else: for m in xrange(N): # Explicit loop to allow for general abscissa X[m,n] = f[n](x[m]) a, cov, Q, residuals = lsq(X, y, dy) chi_sq = np.sum(residuals**2) if N <= Na: chi_sq_red = np.inf else: chi_sq_red = chi_sq/(N - Na) if scale_cov: cov *= chi_sq_red # Compute error estimates assert np.allclose(cov, cov.T) d, U = np.linalg.eigh(cov); d = np.sqrt(d) Ud = np.asarray(U*d) def F(x, a=a, f=f, Ud=Ud, sigma=1): r"""Returns (y,dy) where y = sum(a[n]*f[n](x)) and dy is the error estimate. Parameters ---------- x : float or array Abscissa: the function is vectorized, so arrays can be passed (as long as the functions `f` are vectorized). f : [function] List of functions. This can be replaced. For example, you might replace this by a list of the derivatives of ``f[n](x)`` to compute the error in the extrapolated derivative. a : [float] Fit coefficients. The model is ``sum([a[n]*f[n](x) for n in ...]). This should not be changed. Ud: matrix This should not be changed. sigma : float This should not be changed. """ try: N = len(x) except TypeError: x = [x] N = 1 x = np.array(x) y = np.zeros(N,float) dy = np.zeros(N,float) X = np.empty((N,len(a)),float) for n in xrange(len(a)): if x_is_array: X[:,n] = f[n](x) y += a[n]*f[n](x) else: for m in xrange(N): # Explicit loop to allow for general abscissa X[m,n] = f[n](x[m]) y[m] += a[n]*f[n](x[m]) X = np.asarray(X) tmp = np.dot(X, Ud) dy = np.sqrt(Delta(nu=len(a), sigma=sigma) *np.diag(np.dot(tmp, tmp.T))).reshape(y.shape) return y, dy if return_residuals: return a, cov, Q, F, residuals, chi_sq_red else: return a, cov, Q, F
[docs]def lsqm(model,x,y,dy,active=None): r"""Perform a least squares fit using the functions defined in the model dictionary and return a dictionary of results as (value,err) pairs.""" if active is None: active = model.keys() f = [model[param] for param in active] a, cov, Q, F, residuals, chi2r = lsqf(f,x,y,dy,return_residuals=True) res = {} for n in xrange(len(active)): res[active[n]] = a[n] + 1j*np.sqrt(cov[n,n]) res['residuals'] = residuals res['Q'] = Q res['Cov'] = cov res['F'] = F res['chi2r'] = chi2r return res
def _test_1(N=100): r"""Test a linear fit.""" f = [lambda x:1.0, lambda x:x] a0 = [1.0,-2.0] Na = len(a0) def F(x, a=a0, f=f): y = np.zeros(len(x),float) for n in xrange(len(a)): y += a[n]*f[n](x) return y scipy.random.seed(0) x = scipy.random.uniform(-3,3,N) dy = scipy.random.uniform(0,0.1,N) y = scipy.random.normal(F(x, a=a0),dy) a, cov, Q, F = lsqf(f,x,y,dy) print("Q=%f, a=%s, err=%s"%(Q,str(a),str(np.sqrt(np.diag(cov))))) assert Q < 0.9 and 0.1 < Q, "Q value not good for data! Q=%f" % Q #from pylab import *;ion();clf() #errorbar(x,y-F(x, a=a)[0],dy,fmt='.') X = np.linspace(-3,3,1000) Y,dY = F(X, a=a) #plot(X,dY,X,-dY) return Q, a, np.sqrt(np.diag(cov)) def _test_linear(N=6, M=200, plot=False): r"""Linear least squares problem. Here we generate some data to test the linear version. We apply this to the polynomial fitting problem. """ fs = [lambda x:x*x*x, lambda x:x*x, lambda x:x, lambda x:1.0 + 0*x] def f(x, p): return sum(f_(x)*p_ for (f_, p_) in zip(fs, p)) def df(x, p): return np.vstack([f_(x) for f_ in fs]) x = np.linspace(0, 3, N) p0 = [4,3,2,1] y0 = f(x, p0) np.random.seed(0) dy = np.random.rand(N)/10 + 0.01 y = y0 + np.random.normal(0, dy) p, cov, Q, F = lsqf(fs, x, y, dy) p1, dp1, Q1, cov1, F1 = leastsq(f=f, p0=p0, x=x, y=y, dy=dy, df=df) assert np.allclose(p, p1) assert np.allclose(cov, cov1) p_ = [] cov_ = [] Q_ = [] chi2_ = [] for i in xrange(M): y = y0 + np.random.normal(0, dy) p, cov, Q, F = lsqf(fs, x=x, y=y, dy=dy) chi2_.append(np.sum(((y - f(x, p))/dy)**2)) #p, dp, Q, cov, F = leastsq(f=f, p0=p0, x=x, # y=y0 + np.random.normal(0, dy), # dy=dy, df=df) p_.append(p) Q_.append(Q) cov_.append(cov.ravel()) p_ = np.array(p_) chi2_ = np.array(chi2_) cov_ = np.array(cov_) cov = cov_.mean(axis=0).reshape(cov.shape) dcov = cov_.std(axis=0).reshape(cov.shape) if plot: for i in xrange(3): plt.hist(p_[:,i], np.sqrt(M), normed=True) _x = np.linspace(*(plt.axis()[:2] + (100,))) plt.plot(_x, sp.stats.norm.pdf(_x, p0[i], dp0[i]), ':') plt.plot(_x, sp.stats.norm.pdf(_x, p0[i], err_.mean(axis=0)[i])) return x, y_, f def _test_nl(N=100, M=1000, plot=False): r"""Nonlinear test problem. Here we fit the following function to some data with errors: .. math:: f(x) &= \sqrt{\left(\frac{x^2}{2m} - \mu\right)^2 + \Delta^2},\\ f_{,m}(x) &= -\frac{x^2}{2m^2f(x)}\left(\frac{x^2}{2m} - \mu\right),\\ f_{,\mu}(x) &= -\frac{1}{f(x)}\left(\frac{x^2}{2m} - \mu\right),\\ f_{,\Delta}(x) &= \frac{\Delta}{f(x)},\\ First we generate `N` points of data with Gaussian errors, then we fit and extract the parameters. We do this `M` times and bin the results to show the distribution. Note that the error estimates are consistent with the distributed errors, but not with the original parameter distribution. I am not quite sure why... The latter discrepancy may be due to the non-linearity in the model, but linear models do not fare any better. """ from numpy.random import normal as rand if plot: import matplotlib.pyplot as plt # First define the function to be modelled using sympy so we can # compute derivatives. import sympy x, m, mu, Delta = sympy.symbols(['x', 'm', 'mu', 'Delta']) f_expr = sympy.sqrt((x**2/2/m - mu)**2 + Delta**2) _d = {} exec "def f(x, m, mu, Delta): return %s" % str(f_expr) in _d f = _d['f'] # These are the parameters and the standard deviations m0, dm0 = 2.0, 0.02 mu0, dmu0 = 3.0, 0.04 Delta0, dDelta0 = 4.0, 0.06 p0 = [m0, mu0, Delta0] dp0 = [dm0, dmu0, dDelta0] # Generate a set of abscissa x = np.linspace(0, 3, N) # Generate the random parameters with a normal distribution m_ = rand(m0, dm0, (N, M)) mu_ = rand(mu0, dmu0, (N, M)) Delta_ = rand(Delta0, dDelta0, (N, M)) # Broadcast the parameters y_ = f(x[:, None], m_, mu_, Delta_) # Check the distribution of the errors. Here are the mean and std # from the data: y_avg, y_std = y_.mean(axis=1), y_.std(axis=1) # And the predicted values using linear response args = dict(x=x, m=m0, mu=mu0, Delta=Delta0) #y0 = eval(str(f_expr), args) dy0 = np.sqrt(eval(str((f_expr.diff('m')*dm0)**2 + (f_expr.diff('mu')*dmu0)**2 + (f_expr.diff('Delta')*dDelta0)**2), args)) # Check that these are consistent with zero. assert abs((y_std - dy0).mean()/(y_std - dy0).std() < 0.2) # Now use the data to generate a set of fits def fn(x, params): m, mu, Delta = params return f(x, m, mu, Delta) p_ = [] err_ = [] Q_ = [] for i in xrange(M): p, err, Q, cov = leastsq(fn, p0, x, y_[:,i], dy0) p_.append(p) Q_.append(Q) err_.append(err) p_ = np.array(p_) err_ = np.array(err_) if plot: for i in xrange(3): plt.hist(p_[:,i], np.sqrt(M), normed=True) _x = np.linspace(*(plt.axis()[:2] + (100,))) plt.plot(_x, sp.stats.norm.pdf(_x, p0[i], dp0[i]), ':') plt.plot(_x, sp.stats.norm.pdf(_x, p0[i], err_.mean(axis=0)[i])) return x, y_, f #ion();clf() #errorbar(p,y,dy); #import pdb;pdb.set_trace() #p, err, Q, cov = leastsq(fn, [m, mu, Delta], x, y, dy)