r"""Least squares fitting with error estimates.
Linear Least Squares
====================
The simplest least-squares is the linear least squares approach which
involves fitting the coefficients $a_{n}$ in an expansion of the form:
.. math::
y = \sum a_n f_{n}(x).
To view this as a linear algebra problem, define the matrix
.. math::
\mat{X}_{m,n} = f_{n}(x_{m})
One can then write the problem as a standard least-squares problem of
minimizing the $\chi^2$ fitness measure over the vector of parameters
$\mat{a}$:
.. math::
\chi^2 = \norm{\mat{\delta_y}^{-1}(\mat{y} - \mat{X}\cdot \mat{a})}_{2}^2.
Error Estimates
===============
.. warning:: The error estimates discussed here are only valid under the strict
assumption that the errors in the input data are normally distributed. In
this case -- and when the model is linear over the region of variation --
then the covariance matrix $C$ may be used to provide confidence estimates as
discussed below.
For a detailed discussion, please see [PTVF:2007]_ where they discuss
that, if you have non-normal errors, then you may still minimize the $\chi^2$
to fit your parameters, and use the covariance matrix $\mat{C}$ to determine
the confidence contours. You *may not* estimate which confidence interval
the contours correspond to. To get this estimate, you must perform a Monte
Carlo simulation using the proper distribution function.
The following discussion assumes that the input data has normally distributed
errors.
Single Parameter Confidence Interval
++++++++++++++++++++++++++++++++++++
When considering a single parameter on its own, the error returned represents
the $1\sigma$ confidence interval (for normally distributed data) for that
parameter assuming that the additional parameters are chosen to minimize
$\chi^2$. We provide two types of error and covariance estimates here depending
on the value of the `scale_cov` argument:
1) If `scale_cov = False`, then the error estimates describe the standard
deviation of the distribution of the fitted parameters from a set of data
with uncorrelated normal errors with the prescribed standard deviations `dy`.
If the model is poor, or the errors are underestimated (very low quality of
fit `Q`) , then this will give very small error (unreasonable) estimates for
the parameters.
2) If `scale_cov = True`, then the covariance matrix is scaled by the
reduced chi squared $\chi^2_{R} = \chi^2/\nu$ where $\nu =
N - p$ is the number of degrees of freedom ($N$ data points and $p$
parameters). This is equivalent to the previous case if
$\chi^2_{R} = 1$, and so can be thought of as scaling the
error estimates `dy` to make the model fit. In essence, this
provides a much more reasonable estimate of the parameters
as a curve fitting problem, but will not have the proper
interpretation as in the prior case. This is what the :mod:`scipy`
documentation means when it states: "This matrix must be multiplied by the
residual standard deviation to get the covariance of the parameter
estimates" in the documentation of :func:`scipy.optimize.leastsq`.
.. note::
One can estimate the `scale_cov = True` errors after the fact by simply
scaling them by simply multiplying the covariance matrix by the reduced
chi-squared $\mat{C} \rightarrow \mat{C}\chi^2_{R}$. Likewise, the errors
can be scaled by $\delta_a \rightarrow \delta_a \sqrt{\chi^2_{R}}$.
In short, if the error estimates `dy` are trusted, then do not scale
the covariance matrix, and take the quality of the fit `Q` seriously
and rejecting the parameter estimates if it is too small. On the
other hand, if the errors are not known, or one is imply trying to
visually fit the data, then one should use the scaled covariance
matrix.
If you need different confidence bands, then note that the error is related to
the diagonal of the covariance matrix via:
.. math::
\delta a_0 = \pm \sqrt{\Delta C_{00}}.
One can compute $\Delta$ for the appropriate confidence level using
:func:`Delta`. Note that for a single parameter $\nu=1$ and the standard
$1\sigma$ ($68.3\%$) confidence interval, $\Delta = 1$::
>>> Delta(nu=1.0,sigma=1.0)
1.0
Thus, you can simply multiply the error by $\sqrt{\Delta}$ to find
the different confidence interval.
Multiple Parameter Confidence Intervals
+++++++++++++++++++++++++++++++++++++++
The previous characterization of errors -- and that returned in the parameter
error estimates -- is valid when each parameter is considered independently
under the assumption that the other parameters are adjusted to minimize
$\chi^2$. It is in general not valid when considering a group of parameters.
Instead, the confidence region of multiple parameters is in general described by
an ellipsoid described by the following procedure (see [PTVF:2007]_):
1) Determine $\Delta$ such that $\Gamma_{q}(\nu/2, \Delta/2) = 1-p$
where $p = 68.3\%$ for one standard deviation, $p = 95.4\%$ for two
standard deviations etc. Here $\nu$ is the number of parameters.
2) The confidence ellipse is:
.. math::
\vect{\delta}_a^T\cdot \mat{c}^{-1} \cdot \vect{\delta}_{a} = \Delta
where $\mat{c}$ is the block of the total covariance matrix
containing the corresponding parameters of interest, and
$\vect{\delta}_{a}$ is the vector of the errors in those
parameters. This analysis assumes that the other parameters are
adjusted to minimize $\chi^2$ as these chosen parameters are
varied and as a result, is slightly smaller than the extremal
bounds of the confidence ellipse.
We present the result of this here in terms of a non-linear least
squares formula. Suppose that $y = f(x, \vect{a})$, then within the
error ellipse we have
.. math::
y = f(x, \vect{a} + \vect{\delta}_{a})
\approx f + \partial_{\vect{a}}f\cdot \vect{\delta}_{a}.
If we diagonalize $\mat{c} = \mat{U}\mat{d}^2\mat{U}^{T}$ where
$\mat{d}$ is diagonal, then we can write the maximum value of $y$ as
.. math::
f + \sqrt{\Delta} \max_{\norm{\vect{q}} = 1}
\partial_{\vect{a}}f\cdot \mat{d}\mat{U}\vect{q}.
Thus, we simply find the maximum and minimum components of the vector
$\partial_{\vect{a}}f\cdot \mat{d}\mat{U}$ and we have the range of
errors in $y$. For the linear model $\partial_{a_{n}}f = f_{n}$.
Example
+++++++
Here we demonstrate this with a polynomial fit with data sampled with a Gaussian
distribution. We generate the data with a fixed standard deviation, but first
underestimate the errors. We then apply the `scale_cov=True` and show that this
is consistent with a proper error estimate.
.. plot::
:width: 100%
:include-source:
plt.figure(figsize=[16,6])
plt.subplot(121)
import mmf.fit.lsq
np.random.seed(0)
fs = [lambda x:1, lambda x:x, lambda x:x*x]
def f(x, p): return fs[0](x)*p[0] + fs[1](x)*p[1] + fs[2](x)*p[2]
x = np.linspace(0.3, 1.3, 10); p0 = [1, 2, 3]; y0 = f(x, p0)
X = np.linspace(-0.1, 1.5, 100)
dy0 = 2.0 # Large scatter in points
y = y0 + np.random.normal(0, dy0, len(y0))
dy = 0.5 # Underestimate of errors
plt.subplot(121)
(p, cov, Q, F) = mmf.fit.lsq.lsqf(fs, x, y, dy=dy, scale_cov=False)
plt.errorbar(x, y, 0*y + dy, fmt='.b', ecolor='b')
plt.errorbar(0, p[0], np.sqrt(cov[0,0]), fmt='dk')
plt.plot(X, F(X, a=p)[0], '-k',
X, F(X, a=p)[0] + F(X, a=p)[1], '--b',
X, F(X, a=p)[0] - F(X, a=p)[1], '--b')
plt.title("Q=%g" % Q)
plt.axis((-0.1,1.5,0,15))
plt.subplot(122)
(p, cov, Q, F) = mmf.fit.lsq.lsqf(fs, x, y, dy=dy, scale_cov=True)
plt.errorbar(x, y, 0*y + dy0, fmt='.g', ecolor='g')
plt.errorbar(0, p[0], np.sqrt(cov[0,0]), fmt='dk')
plt.plot(X, F(X, a=p)[0], '-k',
X, F(X, a=p)[0] + F(X, a=p)[1], ':g',
X, F(X, a=p)[0] - F(X, a=p)[1], ':g')
(p, cov, Q, F) = mmf.fit.lsq.lsqf(fs, x, y, dy=dy0, scale_cov=False)
plt.errorbar(0, p[0], np.sqrt(cov[0,0]), fmt='dk')
plt.plot(X, F(X, a=p)[0], '-k',
X, F(X, a=p)[0] + F(X, a=p)[1], '--g',
X, F(X, a=p)[0] - F(X, a=p)[1], '--g')
plt.title("Q=%g" % Q)
plt.axis((-0.1,1.5,0,15))
As an example, here is a plot of some data scattered with a standard
deviation of `2` but fit with underestimated errors of `0.5` (left).
The quality of this fit is extremely poor (`Q=0`) and the error bounds (blue
dashed curves) are correspondingly underestimated. This is the type
of fluctuation that the fitted curve would exhibit if the data
fluctuated with the smaller standard deviation, but does not describe
the scatter in the data. If the error bars are trusted here and the
errors are normally distributed etc. then one would have to conclude
that the model is not very good.
On the right we consider the same fit, but with `scale_cov = True` (green dotted
curves) which much better captures the visual scatter of the data. In fact,
this agrees with the error bounds (green dotted curves) that are obtained by
setting the appropriate standard deviation `dy = 2` (green error bars) with
`scale_cov = False`. In this case the quality of the fit `Q = 0.4` is good and
the error bound agree well "visually" with the scatter in the data. This shows
how -- if one has normally distributed errors of the same size -- one can use
`scale_cov=True` to estimate the errors.
Finally, since this is a polynomial model, one might think that the error in the
intercept would be simply the error in the parameter $a_0$ (shown as a black
error bar with a diamond at $x=0$). This is the $1\sigma$ confidence interval
for the single parameter $a_0$ when all other parameters are adjusted to
minimize the $\chi^2$. This corresponds to $\Delta(\sigma=1, \nu=1)=1$. The
error band shown in the previous plot corresponds to the joint error keeping all
three parameters fixed $\sqrt{\Delta(\sigma=1, \nu=3)}=1.88\cdots$ which is the
larger ellipse shown below. This is a projection of the $1\sigma$ "joint"
distribution ellipsoid for all three parameters.
.. plot::
:width: 100%
:include-source:
plt.figure(figsize=[16,6])
import mmf.fit.lsq
np.random.seed(0)
fs = [lambda x:1, lambda x:x, lambda x:x*x]
def f(x, p): return fs[0](x)*p[0] + fs[1](x)*p[1] + fs[2](x)*p[2]
x = np.linspace(0.3, 1.3, 10); p0 = [1, 2, 3]; y0 = f(x, p0)
X = np.linspace(-0.1, 1.5, 100)
dy = 2.0
y = y0 + np.random.normal(0, dy, len(y0))
(p, cov, Q, F) = mmf.fit.lsq.lsqf(fs, x, y, dy=dy, scale_cov=False)
# Generate ellipses for 2-parameters a_0 and a_1
def error_band(Cproj, sigma=1.0, nu=1.0):
u, d, vt = np.linalg.svd(Cproj)
Delta = mmf.fit.lsq.Delta(sigma=sigma, nu=nu)
theta = np.linspace(0, 2.0*np.pi, 100)
dx = np.sqrt(Delta)*np.exp(1j*theta)
dp = np.dot(np.dot(u*np.sqrt(d),vt), [dx.real, dx.imag])
assert np.allclose(Delta,
(np.dot(dp.T, np.linalg.inv(Cproj))*dp.T).sum(axis=1))
return dp
plt.subplot(121)
C01 = cov[:2,:2]
ls = ['k-', 'g:', 'b--']
for _l, _nu in enumerate([1,2,3]):
dp = error_band(C01, nu=_nu)
plt.plot(p[1] + dp[1], p[0] + dp[0], ls[_l])
Delta = mmf.fit.lsq.Delta(sigma=1.0, nu=_nu)
plt.axhline(p[0] + np.sqrt(Delta*cov[0,0]), c=ls[_l][0], ls=ls[_l][1])
plt.axhline(p[0] - np.sqrt(Delta*cov[0,0]), c=ls[_l][0], ls=ls[_l][1])
plt.xlabel("a_1")
plt.ylabel("a_0")
plt.subplot(122)
C02 = cov[[0,2],:][:,[0,2]]
for _l, _nu in enumerate([1,2,3]):
dp = error_band(C02, nu=_nu)
plt.plot(p[2] + dp[1], p[0] + dp[0], ls[_l])
Delta = mmf.fit.lsq.Delta(sigma=1.0, nu=_nu)
plt.axhline(p[0] + np.sqrt(Delta*cov[0,0]), c=ls[_l][0], ls=ls[_l][1])
plt.axhline(p[0] - np.sqrt(Delta*cov[0,0]), c=ls[_l][0], ls=ls[_l][1])
plt.xlabel("a_2")
plt.ylabel("a_0")
The functions defined here perform linear least-squares fitting and
non-linear fitting, using :func:`scipy.optimize.leastsq`.
References
==========
.. [PTVF:2007] William H. Press, Saul A. Teukolsky, William T. Vetterling, and
Brian P. Flannery, "Numerical Recipes: The Art of Scientific Computing",
Cambridge University Press, (2007)
"""
from __future__ import division
__all__ = ['leastsq',
'lsq',
'lsqf',
'lsqm',
'Delta',
]
import math
import numpy as np
import scipy as sp
import scipy.special
import scipy.optimize
import scipy.stats
[docs]def leastsq(f, p0, x, y, dy=None, df=None, scale_cov=False,
epsfcn=0.0, factor=100, ftol=1e-8, diag=None):
r"""Return `(p, dp, Q, cov)` (or `(p, dp, Q, cov, F)` if `df` is
provided) for a a least-square fit of the function `y = f(x, p)`
for the parameters `p`.
Parameters
----------
f : function
This should be a vectorized function `f(x, p)`. A common
example for polynomial fitting is::
def f(x, p):
return np.polyval(p, x)
df : function, (optional)
This should return the derivatives of `f` with respect to the
parameters as a column vector (broadcast as `x[None, :]` and
`p[:,None]`), for example, for polynomial fitting::
def df(x, p):
return x**np.arange(len(p))[::-1,None]
p0 : array
Initial guess for parameters.
x, y : array
This is the data
dy : array, (optional)
The errors. These should be the standard deviations for the
values `y`. (The whole analysis is predicated on the
assumption of Gaussian errors.)
scale_cov : bool, (optional)
If this is `True`, then the covariance matrix and error
estimates will be scaled by `chi/sqrt(nu)` where `nu = len(x) -
len(p)` is the number of degrees of freedom. This scaling
allows one to extract parameter estimates even if the errors
`dy` are not properly estimated.
.. note:: We do not use Bessel's correction here (which would count the
number of degrees of freedom as `nu = len(x) - len(p) -1`). This
might lead to a slight bias in the error estimates but allows the code
to work in plausible cases such as fitting two-points with one
parameter. We follow [PTVF:2007]_ here, not Wikipedia for
example.
Returns
-------
p : array
Parameter estimates (mean values).
dp : array
Estimate of the errors in parameters assuming that they are
independent. Note that the actual 1-sigma confidence region
for the parameters is the ellipsoid
.. math::
\vect{\delta}_{p}^{T} \cdot \mat{c} \vect{\delta}_{p} = 1.
The meaning of these errors are that if a set of data is
generated with uncorrelated normally distributed errors in `y`
with standard deviation `dy` about mean values `f(x, p)`, then
the estimates for the parameters `p` resulting from the fits
will be approximately normally distributed with standard
deviation `dp`.
cov : array
Two-dimensional covariance matrix $\mat{c}$. This defines the
1-sigma confidence interval if the errors are normally
distributed.
Q : float
Quality of fit (should be between 0.01 and 0.1). If errors are
not specified or are inaccurate, then this is meaningless.
Examples
--------
We start with a linear example of polynomial fitting as discussed
above:
>>> def f(x, p):
... return np.polyval(p, x)
>>> def df(x, p):
... return x**np.arange(len(p))[::-1,None]
Here are our parameters `f = 1 + 2*x + 3*x**2` and some errors:
>>> p0 = [3, 2, 1]
>>> dp0 = [0.03, 0.02, 0.01]
Let's generate some normally distributed data over an abscissa
with some errors `dy` that we can estimate using standard error
analysis:
>>> np.random.seed(1)
>>> x = np.linspace(0, 3, 50)
>>> dy = np.sqrt(((df(x, p0)*np.array(dp0)[:,None])**2).sum(axis=0))
>>> y = f(x, p0) + np.random.normal(0, dy, len(x))
Now we do the fit: we try with and without `df` to ensure that
both are the same:
>>> p,dp,Q,cov = leastsq(f=f,p0=[0,0,0],x=x,y=y,dy=dy)
>>> p1,dp1,Q1,cov1,F = leastsq(f=f,p0=[0,0,0],x=x,y=y,dy=dy,df=df)
>>> (np.allclose([p, dp], [p1, dp1]), np.allclose(cov, cov1))
(True, True)
Providing the derivative allows us to reduce the number of
function calls though. Now we generate a bunch of data and
tabulate the fitted parameter values to see if the error estimates
are reasonable:
>>> P_ = np.array([
... leastsq(f=f, p0=p, x=x, dy=dy, df=df,
... y = f(x, p0) + np.random.normal(0, dy))[0]
... for n in xrange(100)])
>>> P_.mean(axis=0)
array([ 3.00..., 2.00..., 0.99...])
>>> P_.std(axis=0)
array([ 0.012..., 0.022..., 0.0058...])
This agrees with the previously calculated errors of the fit:
>>> dp
array([ 0.012..., 0.020..., 0.0058...])
.. note:: The precise meaning of this error estimate is to describe the
standard deviation of the fitted parameter values give
independent normally distributed errors in the data with
standard deviation `dy`. These are not estimates of the
distribution of the original parameters which would be
determined directly from the errors in the data:
>>> np.sqrt(np.dot(np.linalg.pinv((df(x, p0).T**2)), dy**2))
array([ 0.03, 0.02, 0.01])
This relationship between `dy` and the distribution of the
original parameters has nothing to do with the fit. Were we to
provide more data for the fit, the error in the parameters `dp`
would decrease: we determining the mean of the parameter
distribution more and more precisely.
We also consider the extrapolation error using the model with the
fitted parameters
>>> x0 = 5.0 # Extrapolate to here
>>> y0_ = np.array([
... f(x0, leastsq(f=f, p0=p, x=x, dy=dy, df=df,
... y = f(x, p0) + np.random.normal(0, dy))[0])
... for n in xrange(200)])
>>> y0_.mean(), y0_.std()
(86.00..., 0.227...)
>>> F(x0)
(86.18..., 0.220...)
This agrees with the previously calculated errors of the fit:
>>> dp
array([ 0.012..., 0.020..., 0.0058...])
Now we consider a non-linear model:
.. math::
f(p) = \sqrt{\left(\frac{p^2}{2m} - \mu\right)^2 + \Delta^2}
First we generate some data, then we fit and extract the parameters
>>> import numpy as np
>>> from numpy import linspace, array, mean, std, ones, sqrt
>>> from numpy.random import normal as rand
>>> np.random.seed(1)
>>> def f(x, m, mu, Delta):
... return sqrt((x**2/2/m-mu)**2+Delta**2)
>>> m, dm = 2.0, 0.01
>>> mu, dmu = 2.0, 0.02
>>> Delta, dDelta = 2.0, 0.03
>>> x = linspace(0, 3, 50)
>>> def get_y_dy(x, N=1000):
... ys = [f(x0, rand(m,dm), rand(mu,dmu), rand(Delta,dDelta)) for
... x0 in x*ones(N)]
... y = ys[0]
... dy = std(ys)
... return (y,dy)
>>> y_dy = [get_y_dy(x0) for x0 in x]
>>> y = [ydy[0] for ydy in y_dy]
>>> dy = [ydy[1] for ydy in y_dy]
>>> #from pylab import errorbar,ion,clf
>>> #ion();clf()
>>> #errorbar(p,y,dy);
>>> #import pdb;pdb.set_trace()
>>> def fn(p, params):
... m, mu, Delta = params
... return f(p,m,mu,Delta)
>>> p, err, Q, cov = leastsq(fn, [m, mu, Delta], x, y, dy)
Is the fit is good?
>>> Q
0.6...
Yes. This is a Gaussian model and so the fit should be good. In
general, if the Gaussian approximation is not perfect, then `Q`
may be lower. Values of 0.001 may still be acceptable. We check
that the error estimates are also reasonable:
>>> abs(p - array([m, mu, Delta]))/err
array([ 1.3..., 1.6..., 1.6...])
A brief note about how the errors and covariances scale. If we scale the
errors `dy` by a common factor `s`, then the fit `p` remains unchanged while
the error-estimates scale by `s` and the covariance matrix scales by `s**2`:
>>> s = 0.5
>>> dy1 = s*np.array(dy)
>>> p1, err1, Q1, cov1 = leastsq(fn, [m, mu, Delta], x, y, dy1)
>>> np.allclose(p, p1)
True
>>> np.allclose(err1, s*err)
True
>>> np.allclose(cov1, s**2*cov)
True
Note that the `Q` value is horrible though because we have made the errors
too small.
>>> Q1
1...e-16
The argument `scale_cov` scales the errors in this way so that the reduced
$\chi^2_R \propto s^{-2} = 1$ would be one, and uses this to estimate the
errors.
>>> chi2 = (((fn(x, p1) - y)/dy1)**2).sum()
>>> chi2_r = chi2/(len(x) - len(p1))
>>> s1 = np.sqrt(chi2_r)
>>> p2, err2, Q2, cov2 = leastsq(fn, [m, mu, Delta], x, y, dy1,
... scale_cov=True)
>>> np.allclose(err2, err1*s1)
True
"""
x = np.array(x)
y = np.array(y)
N = len(y) # Number of data points
try:
Na = len(p0) # Number of parameters
except TypeError:
Na = 1
if dy is None:
dy = y*0.0 + 1.0
else:
dy = np.array(dy)
def residuals(p, x=x, y=y, dy=dy):
return (f(x, p) - y)/dy
if df is None:
d_residuals = None
else:
def d_residuals(p, x=x, y=y, dy=dy):
return df(x, p)/dy
# This sometimes fails even where there is a reasonable answer.
# See scipy ticket #984
#def _f(x, *p):
# return f(x, p)
#p, pcov = scipy.optimize.curve_fit(
# _f, xdata=x, ydata=y, p0=p0, sigma=dy,
# epsfcn=epsfcn, factor=factor)
res = sp.optimize.leastsq(residuals, x0=p0, full_output=1,
Dfun=d_residuals, col_deriv=True,
epsfcn=epsfcn, factor=factor,
ftol=ftol, diag=diag)
(p, pcov, infodict, errmsg, ier) = res
#return res
#fjac = infodict['fjac']
#ipvt = infodict['ipvt']
# Compute the one sigma error estimates from the chi square:
chi_sq = (residuals(p)**2).sum()
if pcov is None:
pcov = np.inf
#D = Delta(N - Na, 1) # This does not seem to work.
#pcov *= D
if N <= Na:
# Special case since gammainc only works for positive values.
Q = 0.0
pcov = np.inf
chi_sq_red = np.inf
elif chi_sq == 0.0:
Q = 1.0
chi_sq_red = 0.0
else:
Q = 1.0 - scipy.special.gammainc((N-Na)/2.0, chi_sq/2.0)
chi_sq_red = chi_sq/(N - Na)
if scale_cov:
pcov *= chi_sq_red
shape = np.shape(p0)
# Compute independent error estimates in the parameters
err = np.reshape(np.sqrt(np.diag(pcov)),shape)
p = np.reshape(p, shape)
if df is not None:
# If so, we can provide an extrapolation estimate. Note that
# I have not figured out how to do the sigma extrapolation yet.
def F(x, p=p, f=f, df=df, # sigma=1
):
r"""Returns (y,dy) where y = sum(a[n]*f[n](x)) and dy is the
error estimate."""
y = f(x, p)
J = df(x, p)
dy = np.sqrt(np.diag(np.dot(np.dot(J.T, pcov), J))).ravel()
if np.asarray(y).shape is ():
dy = dy[0]
return y, dy
return p, err, Q, pcov, F
else:
return p, err, Q, pcov
[docs]def lsq(X, y, dy=None):
r"""Return `(a, cov, Q, residuals)` for a least
squares fit of the linear least squares problem
.. math::
\min_{a}\norm{X\cdot a - y}^2_2.
Parameters
----------
X : 2d array X x Na
Abscissa. Least square problem is `norm(dot(X, a) - y)`.
y : (N x 1) array of floats
Data values.
dy : (N x 1) array of floats, optional
Errors in data (assumed to be normally distributed). Assumed to
be `1` if not provided.
Returns
-------
a : vector
Vector of parameters estimates
cov : matrix
Covariance matrix
Q : float
Estimate of the goodness of the fit. (`Q` should be between
0.001 and 0.1. Larger values indicate that the errors
have been over-estimated. Smaller values mean that the
model is bad, or the errors have been underestimates, or
the errors are not normally distributed.
References
----------
See [PTVF:2007]_.
"""
N, Na = X.shape
y = np.array(y).reshape((N,1))
if dy is None:
dy = 1
dy = np.asarray(dy)
if dy.shape is not ():
dy = np.array(dy).reshape((N,1))
if np.min(dy) == 0.0:
dy += 1e-20
A = X/dy
y = y.ravel()
dy = dy.ravel()
Y = y/dy
U, d, Vt = np.linalg.svd(A, full_matrices=False)
V = Vt.T
if N < Na:
raise NotImplemented("Needs to be tested...")
# Pad missing entries with large values:
dU = np.ones(Na,float)/np.finfo(float).eps
dU[:N] = 1./d*np.dot(U.T,Y)
a = np.dot(V.T,dU)
d2inv = np.ones(Na,float)/np.finfo(float).eps**2.0
d2inv[:N] = 1./d**2
cov = np.dot(V.T*d2inv, V)
else:
a = np.dot(np.dot(Y.T, U/d), V.T)
cov = np.dot(V/d**2, V.T)
residuals = np.dot(A,a) - Y
residuals2 = np.dot(residuals, residuals)
chi_sq = np.sum(residuals2)
if N <= Na:
# Special case since gammainc only works for positive values.
Q = 0.0
elif chi_sq == 0.0:
Q = 1.0
else:
Q = 1.0 - scipy.special.gammainc((N-Na)/2.0,chi_sq/2.0)
return a, cov, Q, residuals
[docs]def Delta(nu, sigma):
r"""Return $\Delta$ required to estimate the errors for $\nu$
parameters within a confidence interval of $\sigma$ standard
deviations.
.. math::
\gamma_{q}(\nu/2, \Delta/2) = p(\sigma)
where $p(\sigma)$ is the desired confidence from the normal
distribution. I.e. $68.3\%$ for $\sigma = 1$, $90\%$ for
$\sigma=2$ etc.
Examples
--------
>>> Delta(1, 1)
1.0
>>> Delta(1, 2)
3.9999...
>>> Delta(2, 1)
2.295...
>>> Delta(4, 2)
9.715...
"""
p = 2*scipy.stats.distributions.norm.cdf(sigma) - 1
def f(Delta):
return scipy.special.gammainc(nu/2.0, Delta/2.0) - p
Delta_max = 1
while f(Delta_max) < 0:
Delta_max *= 2
return scipy.optimize.brentq(f, 0, Delta_max)
[docs]def lsqf(f, x, y, dy, scale_cov=False, return_residuals=False):
r"""Return `(a, cov, Q, F)` or `(a, cov, Q, F, residuals, chi2r)` for a
least squares fit of the data `(x, y)` to the function::
y = a[0]*f[0](x) + a[1]*f[1](x) + ...
Parameters
----------
f : 1d array of Na functions
x : 1d array of N parameters
Abscissa
y : 1d array of N floats
Data values
dy : 1d array of N floats
Errors in data (assumed to be normally distributed).
scale_cov : bool, (optional)
If this is `True`, then the covariance matrix and error
estimates will be scaled by `chi/sqrt(nu)` where `nu = len(x) -
len(p)` is the number of degrees of freedom. This scaling
allows one to extract parameter estimates even if the errors
`dy` are not properly estimated.
return_residuals : bool
If `True`, then data residuals will be returned.
Returns
-------
a : vector
Vector of parameters estimates
cov : matrix
Covariance matrix
Q : float
Estimate of the goodness of the fit. (`Q` should be between
0.001 and 0.1. Larger values indicate that the errors
have been over-estimated. Smaller values mean that the
model is bad, or the errors have been underestimates, or
the errors are not normally distributed.
F : function
This function takes abscissa `x -> (y,dy)` where `y = f(x)` is
the best fit and dy is the estimated error. The error
estimate uses the full covariance matrix.
residuals : Residuals if `return_residuals` is `True`.
chi2r : Reduced chi square if `return_residuals` is `True`.
Examples
--------
Generate a bunch of points on the parabola y = x**2-2*x+1 with
errors and fit them.
>>> scipy.random.seed(0)
>>> N = 100
>>> f = [lambda x:1.0,lambda x:x,lambda x:x**2]
>>> a0 = [1.0,-2.0,1.0]
>>> Na = len(a0)
>>> def F(x,a=a0,f=f):
... y = np.zeros(len(x),float)
... for n in xrange(len(a)):
... y += a[n]*f[n](x)
... return y
>>> x = scipy.random.uniform(-3,3,N)
>>> dy = scipy.random.uniform(0,0.1,N)
>>> y = scipy.random.normal(F(x, a=a0),dy)
>>> a, cov, Q, F = lsqf(f,x,y,dy)
>>> assert Q < 0.9 and 0.1 < Q, "Q value not good for data! Q=%f" % Q
>>> #from pylab import *;ion();clf()
>>> #errorbar(x,y-F(x, a=a)[0],dy,fmt='.')
>>> #X = np.linspace(-3,3,1000)
>>> #Y,dY = F(X, a=a)
>>> #plot(X,dY,X,-dY)
References
----------
See [PTVF:2007]_.
"""
Na = len(f)
N = len(x)
if isinstance(x,np.ndarray) or np.isscalar(x[0]):
# Vectorize if x is an array
x_is_array = True
x = np.array(x).reshape(N)
else:
# Allow for a list of tuples for example...
x_is_array = False
X = np.empty((N,Na),float)
for n in xrange(Na):
if x_is_array:
X[:,n] = f[n](x)
else:
for m in xrange(N):
# Explicit loop to allow for general abscissa
X[m,n] = f[n](x[m])
a, cov, Q, residuals = lsq(X, y, dy)
chi_sq = np.sum(residuals**2)
if N <= Na:
chi_sq_red = np.inf
else:
chi_sq_red = chi_sq/(N - Na)
if scale_cov:
cov *= chi_sq_red
# Compute error estimates
assert np.allclose(cov, cov.T)
d, U = np.linalg.eigh(cov); d = np.sqrt(d)
Ud = np.asarray(U*d)
def F(x, a=a, f=f, Ud=Ud, sigma=1):
r"""Returns (y,dy) where y = sum(a[n]*f[n](x)) and dy is the
error estimate.
Parameters
----------
x : float or array
Abscissa: the function is vectorized, so arrays can be passed (as long
as the functions `f` are vectorized).
f : [function]
List of functions. This can be replaced. For example, you might
replace this by a list of the derivatives of ``f[n](x)`` to compute
the error in the extrapolated derivative.
a : [float]
Fit coefficients. The model is ``sum([a[n]*f[n](x) for n in ...]).
This should not be changed.
Ud: matrix
This should not be changed.
sigma : float
This should not be changed.
"""
try:
N = len(x)
except TypeError:
x = [x]
N = 1
x = np.array(x)
y = np.zeros(N,float)
dy = np.zeros(N,float)
X = np.empty((N,len(a)),float)
for n in xrange(len(a)):
if x_is_array:
X[:,n] = f[n](x)
y += a[n]*f[n](x)
else:
for m in xrange(N):
# Explicit loop to allow for general abscissa
X[m,n] = f[n](x[m])
y[m] += a[n]*f[n](x[m])
X = np.asarray(X)
tmp = np.dot(X, Ud)
dy = np.sqrt(Delta(nu=len(a), sigma=sigma)
*np.diag(np.dot(tmp, tmp.T))).reshape(y.shape)
return y, dy
if return_residuals:
return a, cov, Q, F, residuals, chi_sq_red
else:
return a, cov, Q, F
[docs]def lsqm(model,x,y,dy,active=None):
r"""Perform a least squares fit using the functions defined in the
model dictionary and return a dictionary of results as (value,err)
pairs."""
if active is None:
active = model.keys()
f = [model[param] for param in active]
a, cov, Q, F, residuals, chi2r = lsqf(f,x,y,dy,return_residuals=True)
res = {}
for n in xrange(len(active)):
res[active[n]] = a[n] + 1j*np.sqrt(cov[n,n])
res['residuals'] = residuals
res['Q'] = Q
res['Cov'] = cov
res['F'] = F
res['chi2r'] = chi2r
return res
def _test_1(N=100):
r"""Test a linear fit."""
f = [lambda x:1.0,
lambda x:x]
a0 = [1.0,-2.0]
Na = len(a0)
def F(x, a=a0, f=f):
y = np.zeros(len(x),float)
for n in xrange(len(a)):
y += a[n]*f[n](x)
return y
scipy.random.seed(0)
x = scipy.random.uniform(-3,3,N)
dy = scipy.random.uniform(0,0.1,N)
y = scipy.random.normal(F(x, a=a0),dy)
a, cov, Q, F = lsqf(f,x,y,dy)
print("Q=%f, a=%s, err=%s"%(Q,str(a),str(np.sqrt(np.diag(cov)))))
assert Q < 0.9 and 0.1 < Q, "Q value not good for data! Q=%f" % Q
#from pylab import *;ion();clf()
#errorbar(x,y-F(x, a=a)[0],dy,fmt='.')
X = np.linspace(-3,3,1000)
Y,dY = F(X, a=a)
#plot(X,dY,X,-dY)
return Q, a, np.sqrt(np.diag(cov))
def _test_linear(N=6, M=200, plot=False):
r"""Linear least squares problem.
Here we generate some data to test the linear version. We apply
this to the polynomial fitting problem.
"""
fs = [lambda x:x*x*x, lambda x:x*x, lambda x:x, lambda x:1.0 + 0*x]
def f(x, p):
return sum(f_(x)*p_ for (f_, p_) in zip(fs, p))
def df(x, p):
return np.vstack([f_(x) for f_ in fs])
x = np.linspace(0, 3, N)
p0 = [4,3,2,1]
y0 = f(x, p0)
np.random.seed(0)
dy = np.random.rand(N)/10 + 0.01
y = y0 + np.random.normal(0, dy)
p, cov, Q, F = lsqf(fs, x, y, dy)
p1, dp1, Q1, cov1, F1 = leastsq(f=f, p0=p0, x=x, y=y, dy=dy, df=df)
assert np.allclose(p, p1)
assert np.allclose(cov, cov1)
p_ = []
cov_ = []
Q_ = []
chi2_ = []
for i in xrange(M):
y = y0 + np.random.normal(0, dy)
p, cov, Q, F = lsqf(fs, x=x, y=y, dy=dy)
chi2_.append(np.sum(((y - f(x, p))/dy)**2))
#p, dp, Q, cov, F = leastsq(f=f, p0=p0, x=x,
# y=y0 + np.random.normal(0, dy),
# dy=dy, df=df)
p_.append(p)
Q_.append(Q)
cov_.append(cov.ravel())
p_ = np.array(p_)
chi2_ = np.array(chi2_)
cov_ = np.array(cov_)
cov = cov_.mean(axis=0).reshape(cov.shape)
dcov = cov_.std(axis=0).reshape(cov.shape)
if plot:
for i in xrange(3):
plt.hist(p_[:,i], np.sqrt(M), normed=True)
_x = np.linspace(*(plt.axis()[:2] + (100,)))
plt.plot(_x, sp.stats.norm.pdf(_x, p0[i], dp0[i]), ':')
plt.plot(_x, sp.stats.norm.pdf(_x, p0[i], err_.mean(axis=0)[i]))
return x, y_, f
def _test_nl(N=100, M=1000, plot=False):
r"""Nonlinear test problem.
Here we fit the following function to some data with errors:
.. math::
f(x) &= \sqrt{\left(\frac{x^2}{2m} - \mu\right)^2 + \Delta^2},\\
f_{,m}(x) &= -\frac{x^2}{2m^2f(x)}\left(\frac{x^2}{2m} - \mu\right),\\
f_{,\mu}(x) &= -\frac{1}{f(x)}\left(\frac{x^2}{2m} - \mu\right),\\
f_{,\Delta}(x) &= \frac{\Delta}{f(x)},\\
First we generate `N` points of data with Gaussian errors, then we
fit and extract the parameters. We do this `M` times and bin the
results to show the distribution.
Note that the error estimates are consistent with the distributed
errors, but not with the original parameter distribution. I am
not quite sure why... The latter discrepancy may be due to the
non-linearity in the model, but linear models do not fare any better.
"""
from numpy.random import normal as rand
if plot:
import matplotlib.pyplot as plt
# First define the function to be modelled using sympy so we can
# compute derivatives.
import sympy
x, m, mu, Delta = sympy.symbols(['x', 'm', 'mu', 'Delta'])
f_expr = sympy.sqrt((x**2/2/m - mu)**2 + Delta**2)
_d = {}
exec "def f(x, m, mu, Delta): return %s" % str(f_expr) in _d
f = _d['f']
# These are the parameters and the standard deviations
m0, dm0 = 2.0, 0.02
mu0, dmu0 = 3.0, 0.04
Delta0, dDelta0 = 4.0, 0.06
p0 = [m0, mu0, Delta0]
dp0 = [dm0, dmu0, dDelta0]
# Generate a set of abscissa
x = np.linspace(0, 3, N)
# Generate the random parameters with a normal distribution
m_ = rand(m0, dm0, (N, M))
mu_ = rand(mu0, dmu0, (N, M))
Delta_ = rand(Delta0, dDelta0, (N, M))
# Broadcast the parameters
y_ = f(x[:, None], m_, mu_, Delta_)
# Check the distribution of the errors. Here are the mean and std
# from the data:
y_avg, y_std = y_.mean(axis=1), y_.std(axis=1)
# And the predicted values using linear response
args = dict(x=x, m=m0, mu=mu0, Delta=Delta0)
#y0 = eval(str(f_expr), args)
dy0 = np.sqrt(eval(str((f_expr.diff('m')*dm0)**2 +
(f_expr.diff('mu')*dmu0)**2 +
(f_expr.diff('Delta')*dDelta0)**2), args))
# Check that these are consistent with zero.
assert abs((y_std - dy0).mean()/(y_std - dy0).std() < 0.2)
# Now use the data to generate a set of fits
def fn(x, params):
m, mu, Delta = params
return f(x, m, mu, Delta)
p_ = []
err_ = []
Q_ = []
for i in xrange(M):
p, err, Q, cov = leastsq(fn, p0, x, y_[:,i], dy0)
p_.append(p)
Q_.append(Q)
err_.append(err)
p_ = np.array(p_)
err_ = np.array(err_)
if plot:
for i in xrange(3):
plt.hist(p_[:,i], np.sqrt(M), normed=True)
_x = np.linspace(*(plt.axis()[:2] + (100,)))
plt.plot(_x, sp.stats.norm.pdf(_x, p0[i], dp0[i]), ':')
plt.plot(_x, sp.stats.norm.pdf(_x, p0[i], err_.mean(axis=0)[i]))
return x, y_, f
#ion();clf()
#errorbar(p,y,dy);
#import pdb;pdb.set_trace()
#p, err, Q, cov = leastsq(fn, [m, mu, Delta], x, y, dy)