next up previous
Next: Inference Up: Constrained Maximum Likelihood Previous: Introduction

CML

CML) is a set of procedures written in the GAUSS programming language (Schoenberg, 1995) for the estimation of the parameters of models via the maximum likelihood method with general constraints on the parameters

CML solves the general weighted maximum likelihood problem

displaymath493

where N is the number of observations, tex2html_wrap_inline531 is a weight. tex2html_wrap_inline533 is the probability of tex2html_wrap_inline535 given tex2html_wrap_inline537 , a vector of parameters, subject to the linear constraints,

eqnarray51

the nonlinear constraints

eqnarray53

and bounds

displaymath494

tex2html_wrap_inline539 and tex2html_wrap_inline541 are functions provided by the user and must be differentiable at least once with respect to tex2html_wrap_inline537 .

CML finds values for the parameters in tex2html_wrap_inline537 such that L is maximized using the Sequential Quadratic Programming method. In this method the parameters are updated in a series of iterations beginning with a vector of starting values. Let tex2html_wrap_inline549 be the current parameter values. Then the succeeding values are

displaymath495

where tex2html_wrap_inline551 is a tex2html_wrap_inline553 direction vector, and tex2html_wrap_inline555 a scalar step length.

Define

eqnarray63

and the Jacobians

eqnarray69

For the purposes of this exposition, and without loss of generality, we may assume that the linear constraints and bounds have been incorporated into G and H.

The direction, tex2html_wrap_inline551 is the solution to the quadratic program

displaymath496

eqnarray81

This solution requires that tex2html_wrap_inline563 be non-negative definite (In practice, CML incorporates a slight modification in the quadratic programming solution which relaxes this requirement to positive semi-definite).

In practice, linear constraints are specified separately from the G and H because their Jacobians are known and easy to compute. And the bounds are more easily handled separately from the linear inequality constraints.

The SQP method requires the calculation of a Hessian, tex2html_wrap_inline563 , and various gradients and Jacobians, tex2html_wrap_inline571 , tex2html_wrap_inline573 , and tex2html_wrap_inline575 . CML computes these numerically if procedures to compute them are not supplied.

Descent Algorithms. The Hessian may be very expensive to compute at every iteration, and poor start values may produce an ill-conditioned Hessian. For these reasons alternative algorithms are provided in CML for updating the Hessian rather than computing it directly at each iteration. These algorithms, as well as step length methods, may be modified during the execution of CML.

Beginning with an initial estimate of the Hessian, or a conformable identity matrix, an update is calculated. The update at each iteration adds more ``information'' to the estimate of the Hessian, improving its ability to project the direction of the descent. Thus after several iterations the secant algorithm should do nearly as well as Newton iteration with much less computation.

There are two basic types of secant methods used in CML, the BFGS (Broyden, Fletcher, Goldfarb, and Shanno), and the DFP (Davidon, Fletcher, and Powell). They are both rank two updates, that is, they are analogous to adding two rows of new data to a previously computed moment matrix. The Cholesky factorization of the estimate of the Hessian is updated using the functions CHOLUP and CHOLDN.

BFGS is the method of Broyden, Fletcher, Goldfarb, and Shanno, and DFP is the method of Davidon, Fletcher, and Powell. These methods are complementary (Luenberger 1984, page 268). BFGS and DFP are like the Gauss-Newton method in that they use both first and second derivative information. However, in DFP and BFGS the Hessian is approximated, reducing considerably the computational requirements. Because they do not explicitly calculate the second derivatives, they are sometimes called ``quasi-Newton'' methods. While typically needing more iterations than the Gauss-Newton method, these approximations can be expected to converge in less overall time (though Gauss-Newton with analytical second derivatives can be quite competitive).

The secant methods are commonly implemented as updates of the inverse of the Hessian. This is not the best method numerically for the BFGS algorithm (Gill and Murray, 1972). This version of CML, following Gill and Murray (1972), updates the Cholesky factorization of the Hessian instead. The new direction is computed using a Cholesky solve, as applied to the updated Cholesky factorization of the Hessian and the gradient.

Line Search Methods. Define the merit function

displaymath497

where tex2html_wrap_inline577 is the j-th row of G, tex2html_wrap_inline583 is the tex2html_wrap_inline585 -th row of H, tex2html_wrap_inline589 is the vector of Lagrangean coefficients of the equality constraints, and tex2html_wrap_inline591 the Lagrangean coefficients of the inequality constraints. The line search finds a value of tex2html_wrap_inline555 that minimizes or decreases

displaymath498

where tex2html_wrap_inline555 is a constant. Given tex2html_wrap_inline537 and d, this is function of a single variable tex2html_wrap_inline555 . Line search methods attempt to find a value for tex2html_wrap_inline555 that decreases m. Several methods are available in CML, STEPBT, a polynomial fitting method, BRENT and BHHHSTEP, golden section methods, and HALF, a step-halving method.

Constraints. There are two general types of constraints, nonlinear equality constraints and nonlinear inequality constraints. However, for computational convenience they are divided into five types: linear equality, linear inequality, nonlinear equality, nonlinear inequality, and bounds.

Linear constraints are of the form

displaymath499

where A is an tex2html_wrap_inline607 matrix of known constants, and B an tex2html_wrap_inline609 vector of known constants, and tex2html_wrap_inline537 the vector of parameters.

Linear constraints are of the form

displaymath500

where C is an tex2html_wrap_inline613 matrix of known constants, and D an tex2html_wrap_inline615 vector of known constants, and tex2html_wrap_inline537 the vector of parameters.

Nonlinear equality constraints are of the form

displaymath501

where tex2html_wrap_inline537 is the vector of parameters, and tex2html_wrap_inline539 is an arbitrary, user-supplied function.

Nonlinear inequality constraints are of the form:

displaymath502

where tex2html_wrap_inline537 is the vector of parameters, and tex2html_wrap_inline541 is an arbitrary, user-supplied function.

Bounds are a type of linear inequality constraint. For computational convenience they are specified separately from the other inequality constraints.

Covariance Matrix of the Parameters. CML computes a covariance matrix of the parameters that is an approximate estimate when there are constrained parameters in the model (Gallant, 1987, Wolfgang and Hartwig, 1995). When the model includes inequality constraints, however, confidence limits computed from the usual t-statistics - dividing the parameter estimates by their standard errors - are incorrect because they do not account for boundaries placed on the distributions of the parameters by the inequality constraints. Confidence limits will have to be calculated therefore. Methods using inversion of the the Wald and likelihood ratio statistics are presented in the next section. The Wald method, however, requires an estimate of the covariance matrix of the parameters.

An argument based on a Taylor-series approximation to the likelihood function (e.g., Amemiya, 1985, page 111) shows that

displaymath503

where

eqnarray121

Estimates of A and B are

eqnarray129

Assuming the correct specification of the model plim(A) = plim(B), and thus

displaymath504

Without loss of generality we may consider two types of constraints, the nonlinear equality and the nonlinear inequality constraints (the linear constraints are included in nonlinear, and the bounds are regarded as a type of linear inequality). Furthermore, the inequality constraints may be treated as equality constraints with the introduction of ``slack'' parameters into the model:

displaymath505

is changed to

displaymath506

where tex2html_wrap_inline627 is a conformable vector of slack parameters.

Further we distinguish active from inactive inequality constraints. Active inequality constraints have nonzero Lagrangeans, tex2html_wrap_inline629 , and zero slack parameters, tex2html_wrap_inline631 , while the reverse is true for inactive inequality constraints. Keeping this in mind, define the diagonal matrix, Z, containing the slack parameters, tex2html_wrap_inline631 , for the inactive constraints, and another diagonal matrix, tex2html_wrap_inline637 , containing the Lagrangean coefficients. Also, define tex2html_wrap_inline639 representing the active constraints, and tex2html_wrap_inline641 the inactive.

The likelihood function augmented by constraints is then

displaymath507

and the Hessian of the augmented likelihood is

displaymath508

where the dot represents the Jacobian with respect to tex2html_wrap_inline537 , tex2html_wrap_inline645 , and tex2html_wrap_inline647 . The covariance matrix of the parameters, Lagrangeans, and slack parameters is the Moore-Penrose inverse of this matrix.

Usually we are interested only in the covariance matrix of the parameters, i.e., the upper left portion of the inverse associated with the parameters. This portion of the inverse can be produced using methods described in Hartmann and Hartwig (1995). For example, construct the partitioned array

displaymath509

Let tex2html_wrap_inline649 be the orthonormal basis for the null space of tex2html_wrap_inline651 , then the covariance matrix of the parameters is

displaymath510

Rows of this matrix associated with active inequality constraints may not be available, i.e., the rows and columns of tex2html_wrap_inline653 associated with those parameters may be all zeros.

Heteroskedastic-consistent Covariance Matrix. CML computes a heteroskedastic-consistent covariance matrix of the parameters when requested. Define tex2html_wrap_inline655 evaluated at the estimates. Then the covariance matrix of the parameters is tex2html_wrap_inline657 .


next up previous
Next: Inference Up: Constrained Maximum Likelihood Previous: Introduction

R. Schoenberg
Fri Sep 12 09:21:35 PDT 1997