gmchol(A[, test]) | Return (L, e): the Gill-Murray generalized Cholesky |
colmod(A) | Modified Cholesky factorization |
mchol2(A) | Based on the MATLAB code |
These are a collection of codes based on the [GMW81] algorithm.
Return (L, e): the Gill-Murray generalized Cholesky decomposition of M = A + diag(e) = dot(L, L.T) where
Parameters : | A : array
test : bool
|
---|---|
Returns : | L : 2d array
e : 1d array
|
Notes
****************************************************************/ * This is the Gill/Murray cholesky routine. Reference: / * */ * Gill, Jeff and Gary King. ``What to do When Your Hessian */ * is Not Invertible: Alternatives to Model Respecification */ * in Nonlinear Estimation,’’ Sociological Methods and */ * Research, Vol. 32, No. 1 (2004): Pp. 54–87. */ * */ * Abstract */ * */ * What should a researcher do when statistical analysis */ * software terminates before completion with a message that */ * the Hessian is not invertable? The standard textbook advice */ * is to respecify the model, but this is another way of saying */ * that the researcher should change the question being asked. */ * Obviously, however, computer programs should not be in the */ * business of deciding what questions are worthy of */ * study. Although noninvertable Hessians are sometimes signals */ * of poorly posed questions, nonsensical models, or */ * inappropriate estimators, they also frequently occur when */ * information about the quantities of interest exists in the */ * data, through the likelihood function. We explain the */ * problem in some detail and lay out two preliminary proposals */ * for ways of dealing with noninvertable Hessians without */ * changing the question asked. */ ***************************************************************/
/* calculates the Gill-Murray generalized choleski decomposition / / input matrix A must be non-singular and symmetric / / Author: Jeff Gill. Part of the Hessian Project. */ local i,j,k,n,sum,R,theta_j,norm_A,phi_j,delta,xi_j,gamm,E,beta_j; n = rows(A); R = eye(n); E = zeros(n,n); norm_A = maxc(sumc(abs(A))); gamm = maxc(abs(diag(A))); delta = maxc(maxc(__macheps*norm_A~__macheps)); for j (1, n, 1);
theta_j = 0; for i (1, n, 1);
sum = 0; for k (1, (i-1), 1);
sum = sum + R[k,i]*R[k,j];endfor; R[i,j] = (A[i,j] - sum)/R[i,i]; if (A[i,j] -sum) > theta_j;
theta_j = A[i,j] - sum;endif; if i > j;
R[i,j] = 0;endif;
endfor; sum = 0; for k (1, (j-1), 1);
sum = sum + R[k,j]^2;endfor; phi_j = A[j,j] - sum; if (j+1) <= n;
xi_j = maxc(abs(A[(j+1):n,j]));
- else;
- xi_j = maxc(abs(A[n,j]));
endif; beta_j = sqrt(maxc(maxc(gamm~(xi_j/n)~__macheps))); if delta >= maxc(abs(phi_j)~((theta_j^2)/(beta_j^2)));
E[j,j] = delta - phi_j;
- elseif abs(phi_j) >= maxc(((delta^2)/(beta_j^2))~delta);
- E[j,j] = abs(phi_j) - phi_j;
- elseif ((theta_j^2)/(beta_j^2)) >= maxc(delta~abs(phi_j));
- E[j,j] = ((theta_j^2)/(beta_j^2)) - phi_j;
endif; R[j,j] = sqrt(A[j,j] - sum + E[j,j]);
endfor; retp(R’R);
endp;
Examples
>>> np.random.seed(3)
>>> A = np.random.rand(100,100)*2 - 1
>>> A = A + A.T
>>> L, e = gmchol(A, test=True)
>>> np.allclose(np.dot(L,L.T), (A + np.diag(e)))
True
>>> -e.max()/np.linalg.eigvalsh(A).min() < 1000
True
>>> A2 = np.array([[-0.451, -0.041, 0.124],
... [-0.041, -0.265, 0.061],
... [ 0.124, 0.061, -0.517]])
>>> L, e = gmchol(A2, test=True)
>>> e.max()/np.linalg.eigvalsh(A).min()
Modified Cholesky factorization
R = cholmod(A) returns the upper Cholesky factor of A (same as chol) if A is (sufficiently) positive definite, and otherwise returns a modified factor R with diagonal entries >= sqrt(delta) and offdiagonal entries <= beta in absolute value, where delta and beta are defined in terms of size of diagonal and offdiagonal entries of A and the machine precision; see below. The idea is to ensure that E = A - R’*R is reasonably small if A is not too far from being indefinite. If A is sparse, so is R. The output parameter indef is set to 0 if `A is sufficiently positive definite and to 1 if the factorization is modified.
The point of modified Cholesky is to avoid computing eigenvalues, but for dense matrices, MODCHOL takes longer than calling the built-in EIG, because of the cost of interpreting the code, even though it only has one loop and uses vector operations.
reference: Nocedal and Wright, Algorithm 3.4 and subsequent discussion (not Algorithm 3.5, which is more complicated) original algorithm is due to Gill and Murray, 1974 written by M. Overton, overton@cs.nyu.edu, last modified Feb 2005
convenient to work with A = LDL’ where D is diagonal, L is unit lower triangular, and so R = (LD^(1/2))’
Notes
This is based on the following MATLAB code from Michael L. Overton: http://cs.nyu.edu/overton/g22_opt/codes/cholmod.m
function [R, indef, E] = cholmod(A)
if sum(sum(abs(A-A'))) > 0
error('A is not symmetric')
end
% set parameters governing bounds on L and D (eps is machine epsilon)
n = length(A);
diagA = diag(A);
gamma = max(abs(diagA)); % max diagonal entry
xi = max(max(abs(A - diag(diagA)))); % max offidagonal entry
delta = eps*(max([gamma+xi, 1]));
beta = sqrt(max([gamma, xi/n, eps]));
indef = 0;
% initialize d and L
d = zeros(n,1);
if issparse(A)
L = speye(n); % sparse identity
else
L = eye(n);
end
% there are no inner for loops, everything implemented with
% vector operations for a reasonable level of efficiency
for j = 1:n
K = 1:j-1; % column index: all columns to left of diagonal
% d(K) doesn't work in case K is empty
djtemp = A(j,j) - L(j,K)*(d(K,1).*L(j,K)'); % C(j,j) in book
if j < n
I = j+1:n; % row index: all rows below diagonal
Ccol = A(I,j) - L(I,K)*(d(K,1).*L(j,K)'); % C(I,j) in book
theta = max(abs(Ccol));
% guarantees d(j) not too small and L(I,j) not too big
% in sufficiently positive definite case, d(j) = djtemp
d(j) = max([abs(djtemp), (theta/beta)^2, delta]);
L(I,j) = Ccol/d(j);
else
d(j) = max([abs(djtemp), delta]);
end
if d(j) > djtemp % A was not sufficiently positive definite
indef = 1;
end
end
% convert to usual output format: replace L by L*sqrt(D) and transpose
for j=1:n
L(:,j) = L(:,j)*sqrt(d(j)); % L = L*diag(sqrt(d)) bad in sparse case
end;
R = L';
if nargout == 3
E = A - R'*R;
end
Examples
>>> np.random.seed(3)
>>> A = np.random.rand(100,100)*2-1
>>> A = A + A.T
>>> L, e = colmod(A)
>>> np.allclose(np.dot(L, L.T) - np.diag(e), A)
True
>>> 0 < np.linalg.eigvalsh(A + np.diag(e)).min()
True
>>> -e.max()/np.linalg.eigvalsh(A).min() < 1000
True
by Brian Borchers and Michael Zibulevsky:
%
% [L,D,E,pneg]=mchol2(G)
%
% Given a symmetric matrix G, find a matrix E of "small" norm and c
% L, and D such that G+E is Positive Definite, and
%
% G+E = L*D*L'
%
% Also, calculate a direction pneg, such that if G is not PSD, then
%
% pneg'*G*pneg < 0
%
% Reference: Gill, Murray, and Wright, "Practical Optimization", p111.
% Author: Brian Borchers (borchers@nmt.edu)
% Modification: Michael Zibulevsky (mzib@ee.technion.ac.il)
%
function [L,D,E,pneg]=mcholmz3(G)
%
% n gives the size of the matrix.
%
n=size(G,1);
%
% gamma, zi, nu, and beta2 are quantities used by the algorithm.
%
gamma=max(diag(G));
zi=max(max(G-diag(diag(G))));
nu=max([1,sqrt(n^2-1)]);
beta2=max([gamma, zi/nu, 1.0E-15]);
C=diag(diag(G));
L=zeros(n);
D=zeros(n,1); % use vestors as diagonal matrices
E=zeros(n,1); %%****************** **************
%
% Loop through, calculating column j of L for j=1:n
%
for j=1:n,
bb=[1:j-1];
ee=[j+1:n];
if (j > 1),
L(j,bb)=C(j,bb)./D(bb)'; % Calculate the jth row of L.
end;
if (j >= 2)
if (j < n),
C(ee,j)=G(ee,j)- C(ee,bb)*L(j,bb)'; % Update the jth column of C.
end;
else
C(ee,j)=G(ee,j);
end;
%%%% Update theta.
if (j == n)
theta(j)=0;
else
theta(j)=max(abs(C(ee,j)));
end;
%%%%% Update D
D(j)=max([eps,abs(C(j,j)),theta(j)^2/beta2]');
%%%%%% Update E.
E(j)=D(j)-C(j,j);
%%%%%%% Update C again...
%for i=j+1:n,
% C(i,i)=C(i,i)-C(i,j)^2/D(j,j);
%end;
ind=[j*(n+1)+1 : n+1 : n*n]';
C(ind)=C(ind)-(1/D(j))*C(ee,j).^2;
end;
%
% Put 1's on the diagonal of L
%
%for j=1:n,
% L(j,j)=1;
%end;
ind=[1 : n+1 : n*n]';
L(ind)=1;
%
% if needed, finded a descent direction.
%
if (nargout == 4)
[m,col]=min(diag(C));
rhs=zeros(n,1);
rhs(col)=1;
pneg=L'
end;
return
%%%%%%%%%%%%%%%%%%%%%% Test %%%%%%%%%%%%%
n=3; G=rand(n);G=G+G’;eigG=eig(G), [L,D,E,pneg]=mchol(G) [L1,D1,E1,pneg1]=mcholmz1(G); LL=L-L1, DD=D-D1, EE=E-E1, pp=pneg-pneg1
n=3; G=rand(n);G=G+G’;eigG=eig(G), [L,D,E,pneg]=mchol(G) [L2,D2,E2,pneg2]=mcholmz2(G); LL=L-L2, DD=D-diag(D2), EE=E-diag(E2), pp=pneg-pneg2