next up previous
Next: Individual functions Up: Tools for modelling categorical Previous: Introduction

Object oriented methods

model-formula-proto is a prototype for a model formula class. To create a new model formula use the (as-formula) function

(as-formula '((factor a) (factor b) (term x) (interaction (list a b)) (interaction (list a x) :is-factor (list t nil))))

The argument must be quoted. I'm sure there is some way of getting round this --- the equivalent of deparse(substitute()) in S but I'm not a good enough LISP hacker to know what it is. Please tell me if you know -- it probably involves using a macro rather than a function.

Note that the :is-factor keyword in (interaction) is optional but that the default is to assume everything is a factor. If one of the variables is continuous and you forget :is-factor the results will be obviously wrong (even if you don't run out of stack space). This is the best I can do with a function that accepts lists of numbers as input. The Right Thing to do is probably what S does: force the user to predeclare factor variables as objects and treat everything else as a metric variable. This is not too difficult and I may well do a revised version that makes everything completely object oriented. See the Individual functions section for more documentation --- as-formula is just an object version of (formula)

The formula object responds to messages

 
:formula

:design-matrix

:name-list

:block-names

:block-indices

The first sets or returns the formula (the argument to (as-formula)) and recomputes if necessary. The other four return the appropriate things: a design matrix, a list of column labels for it, a list of names for each block of columns (ie each factor, interaction or metric variable), and a list of lists of column numbers for each block.

There is a function (regression-formula) which is exactly the same as (regression-model) except that it takes a model-formula-proto object instead of the x argument. It then stores the formula in a new slot in the regression-model object and optionally invokes a new method :display-with-formula which gives Wald tests for the various factors, interactions and other terms in the model. The method :display-with-formula will work on glim-protos as well, but a new set of constructor functions similar to (regression-formula) would be needed or else the :isnew method would need to be changed. Changing the :isnew method is the most elegant solution; I haven't done it because I am reluctant to override built-in code at this stage.

My Generalised Estimating Equations prototype deals with model formulae fairly transparently: it will accept a formula object, a design matrix, a sequence or a list of sequences to specify the predictors.

Example:

> (def a (repeat (iseq 1 2) 30))
A
>
(def b (repeat (iseq 1 3) 20))
B
>(def x (normal-rand 60))
X
> (def y (+ x a (normal-rand 60)))
Y

> (def formula1 (as-formula '((term x) (factor a) (factor b)
 (interaction (list a b)))))
FORMULA1
> (send formula1 :name-list) (send formula1 :name-list)
("((A 2) (B 2))" "((A 2) (B 3))" "(B 2)" "(B 3)" "(A 2)" "X")
> (send formula1 :block-names)
("(INTERACTION A B)" "B" "A" "X")
> (send formula1 :block-indices)
((1 0) (3 2) (4) (5))

> (regression-model (send formula1 :design-matrix) y :response-name "Why?" 
:predictor-names (send formula1 :name-list))

Least Squares Estimates:

Constant                  0.867491      (0.308863)
((A 2) (B 2))            -0.481680      (0.621703)
((A 2) (B 3))            -0.666604      (0.624665)
(B 2)                     0.782557      (0.441825)
(B 3)                     0.538866      (0.444595)
(A 2)                      1.16702      (0.437606)
X                          1.30912      (0.155051)

R Squared:                0.634662    
Sigma hat:                0.976272    
Number of cases:                60
Degrees of freedom:             53

#<Object: 1674520, prototype = REGRESSION-MODEL-PROTO>

> (def modela (regression-formula formula1 y :response-name "Why?"))
Block               Wald Chisq     p-value
X                      71.287      0.0000
A                      7.1119      0.0077
B                      3.3431      0.1880
(INTERACTION A B)      1.2326      0.5399
MODELA
>  (send modela :display)

Least Squares Estimates:

Constant                  0.867491      (0.308863)
((A 2) (B 2))            -0.481680      (0.621703)
((A 2) (B 3))            -0.666604      (0.624665)
(B 2)                     0.782557      (0.441825)
(B 3)                     0.538866      (0.444595)
(A 2)                      1.16702      (0.437606)
X                          1.30912      (0.155051)

R Squared:                0.634662    
Sigma hat:                0.976272    
Number of cases:                60
Degrees of freedom:             53

NIL

> (send modela :display-with-formula)
Block               Wald Chisq     p-value
X                      71.287      0.0000
A                      7.1119      0.0077
B                      3.3431      0.1880
(INTERACTION A B)      1.2326      0.5399
NIL
> (send modela :display-with-formula nil)
Block               Wald Chisq     p-value
X                      71.287      0.0000
      Variable            Estimate       Std.Err.       p-value
     X                      1.3091      (0.155051)     0.0000
A                      7.1119      0.0077
      Variable            Estimate       Std.Err.       p-value
     (A 2)                  1.1670      (0.437606)     0.0077
B                      3.3431      0.1880
      Variable            Estimate       Std.Err.       p-value
     (B 3)                 0.53887      (0.444595)     0.2255
     (B 2)                 0.78256      (0.441825)     0.0765
(INTERACTION A B)      1.2326      0.5399
      Variable            Estimate       Std.Err.       p-value
     ((A 2) (B 3))        -0.66660      (0.624665)     0.2859
     ((A 2) (B 2))        -0.48168      (0.621703)     0.4385
NIL



next up previous
Next: Individual functions Up: Tools for modelling categorical Previous: Introduction



Thomas Lumley
Sat Apr 20 15:07:20 PDT 1996