model-formula-proto is a prototype for a model formula class. To create a new model formula use the (as-formula) function
(as-formula '((factor a) (factor b) (term x) (interaction (list a b)) (interaction (list a x) :is-factor (list t nil))))
The argument must be quoted. I'm sure there is some way of getting round this --- the equivalent of deparse(substitute()) in S but I'm not a good enough LISP hacker to know what it is. Please tell me if you know -- it probably involves using a macro rather than a function.
Note that the :is-factor keyword in (interaction) is optional but that the default is to assume everything is a factor. If one of the variables is continuous and you forget :is-factor the results will be obviously wrong (even if you don't run out of stack space). This is the best I can do with a function that accepts lists of numbers as input. The Right Thing to do is probably what S does: force the user to predeclare factor variables as objects and treat everything else as a metric variable. This is not too difficult and I may well do a revised version that makes everything completely object oriented. See the Individual functions section for more documentation --- as-formula is just an object version of (formula)
The formula object responds to messages
:formula:design-matrix
:name-list
:block-names
:block-indices
The first sets or returns the formula (the argument to (as-formula)) and recomputes if necessary. The other four return the appropriate things: a design matrix, a list of column labels for it, a list of names for each block of columns (ie each factor, interaction or metric variable), and a list of lists of column numbers for each block.
There is a function (regression-formula) which is exactly the same as (regression-model) except that it takes a model-formula-proto object instead of the x argument. It then stores the formula in a new slot in the regression-model object and optionally invokes a new method :display-with-formula which gives Wald tests for the various factors, interactions and other terms in the model. The method :display-with-formula will work on glim-protos as well, but a new set of constructor functions similar to (regression-formula) would be needed or else the :isnew method would need to be changed. Changing the :isnew method is the most elegant solution; I haven't done it because I am reluctant to override built-in code at this stage.
My Generalised Estimating Equations prototype deals with model formulae fairly transparently: it will accept a formula object, a design matrix, a sequence or a list of sequences to specify the predictors.
Example:
> (def a (repeat (iseq 1 2) 30)) A > (def b (repeat (iseq 1 3) 20)) B >(def x (normal-rand 60)) X > (def y (+ x a (normal-rand 60))) Y
> (def formula1 (as-formula '((term x) (factor a) (factor b) (interaction (list a b))))) FORMULA1 > (send formula1 :name-list) (send formula1 :name-list) ("((A 2) (B 2))" "((A 2) (B 3))" "(B 2)" "(B 3)" "(A 2)" "X") > (send formula1 :block-names) ("(INTERACTION A B)" "B" "A" "X") > (send formula1 :block-indices) ((1 0) (3 2) (4) (5))
> (regression-model (send formula1 :design-matrix) y :response-name "Why?" :predictor-names (send formula1 :name-list)) Least Squares Estimates: Constant 0.867491 (0.308863) ((A 2) (B 2)) -0.481680 (0.621703) ((A 2) (B 3)) -0.666604 (0.624665) (B 2) 0.782557 (0.441825) (B 3) 0.538866 (0.444595) (A 2) 1.16702 (0.437606) X 1.30912 (0.155051) R Squared: 0.634662 Sigma hat: 0.976272 Number of cases: 60 Degrees of freedom: 53 #<Object: 1674520, prototype = REGRESSION-MODEL-PROTO>
> (def modela (regression-formula formula1 y :response-name "Why?")) Block Wald Chisq p-value X 71.287 0.0000 A 7.1119 0.0077 B 3.3431 0.1880 (INTERACTION A B) 1.2326 0.5399 MODELA > (send modela :display) Least Squares Estimates: Constant 0.867491 (0.308863) ((A 2) (B 2)) -0.481680 (0.621703) ((A 2) (B 3)) -0.666604 (0.624665) (B 2) 0.782557 (0.441825) (B 3) 0.538866 (0.444595) (A 2) 1.16702 (0.437606) X 1.30912 (0.155051) R Squared: 0.634662 Sigma hat: 0.976272 Number of cases: 60 Degrees of freedom: 53 NIL
> (send modela :display-with-formula) Block Wald Chisq p-value X 71.287 0.0000 A 7.1119 0.0077 B 3.3431 0.1880 (INTERACTION A B) 1.2326 0.5399 NIL > (send modela :display-with-formula nil) Block Wald Chisq p-value X 71.287 0.0000 Variable Estimate Std.Err. p-value X 1.3091 (0.155051) 0.0000 A 7.1119 0.0077 Variable Estimate Std.Err. p-value (A 2) 1.1670 (0.437606) 0.0077 B 3.3431 0.1880 Variable Estimate Std.Err. p-value (B 3) 0.53887 (0.444595) 0.2255 (B 2) 0.78256 (0.441825) 0.0765 (INTERACTION A B) 1.2326 0.5399 Variable Estimate Std.Err. p-value ((A 2) (B 3)) -0.66660 (0.624665) 0.2859 ((A 2) (B 2)) -0.48168 (0.621703) 0.4385 NIL