next up previous
Next: Primitive functions used Up: Tools for modelling categorical Previous: Object oriented methods

Individual functions

The (formula) function takes a quoted list describing a model as its argument and returns (design-matrix names block-names block-indices) where design-matrix is a design matrix, names is a list of names for its columns, block-names is a list of names for multicolumn model terms such as categorical predictors and block-indices is a list of lists of integers describing which columns of the design matrix comprise which blocks.

The canonical form for input is like this

(formula '((factor a) (factor b) (term x) (interaction (list a b)) (interaction (list a x) :is-factor (list t nil))))

The function extracts names from the quoted list so if you try something clever to save typing like storing (interaction (list a x) :is-factor (list t nil)) in a variable you may get silly results. Don't do that, then.

You can change the default variable names (though not the default block names) using the relevant keyword arguments in factor and interaction. See the documentation for the functions (below) and the example to find out how. You need to specify the new name everywhere in order to make the labels come out consistent. It's probably easier to use the right variable name to begin with.

Note that the :is-factor keyword in (interaction) is optional but that the default is to assume everything is a factor. If one of the variables is continuous and you forget :is-factor the results will be obviously wrong (even if you don't run out of stack space). This is the best I can do with a function that accepts lists of numbers as input.

(formula-display) takes a vector of coefficient estimates, a covariance matrix and the output of (formula) and displays the Wald chisquare test for each block (factor, term or interaction), with or without the separate estimates and p-values for each variable.

(formula-display beta cov-mat model-formula &key (block-only t) (intercept t)) beta is the coefficient estimates, cov-mat is the covariance matrix, model-formula is the output of the original formula command. Set block-only to nil to get tests for each separate variable. If the model does not have an intercept set intercept to nil. It is obviously better to have this installed as a method for your regression objects.

Example:
> (def a (repeat (iseq 1 2) 30))
A
>
(def b (repeat (iseq 1 3) 20))
B
>(def x (normal-rand 60))
X
> (def y (+ x a (normal-rand 60)))
Y
> (def result (formula '((term x) (factor a) (factor b)
 (interaction (list a b)))))
RESULT

>  (def model1 (regression-model (first result) y :predictor-names 
(second result)))

Least Squares Estimates:

Constant                  0.867491      (0.308863)
((A 2) (B 2))            -0.481680      (0.621703)
((A 2) (B 3))            -0.666604      (0.624665)
(B 2)                     0.782557      (0.441825)
(B 3)                     0.538866      (0.444595)
(A 2)                      1.16702      (0.437606)
X                          1.30912      (0.155051)

R Squared:                0.634662    
Sigma hat:                0.976272    
Number of cases:                60
Degrees of freedom:             53


MODEL1

>  (def model1.cov (* (^ (send model1 :sigma-hat) 2) (send model1 :xtxinv)))
MODEL1.COV
> (formula-display (send model1 :coef-estimates) model1.cov result
 :block-only nil)
Block               Wald Chisq     p-value
X                      71.287      0.0000
      Variable            Estimate       Std.Err.       p-value
     X                      1.3091      (0.155051)     0.0000
A                      7.1119      0.0077
      Variable            Estimate       Std.Err.       p-value
     (A 2)                  1.1670      (0.437606)     0.0077
B                      3.3431      0.1880
      Variable            Estimate       Std.Err.       p-value
     (B 3)                 0.53887      (0.444595)     0.2255
     (B 2)                 0.78256      (0.441825)     0.0765
(INTERACTION A B)      1.2326      0.5399
      Variable            Estimate       Std.Err.       p-value
     ((A 2) (B 3))        -0.66660      (0.624665)     0.2859
     ((A 2) (B 2))        -0.48168      (0.621703)     0.4385

> (formula-display (send model1 :coef-estimates) model1.cov result)

Block               Wald Chisq     p-value
X                      71.287      0.0000
A                      7.1119      0.0077
B                      3.3431      0.1880
(INTERACTION A B)      1.2326      0.5399
NIL

Now we recode the predictors and relabel.

>  (def a (repeat (list "Male" "Female") 30))
A
>  (def b (repeat (list "Never" "Past" "Current") 20))
B
> (def result1 (formula '((term x :namebase "Age") (factor a :namebase "Sex") 
(factor b :namebase "Smoking") (interaction (list a b) 
:namebases (list "Sex" "Smoking")))))
RESULT1
> (formula-display (send model1 :coef-estimates) model1.cov result1)

Block               Wald Chisq     p-value
Age                    71.287      0.0000
Sex                    7.1119      0.0077
Smoking                3.3431      0.1880
(INTERACTION Sex Smoking)    1.2326     0.5399
NIL

> (formula-display (send model1 :coef-estimates) model1.cov result1 
:block-only nil)
Block               Wald Chisq     p-value
Age                    71.287      0.0000
      Variable            Estimate       Std.Err.       p-value
     Age                    1.3091      (0.155051)     0.0000
Sex                    7.1119      0.0077
      Variable            Estimate       Std.Err.       p-value
     (Sex Male)             1.1670      (0.437606)     0.0077
Smoking                3.3431      0.1880
      Variable            Estimate       Std.Err.       p-value
     (Smoking Past)        0.53887      (0.444595)     0.2255
     (Smoking Never)       0.78256      (0.441825)     0.0765
(INTERACTION Sex Smoking)    1.2326     0.5399
      Variable            Estimate       Std.Err.       p-value
     ((Sex Male) (Smoking Past))  -0.66660     (0.624665) 0.2859
     ((Sex Male) (Smoking Never))  -0.48168     (0.621703) 0.4385
NIL



next up previous
Next: Primitive functions used Up: Tools for modelling categorical Previous: Object oriented methods



Thomas Lumley
Sat Apr 20 15:07:20 PDT 1996