next up previous
Next: Diagnostics Up: How to use Previous: Fitting a model

Model formulae

It is possible to use a model formula object to include factor (class) and interaction terms more easily in a model. The support is less simple than either SAS or S-PLUS and less complete, allowing interactions but not nesting. The function as-formula creates formula objects which contain a design matrix :design-matrix, a list of column labels :name-list, a list of block (eg factor or interaction) names :block-names and a list of lists of column numbers specifying each block :block-indices. The form of a call to as-formula is

(as-formula '((factor a) (factor b) (term x) (interaction (list a b))
 (interaction (list a x) :is-factor (list t nil))))
where A and B are factor variables and X is a metric (continuous) variable. The keyword argument :is-factor to interaction indicates whether of the terms in the interaction should be treated as factors. It is optional; the default is to treat them all as factors. The reason for this default is partly that it is more common and partly that making a mistake is more likely to be obvious. Note that the argument to (as-formula) must be quoted, as in the example.

The gee-model command and gee-proto object have some built-in recognition of formula objects, such as the ability to use a formula object as the :x argument to gee-model. The file modelformula.lsp which defines the objects also contains methods for regression-model-proto and some more primitive functions which may be useful for creating and manipulating design matrices. In particular the functions interaction, factor and term can be used on their own to create contrast matrices and variable names. More documentation of this is in modelformula.txt. The test file contains some further examples.

The main difference caused by using a model formula argument to gee-model is that the default output changes to report the Wald chisquare and p-value for each block. This is produced by the message :display-with-formula :block-only nil. To get only the test statistics, use the message :display-with-formula :block-only t. To get the output as if a model formula had not been given use :display. The default value of the block-only keyword is controlled by a global variable which is initially nil.

In the example given above can be simplified by the use of model formulae:

> (def karim-model (as-formula '((term age) (factor gender)
 (factor smoker) (factor times))))
KARIM-MODEL
> (def example2 (gee-model :y num-visits :x karim-model :g id 
:error poisson-error :correlation (stat-m-dependence 3) :times times))
 No link specified - canonical link used
Iteration 1: quasideviance = 517.114
Iteration 2: quasideviance = 478.439
Iteration 3: quasideviance = 475.966
Iteration 4: quasideviance = 476.062
Iteration 5: quasideviance = 476.087
Iteration 6: quasideviance = 476.088
Iteration 7: quasideviance = 476.088

GEE Estimates:

Block               Wald Chisq     p-value
Intercept             0.43793      0.5081
      Variable            Estimate       Std.Err.       p-value
     Intercept             0.24092      (0.364061)     0.5081
TIMES                  26.659      0.0000
      Variable            Estimate       Std.Err.       p-value
     (TIMES 4)             -1.1285      (0.221787)     0.0000
     (TIMES 3)            -0.30748      (0.202848)     0.1296
     (TIMES 2)            -0.43532      (0.195649)     0.0261
SMOKER                0.56194      0.4535
      Variable            Estimate       Std.Err.       p-value
     (SMOKER Y)            0.17699      (0.236101)     0.4535
GENDER                 1.0184      0.3129
      Variable            Estimate       Std.Err.       p-value
     (GENDER M)           -0.20550      (0.203637)     0.3129
AGE                   0.23661      0.6267
      Variable            Estimate       Std.Err.       p-value
     AGE                    3.15713E-3  (6.490476E-3)  0.6267

Scale Estimate:            1.83565    
Independence model deviance:   476.088    
Number of cases:               292
Link:                    #<Glim Link Object: LOG-LINK>
Variance function:       Poisson error
Stationary  3 - dependence Working Model: 
 correlation matrix =
#2a(
    (  1.0      0.21      0.19      0.34    )
    ( 0.21       1.0      0.21      0.19    )
    ( 0.19      0.21       1.0      0.21    )
    ( 0.34      0.19      0.21       1.0    )
   )
EXAMPLE2
where GENDER and SMOKER are character versions of the binary numerical variables SEX and SMOKE used in the earlier model. The predictor names are automatically generated from the variable names and values. This is particularly useful when the variable names and values are easily interpretable as in this example. As can be seen here, the reference category for the indicator variables is the first one in alphabetic order (more generally in ASCII order).



next up previous
Next: Diagnostics Up: How to use Previous: Fitting a model



Thomas Lumley
Sun Dec 8 16:10:41 PST 1996