*****************************************
***** comments on exercise #4 *****
*****************************************
In question 2 we are looking at both the DUMMY VARIABLE model that
is given in part (b) and the "GROUPED LINEAR" model that is given in
part (a). Our first objective is to verify that we can interpret
each of these models, and then by doing some work with part (c) we
can see that the linear model is actually nested within the dummy
variable model. If that's true, then we can check the adequacy of
the linear model by comparing the log likelihood for the linear model
to the log likelihood for the dummy variable model. This becomes
a useful practical result (ie. a basis for checking the linear
assumption of the model).
First, note that the model we fit in (a) uses the variable ALC
directly and has just two regression parameters:
logit[ pi(X) ] = beta0 + beta1*ALC
which implies the fitted log odds:
ALC=1 beta0 + beta1*(1)
ALC=2 beta0 + beta1*(2)
ALC=3 beta0 + beta1*(3)
ALC=4 beta0 + beta1*(4)
From this we should be able to state an interpretation for beta1 -- this is
a log odds ratio comparing...
Second, In order to interpret the dummy variable model we can make a
similar table (or see the lecture notes on page 202 for a related but
more involved model).
The model in part (c) is an unusual model that we rarely would entertain
in practice, but by considering it we should be able to "see" that indeed
the linear model is nested within the dummy variable model. It's pretty
clear that the linear model is a special case of model (c) -- just determine
what coefficients have to be set equal to zero. If we can also show that
this model "is equivalent to" the dummy variable model then we have shown
model (a) is nested within model (c) which equals model (b),
therefore, model (a) is nested within model (b).
In order to see that models (c) and (b) are equivalent consider evaluating
what the fitted log odds are for each model:
fitted log odds for (b) fitted log odds for (c)
ALC=1
ALC=2
ALC=3
ALC=4
Also, look at the value of the log likelihoods, a global summary of fit
to the observed data. What do you notice in these comparisons?
STATA details:
* In order to fit model (c) you'll need to create the dummy variables
for ALC(3) and ALC(4). You can use either:
generate alc3 = (alc==3)
or
generate alc3 = alc
recode alc3 1=0 2=0 3=1 4=0