***************************************** ***** comments on exercise #4 ***** ***************************************** In question 2 we are looking at both the DUMMY VARIABLE model that is given in part (b) and the "GROUPED LINEAR" model that is given in part (a). Our first objective is to verify that we can interpret each of these models, and then by doing some work with part (c) we can see that the linear model is actually nested within the dummy variable model. If that's true, then we can check the adequacy of the linear model by comparing the log likelihood for the linear model to the log likelihood for the dummy variable model. This becomes a useful practical result (ie. a basis for checking the linear assumption of the model). First, note that the model we fit in (a) uses the variable ALC directly and has just two regression parameters: logit[ pi(X) ] = beta0 + beta1*ALC which implies the fitted log odds: ALC=1 beta0 + beta1*(1) ALC=2 beta0 + beta1*(2) ALC=3 beta0 + beta1*(3) ALC=4 beta0 + beta1*(4) From this we should be able to state an interpretation for beta1 -- this is a log odds ratio comparing... Second, In order to interpret the dummy variable model we can make a similar table (or see the lecture notes on page 202 for a related but more involved model). The model in part (c) is an unusual model that we rarely would entertain in practice, but by considering it we should be able to "see" that indeed the linear model is nested within the dummy variable model. It's pretty clear that the linear model is a special case of model (c) -- just determine what coefficients have to be set equal to zero. If we can also show that this model "is equivalent to" the dummy variable model then we have shown model (a) is nested within model (c) which equals model (b), therefore, model (a) is nested within model (b). In order to see that models (c) and (b) are equivalent consider evaluating what the fitted log odds are for each model: fitted log odds for (b) fitted log odds for (c) ALC=1 ALC=2 ALC=3 ALC=4 Also, look at the value of the log likelihoods, a global summary of fit to the observed data. What do you notice in these comparisons? STATA details: * In order to fit model (c) you'll need to create the dummy variables for ALC(3) and ALC(4). You can use either: generate alc3 = (alc==3) or generate alc3 = alc recode alc3 1=0 2=0 3=1 4=0