Regression models

Generalized linear models, including the linear model, are estimated by svyglm. This has almost the same arguments as glm, the difference being that the data argument to glm is replaced by a design argument to svyglm. Similarly, svycoxph fits Cox models to survey data.

In this example we use the dclus2 two-stage cluster sample from the California Academic Performance Index, which was created in an earlier example. The syntax and options for svyglm are the same for designs with and without replicate weights.

The outcome variable is 2000 API, predicted by the proportions of students learning English (ell), receiving subsidized means (means) and having moved to the school within the past year (mobility). This is a linear regression model, so no family argument to svyglm is needed.

 > summary(svyglm(api00 ~ ell + meals + mobility, design = dclus2))

 Call:
 svyglm.survey.design(formula = api00 ~ ell + meals + mobility,
     design = dclus2)

 Survey design:
 svydesign(id = ~dnum + snum, weights = ~pw, data = apiclus2)

 Coefficients:
             Estimate Std. Error t value Pr(>|t|)
 (Intercept) 811.4907    30.8795  26.279   <2e-16 ***
 ell          -2.0592     1.4075  -1.463    0.146
 meals        -1.7772     1.1053  -1.608    0.110
 mobility      0.3253     0.5305   0.613    0.541
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 (Dispersion parameter for gaussian family taken to be 8296.727)

 Number of Fisher Scoring iterations: 2

A useful property of regression models is that they provide another way to get domain estimates. Suppose we want the mean of api00 for each school type:

 >  summary(svyglm(api00~stype-1, dclus2))

 Call:
 svyglm.survey.design(formula = api00 ~ stype - 1, design = dclus2)

 Survey design:
 svydesign(id = ~dnum + snum, weights = ~pw, data = apiclus2)

 Coefficients:
        Estimate Std. Error t value Pr(>|t|)
 stypeE   692.81      30.28   22.88   <2e-16 ***
 stypeH   598.34      16.96   35.27   <2e-16 ***
 stypeM   642.35      45.34   14.17   <2e-16 ***
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 
 (Dispersion parameter for gaussian family taken to be 17389.33)

 Number of Fisher Scoring iterations: 2

 > svyby(~api00,~stype,dclus2,svymean, keep.var=TRUE)
   stype statistic.api00       SE
 E     E        692.8104 30.28244
 H     H        598.3407 16.96500
 M     M        642.3520 45.34363

This equivalence helps in thinking about domain estimators and how they handle more complex designs.

Thomas Lumley

Last modified: Mon Jun 13 15:43:36 PDT 2005