Difference between revisions of "Main Page/Research/Papers/fast food and arterials/AJPH/comments"

From phurvitz
Jump to: navigation, search
 
(7 intermediate revisions by the same user not shown)
Line 7: Line 7:
 
|1
 
|1
 
||"you imply that once road density is considered SES no longer matters, but you do not consider that SES is also highly related to road density, both poverty and Fast Food are attracted to high density road/highway locations."
 
||"you imply that once road density is considered SES no longer matters, but you do not consider that SES is also highly related to road density, both poverty and Fast Food are attracted to high density road/highway locations."
||Hmm, not that SES no longer matters, but that the relationship between fast food restaurant density is much stronger than the relationship between fast food density and poverty (measured either by median hh income or % below poverty).
+
||Hmm, not that SES no longer matters, but that the relationship between fast food restaurant density is stronger than the relationship between fast food density and poverty (measured either by median hh income or % below poverty).
  
 
|-
 
|-
Line 32: Line 32:
 
|1
 
|1
 
||"Discussion I do not follow your argument in the last paragraph (page 11) ‘Physical access to fast food measured by proximity or density may not be the most important determinant of fast food use’ I agree it is not the most important factor, but your study indirectly supports the hypothesis that access IS an important factor. Your study supports the hypothesis that amount of traffic passing by a location is an important factor in determining location of fast food outlet. It also supports that idea that more affluent people live in less dense locations further away from fast food (and nearer white table cloth restaurants?)"
 
||"Discussion I do not follow your argument in the last paragraph (page 11) ‘Physical access to fast food measured by proximity or density may not be the most important determinant of fast food use’ I agree it is not the most important factor, but your study indirectly supports the hypothesis that access IS an important factor. Your study supports the hypothesis that amount of traffic passing by a location is an important factor in determining location of fast food outlet. It also supports that idea that more affluent people live in less dense locations further away from fast food (and nearer white table cloth restaurants?)"
||
+
||The point is that nearly everyone in King County has fairly ready access to fast food. There is no differential in access that explains the difference in obesity across SES. Simple physical access does not necessarily imply use.
 +
Also, we did not test any residential density values; and the less dense areas of KC are not necessarily higher income.
  
 
|-
 
|-
 
|1
 
|1
 
||"Table 4 Should you explain the implications of your findings that %non white is significant in both models and in the opposite direction than what was expected and found in the Pearson correlations (table 3)"
 
||"Table 4 Should you explain the implications of your findings that %non white is significant in both models and in the opposite direction than what was expected and found in the Pearson correlations (table 3)"
||
+
||Eliminated Pearson correlations due to violation of assumptions.
  
 
|-
 
|-
 
|2
 
|2
 
||"By the way, what kind of correlation is there between freeway/arterial density and SES?"
 
||"By the way, what kind of correlation is there between freeway/arterial density and SES?"
||
+
||There is a relationship though not too strong, r =~ 0.5
 +
 
 +
<pre>cor.test(artdnsmi2, medhhinc)
 +
 
 +
        Pearson's product-moment correlation
 +
 
 +
data:  artdnsmi2 and medhhinc
 +
t = -9.523, df = 371, p-value < 2.2e-16
 +
alternative hypothesis: true correlation is not equal to 0
 +
95 percent confidence interval:
 +
-0.5213 -0.3578
 +
sample estimates:
 +
    cor
 +
-0.4432
 +
 
 +
> cor.test(artdnsmi2, pct.below.pov)
 +
 
 +
        Pearson's product-moment correlation
 +
 
 +
data:  artdnsmi2 and pct.below.pov
 +
t = 11.90, df = 371, p-value < 2.2e-16
 +
alternative hypothesis: true correlation is not equal to 0
 +
95 percent confidence interval:
 +
0.4481 0.5955
 +
sample estimates:
 +
  cor
 +
0.5257
 +
</pre>
 +
 
  
 
|-
 
|-
 
|2
 
|2
 
||"The authors should think about the difference between analysis that aims to explain why a situation exists and analysis that aims to describe a situation that might help to explain outcomes of interest. By incorporating freeway/arterial density, they seem to be aiming for the former. But this is a less important question. If there are greater concentrations of fast food outlets in low SES areas, we need to understand the implications of this situation and find ways to respond to it, regardless of the cause."
 
||"The authors should think about the difference between analysis that aims to explain why a situation exists and analysis that aims to describe a situation that might help to explain outcomes of interest. By incorporating freeway/arterial density, they seem to be aiming for the former. But this is a less important question. If there are greater concentrations of fast food outlets in low SES areas, we need to understand the implications of this situation and find ways to respond to it, regardless of the cause."
||
+
||Of course it is important to respond to problems, but understanding the cause of these problems is important as well.  If the reason fast food restaurants are located in certain areas is because of jurisdictional policies (or lack thereof), versus specific targeting of low-SES areas, the response from a policy standpoint could be very different.  Interventions can be designed from either the supply side or the demand side.
  
 
|-
 
|-
 
|2
 
|2
 
||"The one concern I have methodologically is in the use of census tracts for measuring fast food density. The question is whether the density within one’s own census tract is a good measure of one’s access to fast food, when density can vary considerably from tract to tract. If the fast food outlets near me, for example, happen to be on the other side of the arterial street used as a boundary for my census tract, I might have 0 density, while the tract right next door has a high density. An alternative is to use an average density for the tract and its adjacent tracts, weighted or unweighted. This approach presents its own challenges, and doesn’t guarantee a more accurate result, but the authors need to discuss the possibility that the definition of areas (census tracts, in this case) can influence the results. This is an example of the modifiable area unit problem (MAUP), on which there is a rather substantial literature."
 
||"The one concern I have methodologically is in the use of census tracts for measuring fast food density. The question is whether the density within one’s own census tract is a good measure of one’s access to fast food, when density can vary considerably from tract to tract. If the fast food outlets near me, for example, happen to be on the other side of the arterial street used as a boundary for my census tract, I might have 0 density, while the tract right next door has a high density. An alternative is to use an average density for the tract and its adjacent tracts, weighted or unweighted. This approach presents its own challenges, and doesn’t guarantee a more accurate result, but the authors need to discuss the possibility that the definition of areas (census tracts, in this case) can influence the results. This is an example of the modifiable area unit problem (MAUP), on which there is a rather substantial literature."
||
+
||We agree that this is a problem.  At the risk of passing the buck, we designed the study using the same jurisdictional units that have been used in similar studies.  We are working on a method to generate density measures that are less dependent upon census tract boundaries (KDE paper).
  
 
|-
 
|-
 
|2
 
|2
 
||"I was also curious about the measure of distance from each dwelling to the nearest fast food restaurant. This is a much better measure of accessibility than average density for a tract. Yet the authors seem to use this solely for descriptive purposes. Is there some way to use this measure as a dependent variable? What if you use the average distance for all residences within a tract, rather than tract density? This would largely eliminate the problem with the census tract density measure noted above."
 
||"I was also curious about the measure of distance from each dwelling to the nearest fast food restaurant. This is a much better measure of accessibility than average density for a tract. Yet the authors seem to use this solely for descriptive purposes. Is there some way to use this measure as a dependent variable? What if you use the average distance for all residences within a tract, rather than tract density? This would largely eliminate the problem with the census tract density measure noted above."
||
+
||Interesting.  I ran the multiple linear regression stats using the mean distance and the general pattern in results is the same.  The problem with the mean distance value is that the GIS calculates the distance only to the '''closest''' restaurant.  What we'd need to do to refine this would be to get the number of ff restaurants within a certain distance of each parcel.  But this distance would be arbitrary.  We can achieve the same result with the KDE measure but in a more straightforward manner.
  
 
|-
 
|-
 
|3
 
|3
 
||"Table 2 reports the descriptive statistics for the variables used in the analysis reports two fast food density variables: restaurants per square mile and restaurants per 1000 persons. It is not clear which of these was used as the dependent variable in the reported models. Also, this leads to the question: Do the results differ depending on the rate variable that is used?"
 
||"Table 2 reports the descriptive statistics for the variables used in the analysis reports two fast food density variables: restaurants per square mile and restaurants per 1000 persons. It is not clear which of these was used as the dependent variable in the reported models. Also, this leads to the question: Do the results differ depending on the rate variable that is used?"
||
+
||We used area-based density.  The results do differ when population-based density is used.  Percent nonwhite becomes significant in the models.  Median household income becomes more significant than arterial density.
  
 
|-
 
|-
 
|3
 
|3
 
||"OLS regression is not the most appropriate estimation method for the reported models. Of the 373 census tracts in King County, WA, 196 do not have a fast food restaurant. The first set of OLS models reported in Table 4 include all tracts. Therefore, over half of the observations in these first 3 models have “0” for the dependent variable. OLS regression is not the most appropriate technique in this instance. The large number of zeroes in the dependent variable likely leads to violations of OLS assumptions. It is more appropriate to treat the dependent variables as counts of the number of fast food restaurants and estimate a countdata model. More appropriate estimation techniques include: Poisson regression, zero-inflated Poisson regression, and negative binomial regression."
 
||"OLS regression is not the most appropriate estimation method for the reported models. Of the 373 census tracts in King County, WA, 196 do not have a fast food restaurant. The first set of OLS models reported in Table 4 include all tracts. Therefore, over half of the observations in these first 3 models have “0” for the dependent variable. OLS regression is not the most appropriate technique in this instance. The large number of zeroes in the dependent variable likely leads to violations of OLS assumptions. It is more appropriate to treat the dependent variables as counts of the number of fast food restaurants and estimate a countdata model. More appropriate estimation techniques include: Poisson regression, zero-inflated Poisson regression, and negative binomial regression."
||
+
||We are using a negative binomial regression model in this revision.
  
 
|-
 
|-
 
|3
 
|3
 
||"It is not clear what is gained by estimating a set of models for only tracts with fast food restaurants. By sampling on the dependent variable, and thereby excluding over half of the observations, the utility of these models is not clear. At least provide a more complete justification for doing this. There is still concern that OLS may not be the most appropriate model and that a count-data model would be more appropriate for these models, as well. Justification needs to be provided for using OLS in this context."
 
||"It is not clear what is gained by estimating a set of models for only tracts with fast food restaurants. By sampling on the dependent variable, and thereby excluding over half of the observations, the utility of these models is not clear. At least provide a more complete justification for doing this. There is still concern that OLS may not be the most appropriate model and that a count-data model would be more appropriate for these models, as well. Justification needs to be provided for using OLS in this context."
||
+
||In this revision we use the negative binomial distribution of GLM, which allows us to use all tracts.
  
 
|-
 
|-
 
|3
 
|3
 
||"Residual diagnostics should be reported for the OLS models. Examine the normality of the residuals (this is especially important due to the relatively small number of observations and for what is likely skewed dependent variables). While Pearson’s r is reported for some variables to assess collinearity, this is not a sufficient diagnostic in a multivariate context. Instead, report variance inflation factors (VIFs) or tolerance statistics."
 
||"Residual diagnostics should be reported for the OLS models. Examine the normality of the residuals (this is especially important due to the relatively small number of observations and for what is likely skewed dependent variables). While Pearson’s r is reported for some variables to assess collinearity, this is not a sufficient diagnostic in a multivariate context. Instead, report variance inflation factors (VIFs) or tolerance statistics."
||
+
||No OLS models are used.
  
 
|-
 
|-
 
|3
 
|3
 
||"It appears that correlation coefficients were first used to reduce the number of variables to be included in the model. Avoid this stepwise-like approach. Instead, identify the variables that are relevant based on previous research, estimate the models with these, then maybe reduce the model based on concerns such as collinearity."
 
||"It appears that correlation coefficients were first used to reduce the number of variables to be included in the model. Avoid this stepwise-like approach. Instead, identify the variables that are relevant based on previous research, estimate the models with these, then maybe reduce the model based on concerns such as collinearity."
||
+
||Correlation coefficients are not used in this revision.
  
 
|-
 
|-
 
|3
 
|3
 
||"Not sure what the utility is of Table 3. Why report correlation coefficients for independent variables with the dependent variable when multivariate models are also reported?"
 
||"Not sure what the utility is of Table 3. Why report correlation coefficients for independent variables with the dependent variable when multivariate models are also reported?"
||
+
||Correlation coefficients are not used in this revision.
  
 
|-
 
|-
 
|3
 
|3
 
||"Be careful in asserting the amount of variation in the dependent variable accounted for by the inclusion of the SES variables. While the R2 value increases a small amount between models #2 and #3, it is not appropriate to imply that the SES variables account for little variation. Although the author does not state this directly, the discussion of the change in R2 between the two models could be interpreted this way. I suspect that arterial and freeway density is correlated with the SES variables. Thus, all that can be stated is that the unique influence of the SES variables is small. It cannot be assumed that arterial and freeway density accounts for all of the variance in the dependent variable identified by the R2 in model #2."
 
||"Be careful in asserting the amount of variation in the dependent variable accounted for by the inclusion of the SES variables. While the R2 value increases a small amount between models #2 and #3, it is not appropriate to imply that the SES variables account for little variation. Although the author does not state this directly, the discussion of the change in R2 between the two models could be interpreted this way. I suspect that arterial and freeway density is correlated with the SES variables. Thus, all that can be stated is that the unique influence of the SES variables is small. It cannot be assumed that arterial and freeway density accounts for all of the variance in the dependent variable identified by the R2 in model #2."
||
+
||Noted, good point.  Since we are using NB-GLM, there is no analog for R^2, so we are not reporting this.
  
 
|-
 
|-
 
|3
 
|3
 
||"Does the title make sense? Explaining more fully how “fast cars” relates to the analysis."
 
||"Does the title make sense? Explaining more fully how “fast cars” relates to the analysis."
||
+
||The title was intended to be catchy.  We did not measure speed.  Title has been reverted to a more descriptive one.
  
 
|-
 
|-
 
|3
 
|3
 
||"Are Figures 1-3 necessary? If so, they should be discussed more fully in the manuscript."
 
||"Are Figures 1-3 necessary? If so, they should be discussed more fully in the manuscript."
||
+
||A little more discussion in the text.
  
 
|-
 
|-
 
|Solet
 
|Solet
||Okay, here goes, based on a fairly quick read. First off, I think the analysis has some problems in terms of how race is modeled in measuring "exposure" and in understanding the limitations of census data. (1) Using percent nonwhite as an independent variable may result in major misclassification. For King County, the Asian-PI population is the largest minority, greater than twice that for African Americans, the next largest group. I'm not sure what the correlation of A-PI is with poverty or the use of fast food restaurants, but obesity for A-PIs is lower than that of whites and diabetes deaths, while elevated in A-PIs compared to whites, is substantially lower than for African Americans. This could be resolved by using dummy variables in the regression model for each race, or at least the top three (white, API and AA and other)--either that or explain why not. (2) Using tract-level census measures of median hh income and percent living below the poverty level as independent variables without any adjustment is flawed because both measures are based on a sample and have a variance that doesn't seem to be accounted for in the regression model. Instead, they are treated as if they were measures of individuals, rather than area-based aggregates or measures of central tendency based on a sample. The Census has documentation on how to calculate variances for these measures and there are various methods for including the variance in regression models. I am certainly not an expert on that subject. This is the sort of thing I would consult a biostatistician about. Problem 1 may have the effect of diluting the association between race and ff density. Problem 2 may have the effect of biasing high the reported association between SES measures like HH income and poverty with ff density.
+
||Using percent nonwhite as an independent variable may result in major misclassification. For King County, the Asian-PI population is the largest minority, greater than twice that for African Americans, the next largest group. I'm not sure what the correlation of A-PI is with poverty or the use of fast food restaurants, but obesity for A-PIs is lower than that of whites and diabetes deaths, while elevated in A-PIs compared to whites, is substantially lower than for African Americans. This could be resolved by using dummy variables in the regression model for each race, or at least the top three (white, API and AA and other)--either that or explain why not.  
||
+
||We had decided early on to separate this into white/nonwhite due to the small proportion of various sub-minority classes.  Does it make sense to do this given the small proportion of minority sub-classes?
 +
|-
 +
|Solet
 +
||Using tract-level census measures of median hh income and percent living below the poverty level as independent variables without any adjustment is flawed because both measures are based on a sample and have a variance that doesn't seem to be accounted for in the regression model. Instead, they are treated as if they were measures of individuals, rather than area-based aggregates or measures of central tendency based on a sample. The Census has documentation on how to calculate variances for these measures and there are various methods for including the variance in regression models. I am certainly not an expert on that subject. This is the sort of thing I would consult a biostatistician about. Problem 1 may have the effect of diluting the association between race and ff density. Problem 2 may have the effect of biasing high the reported association between SES measures like HH income and poverty with ff density.
 +
||This does not appear to be straightforward.  And who ''is'' the biostatistician on our team, anyway?  So if we bias high SES, and arterial density is still a stronger predictor, our point is made.
  
 
|-
 
|-
 
|Solet
 
|Solet
 
||I'm no GIS expert, but I also question aggregating, and including as an independent variable, the arterial density of a tract. Since the ff establishments are geocoded, why not calculate something like the number of ff establishments located on arterials? This probably needs to be adjusted in some way--maybe use percent of ff restaurants located on arterials?--but it just seems like a more precise measure than arterial density. Maybe you've already thought about that and discarded it for some reason. I also wondered whether there was some way of incorporating the individual parcel data though some kind of mixed model, rather than aggregating them to a central tendency measure. But my mind is not completing that jump right now.
 
||I'm no GIS expert, but I also question aggregating, and including as an independent variable, the arterial density of a tract. Since the ff establishments are geocoded, why not calculate something like the number of ff establishments located on arterials? This probably needs to be adjusted in some way--maybe use percent of ff restaurants located on arterials?--but it just seems like a more precise measure than arterial density. Maybe you've already thought about that and discarded it for some reason. I also wondered whether there was some way of incorporating the individual parcel data though some kind of mixed model, rather than aggregating them to a central tendency measure. But my mind is not completing that jump right now.
||
+
||We are looking at tract-level variables for everything.  It's all aggregated. 
  
 
|-
 
|-
 
|Solet
 
|Solet
||I have a few other smaller comments: (1) Are freeways like I5 and SR99 (below 65th St.), which have 0 probability of being a site for ff restaurants, included in the calculation of arterial density? And (2) Why not use census block group as the geographic unit of analysis, since because it is smaller it may have more sensitivity to changes in SES, etc. (although aggregate measures based on bgp may also have a larger variance). There may be a good reason for doing this...it's just not stated and it looks arbitrary.
+
||Are freeways like I5 and SR99 (below 65th St.), which have 0 probability of being a site for ff restaurants, included in the calculation of arterial density?  
||
+
||Yes, though I think these locations are a minority.  The majority of highways and arterials outside of the densest urban areas will have FF.
 +
 
 +
|-
 +
|Solet
 +
||Why not use census block group as the geographic unit of analysis, since because it is smaller it may have more sensitivity to changes in SES, etc. (although aggregate measures based on bgp may also have a larger variance). There may be a good reason for doing this...it's just not stated and it looks arbitrary.
 +
||As far as I knew, not all of the census variables were available at the BG level.  Again, we are following the same census units that other researchers have used.
 +
 
 
|}
 
|}

Latest revision as of 01:01, 27 January 2009

reviewer comment PMH response
1 "you imply that once road density is considered SES no longer matters, but you do not consider that SES is also highly related to road density, both poverty and Fast Food are attracted to high density road/highway locations." Hmm, not that SES no longer matters, but that the relationship between fast food restaurant density is stronger than the relationship between fast food density and poverty (measured either by median hh income or % below poverty).
1 "Your findings also do not conflict with previous work that suggests that access to fast food is a factor in obesity prevalence; in fact your study supports this hypothesis." We do not claim that our findings conflict with with previous work. However, if FF access were a factor in obesity, we would expect most people in King County to be obese, since >90% of residences have fast food within 2 miles.
1 "Page 6 describe how distance from residence to fast food is determined – crow fly or network distance" will clarify that this is Euclidean distance
1 "Level of poverty (7-8%) with such a low portion of the sample in this category would it be better to use median household income rather than % below poverty in your models? (see page 9)" Will use median household income instead. Distribution is nearly normal, where poverty is skewed.
1 "Page 8 Table 3 and the text do not match, the last line on page 8 says ‘percent non white was again not significantly associated, but in the table % non white for all tracts was listed at p=.046. Also, should pop density be included in the table, it is described in the text as if it were also in the table? As we are dropping Pearson correlation due to zero inflation, this is no longer an issue.
1 "Discussion I do not follow your argument in the last paragraph (page 11) ‘Physical access to fast food measured by proximity or density may not be the most important determinant of fast food use’ I agree it is not the most important factor, but your study indirectly supports the hypothesis that access IS an important factor. Your study supports the hypothesis that amount of traffic passing by a location is an important factor in determining location of fast food outlet. It also supports that idea that more affluent people live in less dense locations further away from fast food (and nearer white table cloth restaurants?)" The point is that nearly everyone in King County has fairly ready access to fast food. There is no differential in access that explains the difference in obesity across SES. Simple physical access does not necessarily imply use.

Also, we did not test any residential density values; and the less dense areas of KC are not necessarily higher income.

1 "Table 4 Should you explain the implications of your findings that %non white is significant in both models and in the opposite direction than what was expected and found in the Pearson correlations (table 3)" Eliminated Pearson correlations due to violation of assumptions.
2 "By the way, what kind of correlation is there between freeway/arterial density and SES?" There is a relationship though not too strong, r =~ 0.5
cor.test(artdnsmi2, medhhinc)

        Pearson's product-moment correlation

data:  artdnsmi2 and medhhinc 
t = -9.523, df = 371, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0 
95 percent confidence interval:
 -0.5213 -0.3578 
sample estimates:
    cor 
-0.4432 

> cor.test(artdnsmi2, pct.below.pov)

        Pearson's product-moment correlation

data:  artdnsmi2 and pct.below.pov 
t = 11.90, df = 371, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0 
95 percent confidence interval:
 0.4481 0.5955 
sample estimates:
   cor 
0.5257 


2 "The authors should think about the difference between analysis that aims to explain why a situation exists and analysis that aims to describe a situation that might help to explain outcomes of interest. By incorporating freeway/arterial density, they seem to be aiming for the former. But this is a less important question. If there are greater concentrations of fast food outlets in low SES areas, we need to understand the implications of this situation and find ways to respond to it, regardless of the cause." Of course it is important to respond to problems, but understanding the cause of these problems is important as well. If the reason fast food restaurants are located in certain areas is because of jurisdictional policies (or lack thereof), versus specific targeting of low-SES areas, the response from a policy standpoint could be very different. Interventions can be designed from either the supply side or the demand side.
2 "The one concern I have methodologically is in the use of census tracts for measuring fast food density. The question is whether the density within one’s own census tract is a good measure of one’s access to fast food, when density can vary considerably from tract to tract. If the fast food outlets near me, for example, happen to be on the other side of the arterial street used as a boundary for my census tract, I might have 0 density, while the tract right next door has a high density. An alternative is to use an average density for the tract and its adjacent tracts, weighted or unweighted. This approach presents its own challenges, and doesn’t guarantee a more accurate result, but the authors need to discuss the possibility that the definition of areas (census tracts, in this case) can influence the results. This is an example of the modifiable area unit problem (MAUP), on which there is a rather substantial literature." We agree that this is a problem. At the risk of passing the buck, we designed the study using the same jurisdictional units that have been used in similar studies. We are working on a method to generate density measures that are less dependent upon census tract boundaries (KDE paper).
2 "I was also curious about the measure of distance from each dwelling to the nearest fast food restaurant. This is a much better measure of accessibility than average density for a tract. Yet the authors seem to use this solely for descriptive purposes. Is there some way to use this measure as a dependent variable? What if you use the average distance for all residences within a tract, rather than tract density? This would largely eliminate the problem with the census tract density measure noted above." Interesting. I ran the multiple linear regression stats using the mean distance and the general pattern in results is the same. The problem with the mean distance value is that the GIS calculates the distance only to the closest restaurant. What we'd need to do to refine this would be to get the number of ff restaurants within a certain distance of each parcel. But this distance would be arbitrary. We can achieve the same result with the KDE measure but in a more straightforward manner.
3 "Table 2 reports the descriptive statistics for the variables used in the analysis reports two fast food density variables: restaurants per square mile and restaurants per 1000 persons. It is not clear which of these was used as the dependent variable in the reported models. Also, this leads to the question: Do the results differ depending on the rate variable that is used?" We used area-based density. The results do differ when population-based density is used. Percent nonwhite becomes significant in the models. Median household income becomes more significant than arterial density.
3 "OLS regression is not the most appropriate estimation method for the reported models. Of the 373 census tracts in King County, WA, 196 do not have a fast food restaurant. The first set of OLS models reported in Table 4 include all tracts. Therefore, over half of the observations in these first 3 models have “0” for the dependent variable. OLS regression is not the most appropriate technique in this instance. The large number of zeroes in the dependent variable likely leads to violations of OLS assumptions. It is more appropriate to treat the dependent variables as counts of the number of fast food restaurants and estimate a countdata model. More appropriate estimation techniques include: Poisson regression, zero-inflated Poisson regression, and negative binomial regression." We are using a negative binomial regression model in this revision.
3 "It is not clear what is gained by estimating a set of models for only tracts with fast food restaurants. By sampling on the dependent variable, and thereby excluding over half of the observations, the utility of these models is not clear. At least provide a more complete justification for doing this. There is still concern that OLS may not be the most appropriate model and that a count-data model would be more appropriate for these models, as well. Justification needs to be provided for using OLS in this context." In this revision we use the negative binomial distribution of GLM, which allows us to use all tracts.
3 "Residual diagnostics should be reported for the OLS models. Examine the normality of the residuals (this is especially important due to the relatively small number of observations and for what is likely skewed dependent variables). While Pearson’s r is reported for some variables to assess collinearity, this is not a sufficient diagnostic in a multivariate context. Instead, report variance inflation factors (VIFs) or tolerance statistics." No OLS models are used.
3 "It appears that correlation coefficients were first used to reduce the number of variables to be included in the model. Avoid this stepwise-like approach. Instead, identify the variables that are relevant based on previous research, estimate the models with these, then maybe reduce the model based on concerns such as collinearity." Correlation coefficients are not used in this revision.
3 "Not sure what the utility is of Table 3. Why report correlation coefficients for independent variables with the dependent variable when multivariate models are also reported?" Correlation coefficients are not used in this revision.
3 "Be careful in asserting the amount of variation in the dependent variable accounted for by the inclusion of the SES variables. While the R2 value increases a small amount between models #2 and #3, it is not appropriate to imply that the SES variables account for little variation. Although the author does not state this directly, the discussion of the change in R2 between the two models could be interpreted this way. I suspect that arterial and freeway density is correlated with the SES variables. Thus, all that can be stated is that the unique influence of the SES variables is small. It cannot be assumed that arterial and freeway density accounts for all of the variance in the dependent variable identified by the R2 in model #2." Noted, good point. Since we are using NB-GLM, there is no analog for R^2, so we are not reporting this.
3 "Does the title make sense? Explaining more fully how “fast cars” relates to the analysis." The title was intended to be catchy. We did not measure speed. Title has been reverted to a more descriptive one.
3 "Are Figures 1-3 necessary? If so, they should be discussed more fully in the manuscript." A little more discussion in the text.
Solet Using percent nonwhite as an independent variable may result in major misclassification. For King County, the Asian-PI population is the largest minority, greater than twice that for African Americans, the next largest group. I'm not sure what the correlation of A-PI is with poverty or the use of fast food restaurants, but obesity for A-PIs is lower than that of whites and diabetes deaths, while elevated in A-PIs compared to whites, is substantially lower than for African Americans. This could be resolved by using dummy variables in the regression model for each race, or at least the top three (white, API and AA and other)--either that or explain why not. We had decided early on to separate this into white/nonwhite due to the small proportion of various sub-minority classes. Does it make sense to do this given the small proportion of minority sub-classes?
Solet Using tract-level census measures of median hh income and percent living below the poverty level as independent variables without any adjustment is flawed because both measures are based on a sample and have a variance that doesn't seem to be accounted for in the regression model. Instead, they are treated as if they were measures of individuals, rather than area-based aggregates or measures of central tendency based on a sample. The Census has documentation on how to calculate variances for these measures and there are various methods for including the variance in regression models. I am certainly not an expert on that subject. This is the sort of thing I would consult a biostatistician about. Problem 1 may have the effect of diluting the association between race and ff density. Problem 2 may have the effect of biasing high the reported association between SES measures like HH income and poverty with ff density. This does not appear to be straightforward. And who is the biostatistician on our team, anyway? So if we bias high SES, and arterial density is still a stronger predictor, our point is made.
Solet I'm no GIS expert, but I also question aggregating, and including as an independent variable, the arterial density of a tract. Since the ff establishments are geocoded, why not calculate something like the number of ff establishments located on arterials? This probably needs to be adjusted in some way--maybe use percent of ff restaurants located on arterials?--but it just seems like a more precise measure than arterial density. Maybe you've already thought about that and discarded it for some reason. I also wondered whether there was some way of incorporating the individual parcel data though some kind of mixed model, rather than aggregating them to a central tendency measure. But my mind is not completing that jump right now. We are looking at tract-level variables for everything. It's all aggregated.
Solet Are freeways like I5 and SR99 (below 65th St.), which have 0 probability of being a site for ff restaurants, included in the calculation of arterial density? Yes, though I think these locations are a minority. The majority of highways and arterials outside of the densest urban areas will have FF.
Solet Why not use census block group as the geographic unit of analysis, since because it is smaller it may have more sensitivity to changes in SES, etc. (although aggregate measures based on bgp may also have a larger variance). There may be a good reason for doing this...it's just not stated and it looks arbitrary. As far as I knew, not all of the census variables were available at the BG level. Again, we are following the same census units that other researchers have used.