Main Page/Research/Papers/fast food and arterials/Method

From phurvitz
Jump to: navigation, search

Method for fast food restaurants (FFRs) & arterial densities paper

Data Sources

  1. Roadway lines (KCGIS)
  2. Food source points (UFL classification of PHSKC food source data).
  3. Census tracts (US Census)
  4. Parcel polygons (KCGIS)

GIS Analysis

Arterials/Freeways

  1. Reselect road type != minor & local
  2. Identity or Intersect with tracts
  3. Summarize for total length of Art/Fwy length per tract

FFR

  1. Reselect food source points where L4 = fast food
  2. Identity with tracts
  3. Summarize to get count of FFR per tract

Parcels

  1. Reselect residential parcels (num_units > 0)
  2. Convert to centroids
  3. Identity or spatial join with tracts to get STFID on each parcel centroid point

FFR distance to Residential Parcel Centroids

Calculating the network distance between sets of points is processing and time intensive. It is possible to perform a similar operation that uses the CostDistance function in raster analysis.

  1. Convert the road line dataset to a raster using a relatively small cell size (3 m or 10 ft).
  2. Create a raster where the value of the road cells equals 1 and the rest of the cells are no data. This creates a "flow network."
  3. Create a line data set from the rasterized road line data. This is necessary to have a line data set for "snapping" source points to the raster representation of the road lines.
  4. A point data set is used as the source for network distance.
    1. Use the Generate Near Table ArcToolbox tool to create a table containing the XY locations of the nearest location on the vector -> raster -> vector line data set. Use the option to get location. This results in an XY coordinate for each point, representing the location on the line that is closest to the point.
    2. Display XY data to create a set of points. Export as a feature class.
    Now we have a point data set representing the sources, where the locations are "snapped" to the raster representation of the road network.
  5. Use the Costdistance ArcToolbox tool using the point data as the source and the rasterized "flow network."
    Now we have a raster that represents the network distance for each cell to the closest point data source.
  6. Convert the costdistance raster to a point data set.
    Now we have a large and spatially dense set of points where each point's value is the network distance to the source point.
  7. A point data set is used as the features from which to find the closest source in network distance.
    1. Perform a spatial join of the costdistance points onto the features of interest. This puts the attributes of the closest costdistance point onto the point of interest, including a field called Grid-code that represents the network distance to the closest source point.

Voilà, now we have a set of point features of interest that are encoded with their network distance to the closest source points of interest.

Statistical Analysis

  1. Generate a data frame of census variables (tract level)
    1. Race (as % nonwhite)
    2. Median HH income
    3. Percent of persons living below poverty
  2. merge (R) arterial length, FF count per tract, polygon area to tract data frame using STFID
  3. Calculate fields
    1. FFR density
      count of FFR / area of tract
      count of FFR / residents per tract
    2. Freeway/Arterial density
      length of fwy-art / area of tract
  4. Standardize measures
    Use scale in R
  5. Negative binomial regression (glm.nb in the MASS library)


# basic functions
source ("http://gis.washington.edu/phurvitz/R/functions.R") 

# load libraries 
library(MASS)
if ( ! lib.is.loaded ( "RODBC" )) { 
  library (RODBC) 
} 
library(BMA)

# get the scriptname 
strScriptName <- script.name() 
print(strScriptName) 
 
# where are we? 
setwd ("C:/users/phurvitz/htdocs/phurvitz/phd/research/fastfood/kingco_food_sources/kde_summary_stats/kde_by_census/") 
#setwd("C:/users/phurvitz/htdocs/phurvitz/phd/research/fastfood/papers/fast_foods-arterials")


# access db 
if ( !exists("ffkdedb")) { 
  ffkdedb <- odbcConnectAccess ("ff_kde_summary_data.mdb") 
} 
 
# read tractdat 
if (!exists ("tractdat")) { 
#  tractdat <- sqlFetch (ffkdedb, "tract.race.income.age.edu.own.hhsize.kde") 
  tractdat <- sqlFetch (ffkdedb, "q_propval_ses_artdens_tract") 
  colnames(tractdat) <- fix.colnames(tractdat)
} 

# stfid
tractdat$stfid <- as.character(tractdat$stfid)

# rescale area
tractdat$area.mi2 <- tractdat$area.ft2/5280^2
tractdat$logareami2 <- log(tractdat$area.mi2)
tractdat$area.km2 <- tractdat$area.mi2 * 2.59
tractdat$logareakm2 <- log(tractdat$area.km2)

# rescale population
tractdat$logpop <- log(tractdat$tot.pop)

# convert arterial density
tractdat$artdnskm2 <- tractdat$artdnsmi2 * (1.609/2.59)

# counts of zero per tract 
tractdat$ff.count <- ifelse(is.na(tractdat$ff.count), 0, tractdat$ff.count) 
tractdat$valperunit.mf <- ifelse(is.na(tractdat$valperunit.mf), 0, tractdat$valperunit.mf) 
tractdat$valperunit.sf <- ifelse(is.na(tractdat$valperunit.sf), 0, tractdat$valperunit.sf) 

# densities 
tractdat$ffdnsmi2 <- tractdat$ff.count / tractdat$area.mi2
tractdat$ffdnskm2 <- tractdat$ff.count / tractdat$area.km2
tractdat$popdnsmi2 <- tractdat$tot.pop / tractdat$area.mi2
tractdat$popdnskm2 <- tractdat$tot.pop / tractdat$area.km2
tractdat$ffdnscap <- (tractdat$ff.count / tractdat$tot.pop)
tractdat$ffdnscapk <- (tractdat$ff.count / (tractdat$tot.pop/1000))

#rescale $$
tractdat$medhhinc.k <- tractdat$medhhinc/1000
tractdat$valperunit.all.k <- tractdat$valperunit.all/1000

#-------------------- 
# standardized fields
detach(tractdat)
attach(tractdat)
analysis.dat <- as.data.frame(cbind(medhhinc.k,pctnonwhite, valperunit.all.k, artdnsmi2, artdnskm2))
tractdat.std <- as.data.frame(cbind(ff.count, logareami2, logareakm2, logpop, as.data.frame(lapply(analysis.dat, scale))))
tractdat.std$ff.count.std <- scale(tractdat.std$ff.count)


#-------------------- 
# mean and median per stfid
#stfids <- unique(as.character(dat.dist$stfid))
#dist.summary <- as.data.frame(matrix(data=NA, nrow=0, ncol=3))
#colnames(dist.summary) <- c("stfid", "ff.dist.mean", "ff.dist.median")
#for (i in 1:length(stfids)) {
#    stfid <- stfids[i]
#    parcels <- dat.dist[dat.dist$stfid==stfid,]
#    dist.mean <- round(mean(parcels$dist.ff),1)
#    dist.median <- median(parcels$dist.ff)
#    dist.summary <- rbind(dist.summary, c(stfid, dist.mean, dist.median))
#    # add this to the tractdat
#    tractdat[tractdat$stfid==stfid,41] <- dist.mean
#    tractdat[tractdat$stfid==stfid,42] <- dist.median    
#}
#colnames(tractdat) <- c(colnames(tractdat)[1:40], "dist.mean.ff", "dist.median.ff")
#tractdat$log.dist.mean.ff <- log(tractdat$dist.mean.ff)
tractdat$logpop <- log(tractdat$tot.pop)

#-------------------- 
#negative binomial GLMs based on fast food area density

# SAE
#mod.nb.art.ses.only <- glm.nb(ff.count ~ offset(logareami2) + medhhinc.k + pctnonwhite + valperunit.all.k, data=tractdat)
#mod.nb.art.only <- glm.nb(ff.count ~ offset(logareami2) + artdnsmi2, data=tractdat)
#mod.nb.full <- glm.nb(ff.count ~ offset(logareami2) + medhhinc.k + pctnonwhite + valperunit.all.k + artdnsmi2, data=tractdat)
mod.nb.art.ses.only <- glm.nb(ff.count ~ offset(logareami2) + medhhinc.k + pctnonwhite, data=tractdat)
mod.nb.art.only <- glm.nb(ff.count ~ offset(logareami2) + artdnsmi2, data=tractdat)
mod.nb.full <- glm.nb(ff.count ~ offset(logareami2) + medhhinc.k + pctnonwhite + artdnsmi2, data=tractdat)

for (m in c("mod.nb.art.only", "mod.nb.art.ses.only", "mod.nb.full")) {
  md <- eval(parse(text=m))
  mc <- round(as.data.frame(summary(md)$coefficients),3)
  aicx <- eval(parse(text=paste("AIC(", m, ")", sep="")))
  bicx <- eval(parse(text=paste("AIC(", m, ", k=log(nrow(tractdat)))")))
  mc[1,5] <- round(aicx,1)
  mc[1,6] <- round(bicx,1)
  mc[-1,5:6] <- ""
  colnames(mc)[2] <- "Std_Error"
  colnames(mc)[5:6] <- c("aic", "bic")
  cat(paste("\n",m,"\n", sep=""))
  write.csv (mc)
}


# SI
#mod.nb.art.ses.only.si <- glm.nb(ff.count ~ offset(logareakm2) + medhhinc.k + pctnonwhite + valperunit.all.k, data=tractdat)
#mod.nb.art.only.si <- glm.nb(ff.count ~ offset(logareakm2) + artdnskm2, data=tractdat)
#mod.nb.full.si <- glm.nb(ff.count ~ offset(logareakm2) + medhhinc.k + pctnonwhite + valperunit.all.k + artdnskm2, data=tractdat)

mod.nb.art.ses.only.si <- glm.nb(ff.count ~ offset(logareakm2) + medhhinc.k + pctnonwhite , data=tractdat)
mod.nb.art.only.si <- glm.nb(ff.count ~ offset(logareakm2) + artdnskm2, data=tractdat)
mod.nb.full.si <- glm.nb(ff.count ~ offset(logareakm2) + medhhinc.k + pctnonwhite + artdnskm2, data=tractdat)

for (m in c("mod.nb.art.ses.only.si", "mod.nb.art.only.si", "mod.nb.full.si")) {
  md <- eval(parse(text=m))
  mc <- round(as.data.frame(summary(md)$coefficients),3)
  aicx <- eval(parse(text=paste("AIC(", m, ")", sep="")))
  bicx <- eval(parse(text=paste("AIC(", m, ", k=log(nrow(tractdat)))")))
  mc[1,5] <- round(aicx,1)
  mc[1,6] <- round(bicx,1)
  mc[-1,5:6] <- ""
  colnames(mc)[2] <- "Std_Error"
  colnames(mc)[5:6] <- c("aic", "bic")
  cat(paste("\n","ff density by area: ",m, "\n", sep=""))
  write.csv (mc)
}


#-------------------- 
# STANDARDIZED COEFFICIENTS!!!!

#negative binomial GLMs based on fast food area density 
#mod.nb.art.ses.only.std <- glm.nb(ff.count ~ offset(logareami2) + medhhinc.k + pctnonwhite + valperunit.all.k, data=tractdat.std)
#mod.nb.art.only.std <- glm.nb(ff.count ~ offset(logareami2) + artdnsmi2, data=tractdat.std)
#mod.nb.full.std <- glm.nb(ff.count ~ offset(logareami2) + medhhinc.k + pctnonwhite + valperunit.all.k + artdnsmi2, data=tractdat.std)


# SAE
mod.nb.art.ses.only.std <- glm.nb(ff.count ~ offset(logareami2) + medhhinc.k + pctnonwhite, data=tractdat.std)
mod.nb.art.only.std <- glm.nb(ff.count ~ offset(logareami2) + artdnsmi2, data=tractdat.std)
mod.nb.full.std <- glm.nb(ff.count ~ offset(logareami2) + medhhinc.k + pctnonwhite + artdnsmi2, data=tractdat.std)

for (m in c("mod.nb.art.only.std", "mod.nb.art.ses.only.std", "mod.nb.full.std")) {
  md <- eval(parse(text=m))
  mc <- round(as.data.frame(summary(md)$coefficients),3)
  aicx <- eval(parse(text=paste("AIC(", m, ")", sep="")))
  bicx <- eval(parse(text=paste("AIC(", m, ", k=log(nrow(tractdat)))")))
  mc[1,5] <- round(aicx,1)
  mc[1,6] <- round(bicx,1)
  mc[-1,5:6] <- ""
  colnames(mc)[2] <- "Std_Error"
  colnames(mc)[5:6] <- c("aic", "bic")
  cat(paste("\n",m,"\n", sep=""))
  write.csv (mc)
}


# SI
# deprecated: no res unit values
#mod.nb.art.ses.only.si.std <- glm.nb(ff.count ~ offset(logareakm2) + medhhinc.k + pctnonwhite + valperunit.all.k, data=tractdat.std)
#mod.nb.art.only.si.std <- glm.nb(ff.count ~ offset(logareakm2) + artdnskm2, data=tractdat)
#mod.nb.full.si.std <- glm.nb(ff.count ~ offset(logareakm2) + medhhinc.k + pctnonwhite + valperunit.all.k + artdnskm2, data=tractdat.std)

# deprecated: must use standardized response var
#mod.nb.art.ses.only.si.std <- glm.nb(ff.count ~ offset(logareakm2) + medhhinc.k + pctnonwhite , data=tractdat.std)
#mod.nb.art.only.si.std <- glm.nb(ff.count ~ offset(logareakm2) + artdnskm2, data=tractdat)
#mod.nb.full.si.std <- glm.nb(ff.count ~ offset(logareakm2) + medhhinc.k + pctnonwhite + artdnskm2, data=tractdat.std)

mod.nb.art.ses.only.si.std <- lm(ff.count.std ~ offset(logareakm2) + medhhinc.k + pctnonwhite , data=tractdat.std)
mod.nb.art.only.si.std <- lm(ff.count.std ~ offset(logareakm2) + artdnskm2, data=tractdat.std)
mod.nb.full.si.std <- lm(ff.count.std ~ offset(logareakm2) + medhhinc.k + pctnonwhite + artdnskm2, data=tractdat.std)

for (m in c("mod.nb.art.ses.only.si.std", "mod.nb.art.only.si.std", "mod.nb.full.si.std")) {
  md <- eval(parse(text=m))
  mc <- round(as.data.frame(summary(md)$coefficients),3)
  aicx <- eval(parse(text=paste("AIC(", m, ")", sep="")))
  bicx <- eval(parse(text=paste("AIC(", m, ", k=log(nrow(tractdat)))")))
  mc[1,5] <- round(aicx,1)
  mc[1,6] <- round(bicx,1)
  mc[-1,5:6] <- ""
  colnames(mc)[2] <- "Std_Error"
  colnames(mc)[5:6] <- c("aic", "bic")
  cat(paste("\n","ff density by area (std): ", m,"\n", sep=""))
  write.csv (mc)
}




#-------------------- 
#negative binomial GLMs based on fast food population density
#mod.nb.art.only.pop <- glm.nb(ff.count ~ offset(logpop) + artdnsmi2, data=tractdat)
#mod.nb.art.ses.only.pop <- glm.nb(ff.count ~ offset(logpop) + medhhinc.k + pctnonwhite + valperunit.all.k, data=tractdat)
#mod.nb.full.pop <- glm.nb(ff.count ~ offset(logpop) + medhhinc.k + pctnonwhite + valperunit.all.k + artdnsmi2, data=tractdat)

# SAE
mod.nb.art.only.pop <- glm.nb(ff.count ~ offset(logpop) + artdnsmi2, data=tractdat)
mod.nb.art.ses.only.pop <- glm.nb(ff.count ~ offset(logpop) + medhhinc.k + pctnonwhite, data=tractdat)
mod.nb.full.pop <- glm.nb(ff.count ~ offset(logpop) + medhhinc.k + pctnonwhite + artdnsmi2, data=tractdat)

for (m in c("mod.nb.art.ses.only.pop", "mod.nb.art.only.pop", "mod.nb.full.pop")) {
  md <- eval(parse(text=m))
  mc <- round(as.data.frame(summary(md)$coefficients),3)
  aicx <- eval(parse(text=paste("AIC(", m, ")", sep="")))
  bicx <- eval(parse(text=paste("AIC(", m, ", k=log(nrow(tractdat)))")))
  mc[1,5] <- round(aicx,1)
  mc[1,6] <- round(bicx,1)
  mc[-1,5:6] <- ""
  colnames(mc)[2] <- "Std_Error"
  colnames(mc)[5:6] <- c("aic", "bic")
  cat(paste("\n","ff density by pop: ",m,"\n", sep=""))
  mc1 <- mc
  write.csv (mc1)
}

# SI

# deprecated: no res unit values
#mod.nb.art.ses.only.pop.si <- glm.nb(ff.count ~ offset(logpop) + medhhinc.k + pctnonwhite + valperunit.all.k, data=tractdat)
#mod.nb.art.only.pop.si <- glm.nb(ff.count ~ offset(logpop) + artdnskm2, data=tractdat)
#mod.nb.full.pop.si <- glm.nb(ff.count ~ offset(logpop) + medhhinc.k + pctnonwhite + valperunit.all.k + artdnskm2, data=tractdat)

# deprecated: must use standardized response var
#mod.nb.art.ses.only.pop.si <- glm.nb(ff.count ~ offset(logpop) + medhhinc.k + pctnonwhite, data=tractdat)
#mod.nb.art.only.pop.si <- glm.nb(ff.count ~ offset(logpop) + artdnskm2, data=tractdat)
#mod.nb.full.pop.si <- glm.nb(ff.count ~ offset(logpop) + medhhinc.k + pctnonwhite + artdnskm2, data=tractdat)

mod.nb.art.ses.only.pop.si <- lm(ff.count ~ offset(logpop) + medhhinc.k + pctnonwhite, data=tractdat)
mod.nb.art.only.pop.si <- lm(ff.count ~ offset(logpop) + artdnskm2, data=tractdat)
mod.nb.full.pop.si <- lm(ff.count ~ offset(logpop) + medhhinc.k + pctnonwhite + artdnskm2, data=tractdat)

for (m in c("mod.nb.art.ses.only.pop.si", "mod.nb.art.only.pop.si", "mod.nb.full.pop.si")) {
  md <- eval(parse(text=m))
  mc <- round(as.data.frame(summary(md)$coefficients),3)
  aicx <- eval(parse(text=paste("AIC(", m, ")", sep="")))
  bicx <- eval(parse(text=paste("AIC(", m, ", k=log(nrow(tractdat)))")))
  mc[1,5] <- round(aicx,1)
  mc[1,6] <- round(bicx,1)
  mc[-1,5:6] <- ""
  colnames(mc)[2] <- "Std_Error"
  colnames(mc)[5:6] <- c("aic", "bic")
  cat(paste("\n","ff density by pop: ", m,"\n", sep=""))
  write.csv (mc)
}


# population STANDARDIZED, SI

#mod.nb.art.ses.only.si.std <- glm.nb(ff.count ~ offset(logpop) + medhhinc.k + pctnonwhite + valperunit.all.k, data=tractdat.std)
#mod.nb.art.only.si.std <- glm.nb(ff.count ~ offset(logpop) + artdnskm2, data=tractdat)
#mod.nb.full.si.std <- glm.nb(ff.count ~ offset(logpop) + medhhinc.k + pctnonwhite + valperunit.all.k + artdnskm2, data=tractdat.std)
mod.nb.art.ses.only.si.std <- glm.nb(ff.count ~ offset(logpop) + medhhinc.k + pctnonwhite , data=tractdat.std)
mod.nb.art.only.si.std <- glm.nb(ff.count ~ offset(logpop) + artdnskm2, data=tractdat.std)
mod.nb.full.si.std <- glm.nb(ff.count ~ offset(logpop) + medhhinc.k + pctnonwhite + artdnskm2, data=tractdat.std)


cat ("FF density by population\n")
for (m in c("mod.nb.art.ses.only.si.std", "mod.nb.art.only.si.std", "mod.nb.full.si.std")) {
  md <- eval(parse(text=m))
  mc <- round(as.data.frame(summary(md)$coefficients),3)
  aicx <- eval(parse(text=paste("AIC(", m, ")", sep="")))
  bicx <- eval(parse(text=paste("AIC(", m, ", k=log(nrow(tractdat)))")))
  mc[1,5] <- round(aicx,1)
  mc[1,6] <- round(bicx,1)
  mc[-1,5:6] <- ""
  colnames(mc)[2] <- "Std_Error"
  colnames(mc)[5:6] <- c("aic", "bic")
  cat(paste("\n", "ff density by pop (std): ", m,"\n", sep=""))
  write.csv (mc)
}

# hmmm
tractdat.std$ff.count.std <- scale(tractdat.std$ff.count)


#-------------------- 
# linear models using log(dist.mean.ff)
#mod.lm.art.only <- lm(log.dist.mean.ff ~ artdnsmi2, data=tractdat)
#mod.lm.art.none <- lm(log.dist.mean.ff ~ medhhinc.k + pctnonwhite + valperunit.all.k, data=tractdat)
#mod.lm.full <- lm(log.dist.mean.ff ~ medhhinc.k + pctnonwhite + valperunit.all.k + artdnsmi2, data=tractdat)

#for (m in c("mod.lm.art.only", "mod.lm.art.none", "mod.lm.full")) {
#  md <- eval(parse(text=m))
#  mc <- round(as.data.frame(summary(md)$coefficients),3)
#  r2 <- round(summary(md)$adj.r.squared, 2)
#  mc[1,5] <- r2
#  mc[-1,5] <- ""
#  colnames(mc)[5] <- c("adj.r2")
#  cat(paste("\n",m,"\n", sep=""))
#  print (mc)
#}

###########
#AIC and BIC
aic.art.only <- AIC(mod.nb.art.only)
aic.art.none <- AIC(mod.nb.art.ses.only)
aic.full <- AIC(mod.nb.full)

bic.art.only <- AIC(mod.nb.art.only, k=log(nrow(tractdat)))
bic.art.none <- AIC(mod.nb.art.ses.only, k=log(nrow(tractdat)))
bic.full <- AIC(mod.nb.full, k=log(nrow(tractdat)))

#mod.bic.art.only <- bic.glm(ff.count ~ offset(logarea) + artdnsmi2, 
#    data=tractdat, glm.family=neg.bin(summary(mod.nb.art.only)$theta))
mod.bic.art.none <- bic.glm(ff.count ~ offset(logareami2) + + medhhinc.k + pctnonwhite + valperunit.all.k, 
    data=tractdat, glm.family=neg.bin(summary(mod.nb.art.ses.only)$theta))
mod.bic.full <- bic.glm(ff.count ~ offset(logareami2) + medhhinc.k + pctnonwhite + valperunit.all.k + artdnsmi2, 
    data=tractdat, glm.family=neg.bin(summary(mod.nb.full)$theta))

#########################
# fast food distance
dat.dist <- sqlFetch (ffkdedb, "parcel_res-nocomm_tract_value_ffdist")
colnames(dat.dist) <- fix.colnames(dat.dist)
dat.dist$stfid <- as.character(dat.dist$stfid)
dat.dist$ff.count <- ifelse(is.na(dat.dist$ff.count), 0, dat.dist$ff.count)
dat.dist$stfid <- ifelse(is.na(dat.dist$stfid), "", dat.dist$stfid)
    

###############
# zelig
library(Zelig)
z.out.art.none <- zelig(ff.count ~ offset(logareami2) + medhhinc.k + pctnonwhite + valperunit.all.k, 
    data=tractdat, model="negbin")
z.out.full <- zelig(ff.count ~ offset(logareami2) + medhhinc.k + pctnonwhite + valperunit.all.k + artdnsmi2, 
    data=tractdat, model="negbin")