Main Page/Research/Papers/fast food and arterials/Method
From phurvitz
< Main Page | Research | Papers | fast food and arterials
Revision as of 21:29, 24 March 2009 by Phil Hurvitz (talk | contribs) (New page: __FORCETOC__ Method for fast food restaurants (FFRs) & arterial densities paper ==Data Sources== # Roadway lines (KCGIS) # Food source points (UFL classification of PHSKC food source data...)
Method for fast food restaurants (FFRs) & arterial densities paper
Contents
Data Sources
- Roadway lines (KCGIS)
- Food source points (UFL classification of PHSKC food source data).
- Census tracts (US Census)
- Parcel polygons (KCGIS)
GIS Analysis
Arterials/Freeways
- Reselect road type != minor & local
- Identity or Intersect with tracts
- Summarize for total length of Art/Fwy length per tract
FFR
- Reselect food source points where L4 = fast food
- Identity with tracts
- Summarize to get count of FFR per tract
Parcels
- Reselect residential parcels (num_units > 0)
- Convert to centroids
- Identity or spatial join with tracts to get STFID on each parcel centroid point
FFR distance to Residential Parcel Centroids
Calculating the network distance between sets of points is processing and time intensive. It is possible to perform a similar operation that uses the CostDistance function in raster analysis.
- Convert the road line dataset to a raster using a relatively small cell size (3 m or 10 ft).
- Create a raster where the value of the road cells equals 1 and the rest of the cells are no data. This creates a "flow network."
- Create a line data set from the rasterized road line data. This is necessary to have a line data set for "snapping" source points to the raster representation of the road lines.
- A point data set is used as the source for network distance.
- Use the Generate Near Table ArcToolbox tool to create a table containing the XY locations of the nearest location on the vector -> raster -> vector line data set. Use the option to get location. This results in an XY coordinate for each point, representing the location on the line that is closest to the point.
- Display XY data to create a set of points. Export as a feature class.
- Now we have a point data set representing the sources, where the locations are "snapped" to the raster representation of the road network.
- Use the Costdistance ArcToolbox tool using the point data as the source and the rasterized "flow network."
- Now we have a raster that represents the network distance for each cell to the closest point data source.
- Convert the costdistance raster to a point data set.
- Now we have a large and spatially dense set of points where each point's value is the network distance to the source point.
- A point data set is used as the features from which to find the closest source in network distance.
- Perform a spatial join of the costdistance points onto the features of interest. This puts the attributes of the closest costdistance point onto the point of interest, including a field called Grid-code that represents the network distance to the closest source point.
Voilà, now we have a set of point features of interest that are encoded with their network distance to the closest source points of interest.
Statistical Analysis
- Generate a data frame of census variables (tract level)
- Race (as % nonwhite)
- Median HH income
- Percent of persons living below poverty
- merge (R) arterial length, FF count per tract, polygon area to tract data frame using STFID
- Calculate fields
- FFR density
- count of FFR / area of tract
- count of FFR / residents per tract
- Freeway/Arterial density
- length of fwy-art / area of tract
- FFR density
- Standardize measures
- Use scale in R
- Negative binomial regression (glm.nb in the MASS library)
# basic functions source ("http://gis.washington.edu/phurvitz/R/functions.R") # load libraries library(MASS) if ( ! lib.is.loaded ( "RODBC" )) { library (RODBC) } library(BMA) # get the scriptname strScriptName <- script.name() print(strScriptName) # where are we? setwd ("C:/users/phurvitz/htdocs/phurvitz/phd/research/fastfood/kingco_food_sources/kde_summary_stats/kde_by_census/") #setwd("C:/users/phurvitz/htdocs/phurvitz/phd/research/fastfood/papers/fast_foods-arterials") # access db if ( !exists("ffkdedb")) { ffkdedb <- odbcConnectAccess ("ff_kde_summary_data.mdb") } # read tractdat if (!exists ("tractdat")) { # tractdat <- sqlFetch (ffkdedb, "tract.race.income.age.edu.own.hhsize.kde") tractdat <- sqlFetch (ffkdedb, "q_propval_ses_artdens_tract") colnames(tractdat) <- fix.colnames(tractdat) } # stfid tractdat$stfid <- as.character(tractdat$stfid) # rescale area tractdat$area.mi2 <- tractdat$area.ft2/5280^2 tractdat$logareami2 <- log(tractdat$area.mi2) tractdat$area.km2 <- tractdat$area.mi2 * 2.59 tractdat$logareakm2 <- log(tractdat$area.km2) # rescale population tractdat$logpop <- log(tractdat$tot.pop) # convert arterial density tractdat$artdnskm2 <- tractdat$artdnsmi2 * (1.609/2.59) # counts of zero per tract tractdat$ff.count <- ifelse(is.na(tractdat$ff.count), 0, tractdat$ff.count) tractdat$valperunit.mf <- ifelse(is.na(tractdat$valperunit.mf), 0, tractdat$valperunit.mf) tractdat$valperunit.sf <- ifelse(is.na(tractdat$valperunit.sf), 0, tractdat$valperunit.sf) # densities tractdat$ffdnsmi2 <- tractdat$ff.count / tractdat$area.mi2 tractdat$ffdnskm2 <- tractdat$ff.count / tractdat$area.km2 tractdat$popdnsmi2 <- tractdat$tot.pop / tractdat$area.mi2 tractdat$popdnskm2 <- tractdat$tot.pop / tractdat$area.km2 tractdat$ffdnscap <- (tractdat$ff.count / tractdat$tot.pop) tractdat$ffdnscapk <- (tractdat$ff.count / (tractdat$tot.pop/1000)) #rescale $$ tractdat$medhhinc.k <- tractdat$medhhinc/1000 tractdat$valperunit.all.k <- tractdat$valperunit.all/1000 #-------------------- # standardized fields detach(tractdat) attach(tractdat) analysis.dat <- as.data.frame(cbind(medhhinc.k,pctnonwhite, valperunit.all.k, artdnsmi2, artdnskm2)) tractdat.std <- as.data.frame(cbind(ff.count, logareami2, logareakm2, logpop, as.data.frame(lapply(analysis.dat, scale)))) tractdat.std$ff.count.std <- scale(tractdat.std$ff.count) #-------------------- # mean and median per stfid #stfids <- unique(as.character(dat.dist$stfid)) #dist.summary <- as.data.frame(matrix(data=NA, nrow=0, ncol=3)) #colnames(dist.summary) <- c("stfid", "ff.dist.mean", "ff.dist.median") #for (i in 1:length(stfids)) { # stfid <- stfids[i] # parcels <- dat.dist[dat.dist$stfid==stfid,] # dist.mean <- round(mean(parcels$dist.ff),1) # dist.median <- median(parcels$dist.ff) # dist.summary <- rbind(dist.summary, c(stfid, dist.mean, dist.median)) # # add this to the tractdat # tractdat[tractdat$stfid==stfid,41] <- dist.mean # tractdat[tractdat$stfid==stfid,42] <- dist.median #} #colnames(tractdat) <- c(colnames(tractdat)[1:40], "dist.mean.ff", "dist.median.ff") #tractdat$log.dist.mean.ff <- log(tractdat$dist.mean.ff) tractdat$logpop <- log(tractdat$tot.pop) #-------------------- #negative binomial GLMs based on fast food area density # SAE #mod.nb.art.ses.only <- glm.nb(ff.count ~ offset(logareami2) + medhhinc.k + pctnonwhite + valperunit.all.k, data=tractdat) #mod.nb.art.only <- glm.nb(ff.count ~ offset(logareami2) + artdnsmi2, data=tractdat) #mod.nb.full <- glm.nb(ff.count ~ offset(logareami2) + medhhinc.k + pctnonwhite + valperunit.all.k + artdnsmi2, data=tractdat) mod.nb.art.ses.only <- glm.nb(ff.count ~ offset(logareami2) + medhhinc.k + pctnonwhite, data=tractdat) mod.nb.art.only <- glm.nb(ff.count ~ offset(logareami2) + artdnsmi2, data=tractdat) mod.nb.full <- glm.nb(ff.count ~ offset(logareami2) + medhhinc.k + pctnonwhite + artdnsmi2, data=tractdat) for (m in c("mod.nb.art.only", "mod.nb.art.ses.only", "mod.nb.full")) { md <- eval(parse(text=m)) mc <- round(as.data.frame(summary(md)$coefficients),3) aicx <- eval(parse(text=paste("AIC(", m, ")", sep=""))) bicx <- eval(parse(text=paste("AIC(", m, ", k=log(nrow(tractdat)))"))) mc[1,5] <- round(aicx,1) mc[1,6] <- round(bicx,1) mc[-1,5:6] <- "" colnames(mc)[2] <- "Std_Error" colnames(mc)[5:6] <- c("aic", "bic") cat(paste("\n",m,"\n", sep="")) write.csv (mc) } # SI #mod.nb.art.ses.only.si <- glm.nb(ff.count ~ offset(logareakm2) + medhhinc.k + pctnonwhite + valperunit.all.k, data=tractdat) #mod.nb.art.only.si <- glm.nb(ff.count ~ offset(logareakm2) + artdnskm2, data=tractdat) #mod.nb.full.si <- glm.nb(ff.count ~ offset(logareakm2) + medhhinc.k + pctnonwhite + valperunit.all.k + artdnskm2, data=tractdat) mod.nb.art.ses.only.si <- glm.nb(ff.count ~ offset(logareakm2) + medhhinc.k + pctnonwhite , data=tractdat) mod.nb.art.only.si <- glm.nb(ff.count ~ offset(logareakm2) + artdnskm2, data=tractdat) mod.nb.full.si <- glm.nb(ff.count ~ offset(logareakm2) + medhhinc.k + pctnonwhite + artdnskm2, data=tractdat) for (m in c("mod.nb.art.ses.only.si", "mod.nb.art.only.si", "mod.nb.full.si")) { md <- eval(parse(text=m)) mc <- round(as.data.frame(summary(md)$coefficients),3) aicx <- eval(parse(text=paste("AIC(", m, ")", sep=""))) bicx <- eval(parse(text=paste("AIC(", m, ", k=log(nrow(tractdat)))"))) mc[1,5] <- round(aicx,1) mc[1,6] <- round(bicx,1) mc[-1,5:6] <- "" colnames(mc)[2] <- "Std_Error" colnames(mc)[5:6] <- c("aic", "bic") cat(paste("\n","ff density by area: ",m, "\n", sep="")) write.csv (mc) } #-------------------- # STANDARDIZED COEFFICIENTS!!!! #negative binomial GLMs based on fast food area density #mod.nb.art.ses.only.std <- glm.nb(ff.count ~ offset(logareami2) + medhhinc.k + pctnonwhite + valperunit.all.k, data=tractdat.std) #mod.nb.art.only.std <- glm.nb(ff.count ~ offset(logareami2) + artdnsmi2, data=tractdat.std) #mod.nb.full.std <- glm.nb(ff.count ~ offset(logareami2) + medhhinc.k + pctnonwhite + valperunit.all.k + artdnsmi2, data=tractdat.std) # SAE mod.nb.art.ses.only.std <- glm.nb(ff.count ~ offset(logareami2) + medhhinc.k + pctnonwhite, data=tractdat.std) mod.nb.art.only.std <- glm.nb(ff.count ~ offset(logareami2) + artdnsmi2, data=tractdat.std) mod.nb.full.std <- glm.nb(ff.count ~ offset(logareami2) + medhhinc.k + pctnonwhite + artdnsmi2, data=tractdat.std) for (m in c("mod.nb.art.only.std", "mod.nb.art.ses.only.std", "mod.nb.full.std")) { md <- eval(parse(text=m)) mc <- round(as.data.frame(summary(md)$coefficients),3) aicx <- eval(parse(text=paste("AIC(", m, ")", sep=""))) bicx <- eval(parse(text=paste("AIC(", m, ", k=log(nrow(tractdat)))"))) mc[1,5] <- round(aicx,1) mc[1,6] <- round(bicx,1) mc[-1,5:6] <- "" colnames(mc)[2] <- "Std_Error" colnames(mc)[5:6] <- c("aic", "bic") cat(paste("\n",m,"\n", sep="")) write.csv (mc) } # SI # deprecated: no res unit values #mod.nb.art.ses.only.si.std <- glm.nb(ff.count ~ offset(logareakm2) + medhhinc.k + pctnonwhite + valperunit.all.k, data=tractdat.std) #mod.nb.art.only.si.std <- glm.nb(ff.count ~ offset(logareakm2) + artdnskm2, data=tractdat) #mod.nb.full.si.std <- glm.nb(ff.count ~ offset(logareakm2) + medhhinc.k + pctnonwhite + valperunit.all.k + artdnskm2, data=tractdat.std) # deprecated: must use standardized response var #mod.nb.art.ses.only.si.std <- glm.nb(ff.count ~ offset(logareakm2) + medhhinc.k + pctnonwhite , data=tractdat.std) #mod.nb.art.only.si.std <- glm.nb(ff.count ~ offset(logareakm2) + artdnskm2, data=tractdat) #mod.nb.full.si.std <- glm.nb(ff.count ~ offset(logareakm2) + medhhinc.k + pctnonwhite + artdnskm2, data=tractdat.std) mod.nb.art.ses.only.si.std <- lm(ff.count.std ~ offset(logareakm2) + medhhinc.k + pctnonwhite , data=tractdat.std) mod.nb.art.only.si.std <- lm(ff.count.std ~ offset(logareakm2) + artdnskm2, data=tractdat.std) mod.nb.full.si.std <- lm(ff.count.std ~ offset(logareakm2) + medhhinc.k + pctnonwhite + artdnskm2, data=tractdat.std) for (m in c("mod.nb.art.ses.only.si.std", "mod.nb.art.only.si.std", "mod.nb.full.si.std")) { md <- eval(parse(text=m)) mc <- round(as.data.frame(summary(md)$coefficients),3) aicx <- eval(parse(text=paste("AIC(", m, ")", sep=""))) bicx <- eval(parse(text=paste("AIC(", m, ", k=log(nrow(tractdat)))"))) mc[1,5] <- round(aicx,1) mc[1,6] <- round(bicx,1) mc[-1,5:6] <- "" colnames(mc)[2] <- "Std_Error" colnames(mc)[5:6] <- c("aic", "bic") cat(paste("\n","ff density by area (std): ", m,"\n", sep="")) write.csv (mc) } #-------------------- #negative binomial GLMs based on fast food population density #mod.nb.art.only.pop <- glm.nb(ff.count ~ offset(logpop) + artdnsmi2, data=tractdat) #mod.nb.art.ses.only.pop <- glm.nb(ff.count ~ offset(logpop) + medhhinc.k + pctnonwhite + valperunit.all.k, data=tractdat) #mod.nb.full.pop <- glm.nb(ff.count ~ offset(logpop) + medhhinc.k + pctnonwhite + valperunit.all.k + artdnsmi2, data=tractdat) # SAE mod.nb.art.only.pop <- glm.nb(ff.count ~ offset(logpop) + artdnsmi2, data=tractdat) mod.nb.art.ses.only.pop <- glm.nb(ff.count ~ offset(logpop) + medhhinc.k + pctnonwhite, data=tractdat) mod.nb.full.pop <- glm.nb(ff.count ~ offset(logpop) + medhhinc.k + pctnonwhite + artdnsmi2, data=tractdat) for (m in c("mod.nb.art.ses.only.pop", "mod.nb.art.only.pop", "mod.nb.full.pop")) { md <- eval(parse(text=m)) mc <- round(as.data.frame(summary(md)$coefficients),3) aicx <- eval(parse(text=paste("AIC(", m, ")", sep=""))) bicx <- eval(parse(text=paste("AIC(", m, ", k=log(nrow(tractdat)))"))) mc[1,5] <- round(aicx,1) mc[1,6] <- round(bicx,1) mc[-1,5:6] <- "" colnames(mc)[2] <- "Std_Error" colnames(mc)[5:6] <- c("aic", "bic") cat(paste("\n","ff density by pop: ",m,"\n", sep="")) mc1 <- mc write.csv (mc1) } # SI # deprecated: no res unit values #mod.nb.art.ses.only.pop.si <- glm.nb(ff.count ~ offset(logpop) + medhhinc.k + pctnonwhite + valperunit.all.k, data=tractdat) #mod.nb.art.only.pop.si <- glm.nb(ff.count ~ offset(logpop) + artdnskm2, data=tractdat) #mod.nb.full.pop.si <- glm.nb(ff.count ~ offset(logpop) + medhhinc.k + pctnonwhite + valperunit.all.k + artdnskm2, data=tractdat) # deprecated: must use standardized response var #mod.nb.art.ses.only.pop.si <- glm.nb(ff.count ~ offset(logpop) + medhhinc.k + pctnonwhite, data=tractdat) #mod.nb.art.only.pop.si <- glm.nb(ff.count ~ offset(logpop) + artdnskm2, data=tractdat) #mod.nb.full.pop.si <- glm.nb(ff.count ~ offset(logpop) + medhhinc.k + pctnonwhite + artdnskm2, data=tractdat) mod.nb.art.ses.only.pop.si <- lm(ff.count ~ offset(logpop) + medhhinc.k + pctnonwhite, data=tractdat) mod.nb.art.only.pop.si <- lm(ff.count ~ offset(logpop) + artdnskm2, data=tractdat) mod.nb.full.pop.si <- lm(ff.count ~ offset(logpop) + medhhinc.k + pctnonwhite + artdnskm2, data=tractdat) for (m in c("mod.nb.art.ses.only.pop.si", "mod.nb.art.only.pop.si", "mod.nb.full.pop.si")) { md <- eval(parse(text=m)) mc <- round(as.data.frame(summary(md)$coefficients),3) aicx <- eval(parse(text=paste("AIC(", m, ")", sep=""))) bicx <- eval(parse(text=paste("AIC(", m, ", k=log(nrow(tractdat)))"))) mc[1,5] <- round(aicx,1) mc[1,6] <- round(bicx,1) mc[-1,5:6] <- "" colnames(mc)[2] <- "Std_Error" colnames(mc)[5:6] <- c("aic", "bic") cat(paste("\n","ff density by pop: ", m,"\n", sep="")) write.csv (mc) } # population STANDARDIZED, SI #mod.nb.art.ses.only.si.std <- glm.nb(ff.count ~ offset(logpop) + medhhinc.k + pctnonwhite + valperunit.all.k, data=tractdat.std) #mod.nb.art.only.si.std <- glm.nb(ff.count ~ offset(logpop) + artdnskm2, data=tractdat) #mod.nb.full.si.std <- glm.nb(ff.count ~ offset(logpop) + medhhinc.k + pctnonwhite + valperunit.all.k + artdnskm2, data=tractdat.std) mod.nb.art.ses.only.si.std <- glm.nb(ff.count ~ offset(logpop) + medhhinc.k + pctnonwhite , data=tractdat.std) mod.nb.art.only.si.std <- glm.nb(ff.count ~ offset(logpop) + artdnskm2, data=tractdat.std) mod.nb.full.si.std <- glm.nb(ff.count ~ offset(logpop) + medhhinc.k + pctnonwhite + artdnskm2, data=tractdat.std) cat ("FF density by population\n") for (m in c("mod.nb.art.ses.only.si.std", "mod.nb.art.only.si.std", "mod.nb.full.si.std")) { md <- eval(parse(text=m)) mc <- round(as.data.frame(summary(md)$coefficients),3) aicx <- eval(parse(text=paste("AIC(", m, ")", sep=""))) bicx <- eval(parse(text=paste("AIC(", m, ", k=log(nrow(tractdat)))"))) mc[1,5] <- round(aicx,1) mc[1,6] <- round(bicx,1) mc[-1,5:6] <- "" colnames(mc)[2] <- "Std_Error" colnames(mc)[5:6] <- c("aic", "bic") cat(paste("\n", "ff density by pop (std): ", m,"\n", sep="")) write.csv (mc) } # hmmm tractdat.std$ff.count.std <- scale(tractdat.std$ff.count) #-------------------- # linear models using log(dist.mean.ff) #mod.lm.art.only <- lm(log.dist.mean.ff ~ artdnsmi2, data=tractdat) #mod.lm.art.none <- lm(log.dist.mean.ff ~ medhhinc.k + pctnonwhite + valperunit.all.k, data=tractdat) #mod.lm.full <- lm(log.dist.mean.ff ~ medhhinc.k + pctnonwhite + valperunit.all.k + artdnsmi2, data=tractdat) #for (m in c("mod.lm.art.only", "mod.lm.art.none", "mod.lm.full")) { # md <- eval(parse(text=m)) # mc <- round(as.data.frame(summary(md)$coefficients),3) # r2 <- round(summary(md)$adj.r.squared, 2) # mc[1,5] <- r2 # mc[-1,5] <- "" # colnames(mc)[5] <- c("adj.r2") # cat(paste("\n",m,"\n", sep="")) # print (mc) #} ########### #AIC and BIC aic.art.only <- AIC(mod.nb.art.only) aic.art.none <- AIC(mod.nb.art.ses.only) aic.full <- AIC(mod.nb.full) bic.art.only <- AIC(mod.nb.art.only, k=log(nrow(tractdat))) bic.art.none <- AIC(mod.nb.art.ses.only, k=log(nrow(tractdat))) bic.full <- AIC(mod.nb.full, k=log(nrow(tractdat))) #mod.bic.art.only <- bic.glm(ff.count ~ offset(logarea) + artdnsmi2, # data=tractdat, glm.family=neg.bin(summary(mod.nb.art.only)$theta)) mod.bic.art.none <- bic.glm(ff.count ~ offset(logareami2) + + medhhinc.k + pctnonwhite + valperunit.all.k, data=tractdat, glm.family=neg.bin(summary(mod.nb.art.ses.only)$theta)) mod.bic.full <- bic.glm(ff.count ~ offset(logareami2) + medhhinc.k + pctnonwhite + valperunit.all.k + artdnsmi2, data=tractdat, glm.family=neg.bin(summary(mod.nb.full)$theta)) ######################### # fast food distance dat.dist <- sqlFetch (ffkdedb, "parcel_res-nocomm_tract_value_ffdist") colnames(dat.dist) <- fix.colnames(dat.dist) dat.dist$stfid <- as.character(dat.dist$stfid) dat.dist$ff.count <- ifelse(is.na(dat.dist$ff.count), 0, dat.dist$ff.count) dat.dist$stfid <- ifelse(is.na(dat.dist$stfid), "", dat.dist$stfid) ############### # zelig library(Zelig) z.out.art.none <- zelig(ff.count ~ offset(logareami2) + medhhinc.k + pctnonwhite + valperunit.all.k, data=tractdat, model="negbin") z.out.full <- zelig(ff.count ~ offset(logareami2) + medhhinc.k + pctnonwhite + valperunit.all.k + artdnsmi2, data=tractdat, model="negbin")