U. Washington, Geography 207, specifying SIMs

University of Washington
Geography 207

Fitting an Unconstrained Spatial Interaction Model
from: David Plane and Peter Rogerson (1994), The Geographical Analysis of Population, pp. 200-201. New York: John Wiley.

"To operationalize the [spatial interaction model where the interaction is total migration between places i and j]
M_ij = k P_iP_j d_ij^-b , the most common approach is to use log-linear regression analysis. We first transform the variables in the equations by taking the logarithms of both sides:

log M_ij = log k + log P_i+ log P_j - b log d_ij

[Alternatively], natural logarithms (logs to base e) are used, whereas in the above] equation common logarithms (e.g., those to base 10) suffice. Note that the impact of this step is to turn our formerly multiplicative models into additive ones.

"Next, using any standard computerized statistical package that supports multiple regression applications, we 'fit' the models. The equation typically estimated by a regression program is a slightly modified version of the [above] equation:

log M_ij = a₀ + a₁ log P_i+ a₂ log P_j - b log d_ij

Here, a₀ = log k. Also, two new parameters have been added on the populations variables that make the models a bit more flexible. The a₁ and a₂ parameters may diverge somewhat from 1.0, allowing the estimated flows to be something other than directly proportional to origin and destination populations.

"By fitting a multiple regression model we mean that the computer program finds for us the optimum values of the parameters a₀ , a₁ , a₂, and b . These fitted parameter values are those for which the model-predicted flows { M_ij} most accurately replicate the matrix of actual migration figures { m_ij}, in the sense of minimizing the sum of the squared deviations between the logarithms of the model-predicted flows and the actual flows. The regression packages picks out the values for the four parameters that result in M_ij values that minimize the sum of squared errors [between all the observed values of m_ij and the values M_ijderived from the application of the parameter values to the population and distance values associated with each data point].

"To illustrate the results of this fitting procedure, we entered into a statistical package the 30 actual migration flows between the six New England states for 1985-90, the corresponding distances between each origin and each destination state's population centroid, and the origin and destination state populations. Specifying multiple regression in the form of the [above] equation, the following estimated equation was found:

log M_ij = -3.919 + 0.940 log P_i+ 0.570 log P_j - 0.746 log d_ij

The a₀ regression parameter must then be exponentiated to transform the equation back into its original, multiplicative form:
k = 10^a0 .

Our fitted gravity model is thus M_ij = (0.000120503) P_i^0.940P_j^0.570/ d_ij^0.746

[One then uses this model to estimate the "expected" migration flows. One can compare these flows to the actual flows to understand what other factors might be influencing the migration patterns. One can use these parameters to estimate flows in the near future, given projections of future populations. One can compare these parameters to parameters estimated from data on other U.S. regions or other times in history, to understand how migration propensities and the friction of distance are different in different regions or at different times.]