University of Washington, Geography 367, notes on network analysis

University of Washington
Department of Geography
Professor Harrington
Network Analysis

Contents:

Urban transportation planning models

Network analysis

Routing algorithms

These notes provide an overview of some basic analytic approaches for
- analyzing aggregate transportation demand (e.g., in the form that transportation planners must do),
- assessing the adequacy of transportation systems, and
- analyzing logistics for individual carriers.
Each approach has tremendous demands for geographic information, and each approach is eased by using GIS. In fact, it is possible to use pre-packaged programs and algorithms that use these approaches, and yet make them totally invisible to the user. These notes will help you look inside the black box of transportation analysis, to be able to use the results more intelligently.

URBAN TRANSPORTATION PLANNING MODELS

Component One: Trip generation
Transportation activity is viewed as derived demand:

The number of trips from Oi to D_j is a function of the number of people or activities in O and the number of activities or people in D.
Each residential zone O is classified according to rate of auto ownership, number of people per household, LFPR, etc.

The total number of trips from each O can be estimated or forecast based on some set of existing data, using regression analysis.

Component Two: Trip distribution
The total number of trips to each D can be estimated or forecast based on some set of existing data, using regression analysis.

Each activity zone D is specified according to the type and density of activities (shopping, working, school, recreation).
For example, each D could be characterized as its total retail square footage multiplied by its total number of workers.

Each OD flow can be estimated or forecast based on some set of existing data, using regression analysis applied to a gravity model: Iij = k OiDjdij-a

Why would you multiply the size of Oiand the size of D_j, to understand the number of trips between them?
Why would you divide this by the distance between Oi and D_j, to understand the number of trips between them?
Why would we need a constant, k, to understand the number of trips between Oi and D_j?
Why would distance be increased by an exponent, a, to understand the number of trips between Oi and D_j?

The model above would be specified by applying actual data for I, O, D, and d for a set of places to a statistical model that looks like: Ln Iij = ln k + ln Oi + ln D_j-a ln d_ij This would give you values for k and a, which you could apply in the first formula to other places O and D.

Component Three: Modal split
The proportion of OiDj trips using private automobiles is seen to be a function of:

the ratio of time required for the auto trip versus a public-transit mode;
the trip purpose (journeys to work and journeys to school are more likely to use public transit than journeys to shop: why?)
the trip origin and destination (how accessible by public transit?)
the trip time of day.

The actual functions are estimated from some existing data (e.g., sample surveys) and applied to other circumstances (e.g., all zones and trips modeled for the urban area).

Component Four: Route assignment
What proportion of the auto or truck trips from Oito D_j will use each of the possible road routes between the two zones? What proportion of the transit trips from Oi to Dj will use each of the possible transit routes between the two zones?

This is seen to be a function of travel time; most trips are assumed to take the shortest time path.

To increase accuracy, capacity constraints are placed on each route, and when the simulation model reaches capacity on any particular route, the remaining mode-specific trips are assigned to other routes between specific zones.

To understand route assignment better, we need to review network analysis.

NETWORK ANALYSIS

Network: a number of people or places among which there are one-on-one interactions – friendships, airline routes, telephone calls, automobile trips, roads

How can we systematically select the shortest distance routes between Oi and D_j? (Note that for transportation-system modeling or for an individual's decision making, we need to know alternative routes, ordered according to distance).

First, we must abstract the transport system into a graph or matrix.
The network graph illustrates the network in schematic fashion, with each possible O and D presented as a point (“node” or “vertex”) and each possible direct link between an O and a D presented as a straight line segment (“linkage” or “edge”).
(See page 217 figure in Chou; Figure 9.1 in Taaffe, Gauthier, and O'Kelly [TGO])

The connectivity matrix abstracts the network as a table, with each possible node (vertex) presented as a row (an origin) and as a column (a destination). Each possible direct link between an O and a D is presented as a “1” in the appropriate cell; if there is no direct OiDj link, a “0” appears in the appropriate cell. Thus, network graphs or matrices are two formats for topological data (TOPOLOGY: connections and distances among spatial data elements). Within GIS, including topology implies providing the GIS with a spatial data matrix of explicit connections among points, adjacencies among areas, etc.
(See TGO’s Figure 9.8; see this link)

For a simple system, the matrix tells you nothing you can’t see in the graph. The matrix is necessary for three reasons:

Matrix manipulation for network analysis: connectivity
If we multiply the matrix above by itself (see note 3), yielding C² (or a squared connectivity matrix), we have a new matrix that tells us the number of two-edge routes from each O to each D.
(Compare Chou's figures on pages 223-5; See TGO’s Figure 9.11, showing how to multiply a matrix).

If we multiply C² by C¹ to yield C³ , we have the number of three-link connections between each O and D.
(See Chou's page 226; See TGO's Figure 9.12)

If we add the entries, cell by cell, in C¹ , C², C³, and C⁴ , we get a matrix of the total number of possible ways to get from each O to each D. This is generally called the T-matrix, for total accessibility.
(See Chou's page 227 or TGO's Figure 9.13; note the row totals, which are measures of relative accessibility of each origin).

In some cases, when actual distance is not as very meaningful as the number of connections (e.g., airline travel, from the perspective of the traveler; or connections within an integrated circuit; or connections among linked computers on the Internet), these topological distances are all we need.

However, for walking or driving routes within a city, we care about the total distance or time, not the number of connections.

A simple way to approach this is through our graph, this time adding distance or time values (we call this a valued graph of a network).
An alternative is an L matrix, with time or distance entered in the cells rather than “yes/no” to a connection. In the example below, note that we insert “zero” along the principal diagonal and “infinity” in cells where there is no direct connection.

(See Chou's page 230; TGO’s Figure 9.19).

We can derive a total valued matrix by adding the L matrix to itself, repeatedly until we cover the diameter (maximum distance -- see Note 1) of the network. This tells us the minimum distance (in time or ground distance) between each O and each D: a very important piece of information.
(See TGO’s Figures 9.20 and 9.21; pages 220-2 in Chou)

These measures are useful in several ways.

1. They provide us with the d_ij we need for any kind of transportation modeling.
2. They show us the minimum distance between any O and D, expressed in number of links, in time, or in ground distance.
3. They allow us to understand how an additional edge (link) or the removal of an edge will affect the accessibility of a node and the connectivity of the network.

ROUTING ALGORITHMS

What we’ve done so far, however, does not tell us exactly what set of routes we need to follow to get from O to D most efficiently. We need to develop a routing algorithm. This algorithm (see note 2) will allow us (or a computer) to use the matrix of direct links between nodes to establish and record exactly which links are to be used to get from Oi to D_j.

(See Chou's page 235; TGO Figure 13.7; J.P. Rodrigue, Transportation Geography on the Web)
We will build a new matrix, R⁴ . Each cell entry (OiDj) of R⁴ tells us the first node that we have to go through to get from Oi to D_j.

If the cell entry is indeed D_j, then we know we have a direct link (a “nonstop flight” as it were) between i and j.
If the cell entry is not D_j (let’s say the entry in cell O₁D₆ is 2), then we have to go to the matrix to find how we get from O₂ to D₆.
The cell entry in O₂D₆ is not 6; it’s 3. Thus we know we have to go to the matrix to find how we can get from O₃ to D₆.
The cell entry in O₃D₆ is not 6; it’s 5. Thus we know we have to go to the matrix to find how we can get from O₅ to D₆.
The cell entry in O₅D₆ is 5. Thus we know we have a direct link at last.

We ask the computer to keep track of the links we’ve traced and the distances involved, and it can return to us the shortest route and its total (time or mileage) distance.
(See TGO Figure 13.8)

How can a GIS make use of this insight to construct minimum-distance routes?

The network of possible routes is entered into the GIS as a topological matrix: what nodes link directly to what nodes.

The distances don’t have to be explicitly entered, because the GIS has the actual location of each node; it can calculate the distance along each direct link.

1. It can take the approach outlined above:

Identify whether the origin and destination share a direct link, and if so, assigning the route to that link.
If the origin and destination don’t share a direct link, then identify the first link along the route, store that, then identify the link from the second node to the desired destination, and on and on.

2. It can take the approach outlined in Chou, pages 230-235 -- note that this approach requires that the network we're using is not the entire network, but the "minimum spanning tree" (pp. 230-233) that starts at the origin point of the route we want to optimize.

Identify the shortest link from the origin.
Is this the desired destination? If yes, record the route. If no, use the shortest link from this intermediate node.
Is this the desired destination? If yes, record the route. If no, use the shortest link from this intermediate node.
And on, until we’ve arrived at the desired destination.
This minimum branching tree algorithm is not guaranteed to find the closest total route link.

In either case, the GIS can link multiple destinations, to establish the minimum distance for a delivery route, such as we’ll be doing in the first case. We’ll come up with a set of customers, and develop a route among them.

Read Chou, pages 236-248, for a presentation of such a routing problem. The essential logic (which is all that you need remember) is that each iteration identifies the link (or edge) between two nodes that will save the most resources out of all the remaining options. Each time a specific link is identified in an iteration, that (direction-specific) pair of nodes is removed from the options for the next iteration -- because we're assuming we don't want to go from (e.g.) E to F twice.

We’ll then use our information about our customers, and our spatial information about this route, to help us identify similar businesses that are near are route. Perhaps they might be potential customers? If so, we could serve them without much additional cost. (See notes on targeting).

Next: how to determine potential customers from the characteristics of current customers.

NOTE:

1. Chou (p. 221) defines diameter as “the maximum number of steps required to move from any node to any other node through the shortest possible routes within a connected network.” This is perhaps more explicit than, but the same implication as Taaffe, Gauthier, and O’Kelly (p. 252) definition of diameter as “the number of linkages needed to connect the two most remote nodes on the network.”

2. "An algorithm is a set of mathematical expressions or logical rules that can be followed repeatedly to find the solution to a question" [Chou, p. 234]

3. Each cell (x,y) in a new matrix (AB) which is the product of two other matrices (A and B), is the sum of:

the product of the first cell in the Xth row of the matrix A times the first cell in the Yth column of matrix B, plus
the product of the second cell in the Xth row of matrix A times the second cell in the Yth column of matrix B, plus
the product of the third cell in the Xth row of matrix A times the third cell in the Yth column of matrix B;
and on, until we've exhausted the length of the Xth row in matrix A and the length of the Yth row in matrix B.

Note that the rows of matrix A must have the same length as the columns of matrix B. In this simple network analysis, we're multiplying a square matrix by itself, so that's not a problem.
In a connectivity matrix, cell (3,1) is 0 if there's no direct link between 3 and 1; 1 if there is. Cell (1,4) is 0 if there's no direct link between 1 and 4; 1 if there is. What does the product of cell (3,1) and cell (1,4) tell us? Why would we be interested in adding this product to the product of (3,2)(2,4), to the product of (3,3)(3,4), to the product of (3,4)(4,4), to the product of (3,5)(5,4)?

copyright James W. Harrington, Jr.
revised 28 January 2004