Translate my website



Code and Courses


Beyond Science


Addressing management questions using the analysis of multivariate population data (back up)

Collaborators: Eric Ward, NOAA/NWFSC; Mark Scheuerell, NOAA/NWFSC

Status (5/2013): on-going (scroll down for a list of past and current NOAA applications)

One of the fundamental questions to be asked as a first step in a recovery planning effort or status evaluation is: How well are populations doing and what are they likely to do in the near future? Answering this question can determine whether a population needs improvement, how likely recovery goals are to be met, may help provide advice for harvest levels of commercially valuable species, and can serve as a tool to prioritize future research efforts. While quantifying a population's risk seems simple, it is an enormously difficult problem in fisheries because our data is much more poorly behaved than in other fields such as finance, engineering, social sciences. Many fish biologists are familiar with John Shepherd's quote "Counting fish is like counting trees, except they're invisible and they move". With large populations that are poorly sampled, our counts are always uncertain and our data is typically 'gappy'-depending on survey designs or research funding, entire years may be missed. Natural populations are also affected by external forcing-large environmental drivers (e.g. sea surface temperature) may have positive or negative impacts on population abundance, and disentangling these patterns from biased counts is difficult. Similar drivers may affect the spatial distribution of populations, which can also lead to biased abundance estimates if unnoticed. Accounting for this observation error and the various environmental drivers when analyzing fisheries data is a difficult statistical problem.

The types of models used when analyzing noisy population data generally fall into the class of models called "multivariate autoregressive state-space" (MARSS) time-series models because we often have multiple datastreams ("multivariate"), we try to attribute some of the total variance to observation error and some to process variance ("state-space"), and the current population size has information about the population size in the immediate future ("autoregressive"). Many of our fisheries models, such as stock assessments, or hierarchical models, can be cast in MARSS form. Fitting these types of models is not straightforward, however, and specialized statistical algorithms are required to fit them to noisy and gappy fisheries data. In 1982, an "Expectation-Maximization" (EM) algorithm was developed for this class of model (Shumway and Stoffer 1982), and this algorithm is especially useful for analyzing fisheries data because it is robust when working with multivariate data with many missing values. But prior to the work by Holmes and Ward, this algorithm was not available in any statistical package, and thus out of reach for many fisheries biologists.

Over the last 8 years, we have worked to develop faster, more robust algorithms for fitting MARSS models and to make these algorithms widely available in free software (R). In 2004, we started working on univariate time-series models for data with high observation error and started distributing software in workshops taught at the Ecological Society Meetings and meetings of the Marine Mammal Society. In 2005-2006, we (with Steven Viscido) helped to develop a software package for multivariate time-series models (LAMBDA), but this package did not incorporate observation error. In 2008, we received funding from CAMEO-NOAA to support development of an R-package for fitting multivariate time-series data with observation error and to develop extensive documentation and training. The goal was to develop a package to fit very general MARSS models using a constrained EM algorithm (Holmes 2012). This algorithm allowed them to fit a much broader class of MARSS models than the original 1982 Shumway and Stoffer algorithm allowed. This package is targeted towards scientists from all fields who use MARSS models (economics, finance, engineering, image analysis, social sciences, and environmental sciences). The first iteration of the MARSS package was submitted to the R CRAN repository on June 2010 and continues to be updated (current version is 3.4). Based on download statistics, it is the most commonly accessed R package for fitting these models via maximum-likelihood (dlm is the other big MARSS package but is Bayesian-focused). The MARSS package is used by researchers across many different scientific fields. In Winter 2012-2013, we (Holmes, Ward and Scheuerell from NWFSC) taught a course on analysis of fisheries time-series data at the University of Washington that relied heavily on the MARSS package.

Applications resulting from this work have resulted in numerous publications (complete list below) and have provided advice to managers for species ranging from rockfish to salmon to killer whales. This work has demonstrated the utility of MARSS models for multiple types of data; in using MARSS models to evaluate hypotheses about synchrony between killer whale populations, they used raw counts; in evaluating metapopulation structure of Chinook salmon on the Columbia River, they used genetic (allele frequency) data; in evaluating rockfish status in Puget Sound, they used catch per unit effort (CPUE) data; and in examining linkages between salmon productivity and environmental drivers, they used time series of recruits per spawner (R/S). They are currently consulting with a number of MARSS related applications at the Northwest Fisheries Science Center, and other NOAA offices (Southwest, Southeast, Northeast Science Centers). Below we list most of the current and past NOAA applications. Non-NOAA applications are also increasing - recent examples include using the MARSS package to assess conservation status of terrestrial mammals in the Serengeti and willow flycatchers in Arizona.

Applications of MARSS modeling to assist fisheries management decisions

Applications: Inferring spatial structure, and evaluating hypotheses about causal mechanisms

Applications: Analysis of community structure

Development of statistical tools for multivariate stochastic time series data (back up)

Collaborators: Eric Ward, NWFSC; Brian Dennis, Idaho State Univ; Mark Scheuerell, NWFSC

Status (5/2013): on-going

Much of my time since mid 2008 has been devoted to this project which has lead to the MARSS R package. The motivation was development of methods and model selection algorithms to deal with messy multivariate ecological time-series data. Economists have been working with these kinds of models (they call them VAR models) for 30+ years, but their algorithms don't always work for ecological data since we have lots of missing values and short time series. So we need different types of algorithms and different types of inference since model uncertainty is a big issue.

The MARSS package is just a small piece of this work as the package is about 1-2 years behind our research. Some of our other recent work in this area involves the following:

Forecasting population extinction and trends using time-series modeling and diffusion approximations (back up)

Collaborators: Eric Ward, NWFSC; John Sabo, ASU; Bill Fagan, UMaryland

Status (5/2013): on-going

Real population processes are stochastic. Thus any analysis of population data must deal with this characteristic in some fashion. The pre-2000 approach was to interpret population data via models attributing variability within the data to either measurement error or process error alone. However, population data almost always contain multiple sources of variability: process error, measurement error, non-linear feedbacks, etc. Mis-attributing the sources of variability has multiple consequences ranging from misestimation of the population behavior to misestimation of the level of uncertainty associated with the data analysis. My research is focused on development of practical approaches for dealing with noisy data. The motivation behind this work is to try to understand how complex a model we need to predict passage probabilities (i.e. quasi-extinction). It has been know for some time that often a simple random walk will fit observed population time series just as well as a more complex model would fit the data. Some days, I think about this as meaning that stochastic population processes can be written as linear component (the random walk) with non-linear components of higher and higher order added (like a Taylor series). The random walk (perhaps with drift) represents the first order term. Other days, I think about it as dominant and sub-dominant eigenvalues. My research focuses on trying to estimate that random walk component and then testing, with large databases, the idea that this random walk can describe the bulk of the quasi-extinction behavior.



Analysis of demographic changes in large charismatic marine mammals (back up)

Collaborators: Lowell Fritz, NMML; Anne York, York Data Analysis, Eric Ward, NWFSC, Ken Balcomb, CWR

Status (5/2013): on-going depending on what's happening with recovery...

From the mid-1970s through 2000, the western stock of Steller sea lion (Eumetopias jubatus) declined by over 80%. This fish- and squid-eating predator, the largest eared seal (Otariidae), is distributed across the North Pacific Ocean. The western stock breeds on rookeries west of 144°W in Alaska and Russia and the eastern stock breeds to the east and south to the Channel Islands off California. In 1997, the western stock of Steller sea lion was listed as endangered under the U.S. Endangered Species Act, which created new challenges for managers of Alaska’s groundfish fishery, the most productive in the United States. Since 2000, over $120 million, the largest budget for a U.S. endangered species, has been devoted to reducing uncertainty about the factors negatively affecting the population: food limitation, killer whale predation, disease, and direct or indirect impacts from fishing. But despite well-funded and large-scale coordinated research, the complexity, indirectness and cumulative effects of these factors have made it difficult to determine which were responsible for the decline and which are primary threats to recovery. This project is focused on using population models combined with data on the numbers and age distribution of Steller sea lions in the central Gulf of Alaska to estimate the historical changes in survivorship and fecundity that drove the decline.



Estimation of community models using time-series data (back up)

Collaborators: Steven Viscido, Eric Ward, Mark Scheurell, Stephanie Hampton, Steve Katz, Kevin See

Status (5/2013): finishing up the last submissions

Current ecosystem models such as EcoSim build a model of the strengths of species’ interactions within a community primarily via diet information combined with generally an assumed linear or non-linear function to describe how diet changes with changes in the density of individuals. This approach views the community interactions as deterministic and the data (such as diet data and population sizes) as observations, with error, of this deterministic process. Another approach views the community dynamics as stochastic and the data as one possible realization of this stochastic process. This approach has recently been proposed by Ives et al. 2003. This alternative approach uses in particular time series data of population estimates of the species within the community to statistically estimate a community model. Ives et al. use a particular type of stochastic process: a first-order multivariate (or vector) autoregressive process, abbreviated MAR(1). The first-order process implies that enough information can be obtained about a community at a single point in time to predict the immediate changes in species’ abundances. MAR(1) processes assume that the interactions among species, and between species and environmental variables, are linear (at after suitably transformed). Previous research (Ives 1995a, b) has demonstrated that MAR(1) models provide relatively simple approximations to nonlinear, non-first-order processes and therefore can be used to describe the general stochastic properties of complex communities. The advantages of this approach is that it provides a statistical framework for estimating a community model, and thus provides a statistical framework for comparing different possible models that might conceivably have produced the observed data. Diet data can still be used to help constrain the model but this is added as a constraint or as a prior in the estimation process. This approach may also provide a statistically rigorous procedure for estimating community models using the type of data typically available in a fisheries management setting, e.g. stock assessments and regular stock survey data. The risk, of course, is that there simply is not enough information in count data to infer a community model. Then the question is how to use the information available to constrain the problem -- there are different philosophical approaches to that.





Science web tools (back up)

Status (5/2013): this project is now finished and is maintained FishBox

Interactive web-based content collaboration and workshop tools The Iugo-Cafe Project. These are the tools that came out of that project: