Model Assessment

Summary

The immediate goal of assessing a simulation model is to detect those features of the phenomenon that the proposed model is unable to adequately reproduce; the ultimate goal is to then locate the sources of these deficiencies, be they in the optimization search, the model`s proposed mathematical structure, the formulation and selection of the objective functions used to quantify the discrepancy between the model`s predictions and the observed phenomenon, or the model`s underlying process or conceptual structure (see figure below).

The core problem blends modelling, inference, and optimization, raising conceptual challenges for each field. For modelers, the challenge is to acknowledge models can only be assessed, not validated or verified, and that the assessment is only as good as (is conditional on) the choice of phenomenon features underlying the objectives and their quantification. This leads to an evolutionary spiral of learning about both the model structure and the most informative ways of interrogating it (the criteria or assessment characteristics based on the key phenomenon features). For statisticians, the challenge is to shift from inference conditional on the assumed model structure to inference about the model structure, more akin to diagnostics based on goodness of fit and investigating the model's Bayesian predictive distribution.

If there is insufficiently rich observations to support statistical inference, then the problem can be approached from the perspective of multi-criteria optimization. For optimizers (actually, for everybody), the challenge then is to recognize that the information basis for assessment comes from the tradeoffs revealed by the Pareto frontier and that just maximizing the number of satisfied objectives (or constraints) is insufficient.

It is easiest to grasp the core issues of this problem from a familiar perspective:

modeling: (Reynolds and Ford, Ecology, 1999; Komuro et al., Ecological Modeling, 2006),
statistical inference: (Reynolds and Golinelli, JSM Proceedings 2004; Reynolds and Golinelli in development),
multi-objective optimization or evolutionary computation: ( Komuro et al., Lecture Notes in Computer Science, Vol. 4403, 2007; Komuro et al., Proceedings of the IEEE Congress on Evolutionary Computation, CEC 2007,Singapore).

( Figure) Pareto Optimal Model Assessment Cycle (after Reynolds and Ford, Ecology 1999)

Relation to Statistical Inference

Model assessment is concerned with the Zero^th problem of statistics (Mallows, The American Statistician, 1998): model specification. The goal is to develop a coherent logic of structural inference for settings where the model structure is based on some degree of scientific content (in contrast to being simply a parsimonious empirical summary of observed relationships).

Mechanistic models are widely used in the biological and ecological sciences to define and assess hypothesized processes giving rise to an observed phenomenon. Generally, considerable uncertainty is associated with the selection of both the model components and their mathematical representations. Even when the primary focus is prediction rather than model assessment, the predictive impact of this structural uncertainty often greatly outweighs that of the parameter uncertainty (see Wood & Thomas, Proceedings of the Royal Society of London B, 1999, for a nice demonstration).

Thus the initial inference task when developing such models is structural inference or model assessment rather than parameter inference (Mallows 1998, Reynolds & Ford 1999). From a statistical perspective, the issue is whether the (joint) predictive distribution derived from the proposed model structure is coherent with the observed (joint) distribution.

Note that this refers to the predictive distribution of the chosen characteristics of interest, which will commonly only be implicitly defined by the model structure (Diggle & Gratton 1984). This can generate a conflict if traditional inference methods are used uncritically: standard methods start by assuming this coherency via conditioning on the model, thus identifying the sample's sufficient statistics. Yet the sufficient statistics often are not the assessment characteristics of interest. If this distinction is not recognized, standard model fitting followed by goodness of fit assessments may either fail to detect deficiencies or may improperly identify deficiencies that really stem from contrasting information content in the sufficient statistics vs the assessment characteristics of interest (Reynolds & Golinelli 2004). In a sense, this reverses the sequence of traditional statistical inference methods

Model Assessment ⇒ Parameter Inference
rather than Parameter Inference ⇒ Goodness of Fit,

raising a potential for conflicting inferences depending on the sequencing and conditioning adopted.

When only limited observations are available on the characteristics of interest, their joint distribution cannot be adequately estimated, undermining standard statistical approaches for estimating and assessing coherency of the model's posterior predictive distribution. Alternatively, the coherency problem can be seen as a multi-criteria optimization problem: are there any parameterizations that allow the proposed model structure to simultaneously adequately reproduce all of these characteristics? In other words, what is the model's Pareto Frontier?

Pareto Optimality

Model assessment investigates the Pareto frontier of the multi-objective minimization problem defined by the trio (Reynolds and Ford, Ecology, 1999):

M()	the simulation model, producing multivariate output
X ⊆ ℜ^m	the feasible parameter space
F=(F₁(M()),...,F_n(M())): ℜ^m ⇒ ℜⁿ	vector of objective functions measuring n distinct features of model performance

Each model parameterization X is a potential solution of the simultaneous minimization problem defined by F. The technical task is to reveal if any parameterization can simultaneously minimize every component of F; generally none can. In such a case, model assessment investigates the model predictions from the solutions in the Pareto optimal set for insight into the sources of the proposed model structure's deficiencies (ibid).

Definition: Pareto Optimality
Parameterization X dominates Y (X >_Pareto Y) Û
" i, 1 £ i £ n,       F_i(M(X)) £ F_i(M(Y)) and
$ i, 1 £ i £ n,      such that F_i(M(X)) < F_i(M(Y)).

X is non-dominant to Y (X || Y) Û
$ i, j, i ¹ j, such that       F_i(M(X)) < F_i(M(Y))       and      F_j(M(Y)) < F_j(M(X))

The Pareto optimal set, P_F(X) Í X, is the set of all non-dominated solutions with respect to the vector of objective functions F, i.e. the set of solutions which are mutually non-dominated and are not dominated by any other Y in the search range.
The Pareto frontier, F_F Í ℜⁿ, is the associated set of their objective vectors, F_F(X) = {(F₁(M(X)), ..., F_n(M(X))) | X Î P_F(X) }.

Pareto Frontier for 2D simultaneous minimization problem.

EXAMPLE: If the points (black and red) are all of the feasible outcomes, then the red points (connected by the line segments) are the Pareto Frontier for simultaneously minimizing both criteria; their associated parameterizations are the Pareto Optimal Set.

Objective Functions for Model Assessment

Defining the objective functions for model assessment raises two questions:

Which characteristic features of the real-world phenomenon must the model be able to reproduce to be deemed adequate?
How should adequacy be quantified or measured?

Broad guidelines for selecting objective functions are given in Reynolds and Ford, Ecology, 1999. Note that a key step of the model assessment cycle assesses the adequacy of the assessment criteria choices. This issue is also briefly addressed in Komuro et al., Ecological Modeling, 2006.

The objective functions may be continuous measures of discrepancy between a model prediction and a target value. However, often a lack of quantitative data from the phenomena limits the discrepancy measure to a simple binary function - does the prediction fall within an acceptable range of values (Horn 1981, Reynolds & Ford 1999)? E.g., the objective is an implicitly defined constraint. In model assessment this is simply one end of the spectrum of objective function definitions, a spectrum driven by data availability and the nature of the feature and model, e.g., deterministic or stochastic. Fuzzy objective functions are possible. In general, a model assessment can use both continuous objectives and constraints, though most optimization software assumes all objectives have similar quality - either all continuous or all constraints.

Optimization

In most problems of interest, the relationship between the parameterization and the resulting objective vectors is not analytically tractable and must be investigated by simulation. The Pareto frontier approximation can be obtained

by brute force, e.g., Monte Carlo simulation over the parameter space (ideally, over their joint distribution if that much information is available), followed by post-processing of the resulting objective vectors to calculate the Pareto frontier, or
by conducting an optimization search.

The choice between the two methods will depend on the dimensionality of the parameter space and the computation effort required for model simulation.

Software for Generating Pareto Frontiers

Applications

The assessment of the individual-based model of stand development and canopy competition, WHORL, demonstrates the basic steps of the multi-criteria assessment process. The difficult stage of interpreting the assessment results to gain insight into sources of model deficiencies is further illustrated in Komuro et al., Lecture Notes in Computer Science, Vol. 4403, 2007; and Komuro et al., Proceedings of the IEEE Congress on Evolutionary Computation, CEC 2007,Singapore.

Assessing an individual-based model of canopy competition, WHORL.
Assessing a process model of shoot growth. This application is also discussed in the optimization-oriented papers Komuro et al., Lecture Notes in Computer Science, and Komuro et al., Proceedings of the IEEE Congress on Evolutionary Computation, CEC 2007.
Assessing a stochastic process model of feline hematopoiesis

Return to Joel Reynolds' home page; Department of Statistics home page. Last Edit: 4 Sept 2007