Exact and approximate dynamic programming

My recent work has focused on (i) deriving the structure of optimal policies for analytically tractable stochastic dynamic programs, and (ii) developing mathematical programming- and simulation-based algorithms for approximate solution of large-scale stochastic dynamic programs. Some examples of my work in this area are listed here.

1.   Model predictive control for spatiotemporally adaptive intensity modulated radiation therapy for caner: The standard approach in intensity modulated radiation therapy for cancer is to optimize the radiation intensity profile at the beginning of a multi-week treatment course so as to maximize radiation dose to the tumor while limiting dose to the healthy tissue nearby. We have developed a model predictive control framework that, in contrast to this conventional static-deterministic approach, adapts to the stochastic spatiotemporal evolution of individual patients' tumor-response over the treatment course. This is achieved by using quantitative functional images to learn each patient's tumor-behavior and by incorporating this information into the state of a stochastic control formulation where the objective is to minimize the expected number of tumor cells remaining at the end of the treatment course subject to upper bound constraints on the biologically effective dose delivered to the healthy tissue. The dimension of the resulting state- and action-spaces can number in the thousands, and we have developed an interior point algorithm to solve the large-scale convex programs that arise in our model predictive control approach to this problem. Computer simulations have shown that our method can result in a significant improvement in treatment efficacy as compared to the conventional static-deterministic approach.


2.   Lagrangian relaxation, basis function approximation, and constraint generation for large-scale weakly coupled stochastic dynamic programs: In weakly coupled stochastic dynamic programs, the state- and action-spaces are Cartesian products of finite sets. The transition probabilities are multiplicatively separable and the rewards are additively separable over these sets. Feasible actions are defined by coupling constraints across different components of the action-vector. Exact solution of these problems is intractable owing to the curse-of-dimensionality. Problem decomposition using Lagrangian relaxation is one possible approximation approach that works well in some cases. However, in many realistic weakly coupled problems, the decomposed problems themselves are multidimensional, and hence the relaxed problem is intractable. We have developed a hybrid method based on Lagrangian relaxation, basis function approximation of value functions, and constraint generation to tackle such large-scale weakly coupled stochastic dynamic programs. We applied this methodology to allocation and advanced scheduling problems and observed that the sequence of decisions obtained by our hybrid method significantly outperformed other heuristic policies.


3.   Approximate fictitious play for model-free, and multi-action stochastic dynamic programs: The challenge in model-free stochastic dynamic programs is that the distribution of uncertainty and the expected rewards are not known to the decision maker a priori. Thus the decision maker faces the daunting task of simultaneously learning and optimizing the dynamic system under consideration. We have developed an approximation approach rooted in the game theoretic learning paradigm of fictitious play to solve such model-free problems. The idea is to model the states of the problem as players in a game of common interest, namely, that of optimizing the expected system performance. We have shown that a carefully designed mechanism for repeated interaction among these players enables them to asymptotically achieve their objective. Computational experiments with stochastic inventory control problems have shown that this approach can be effective in practice. In multi-action stochastic dynamic programs, the state-space is relatively small but it is the large dimension of the decision vector that renders exact solution intractable. We have developed an approximate solution method that animates this problem as a game of common interest between players that represent different components of the decision vector. We have shown that, by implementing a fictitious play-type best response dynamics, these players converge to a locally optimal value of the problem. We have applied this algorithm to jointly optimize strategic, tactical and operational decisions in operations management.


4.   Nonstationary, infinite-horizon dynamic programming: Dynamic systems typically studied in Operations Research often do not have a pre-determined time of extinction. Thus, using a finite planning horizon introduces end-of-study effects and results in short-sighted decisions. Therefore, infinite-horizon models of such problems are ubiquitous. However, a limiting assumption in most of this literature is that problem data do not change over time. A key example is stationary stochastic dynamic programs. Relaxing the stationarity assumption is difficult because, while computing an optimal policy in a stationary infinite-horizon dynamic program can be reduced to a finite-dimensional problem through Bellman's equations, a truly nonstationary problem is necessarily infinite-dimensional. These nonstationary infinite-horizon dynamic programs have traditionally been studied using a heuristic, rolling horizon approach. Our work instead attempts to develop finitely implementable algorithms that asymptotically discover optimal policies in these problems.


5.   Optimal design of online, sequential auctions: Small and large retailers, and private and public institutions use online, sequential auctions as a revenue management and inventory clearing tool. Auctioneers need to choose multiple auction design parameters such as the minimum bid and the lot size in each auction so as to maximize their total expected revenue. We have developed stochastic dynamic programming formulations of these auction design problems and have derived the structure of optimal minimum bid and lot size policies.


6.   Dynamic programming for optimal Markov chain sampling: Markov chain samplers used in stochastic search algorithms for optimization are often parameterized by "temperature." This temperature is initially set at a high value and then it is reduced gradually to zero. As a result, the parameterized Markov chain is run in multiple, sequential phases defined by a temperature schedule. Thus, the question is how one should allocate a given computational budget such as a fixed number of iterations to different phases. We formulated this problem as a dynamic program where the objective was to minimize the total variation distance between the Markov chain and the target distribution at the end of all phases. A solution of this dynamic program yields an optimal Markov chain sampling policy. In practice, the dynamic program is solved approximately by utilizing bounds on the total variation distance, which in turn rely on bounds on the second largest eigenvalue of the Markov chainŐs transition matrix.


Markov chain sampling

Another component of my work focuses on the design and analysis of Markov chain sampling algorithms, and on their application to optimization. Some examples of my work in this area are listed here.

1.   Decentralized search with local information, and the small world problem: We have designed a family of Markov chains to perform decentralized search with local information on a disc and on the surface of a sphere. This work extends Jon KleinbergŐs analysis of the discrete small world problem to continuous domains. We have shown that there is a unique Markov chain within this family whose expected hitting time from a source to a destination is polylogarithmic in the relative size of the disc or the sphere. Using this family of Markov chains, we have designed a stochastic search algorithm called adaptive parameterized improving hit-and-run for solving Lipschitz continuous optimization problems. Computational experiments have shown that this algorithm outperforms the traditional improving hit-and-run algorithm on several test problems.


2.   Sampling from integer subsets defined by membership oracles: We have designed a Markov chain sampling algorithm called discrete hit-and-run to sample from probability distributions defined over arbitrary, bounded subsets of integer lattices. The subset may be defined only through a membership oracle. We have shown that the mixing time of this Markov chain is polynomial in the dimension of the lattice for some simple subsets.


3.   Stochastic search for simulation-optimization: I am interested in designing stochastic search algorithms for solving continuous optimization problems where the objective function equals the expected value of a stochastic performance measure that can only be estimated by repeated simulations. Our work in this area has thus far focused on convergence analysis of a class of Markov chain sampling methods for such simulation-optimization problems.