Research
Exact
and approximate dynamic programming
My recent work has focused on (i)
deriving the structure of optimal policies for analytically tractable
stochastic dynamic programs, and (ii) developing mathematical programming- and simulation-based
algorithms for approximate solution of large-scale stochastic dynamic programs.
Some examples of my work in this area are listed here.
1.
Model predictive control for spatiotemporally adaptive intensity
modulated radiation therapy for caner: The
standard approach in intensity modulated radiation therapy for cancer is to
optimize the radiation intensity profile at the beginning of a multi-week
treatment course so as to maximize radiation dose to the tumor while limiting
dose to the healthy tissue nearby. We have developed a model predictive control
framework that, in contrast to this conventional static-deterministic approach,
adapts to the stochastic spatiotemporal evolution of individual patients'
tumor-response over the treatment course. This is achieved by using
quantitative functional images to learn each patient's tumor-behavior and by
incorporating this information into the state of a stochastic control
formulation where the objective is to minimize the expected number of tumor
cells remaining at the end of the treatment course subject to upper bound
constraints on the biologically effective dose delivered to the healthy tissue.
The dimension of the resulting state- and action-spaces can number in the
thousands, and we have developed an interior point algorithm to solve the
large-scale convex programs that arise in our model predictive control approach
to this problem. Computer simulations have shown that our method can result in
a significant improvement in treatment efficacy as compared to the conventional
static-deterministic approach.
2.
Lagrangian relaxation, basis function approximation, and constraint generation
for large-scale weakly coupled stochastic dynamic programs: In weakly coupled stochastic dynamic programs, the
state- and action-spaces are Cartesian products of finite sets. The transition
probabilities are multiplicatively separable and the rewards are additively
separable over these sets. Feasible actions are defined by coupling
constraints across different components of the action-vector. Exact solution of
these problems is intractable owing to the curse-of-dimensionality. Problem
decomposition using Lagrangian relaxation is one
possible approximation approach that works well in some cases. However, in many
realistic weakly coupled problems, the decomposed problems themselves are
multidimensional, and hence the relaxed problem is intractable. We have
developed a hybrid method based on Lagrangian relaxation, basis function approximation of value functions,
and constraint generation to tackle such large-scale weakly coupled stochastic
dynamic programs. We applied this methodology to allocation and advanced
scheduling problems and observed that the sequence of decisions obtained by our
hybrid method significantly outperformed other heuristic policies.
3.
Approximate fictitious play for
model-free, and multi-action stochastic dynamic programs: The challenge in model-free stochastic dynamic programs is
that the distribution of uncertainty and the expected rewards
are not known to the decision maker a priori. Thus the decision
maker faces the daunting task of simultaneously learning and optimizing the
dynamic system under consideration. We have developed an approximation approach
rooted in the game theoretic learning paradigm of fictitious play to solve such
model-free problems. The idea is to model the states of the problem as players
in a game of common interest, namely, that of optimizing the expected system
performance. We have shown that a carefully designed mechanism for repeated interaction
among these players enables them to asymptotically achieve their objective.
Computational experiments with stochastic inventory control problems have shown
that this approach can be effective in practice. In multi-action stochastic
dynamic programs, the state-space is relatively small but it is the large
dimension of the decision vector that renders exact solution intractable. We
have developed an approximate solution method that animates this problem as a
game of common interest between players that represent different components of
the decision vector. We have shown that, by implementing a fictitious play-type
best response dynamics, these players converge to a locally optimal value
of the problem. We have applied this algorithm to jointly optimize strategic,
tactical and operational decisions in operations management.
4.
Nonstationary, infinite-horizon dynamic programming: Dynamic systems typically studied in Operations
Research often do not have a pre-determined time of extinction. Thus, using a
finite planning horizon introduces end-of-study effects and results in short-sighted decisions. Therefore, infinite-horizon
models of such problems are ubiquitous. However, a limiting assumption in most
of this literature is that problem data do not change over time. A key example
is stationary stochastic
dynamic programs. Relaxing the stationarity
assumption is difficult because, while computing an optimal policy in a
stationary infinite-horizon dynamic program can be reduced to a
finite-dimensional problem through Bellman's equations, a truly nonstationary problem is necessarily infinite-dimensional.
These nonstationary infinite-horizon dynamic programs
have traditionally been studied using a heuristic, rolling horizon approach.
Our work instead attempts to develop finitely implementable algorithms that
asymptotically discover optimal policies in these problems.
5.
Optimal design of online,
sequential auctions: Small and large retailers, and
private and public institutions use online, sequential auctions as a revenue
management and inventory clearing tool. Auctioneers need to choose multiple
auction design parameters such as the minimum bid and the lot size in each
auction so as to maximize their total expected revenue. We have developed
stochastic dynamic programming formulations of these auction design problems
and have derived the structure of optimal minimum bid and lot size
policies.
6.
Dynamic programming for optimal
Markov chain sampling: Markov chain samplers used in
stochastic search algorithms for optimization are often parameterized by
"temperature." This temperature is initially set at a high value and
then it is reduced gradually to zero. As a result, the parameterized Markov
chain is run in multiple, sequential phases defined by a temperature schedule.
Thus, the question is how one should allocate a given computational budget
such as a fixed number of iterations to different phases. We formulated this
problem as a dynamic program where the objective was to minimize the total
variation distance between the Markov chain and the target distribution at the
end of all phases. A solution of this dynamic program yields an optimal
Markov chain sampling policy. In practice, the dynamic program is solved
approximately by utilizing bounds on the total variation distance, which in
turn rely on bounds on the second largest eigenvalue of the Markov chainŐs
transition matrix.
Markov
chain sampling
Another component of my work focuses on the design and
analysis of Markov chain sampling algorithms, and on their application to
optimization. Some examples of my work in this area are listed here.
1.
Decentralized search with local information, and the small
world problem: We have designed a family of Markov
chains to perform decentralized search with local information on a disc and on
the surface of a sphere. This work extends Jon KleinbergŐs analysis of the
discrete small world problem to continuous domains. We have shown that there is
a unique Markov chain within this family whose expected hitting time from a
source to a destination is polylogarithmic in the
relative size of the disc or the sphere. Using this family of Markov chains, we
have designed a stochastic search algorithm called adaptive parameterized
improving hit-and-run for solving Lipschitz
continuous optimization problems. Computational experiments have shown that
this algorithm outperforms the traditional improving hit-and-run algorithm on
several test problems.
2.
Sampling from integer subsets defined by membership oracles: We have designed a Markov chain sampling algorithm called
discrete hit-and-run to sample from probability distributions defined over
arbitrary, bounded subsets of integer lattices. The subset may be defined only
through a membership oracle. We have shown that the mixing time of this Markov
chain is polynomial in the dimension of the lattice for some simple subsets.
3.
Stochastic search for simulation-optimization: I am interested in designing stochastic search algorithms for
solving continuous optimization problems where the objective function equals
the expected value of a stochastic performance measure that can only be
estimated by repeated simulations. Our work in this area has thus far focused
on convergence analysis of a class of Markov chain sampling methods for such
simulation-optimization problems.