A Convex Programming Approach for Discrete-Time Markov Decision Processes under the Expected Total Reward Criterion
DUFOUR, François
Institut Polytechnique de Bordeaux [Bordeaux INP]
Quality control and dynamic reliability [CQFD]
Institut de Mathématiques de Bordeaux [IMB]
Institut Polytechnique de Bordeaux [Bordeaux INP]
Quality control and dynamic reliability [CQFD]
Institut de Mathématiques de Bordeaux [IMB]
GENADOT, Alexandre
Quality control and dynamic reliability [CQFD]
Institut de Mathématiques de Bordeaux [IMB]
Quality control and dynamic reliability [CQFD]
Institut de Mathématiques de Bordeaux [IMB]
DUFOUR, François
Institut Polytechnique de Bordeaux [Bordeaux INP]
Quality control and dynamic reliability [CQFD]
Institut de Mathématiques de Bordeaux [IMB]
Institut Polytechnique de Bordeaux [Bordeaux INP]
Quality control and dynamic reliability [CQFD]
Institut de Mathématiques de Bordeaux [IMB]
GENADOT, Alexandre
Quality control and dynamic reliability [CQFD]
Institut de Mathématiques de Bordeaux [IMB]
< Leer menos
Quality control and dynamic reliability [CQFD]
Institut de Mathématiques de Bordeaux [IMB]
Idioma
en
Article de revue
Este ítem está publicado en
SIAM Journal on Control and Optimization. 2020-01, vol. 58, n° 4, p. 2535-2566
Society for Industrial and Applied Mathematics
Resumen en inglés
In this work, we study discrete-time Markov decision processes (MDPs) under constraints with Borel state and action spaces and where all the performance functions have the sameform of the expected total reward (ETR) criterion ...Leer más >
In this work, we study discrete-time Markov decision processes (MDPs) under constraints with Borel state and action spaces and where all the performance functions have the sameform of the expected total reward (ETR) criterion over the infinite time horizon. One of our objective is to propose a convex programming formulation for this type of MDPs. It will be shown that the values of the constrained control problem and thea ssociated convex program coincide and that if there exists an optimal solution to the convex program then there exists a stationary randomized policy which is optimal for the MDP. It will be also shown that in the framework of constrained control problems, the supremum of the expected total rewards over the set of randomized policies is equal to the supremum of the expected total rewards over the set of stationary randomized policies. We consider standard hypotheses such as the so-called continuity-compactness conditions and a Slater-type condition. Our assumptions are quite weak to deal with cases that have not yet been addressed in the literature. An example is presented to illustrate our results with respect to those of the literature.< Leer menos
Palabras clave en inglés
Markov decision process
Expected total reward criterion
Occupation measure
Constraints
Convex program
Orígen
Importado de HalCentros de investigación