DUFOUR, François; GENADOT, Alexandre

doi:10.1137/19M1255811

El sistema se apagará debido a tareas habituales de mantenimiento. Por favor, guarde su trabajo y desconéctese.

hal.structure.identifier	Institut Polytechnique de Bordeaux [Bordeaux INP]
hal.structure.identifier	Quality control and dynamic reliability [CQFD]
hal.structure.identifier	Institut de Mathématiques de Bordeaux [IMB]
dc.contributor.author	DUFOUR, François
hal.structure.identifier	Quality control and dynamic reliability [CQFD]
hal.structure.identifier	Institut de Mathématiques de Bordeaux [IMB]
dc.contributor.author	GENADOT, Alexandre
dc.date.accessioned	2024-04-04T02:48:41Z
dc.date.available	2024-04-04T02:48:41Z
dc.date.issued	2020-01
dc.identifier.issn	0363-0129
dc.identifier.uri	https://oskar-bordeaux.fr/handle/20.500.12278/191762
dc.description.abstractEn	In this work, we study discrete-time Markov decision processes (MDPs) under constraints with Borel state and action spaces and where all the performance functions have the sameform of the expected total reward (ETR) criterion over the infinite time horizon. One of our objective is to propose a convex programming formulation for this type of MDPs. It will be shown that the values of the constrained control problem and thea ssociated convex program coincide and that if there exists an optimal solution to the convex program then there exists a stationary randomized policy which is optimal for the MDP. It will be also shown that in the framework of constrained control problems, the supremum of the expected total rewards over the set of randomized policies is equal to the supremum of the expected total rewards over the set of stationary randomized policies. We consider standard hypotheses such as the so-called continuity-compactness conditions and a Slater-type condition. Our assumptions are quite weak to deal with cases that have not yet been addressed in the literature. An example is presented to illustrate our results with respect to those of the literature.
dc.language.iso	en
dc.publisher	Society for Industrial and Applied Mathematics
dc.subject.en	Markov decision process
dc.subject.en	Expected total reward criterion
dc.subject.en	Occupation measure
dc.subject.en	Constraints
dc.subject.en	Convex program
dc.title.en	A Convex Programming Approach for Discrete-Time Markov Decision Processes under the Expected Total Reward Criterion
dc.type	Article de revue
dc.identifier.doi	10.1137/19M1255811
dc.subject.hal	Mathématiques [math]/Optimisation et contrôle [math.OC]
bordeaux.journal	SIAM Journal on Control and Optimization
bordeaux.page	2535-2566
bordeaux.volume	58
bordeaux.hal.laboratories	Institut de Mathématiques de Bordeaux (IMB) - UMR 5251	*
bordeaux.issue	4
bordeaux.institution	Université de Bordeaux
bordeaux.institution	Bordeaux INP
bordeaux.institution	CNRS
bordeaux.peerReviewed	oui
hal.identifier	hal-03033727
hal.version	1
hal.popular	non
hal.audience	Internationale
hal.origin.link	https://hal.archives-ouvertes.fr//hal-03033727v1
bordeaux.COinS	ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.jtitle=SIAM%20Journal%20on%20Control%20and%20Optimization&rft.date=2020-01&rft.volume=58&rft.issue=4&rft.spage=2535-2566&rft.epage=2535-2566&rft.eissn=0363-0129&rft.issn=0363-0129&rft.au=DUFOUR,%20Fran%C3%A7ois&GENADOT,%20Alexandre&rft.genre=article

Archivos en el ítem

Archivos	Tamaño	Formato	Ver
No hay archivos asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)

Institut de Mathématiques de Bordeaux (IMB) - UMR 5251

Mostrar el registro sencillo del ítem

A Convex Programming Approach for Discrete-Time Markov Decision Processes under the Expected Total Reward Criterion

Archivos en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)