Mostrar el registro sencillo del ítem

hal.structure.identifierInstitut Polytechnique de Bordeaux [Bordeaux INP]
hal.structure.identifierQuality control and dynamic reliability [CQFD]
hal.structure.identifierInstitut de Mathématiques de Bordeaux [IMB]
dc.contributor.authorDUFOUR, François
hal.structure.identifierQuality control and dynamic reliability [CQFD]
hal.structure.identifierInstitut de Mathématiques de Bordeaux [IMB]
dc.contributor.authorGENADOT, Alexandre
dc.date.accessioned2024-04-04T02:48:41Z
dc.date.available2024-04-04T02:48:41Z
dc.date.issued2020-01
dc.identifier.issn0363-0129
dc.identifier.urihttps://oskar-bordeaux.fr/handle/20.500.12278/191762
dc.description.abstractEnIn this work, we study discrete-time Markov decision processes (MDPs) under constraints with Borel state and action spaces and where all the performance functions have the sameform of the expected total reward (ETR) criterion over the infinite time horizon. One of our objective is to propose a convex programming formulation for this type of MDPs. It will be shown that the values of the constrained control problem and thea ssociated convex program coincide and that if there exists an optimal solution to the convex program then there exists a stationary randomized policy which is optimal for the MDP. It will be also shown that in the framework of constrained control problems, the supremum of the expected total rewards over the set of randomized policies is equal to the supremum of the expected total rewards over the set of stationary randomized policies. We consider standard hypotheses such as the so-called continuity-compactness conditions and a Slater-type condition. Our assumptions are quite weak to deal with cases that have not yet been addressed in the literature. An example is presented to illustrate our results with respect to those of the literature.
dc.language.isoen
dc.publisherSociety for Industrial and Applied Mathematics
dc.subject.enMarkov decision process
dc.subject.enExpected total reward criterion
dc.subject.enOccupation measure
dc.subject.enConstraints
dc.subject.enConvex program
dc.title.enA Convex Programming Approach for Discrete-Time Markov Decision Processes under the Expected Total Reward Criterion
dc.typeArticle de revue
dc.identifier.doi10.1137/19M1255811
dc.subject.halMathématiques [math]/Optimisation et contrôle [math.OC]
bordeaux.journalSIAM Journal on Control and Optimization
bordeaux.page2535-2566
bordeaux.volume58
bordeaux.hal.laboratoriesInstitut de Mathématiques de Bordeaux (IMB) - UMR 5251*
bordeaux.issue4
bordeaux.institutionUniversité de Bordeaux
bordeaux.institutionBordeaux INP
bordeaux.institutionCNRS
bordeaux.peerReviewedoui
hal.identifierhal-03033727
hal.version1
hal.popularnon
hal.audienceInternationale
hal.origin.linkhttps://hal.archives-ouvertes.fr//hal-03033727v1
bordeaux.COinSctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.jtitle=SIAM%20Journal%20on%20Control%20and%20Optimization&rft.date=2020-01&rft.volume=58&rft.issue=4&rft.spage=2535-2566&rft.epage=2535-2566&rft.eissn=0363-0129&rft.issn=0363-0129&rft.au=DUFOUR,%20Fran%C3%A7ois&GENADOT,%20Alexandre&rft.genre=article


Archivos en el ítem

ArchivosTamañoFormatoVer

No hay archivos asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem