Markov decision processes: the control of high-dimensional systemsMany business problems are characterized by a periodically recurring decision problem. The repetitive aspect of the decision problem does not make the decision making necessarily easier. This is due to the consequences that decisions have on both the short and long term. The short term effect consists primarily of the incurred costs due to exercising a decision until the next decision epoch. The effect on the long term manifests itself by a potentially different state of environment at the next decision epoch; a decision, that has a low cost now, can indeed result in situations in which decisions with only high costs can be selected. In order to minimize the costs, the decisions cannot be studied in isolation, but have to related to future states as well. A Markov decision process is a mathematical formalization to describe and analyze the above mentioned problem. The term "Markov" specifies that the future states of the system depend on the current state only and not on the past states. This property seems to limit the applicability of Markov decision processes, however, many relevant practical problems posses this property. One can think of, for example, the computation of the optimal order size in inventory management systems, the making of work schedules such that enough capacity is available to provide a certain service level, and the routing in communication networks such that waiting times are minimized. Solving these problems by directly applying Markov decision theory is not straightforward, since the space of solutions is high-dimensional. This implies that computation of optimal decision rules is not feasible within reasonable time. In this thesis we develop algorithms for the computation of (nearly) optimal decision rules in high-dimensional systems. In Chapter 1 we start with a survey of the literature on this topic. Next, the formal theory of Markov decision processes is discussed in Chapter 2. Extra emphasis is put on the role of value functions within this theory. The value function plays a central role in one-step policy improvement and in neuro-dynamic programming; these are two techniques to approximate optimal decision rules. In Chapter 3 we systematically analyze the value function of birth-death processes. The well-known queueing systems, such as the multi-server queue, the single server queue, and the infinity server queue, are special cases of it. In Chapter 4 we study more complex queueing systems, such as the priority queue and the tandem queue. In Chapter 5 we consider Markov decision processes in which the decision maker is confronted with additional uncertainty, apart from the uncertainty of the future development of the system. This uncertainty can be due to partial observation of the state of the system, or because of partial specification of transitions between states of the system when a decision is taken. We show that the latter is a special case of the former, and that the problem can be cast as a standard Markov decision process. However, the reformulation causes the system to be high-dimensional again. Although the number of state configurations is very large, only a limited fraction of them can be actually attained. This make it possible to reduce the system in dimensionality and to analyze it. These techniques will be illustrated in Chapter 6 and 7 by means of routing problems and multi-armed bandit problems. Ph.D. Thesis, VU University Amsterdam, 2002. [pdf]
|
