INTEGRATED RESOURCE ALLOCATION AND PLANNING IN STOCHASTIC MULTIAGENT ENVIRONMENTS by Dmitri A. Dolgov A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Computer Science and Engineering) in The University of Michigan 2006 Doctoral Committee: Professor Edmund H. Durfee, Chair Professor Kang G. Shin Professor Demosthenis Teneketzis Professor Michael P.
Wellman Associate Professor Satinder Singh Baveja UMI Number: 3224868 INFORMATION TO USERS The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleed-through, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion.
® UMI UMI Microform 3224868 Copyright 2006 by ProQuest Information and Learning Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code. ProQuest Information and Learning Company 300 North Zeeb Road P.
Box 1346 Ann Arbor, MI 48106-1346 © Dmitri A. Dolgov All rights reserved 2006 To my parents and Anya. ii ACKNOWLEDGEMENTS First and foremost, I owe a lot of gratitude to my advisor, Ed Durfee. Without his support, patience, academic guidance, and technical advice, this dissertation would not have been possible.
I especially want to thank Ed for giving me the freedom to explore and supporting me when I got excited about problems, even at times when he didn’t quite share my enthusiasm. I truly believe Ed’s top work priority is the growth of his students, and for that I offer my deepest thanks! I’m very grateful to the members of my committee, Kang Shin, Satinder Singh, Demos Teneketsis, and Michael Wellman for their valuable advice and insightful comments on my work. Michael Wellman deserves special credit for doing a very thorough job reviewing my thesis. I’m very thankful to my teachers at the University of Michigan, especially professors Martha Pollack and Kevin Compton.
The administrators of the Michigan AI Lab deserve many thanks for their help in navigating various financial and administrative jungles. The support of Kelly Cormier, Colleen Neilson, and Wendy Anderson was truly indispensable. My friends in the AI Lab contributed significantly to making my life there fun, productive, and sometimes both. Thanks to all of you, and especially Jeff C.! Among my non-AI friends, I thank Tami & Mike U., and Lisa & Derek C.
for making my life in Ann Arbor much more enjoyable. ABC deserves a mention for continued and consistent support. Also, to all the people with whom I played soccer and went skiing, thanks. I would have long withered without those games and trips! To my parents I owe so much more than can be expressed on paper or in words.
Mom, Dad, thanks for everything! Finally, I thank my fiance, friend, colleague, and editor, Anya, for her love and encouragement. iv TABLE OF CONTENTS DEDICATION. và va ii ACKNOWLEDGEMENTS .0000 pee eee iii LIST OF FIGURES. ee viii ABSTRACT ©.ga và va xi CHAPTER 1.1 Resource Allocation and Stochastic Planning .3 Overview of the Thesis.
Non-Consumable Resources: Single-Agent Model .1 Planning Under Uncertainty: Markov Decision Processes.2 Agent Model: MDPs with Resources and Capacity Constraints 19 2.3 Properties of the Single-Agent Constrained MDP .5 Binary Resource Costs. Allocation of Non-Consumable Resources .1 Multiagent Resource Allocation.2 Avoiding Bundle Enumeration .3 Distributing the Winner-Determination Problem .4 Preserving Information Privacy. cv gà và và và 72 4. Constrained MDPs with Multiple Discount Factors.1 Justification for Costs and Multiple Discounts.
75 4,2 Stationary Deterministic Policies for Constrained MDPs .3 Stationary Deterministic Policies for MDPs with Multiple Discounts 2. ng ng kg kg kia 84 43.5 Discussion and Generalization.1 Single-Agent Problem Formulation .2 Problem Properties and Complexity .3 Solution for MDPs with Constraints on the Expected Resource CostS oe 110 5.4 Probabilistic Constraints: Linear Approximation .5 Probabilistic Constraints: Polynomial Approximation.1 Calculating the Probability of Exceeding Cost Bounds125 5.2 Computing the Moments .4 Restricting to Stationary Deterministic Policies.6 Multiagent Resource Allocation. cv KT va 147 6. Multiagent MDPs with Local and Asymmetric Dependencies 150 6.1 Graphical Multiagent MDPs .2 Properties of Graphical Multiagent MDPs.3 Maximizing Social Welfare .4 Maximizing Own Welfare .1 Acyclic Dependency Graphs .2 Cyclic Dependency Graphs.00 ee eee ee ee 173 6.
Approximate Planning and Resource Allocation with Fac- tored MDP.1 Factored MDPs and Approximate Linear Programming .2 Approximation of the DualLP.2 Resource Allocation with Factored MDPs .4 Discussion: Folding Resources into MDPs .1 Summary of Contributions .2 Open Questions and Future Directions. Quà kg v kk sa 223 vii LIST OF FIGURES Figure 1. Q vn ng va 5 2.1 Unconstrained MDP example, delivery domain.2 Mapping general utility functions to constrained MDPs.3 Reduction for NP-completeness proof of constrained MDP.1 Preserving information privacy example.2 Running time for a MDP-based WDP MILP for different constraint levels, 2 .3 Comparison of the running time of MDP-based and flat combinato- rial auetÏlONS. c Q Q LH HQ ng nu gà gà xa ko 67 3.4 Scaling the MDP-based winner-determination MILP to more agents.5 Scaling of the MDP-based winner-determination MILP with the number of resource typeéS.6 Complexity of MDP-based winner-determination MILP as a function of complexity of actions’ resource requirements.7 Complexity of MDP-based winner-determination MILP when re- source requirements scale with the number of resource types.1 Example: MDP with constraint on expected cost.2 Example: MDP with two discount factors.3 Value of deterministic and randomized policies for constrained MDPs.4 Complexity profile for constrained MDPs as a function of constraint level.
20 cu gà kg NV k va 4.5 Performance profile for finding optimal deterministic policies for constrained MDPs.6 Solution time for MDP with two discount factors.1 Optimal policies for problems with constraints on consumable resources require randomization and are not uniformly optimal.2 Reduction of HC to MDP with consumable resources and probabilis- tic constraints. HQ ky và va 5.3 MDPs with probabilistic constraints are non-linear and non-convex.4 Sub-optimality of linear Markov approximation for the delivery example.9 Iterative Markov approximation for the delivery example.6 Probability of exceeding cost bounds for the linear Markov approx- imationnN ©. nà à v kg kg KV v V k Ka 5.7 Quality of single-shot Markov approximation.8 Effect of iteratively adjusting the Markov bound.9 Distribution of total cost, comparison to normal.10 Simple problem with two states and two actions.11 Quality of third-degree Legendre approximation of acdf.12 Polynomial approximation of a pdf for several policies.13 Scaling with the number of agents: consumable and non-consumable YESOUICES.14 Scaling with the number of resource types: consumable and non- consumable reSOUTC@S.15 Scaling with the number of resource types and action resource requirements. ix Agent dependency graph.
QC 153 Illustration for Theorem 6. so 164 Assembly line example. eee ee 166 Additive rewards. Two-agent problems.
175 Multiagent problems, additive rewards: existence of equilibrium strategies. ng hà k v kg k kg kia 179 Example of a simple cluster graph that forms a junction tree. 196 Efficiency of resource allocation with factored MDPs. 202 Comparison of ALP methods.
209 ABSTRACT Resource allocation is a ubiquitous problem that arises whenever scarce resources have to be distributed among multiple autonomous entities (e., people, companies, robots). Stochastic planning is also a very common problem that focuses on developing models and algorithms for behaving optimally in uncertain environments. The motivation for this dissertation is that the problems of resource allocation and stochastic planning are often very strongly intertwined: the utility for acquiring some resources is commonly determined by what goals can be achieved using these resources, while devising the best course of action for achieving these goals can involve solving a complex stochastic planning problem. An integrated approach to modeling and solving the problems of resource allocation and stochastic planning allows us to exploit problem structure that would otherwise be lost if the problems were considered separately.
The overarching goal of this dissertation is to develop computationally efficient mechanisms for allocating consumable and non-consumable resources among agents whose preferences for these resources are induced by stochastic planning problems. Towards this end, we develop new models of planning problems, based on the framework of Markov decision processes (MDPs), where the action sets are explicitly parameterized by the available resources. Given these models, we design algorithms based on linear and integer programming that simultaneously solve for optimal allocations of resources and strategies for acting in the stochastic environments. These algorithms then form the core of our mechanisms for allocating resources xi in cooperative as well as competitive multiagent settings.
We show analytically and empirically that the integrated approach leads to drastic (in many cases, exponential) improvements in computational efficiency over methods that consider the problems separately. To complement the above, we develop mechanisms that, in addition to exploiting structure in agents’ preferences arising from regularities in the underlying planning problems, also exploit structure within the agents’ MDPs. By utilizing and extending techniques based on approximate linear programming, we adapt our resource- allocation mechanisms to well-structured planning problems, represented as factored MDPs. This leads to algorithms that scale up to even larger resource-allocation problems, where the agents’ preferences are induced by MDPs with extremely large state spaces.
xI CHAPTER 1 Introduction The problem of resource allocation among multiple autonomous entities (agents) is ubiquitous in the modern world. A business manager has to distribute a limited budget among the departments of the company. A computing center faces the problem of allocating its limited resources among computational tasks. A government is in charge of distributing the scarce resources of its country (e., wireless spectrum) among its people and businesses.
Making the right allocation decisions in these and other similar scenarios can be of critical importance. However, what does it mean to allocate the resources in the “right” way? A good manager wants to maximize the long-term profitability of her business. A process that allocates resources to tasks in a computing environment should aim to get as much useful work done as possible. The goal of a professionally run and honest government is to do what is best for its people.
In all the above, the goal of the resource-allocation process is to maximize a measure of global utility that can be obtained by the agents in the system by using these resources. This, in turn, raises the question of what determines the value of a particular set of resources to an agent. Resources are used by the agents to pursue their goals and to obtain rewards on achieving the latter. However, in order to be able to achieve those goals, the agents often need to solve a nontrivial planning problem.
For example, when making decisions regarding budget allocation, a manager needs to consider how the money will be used and what will be accomplished as a result. Ina computing center, it is important to understand what sequence of steps is required to complete a task before resources can be allocated to it. Similarly, before a government can make an informed decision about how to allocate the wireless spectrum in the best possible way, someone has to evaluate the possible uses of the frequencies and the resulting benefits. Further, in any realistic domain, such a planning process is complicated by the fact that an agent faces multiple interdependent objectives, whose achievement requires executing sequences of actions whose outcomes are uncertain.
For instance, profitability of a business is subject to many external factors (demand, competition, legislation) that can seldom be predicted with certainty. The focus of this dissertation is on the interconnected problems of resource allocation and sequential decision making under uncertainty about the dynamics of the environments the agents operate in. When viewed primarily from the resource allocation perspective, this work can be characterized as a study of algorithms for resource allocation where the values of the resources to the agents are defined by the agents’ stochastic planning problems.