Dynamic Scheduling for Parallel Server Systems in Heavy Traffic: Graphical Structure, Decoupled Workload Matrix and Some Sufficient Conditions for Solvability of the Brownian Control Problem

V. Pesic and R. J. Williams
We consider a problem of dynamic scheduling for parallel server systems. With the exception of a few special cases, such stochastic control problems cannot be solved exactly. Here we focus on an approach pioneered by J. M. Harrison that utilizes approximating diffusion control problems. This approach has been very successfully used when the diffusion control problem can be reduced to an equivalent control problem for a one-dimensional workload process. However, it has been an outstanding open problem to make substantial progress on solving the diffusion control problem when the workload is more than one-dimensional. In this paper, towards breaking this dimension barrier, we present some new structural results concerning the diffusion control problem for parallel server systems with arbitrary workload dimension.

Specifically, we characterize the structure of a certain server-buffer graph associated with parallel server systems. We prove that this graph is a forest of trees. We then exploit this graphical structure to prove that there exists a matrix, used to define the workload process, that has a block diagonal-like structure. We call this a decoupled workload matrix. When the workload dimension is one, there is a single tree in the forest and our decoupled workload matrix agrees with a choice of workload matrix proposed by Harrison. However, when the workload dimension is more than one, our decoupled workload matrix is frequently different from Harrison's workload matrix.

We demonstrate that our decoupled workload matrix simplifies the structure of the control problem for the workload process by proving that when the original diffusion control problem has linear holding costs, the equivalent workload formulation also has a linear cost function. This is somewhat surprising as in general, one expects the workload formulation to have a convex, piecewise linear holding cost.

To illustrate the simplificiation afforded by our decoupled workload matrix, we give sufficient conditions for a certain least control process to be an optimal control for the diffusion control problem. Under these conditions, we propose a continuous review threshold-type control policy for the original parallel server system that exploits pooling of servers within trees in the server-buffer graph and uses non-basic activities connecting different trees in a critical manner. We call this partial pooling. We conjecture that this threshold policy is asymptotically optimal in the heavy traffic limit.

We illustrate the solution of the diffusion control problem and our proposed threshold control policy for a three-buffer, three-server example. In a separate work, we prove that this control policy is asymptotically optimal for this example. We believe that this is the first instance of solution of the diffusion control problem for a parallel server system with partial pooling and use of non-basic activities in the usual heavy traffic regime.

Published in Stochastic Systems, 6 (2016), 26-89. For a preprint, click here. For the published version, click here.