Hostname: page-component-77f85d65b8-jkvpf Total loading time: 0 Render date: 2026-03-27T08:30:27.737Z Has data issue: false hasContentIssue false

Multiserver-job response time under multilevel scaling

Published online by Cambridge University Press:  27 March 2026

Isaac Grosof*
Affiliation:
Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, IL, USA
Hayriye Ayhan
Affiliation:
H. Milton School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, USA
*
Corresponding author: Isaac Grosof; Email: izzy.grosof@northwestern.edu
Rights & Permissions [Opens in a new window]

Abstract

We study the multiserver-job setting in the load-focused multilevel scaling limit, where system load approaches capacity much faster than the growth of the number of servers $n$. We consider the “1 and $n$” system, where each job requires either one server or all $n$. Within the multilevel scaling limit, we examine three regimes: load dominated by $n$-server jobs, 1-server jobs, or balanced. In each regime, we characterize the asymptotic growth rate of the boundary of the stability region and the scaled mean queue length. We demonstrate that mean queue length peaks near balanced load via theory, numerics, and simulation.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - SA
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-ShareAlike licence (http://creativecommons.org/licenses/by-sa/4.0), which permits re-use, distribution, and reproduction in any medium, provided the same Creative Commons licence is used to distribute the re-used or adapted article and the original article is properly cited.
Copyright
© The Author(s), 2026. Published by Cambridge University Press.
Figure 0

Figure 1. The distribution of the number of CPUs requested in Google’s Borg trace, published in Tirmazi et al. [32]. The number of CPUs is normalized to the size of the smallest request observed, not an absolute value. The peak of the distribution is around 500 normalized CPUs, and there is a significant probability mass anywhere from $1$ to $10^5$ normalized CPUs.

Figure 1

Figure 2. Exact versus asymptotic formulas in the $n$-server dominated load regime for $\mu$ and $\mathbb{E}[\Delta(Y_d)]$ as functions of $n$. Parametrization: $p_n=n^{-0.5}$ and $\mu_1=\mu_n=1$.

Figure 2

Figure 3. Exact versus asymptotic formulas in the balanced load regime for $\mu$ and $\mathbb{E}[\Delta(Y_d)]$ as functions of $n$. Parametrization: $p_n=n^{-1}$ and $\mu_1=\mu_n=1$.

Figure 3

Figure 4. Exact versus asymptotic formulas in the $1$-server dominated regime for $\mu$ and $\mathbb{E}[\Delta(Y_d)]$ as functions of $n$. Parametrization: $p_n=n^{-2}$ and $\mu_1=\mu_n=1$.

Figure 4

Figure 5. Main setting with server needs $1$ and $n$, $\mu_1 = \mu_n = 1$, and $p_n = 1/n^\alpha$. (a) Throughput $\mu/n$, calculated using Lemma 5.3. (b) Scaled mean queue length $\mathbb{E}[\Delta(Y_d)]/n$, calculated using Lemma 5.5.

Figure 5

Figure 6. Simulation results ($10^8$ jobs per data point) for mean queue length when $p_{10} = 1/n^\alpha$ and $\lambda$ is chosen so that a constant fraction of capacity is utilized. (a) Main setting with server needs $1$ and 10, $n=10$ servers and $\mu_1 = \mu_{10} = 1$. Equal-area setting with server needs $1$ and $10$, $n=10$ servers and $\mu_1 = 1, \mu_{10} = 10$.

Figure 6

Figure 7. Scaled service rate setting with server needs $1$ and $n$, $\mu_1 = n, \mu_n = 1$ and $p_n = 1/n^\alpha$. (a) Throughput $\mu/n^2$, calculated using Lemma 5.3. (b) Scaled mean queue length $\mathbb{E}[\Delta(Y_d)]/n^2$, calculated using Lemma 5.5.

Figure 7

Figure 8. Calculations and simulation for the half-size large jobs setting with server needs $1$ and $n/2$, $\mu_1 = \mu_{n/2} = 1$ and $p_n = 1/n^\alpha$. (a) Throughput $\mu/n$, calculated using [14, Section 5], [4, Section 5.1]. (b) Simulation results ($10^8$ jobs per data point) for mean queue length when $\lambda$ is chosen so that a constant fraction of capacity is utilized and $n=10$ servers.

Figure 8

Figure 9. Simulation results ($10^8$ jobs per data point) for three-class setting with server needs $1, 5$, $10$, $n=10$ servers, $\mu_1 = \mu_5= \mu_{10} = 1$, and $p_5 = p_{10} = n^\alpha/2$.

Figure 9

Figure A1. Exact versus asymptotic formulas in the $n$-server dominated regime for $\mu$ and $\mathbb{E}[\Delta(Y_d)]$ as functions of $n$. Parametrization: $p_n=n^{-0.5}$ and $\mu_1=10, \mu_n=1$.

Figure 10

Figure A2. Exact versus asymptotic formulas in the balanced load regime for $\mu$ and $\mathbb{E}[\Delta(Y_d)]$ as functions of $n$. Parametrization: $p_n=n^{-1}$ and $\mu_1=10, \mu_n=1$.

Figure 11

Figure A3. Exact versus asymptotic formulas in the $1$-server dominated regime for $\mu$ and $\mathbb{E}[\Delta(Y_d)]$ as functions of $n$. Parametrization: $p_n=n^{-2}$ and $\mu_1=10, \mu_n=1$.

Figure 12

Figure A4. Exact versus asymptotic formulas in the $n$-server dominated regime for $\mu$ and $\mathbb{E}[\Delta(Y_d)]$ as functions of $n$. Parametrization: $p_n=n^{-0.5}$ and $\mu_1=1, \mu_n=10$.

Figure 13

Figure A5. Exact versus asymptotic formulas in the balanced load regime for $\mu$ and $\mathbb{E}[\Delta(Y_d)]$ as functions of $n$. Parametrization: $p_n=n^{-1}$ and $\mu_1=1, \mu_n=10$.

Figure 14

Figure A6. Exact versus asymptotic formulas in the $1$-server dominated regime for $\mu$ and $\mathbb{E}[\Delta(Y_d)]$ as functions of $n$. Parametrization: $p_n=n^{-2}$ and $\mu_1=1, \mu_n=10$.