Home | Business News | Browse by Publication | M | Management Science

Service interruptions in large-scale service systems.

Publication: Management Science
Publication Date: 01-SEP-09
Format: Online
Delivery: Immediate Online Access

Article Excerpt
1. Introduction

A clear trend in call centers and many other service systems is increasing scale. Whereas, formerly, a large call center might employ hundreds of agents, today a large call center employs thousands of agents.

1.1. Economy of Scale

As discussed by Gans et al. (2003) and Aksin et al. (2007) in recent surveys of customer contact centers and their modelling, there is a good reason to see increasing scale, because there is a significant economy of scale, strongly supported by queueing theory. Whitt (1992) applied queueing theory to explain, support, and quantify the economy of scale in many-server service systems. This economy of scale was further exposed in a cost-benefit framework by Borst et al. (2004). Indeed, from the first study of many-server queueing models by A. K. Erlang a century ago, the advantage of large scale has been recognized and quantified; see Brockmeyer et al. (1948). Simulations and mathematical analysis show that performance tends to improve as the number of servers increases with the utilization per server held fixed. Alternatively, the utilization per server can approach the upper limit 100% as the number of servers increases with various performance measures held fixed. Qualitatively, the advantage of large scale is supported by stochastic comparisons in multiserver queueing models, as in Smith and Whitt (1981). There it is shown that the appropriate overall measures of performance are improved (the level of congestion is decreased) when two separate service systems with common service times are combined.

As we will explain in [section]3, the advantage of large scale is also supported by stochastic-process limits for many-server queueing models in the quality-and-efficiency-driven (QED) many-server heavy-traffic limiting regime; e.g., see Halfin and Whitt (1981), Garnett et al. (2002), and Pang et al. (2007). In the QED regime, the scale--as represented by the arrival rate and the number of servers--is allowed to increase without bound, where these two measures of scale increase together appropriately, so that supply matches demand. In that limit, simultaneously, waiting times become negligible (quality), while server utilizations approach 100% (efficiency).

1.2. Problems Posed by Large Scale

The purpose of this paper is to temper the enthusiasm for large scale. Here we emphasize that the quality-and-efficiency gains from large scale are achieved at some risk, because there can be severe congestion if the system does not operate as planned. In particular, here we show that large scale makes the system more vulnerable to service interruptions when (i) most customers remain waiting in the system until they can be served, and (ii) many servers are unable to function during the interruption, as may occur with a system-wide computer failure. Moreover, we show how to quantify the performance impact of the interruptions in a relatively tractable way, so that the trade-off between efficiency gains and interruption costs associated with increasing scale can be assessed.

We were motivated to consider this case because of our own personal experience at a large Department of Motor Vehicles (DMV) office in New York City, where hundreds of people were waiting to complete various processing tasks. During that visit, there was a system-wide computer failure that lasted for about 90 minutes, which stopped all service, and yet almost all customers waited to complete their business. Customer patience in that setting is understandable, because there was a large invested cost in coming to the DMV office in the first place, and in acquiring a place in line, and the length of the interruption was uncertain. It was natural to hope that the difficulty might be resolved any minute. Because of the large scale, hundreds of customers experienced hours of extra delays. Much more serious consequences could occur in a large hospital, transportation system or food-distribution system. Clearly, customer delays constitute only one component of the cost of service interruptions, but they are an important component.

Our results here are consistent with other recent research exposing difficulties associated with increasing scale. From Whitt (2006a, b) and Bassamboo et al. (2006a, b), we see that large-scale service systems are vulnerable to uncertainty about the arrival and service rates. Whitt (2006a) showed that the sensitivity of the principal performance measures to the arrival rate and the service rate increase with increasing scale. In particular, the arrival-rate and service-rate elasticities of various performance measures grow proportionally to the square root of the number of servers in the QED regime. (For example, if E is the arrival-rate elasticity of the delay probability, then a 1% increase in the arrival rate tends to produce an E% increase in the delay probability.)

We next describe our modelling and analysis approach. At the end of [section]2, we indicate how the rest of the paper is organized.

2. Modelling

The congestion impact of a service interruption clearly depends on what happens to the customers during the interruption: Arrivals may either continue or stop. The customers already in the system may either remain waiting or they may leave without receiving service.

2.1. Pure-Delay Model and Pure-Loss Model

We will consider a range of cases, but we will primarily focus on two extreme cases: First, arrivals may continue and all customers may remain waiting; second, new arrivals may refuse to enter and all customers in the system may leave immediately without receiving service. The first case has delays but no losses, whereas the second case has losses but no delays. Consistent with intuition, the number of customers affected in the pure-delay case tends to be much greater than in the pure-loss case, because waiting customers not only experience their own delays, but increase the delays of other customers.

We are primarily concerned with the pure-delay case, which requires more careful analysis. In the pure-delay case, the congestion impact of service interruptions increases with increasing scale. As we explain in [section]3, increasing scale allows the server utilization (or traffic intensity) to be higher, given standard quality-of-service constraints. (That is the much touted economy of scale in this context.) As a consequence the recovery rate after the interruption has ended tends to be slower. Thus, with increasing scale, the recovery time tends to increase and the performance during that event degrades significantly. The bad performance spreads to customers that arrive long after the interruption has ended.

In contrast, in the pure-loss case, the congestion impact is much less. Many customers fail to receive service at all, which naturally may be regarded as a more serious penalty, but there is little impact on other customers that arrive after the interruption has ended. Even though the impact of lost service may be great, relatively few customers will be affected if interruptions are rare. In contrast, with large-scale pure-delay systems, even rare short interruptions can have a dramatic impact on congestion, because they can produce long recovery times.

2.2. A More General Model

But the two extremes discussed above are not the only cases. In practice, service systems tend to operate in between these two extremes, often having customer abandonment after some waiting. Fortunately, abandonment usually tends to make the system behave more like the pure-loss model; see [section]6.

To provide a basis for further systematic analysis, we also consider a more general model that covers a wide range of intermediate cases, allowing customer abandonment at various rates. When the system is operating normally, customers will be served and abandon from queue at nominal rates. In particular, when there is no interruption, we assume that the system behaves as the Markovian M/M/n + M (Erlang-A or Palm) model with unlimited waiting space, the first-come-first-served (FCFS) service discipline, arrival rate [lambda], individual service rate [[mu].sub.1], and individual abandonment rate [[theta].sub.1].

Here is what happens during the service interruptions: First, we assume that arrivals continue arriving at rate [lambda], even during the interruption. When an interruption occurs, it lasts for a random length of time, the down time D. Throughout that interruption, a random number F of the servers remain functioning, which may range from to n; we think of F as being proportional to n, so that F/n is the random proportion of functioning servers.

Customers in service at functioning servers continue receiving service, but at a new service rate, [[mu].sub.2] instead of [[mu].sub.1]. That rate [[mu].sub.2] may be slower than [[mu].sub.1], reflecting service degradation caused by the interruption, or that rate may be faster, because of a special effort to provide exceptional service during the interruption. Customers in queue continue waiting, but abandon at a new rate, [[theta].sub.2] instead of [[theta].sub.1]. We would expect to have [[theta].sub.2] > [theta].sub.1], but we treat the general case.

There are several possible assumptions for the customers that are in service at servers that cease functioning. We assume that these customers remain at these servers, but have high priority (over customers waiting in queue or new arrivals) for newly available functioning servers when they become available, which preserves the FCFS order. These customers at nonfunctioning servers may also abandon from the system, and do so at a new rate [[theta].sub.3]. We would expect to have [[theta].sub.3] > [[theta].sub.1] but again we treat the general case. We assume that all the service and abandonment times are independent exponential random variables.

The pure-delay model with a system-wide service interruption is obtained as the special case in which there are no functioning servers (F = 0) and these abandonment rates--[[theta].sub.1], [[theta].sub.2], and [[theta].sub.3],--are all 0, whereas the pure-loss model is obtained as the special case in which, again, F = 0, but these abandonment rates are all infinite. We quantify performance, conditional on the pair of random variables (D, F), as a function of the six-tuple of model parameters ([lambda], [[mu].sub.1], [[theta].sub.1], [[mu].sub.2], [[theta].sub.2], [[theta].sub.3]). However, we especially emphasize the severe performance degradation in the pure-delay case with F = and [[theta].sub.1] = [[theta].sub.2] = [[theta].sub.3] = 0.

2.3. Models of the Service Interruptions

There are two different ways to look...

View this article FREE - Now for a Limited Time, try Goliath Business News
Free for 3 Days!



More articles from Management Science
Quality disclosure formats in a distribution channel., September 01, 2009
Poker player behavior after big wins and big losses., September 01, 2009
Revenue driven resource allocation: funding authority, incentives, and..., September 01, 2009
Labor market institutions and global strategic adaptation: evidence fr..., September 01, 2009
A general interindustry relatedness index., September 01, 2009

Looking for additional articles?
Search our database of over 3 million articles.

Looking for more in-depth information on this industry?
Search our complete database of Industry & Market reports by text, subject, publication name or publication date.

About Goliath
Whether you're looking for sales prospects, competitive information, company analysis or best practices in managing your organization, Goliath can help you meet your business needs.

Our extensive business information databases empower business professionals with both the breadth and depth of credible, authoritative information they need to support their business goals. Whether it be strategic planning, sales prospecting, company research or defining management best practices - Goliath is your leading source for accurate information.