Your estimated wait time is...

A look at the exponential distribution, a useful model for operations management. Plus, how variability in customer arrivals can lead to killer wait times.

Aug 14, 2024

Take enough business classes, and you might start to believe that everything random can be modeled with a normal distribution: asset returns, demand forecasts, process quality control, and so on. Sometimes this assumption is justified (usually by the central limit theorem), and other times it’s made out of expediency. This post is dedicated to a different distribution: the exponential distribution. While not as famous as its bell-shaped cousin, the exponential distribution is still plenty useful, especially in the realm of operations management.

Refer a friend

Modeling wait times

To analyze any process, you need to model the stress on the system. In many cases, the stress on the system is customer demand. Take a restaurant, for example. The decisions that a restaurant makes—number of waiters to staff, number of tables to set up, etc.—are all largely determined by expectations about how many customers will visit. To be more precise, a restaurant wants to know about the inflow of customers: how many customers will visit per time unit (e.g., customers/hour). Of course, you never know for sure how many customers will visit in an hour. You should know, however, how many customers visit per hour on average. (This number might fluctuate throughout the day, in which case you might want to have different models for busy vs. quiet times.)

Let’s say that the average number of customers per hour is given by λ. This means that the average time between customer arrivals is (1/λ) hours. For example, if you see an average of three (3) customers per hour, then the average arrival time between customers is 1/3 hours, or 20 minutes. Given only the average arrival rate, you might think that there are lots of possible ways to model customer arrivals. However, with one additional assumption, our choices disappear. That assumption is memorylessness. In a nutshell, arrival times being “memoryless” means that the probability of waiting more than 10 minutes (e.g.) for the next customer is the same regardless of when we saw the last customer. (You may remember a similar idea when I wrote about the expected value of geometric distributions in the coupon taker problem article.) With this assumption, there is exactly one probability distribution that can model the inter-arrival times of the next customer: the exponential distribution.1 The exponential distribution has a single parameter, λ, which represents the average number of customers that arrive over a unit time interval. Its probability distribution function (PDF) is given by

\(f(x)=\lambda e^{-\lambda x}.\)

This PDF is defined for x in the interval [0, ∞). In order for f(x) to be a valid probability distribution, it must 1) always be nonzero (it is), and 2) “sum up” to 1. Since x is a continuous variable, “summing up” to 1 really means that the integral over the domain of x is 1. Let’s confirm that:

\(\begin{aligned} \int_0^{\infty}\lambda e^{-\lambda x}\,dx &=\left[-e^{-\lambda x} \right]_0^\infty\\[3pt] & = [0-(-1)]=1 \end{aligned}\)

To find the probability that the time until the next customer arrival falls in a certain interval, you find the area under the curve between the endpoints of that interval. For example, suppose that λ = 3. The probability that the next customer will arrive in the next 20 minutes (i.e., 1/3 hour) is

\(\begin{aligned} \int_0^{\frac{1}{3}}\lambda e^{-3 x}\,dx &=\left[-e^{-3 x} \right]_0^{\frac{1}{3}}\\[3pt] & = [-e^{-1}-(-1)]\\[3pt] &=1-\frac{1}{e}\approx 63.2\%. \end{aligned}\)

I already hinted that the average time between customer arrivals is (1/λ). In the appendix, I’ll prove that this is the case using the definition of expected value. In the preceding calculation, 20 minutes happens to the mean time between customer arrivals. You can see in the integral calculation that the λ’s “cancel out,” which tells us that there is a 63.2% chance that the next arrival occurs before the mean arrival interval, regardless of λ. This is counterintuitive: shouldn’t there be a 50-50 chance that the next customer arrives before (or after) the average arrival time? That’s the difference between median and mean. The fact that more arrivals will occur before the average time than after reflects the skewness of the exponential distribution.

Plot of the PDF for the exponential distribution with λ = 0.8. The mean is 1/λ = 1.25. The probability of the next arrival being 1.25 or less is about 63%. The distribution is skewed to the right.

Capacity, utilization, and wait times

Let’s shift gears now and see what the exponential distribution can teach us about operations. Suppose that a new customer arrives at your business exactly every 20 minutes. If you spend 10 minutes with each customer, then the cadence of your day is pretty clear. One customer arrives, and you see them immediately. You spend 10 minutes helping that customer, then you have a 10-minute break before you see the next person. No customers need to wait to be seen. But what if demand were random, with an average of one customer every 20 minutes. In this case, some customers may have to wait. For example, if your second customer arrives eight minutes after your first, then your second customer will have to wait two minutes before being seen. What’s the probability that the second person will have to wait? If we assume that demand is exponentially distributed with λ = 3, then the second customer will wait if their arrival is less than 10 minutes (= 1/6 of an hour) before the first. The same calculation as before shows that this happens ~39% of the time. Thus, even though there should be plenty of time to service the first customer before the second arrives, the odds are still pretty good that the second person will have to wait.

In the scenario just described, the probability of the second customer waiting more than a minute or two is pretty small. However, there is a nightmare scenario where customers keep arriving earlier than the average time, and thus the queue keeps growing. The easiest way to assess the cumulative impact of random inter-arrival times is to simulate the situation. I did this in R. (Code available upon request.) The idea is straightforward. First, draw 100 random numbers from an exponential distribution to represent the inter-arrival times of 100 customers. Next, take the cumulative sum of this vector to get the true arrival times for each person. Then, loop through each customer to figure out the time at which they get served. During this step, you also figure out how many people are waiting in line when each customer arrives. Using that framework, I computed the following metrics:

Average wait time = 3.4 minutes
Longest the line ever got = 3 customers
Average queue length = .58 customers
Average time between customer arrivals = 20.5 minutes
Percentage of customers who need to wait = 48%

Although these stats are unlikely to cause a riot, we can see that the variability does force some customers to wait. I also used a service time of 10 minutes, compared to the average inter-arrival time of 20 minutes. What if service instead took 15 minutes? Or 19 minutes? I reran the simulation with that change and observed the following:

As you can see, the time spent waiting increases severely as the service time gets closer to the average inter-arrival time. In fact, the wait times get far worse in the last case if we keep running the clock. With 1,000 customers, the average wait time balloons to 2.5 hours, and a max queue length of 31 customers! In other words, 100 customers isn’t enough time for the process to “stabilize” in the last case (nor the middle). Importantly, in all three cases we would be able to serve everyone with no wait times, were it not for the randomness in arrivals.

To formalize things a little, let’s define the capacity (μ) of the shop. Since each customer can be served in 10 minutes (under the initial assumption), the capacity of the server is μ = 6 customers per hour. The average demand is 3 customers per hour, so we say that the utilization (ρ) is (3 / 6) = 50%.2 The punchline from the above table can then be restated:

As utilization increases, average wait times and queue lengths increase as well. They blow up as utilization nears 100%.

The situation gets considerably worse if you treat the service time as variable. (I assumed that each customer was serviced in exactly 10 minutes.) The exponential distribution can also be used to model service time. If you make that assumption, the average long-term wait time W is known exactly:

\(W=\frac{1}{\mu}\cdot\frac{\rho}{1-\rho}.\)

In particular, W approaches infinity as ρ approaches 100%. (Note that if you are operating over capacity, your wait line will grow without bound. The customers arrive faster than you can serve them.) What can you do with this information if you’re the owner of the shop? The most impactful thing would be to increase your capacity, either by adding servers or making the existing average service time shorter. This impacts both ρ and μ. The other thing you can do is reduce the variability in your service time. You can’t tell from the formula I wrote, but the average wait time goes down by about 50% if the service time has standard deviation 0, for fixed μ and ρ.

That’s all for today. I hope you leave with a newfound appreciation for the exponential distribution. Please subscribe and share with your friends for more content like this.

Appendix: Mean and variance of the exponential model

As long as you’re comfortable with improper integrals (and integration by parts), it’s not too bad to calculate the mean and standard deviation of the exponential distribution. Let’s start with the mean:

\(\begin{aligned} E[X]&=\int_{0}^{\infty}x\left(\lambda e^{-\lambda x}\right)\,dx\\[3pt] &=\left[-xe^{-\lambda x} \right]_0^{\infty}-\int_{0}^{\infty}-e^{-\lambda x}\,dx\\[3pt] &\text{(Integration by parts)}\\[3pt] &=0+\frac{1}{\lambda}\int_0^\infty \lambda e^{-\lambda x}\,dx=\frac{1}{\lambda}. \end{aligned}\)

In the last step, I multiplied and divided by λ because I knew that the integral of the PDF would be 1. If you want the integration by parts practice, I set u = x, and dv = λ*exp(-λx).

To compute the variance, I’ll use the formula Var[X] = E[X^2] - (E[X])^2.

\(\begin{aligned} \text{Var}(X)&=E[X^2]-(E[X])^2\\[3pt] &=\int_0^{\infty}\lambda x^2e^{-\lambda x}\,dx-\left(\frac{1}{\lambda} \right)^2\\[3pt] &=\left[-x^2e^{-\lambda x} \right]_0^{\infty}-\int_{0}^{\infty}-2xe^{-\lambda x}\,dx-\frac{1}{\lambda^2}\\[3pt] &\text{(Integration by parts)}\\[3pt] &= 0+\frac{2}{\lambda}\int_{0}^{\infty}\lambda xe^{-\lambda x}\,dx-\frac{1}{\lambda^2}\\[3pt] &= \frac{2}{\lambda}E[X]-\frac{1}{\lambda^2}=\frac{1}{\lambda^2}. \end{aligned}\)

For this integration by parts, u = x^2, and dv = λ*exp(-λx). This shows that the mean and standard deviation of inter-arrival times are both (1/λ), which suggests that the arrival times are highly variable.

The proof of this is a delectable differential equations argument. I’m not going to make you read it, but please let me know if you want a write-up. (Or read the Wikipedia article.)

Don’t ask me why Substack’s rho (ρ) looks like it went through a blender. Obviously it’s too late to switch, but doing math in this editor can be a real PITA.

Squareholder Value

Discussion about this post