|
Article Excerpt We consider a newsvendor problem with partially observed Markovian demand. Demand is observed if it is less than the inventory. Otherwise, only the event that it is larger than or equal to the inventory is observed. These observations are used to update the demand distribution from one period to the next. The state of the resulting dynamic programming equation is the current demand distribution, which is generally infinite dimensional. We use unnormalized probabilities to convert the nonlinear state transition equation to a linear one. This helps in proving the existence of an optimal feedback ordering policy. So as to learn more about the demand, the optimal order is set to exceed the myopic optimal order. The optimal cost decreases as the demand distribution decreases in the hazard rate order. In a special case with finitely many demand values, we characterize a near-optimal solution by establishing that the value function is piecewise linear.
Key words: unobserved unmet demand; Markovian demand; newsvendor problem MSC2000 subject classification: Primary: 90B05; secondary: 93C41, 49L20 OR/MS subject classification: Primary: inventory/production, uncertainty, stochastic; secondary: dynamic programming/optimal control, models
1. Introduction. The newsvendor problem studies the optimization of the inventory level at the beginning of a sales season to meet the demand during the season (e.g., p. 961 of Hillier and Lieberman [10] and p. 342 of Chopra and Meindl [6]). When the inventory level is more than the demand during the season, costs are incurred on the basis of the leftover inventory at the end of the season. Otherwise, costs are incurred depending on the unmet demand during the season. Although there is an extensive literature on this problem, only recent work has started to emphasize the unobservability of the unmet demand.
We consider a multiperiod newsvendor problem in which the demand in each period is observed fully when it is met from the available inventory. Otherwise, only the event that the demand is larger than or equal to the inventory is observed. When the underlying demand distribution is not known, but estimated from the demand observations, such partial demand observations limit the data available for estimation as well as for optimization. This class of problems is called estimation and/or optimization with censored (demand) data.
Ding et al. [8], Lu et al. [12], and Bensoussan et al. [4] study a multiperiod newsvendor model with censored demand, which is taken to be independently and identically distributed. By assuming that the leftover inventories are salvaged and unfilled demands are lost in each period, they decouple the periods from the viewpoint of inventory, but not from that of the Bayesian demand updates. That is, the state of the system becomes only the distribution of the demand, which is updated in each period based on the partial observations available at that time. Prior to these authors, Lariviere and Porteus [11] obtained similar results, but for a more restricted case of exponential demand distributions with gamma conjugate priors.
Unlike Ding et al. [8] and Lariviere and Porteus [11], this paper models the demand with a stationary Markov process whose transition probability is known. Furthermore, we develop a Zakai-type equation (Zakai [17]) for the evolution of the probability distribution of the demand over time. This facilitates the analysis of the dynamic programming equation for the problem. We prove that the value function is the unique solution of the DP equation and we show that there exists an optimal feedback policy for the problem. Furthermore, we establish that the optimal order quantity is at least as large as that in a myopic solution.
The problem studied in this paper can be classified as an example of problems with partial observations (Bensoussan [1], Bensoussan et al. [2, 4], Monahan [13]). A related example is also given by Treharne and Sox [16]. They have a periodic-review inventory model with Markov-modulated demands. The state of this demand is not known, and is estimated in a Bayesian fashion by using the observed sale in each period.
The plan for this paper is as follows. In the next section, we obtain the evolution equation for the demand distribution. In [section] 3, we provide a dynamic programming equation to find the optimal order quantity, and simplify the equation by using the unnormalized probabilities. Next, we establish the existence of an optimal feedback policy, and provide an equation satisfied by the optimal order quantity. In [section] 5, we compare the optimal and myopic solutions and establish that the value function is monotone in hazard rate order. We study the case of the demands taking a finite number of values in [section] 6, and conclude the paper in [section] 7.
2. Evolution of demand distribution. Let ([OMEGA], F, P) be the probability space and let n > 1 be the indices for the periods. Let [x.sub.n] > denote the demand occurring at the beginning of period n. The demand is modeled by a Markov process with the transition probabilities given by
p(x | [xi]) := P([x.sub.n+1] = x | [x.sub.n] = [xi]).
The inventory available to satisfy the demand [x.sub.n], or a part thereof, is called [y.sub.n]. We can think of y, to be the order placed and delivered at the beginning of period n before the nth period demand [x.sub.n] arrives. Then the amount [z.sub.n] of sales is given by
[z.sub.n]:=min{[x.sub.n],[y.sub.n]}. (1)
When [x.sub.n] [y.sub.n], the inventory is not sufficient to meet the demand in period n. In that case, the amount of sales is [y.sub.n] and [x.sub.n] - [y.sub.n] is the unmet demand. When the demand is not met, the magnitude of the unmet demand is not observed by the inventory manager (IM). Indeed, the IM observes only the sales.
Let [F.sub.n] be the sigma algebra generated by the sales {[z.sub.j]: j [less than or equal to] < n}, i.e.,
[F.sub.n]:=[sigma]([z.sub.1], ... ,[z.sub.n]).
Thus, [F.sub.n] is the history available to the IM at the end of period n. Because the IM decides on y, at the beginning of period n, [y.sub.n] is [F.sub.n-1] measurable. However, [x.sub.n], being partially observed, is not in general [F.sub.n] measurable.
Let the function L(x, y), which depends on the demand x and the available inventory level y ordered to meet the demand, denote the one-period cost function. Ding et al. [8] assume that excess inventory in a period, over and above the demand, is salvaged, and the unmet demand in a period is lost. This results in the one-period cost function
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (2)
where h, c, and b are, respectively, the salvage value per unit, the ordering cost per unit, and the shortage cost per unit. It is reasonable to assume that [less than or equal to] h < c < b. We use the same cost function and also observe that
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (3)
With the discount factor < [alpha] < 1 and with y defining the sequence of inventory levels y = {[y.sub.1], [y.sub.2], ...}, our objective is to choose y so as to minimize
J(y):=[[infinity].sub.n=1][[alpha].sup.n-1]EL([x.sub.n], [y.sub.n]). (4)
A standard assumption in the infinite-horizon inventory literature with identically and independently distributed demands or Markovian demands is that the mean demand is finite. This ensures a finite value function. Because in our model, demand could grow over time, we must make an assumption to limit the rate of demand growth. Specifically, we assume E([x.sub.1]) < [infinity] and
E([x.sub.n+1]|[x.sub.n] = [xi]) = [[integral].sup.[infinity].sub.0] xp(x|[xi])dx[less than or equal to][c.sub.0][xi] for n [greater than or equal to] 1, (5)
for a constant [c.sub.0] < 1/[alpha]. Note that if the demand process is a supermartingale, then (5) is satisfied with [c.sub.0] = 1. These conditions ensure that [[summation].sup.[infinity].sub.n=1][[alpha].sup.n-1]E([x.sub.n] < [infinity]. By (3), this sum, when multiplied by the unit shortage cost b, is greater than or equal to the total discounted cost associated with the policy of ordering zero in every period. This cost being an upper bound on the value function ensures a finite value function. Later, in [section] 4.1, we restate the inequality part of (5) in the form of (19), which is used for the subsequent analysis in the paper.
Let [[pi.sub.n](x) = P([x.sub.n] = x | [F.sub.n-1]) be the probability density function of the demand [x.sub.n]. This density materializes at the beginning of period n after observing [z.sub.n-1]. The corresponding cumulative density function is denoted by [[PI].sub.n]. Starting with a given [[pi].sub.1], we can evolve this distribution over time as
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (6)
This evolution is justified rigorously in Appendix A. The first and the second terms on the right-hand side of (6) correspond, respectively, to the events [[x.sub.n] > [y.sub.n]] and [x, < y,,]. In the first event, the demand is more than or equal to the inventory, so it is not observed. Without observing the current demand [x.sub.n], the distribution [[pi].sub.n+1] is found by updating [[pi].sub.n] in a Bayesian manner. In the second event [[x.sub.n] < [y.sub.n]], the demand is observed as [x.sub.n] = [z.sub.n], and therefore [[pi].sub.n+1] = p(x | [z.sub.n]). Once the demand is observed as [x.sub.n] = [z.sub.n], the past inventory decisions {[y.sub.1], ... ,[y.sub.n]} do not affect the distribution of [x.sub.n+1], which is directly obtained as p(x | [z.sub.n]) by using the fact that the demand is a Markov process.
We have thus derived the equation for the evolution of the conditional distribution of the demand. With this distribution as the state variable, we now proceed to derive the dynamic programming equation for our newsvendor problem.
3. Dynamic programming. We begin with
EL([x.sub.n+1], [y.sub.n+1]) = E[E(L([x.sub.n+1], [y.sub.n+1])|[F.sub.n-1], [z.sub.n])] = E [integral] L(x, [y.sub.n+1])[[pi].sub.n+1](x)dx.
When an integral is taken over [0, [infinity]), we suppress the limits to save on notation. From (4) and (6), we define the value function V(*) as follows:
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (7)
By the optimality principle and (6),
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (8)
where
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (9)
Note that we obtain the first term on the right-hand side of (9) by taking the expectation of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]. The argument of V in this first term does not involve x, so that it does not enter into the expectation operation. Physically, it means that under [[x.sub.1] [greater than or equal to] y], [x.sub.1] is not observed, and so the event [[x.sub.1] [greater than or equal to] y] itself determines [[pi].sub.2]. This fact is reflected in the argument of V. In obtaining the second term, we see that under the event [[x.sub.1] < y], we have [x.sub.1] = [z.sub.1]. That is why [z.sub.1] is present in the argument of V in the second term. Hence, V stays inside the integral. These observations imply that the value of y that minimizes over all production-inventory costs also takes into account the "optimal" amount of censoring.
Finally, substituting (9) into (8), we obtain the DP equation
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]...
|