|
Article Excerpt We study the problem faced by a monopolistic company that is dynamically pricing a perishable product or service and simultaneously learning the demand characteristics of its customers. In the learning procedure, the company observes the sales history over consecutive learning stages and predicts consumer demand by applying an aggregating algorithm (AA) to a pool of online stochastic predictors. Numerical implementation uses finite-sample distribution approximations that are periodically updated using the most recent sales data. These are subsequently altered with a random step characterizing the stochastic predictors. The company's pricing policy is optimized with a simulation-based procedure integrated with AA. The methodology of the paper is general and independent of specific distributional assumptions. We illustrate this procedure on a demand model for a market in which customers are aware that pricing is dynamic, may time their purchases strategically, and compete for a limited product supply. We derive the form of this demand model using a game-theoretic consumer choice model and study its structural properties. Numerical experiments demonstrate that the learning procedure is robust to deviations of the actual market from the model of the market used in learning.
Subject classifications: marketing: pricing; games: stochastic; artificial intelligence: online learning.
Area of review: Revenue Management.
History: Received March 2006; revisions received September 2006, April 2007, August 2007, December 2007; accepted January 2008. Published online in Articles in Advance January 21, 2009.
1. Introduction
One of the fundamental challenges faced by any company is that of assessing customer response to changes in the price of the products or services it sells, and this task is particularly important for organizations that are experimenting with controlled dynamic pricing. Fortunately, frequent price changes also produce frequent opportunities for measurement of customer responses and, in principle, the possibility of obtaining near real-time estimates for customer demand models. In this paper, we develop an approach to this type of online learning of customer behavior; that is, learning that takes place as sales unfold. The approach works with a discrete-time approximation to the sales process and can be applied to learning the parameters of any demand model that produces estimates of consumer purchase probability at each time step of the approximation. An important aspect of the method is that learning is integrated with pricing so that pricing policy formation and consumer demand prediction proceed concurrently.
Our approach is based on the Aggregating Algorithm (AA)--a particularly general method for online learning developed by Vovk (1990). To illustrate its generality, we apply it to a model of consumer behavior that allows for strategic consumers who know that pricing is dynamic and may delay their purchases to times of anticipated lower price. The possibility of such strategic behavior has been of increasing interest recently because of the rapid growth in information available to customers through Internet sales channels and "price-shopper" websites. In many cases, customers are able to monitor both prices and availabilities of products over time and may develop accurate guesses about a company's future prices. Ironically, companies that employ carefully controlled dynamic pricing may be more vulnerable to strategic consumer behavior than companies employing ad hoc price adjustments because controlled dynamic pricing can lead to pricing policies with regular features; for example, monotone-decreasing prices in situations where "price skimming" appears to be optimal (see Besanko and Winston 1990).
Dynamic pricing is properly viewed as one approach to the general problem of revenue management, and there is now an extensive literature on revenue management and related practices. For surveys, see Bitran and Caldentey (2003), Elmaghraby and Keskinocak (2003), and McGill and van Ryzin (1999). Broad discussions of revenue management can be found in recent books by Talluri and van Ryzin (2004) and Phillips (2005). Many revenue management applications depend on forecasts of consumer behavior that are generated from stochastic models of demand, for example, demand as a function of time and price. Unfortunately, such stochastic demand-response models typically assume characteristics of demand that cannot be known precisely in practice. This uncertainty in demand characteristics has long been recognized in economics, marketing, pricing, inventory management, and revenue management, and there have been efforts to develop methods for learning of demand-response functions over time. For example, Balvers and Cosimano (1990) study a pricing problem with learning of demand that is a linear function of price. The model does not consider any limits on sales due to inventory levels. Carvalho and Puterman (2005) also study learning and pricing when capacity is unlimited. The authors consider a finite time horizon and focus on specific parametric forms of the customer arrival distribution and the probability of sale (both the number of arrivals and the actual sales are observable). The parameters are assumed to be fixed and unknown. The authors explore a trade-off between learning and pricing using a "one step look ahead" heuristic based on a two-period version of the problem. Petruzzi and Dada (2002) consider a stocking and pricing model with a fixed but unknown perturbation of some given demand function. Papers by Bertsimas and Perakis (2006), Aviv and Pazgal (2005a), and Lin (2006) study the pricing of a fixed stock of items over a finite horizon with demand learning. Bertsimas and Perakis (2006) consider learning of all demand characteristics, including the price sensitivity, but assume a linear demand model with normal perturbations. The other two papers assume a known reservation price distribution. Aviv and Pazgal (2005b) present a general framework for dynamic pricing when stochastic properties of demand are affected by the current state of the world. The number of possible states considered by the authors is finite. They use partially observed Markov decision processes as a modeling basis and information-structure modification heuristics to provide a tractable implementation. A recent work by Besbes and Zeevi (2007) considers a joint learning and pricing method for a network revenue management problem involving multiple products utilizing multiple resources. The approach assumes a Poisson model with demand rates determined by unknown functions of price. While the model of demand in their paper is nonparametric, the authors simplify the problem by only considering demand rates that do not depend explicitly on time (in contrast to this paper). The policies considered involve a "brief" period of learning (experimentation with prices selected from a grid of prices) followed by static pricing. The authors establish asymptotic optimality of the policy given that the resource capacities and the demand rates simultaneously tend to infinity.
All the learning-focused papers cited above consider restricted forms of demand models that do not consider potentially complex consumer behavior. The prior work on dynamic pricing, which assumes known demand models, has allowed for varying degrees of consumer sophistication. For example, the classical model by Gallego and van Ryzin (1994) assumes myopic consumers who make a purchase as soon as the price is below their valuation for the product, whereas other models allow for strategic consumers who may benefit by delaying their purchase decisions (see Besanko and Winston 1990, Elmaghraby et al. 2008, Aviv and Pazgal 2008, Liu and van Ryzin 2008, Su 2007, and Levin et al. 2005). In the case of strategic consumers, the demand model should also capture competition between the customers if the product supply is limited. Although the case of myopic consumers is amenable to existing learning approaches, it is difficult to extend these approaches to the instances of more complex consumer behavior, in particular, strategic behavior. Indeed, one of the typical approaches in dynamic pricing with demand uncertainty (with or without learning) is policy optimization by dynamic programming techniques, but the complexity of demand learning with strategic consumers renders an exact dynamic programming approach computationally intractable.
The main contribution of this paper is the presentation of an integrated procedure to both determine prices and estimate customer behavior under general parametric uncertainty. We accomplish this with an adaptation of the Aggregating Algorithm (AA) of Vovk (1999). The AA methodology belongs to the class of online methods and was originally developed to address the problem of combining expert advice (Vovk 1999). Similar techniques have been applied to the problem of online portfolio selection since the work of Cover (1991).
In our online approach, the company observes the sales history over consecutive learning stages and predicts future demand by applying the AA to a pool of stochastic predictors. Numerical implementation uses finite-sample approximations to the pool of predictors. These are periodically updated using the most recent sales data and are subsequently altered by a random step that maintains diversity of the predictors. This is similar to a method applied to online portfolio selection by Levina (2004), The company's pricing policy is optimized by a simulation-based method that is integrated with AA.
We illustrate the versatility of this integrated procedure on a demand model for a complex market in which customers are aware that pricing is dynamic, may time their purchases strategically, and are competing for a limited product supply. The model of consumer demand used in this illustration is adapted from a game-theoretic, strategic consumer-choice model described in Levin et al. (2005). In that model, a fully rational consumer's decision is characterized as a probability of purchase at each time and state of the sales process. Summation of these probabilities across consumers defines the demand model. A key departure from that model in this paper is that we assume limited rationality of consumers with respect to anticipated future prices.
A number of structural properties of the revised consumer-choice model are relevant to the implementation of online learning described here. In particular, we show that strategic consumer response to price is inherently dependent on time and the remaining capacity of the firm. We also show that this model supports an intuitively appealing decision rule for strategic consumers--that they will attempt to purchase when their consumer surplus from an immediate purchase is greater than the discounted expected surplus from all future purchasing opportunities. The expected surplus is thus identified as a key component of strategic consumer behavior. We then derive important properties for the expected surplus that can be used to construct an empirical consumer demand model. (Such an empirical approximation is needed because exact computation of the expected surplus, when combined with online learning, is not practical in problems of realistic size.) Numerical experiments demonstrate that the learning procedure is robust to discrepancies between the detailed strategic market response and the model of that response used in learning.
This paper is organized as follows. In [section]2, we discuss a general class of time, inventory, and price-dependent demand models that includes the case of strategic consumers. In [section]2.2, we outline a simple Bayesian approach to online learning of the parameters of the general demand model for any pricing policy. In [section]2.3, we discuss a specialization of the AA that implements demand learning using a general Bayesian approach with finite-sample approximations. Pricing policy optimization is addressed in subsequent sections. In [section]3, we identify restricted pricing policy classes that are practical to implement and facilitate dynamic pricing with demand learning. In [section]4, we show how learning can be integrated with optimization of the pricing policy through an online procedure that utilizes the AA for learning and simulation-based optimization for pricing. In [section]5, we discuss the application of this procedure to...
|