|
Article Excerpt Nonlogit maximum-likelihood estimators are inconsistent when using data on a subset of the choices available to agents. I show that the semiparametric, multinomial maximum-score estimator is consistent when using data on a subset of choices. No information is required for choices outside of the subset. The required conditions about the error terms are the same conditions as for using all the choices. Estimation can proceed under additional restrictions if agents have unobserved, random consideration sets. A solution exists for instrumenting endogenous continuous variables. Monte Carlo experiments show the estimator performs well using small subsets of choices.
1. Introduction
* Demand estimates are typically inconsistent when estimation uses data on a subset of the choices available to agents in the data-generating process. This article develops an unordered, discrete-choice estimator that is consistent with data on a subset of the available choices.
One motivation for estimation using data on a subset of choices is data availability. Chevalier and Goolsbee (2003) and Bajari, Fox, and Ryan (2006) use data on purchases from the online retailer Amazon. Not all products in a given retail category are offered on Amazon. One assumption is that a consumer sees all products, both on and offline, and only buys on Amazon if the chosen product is offered by Amazon. Using data from one retailer imposes choice-based sampling.
Another motivation for using a subset of all choices is that the computational burden of estimating discrete-choice models increases with the size of the choice set. Large choice sets occur often in applications. Train, McFadden, and Ben-Akiva (1987) estimate the demand for telephone calling plans, where each plan is a combination of several options such as a monthly fee, usage charge, and so forth. Bayer, McMillan, and Reuben (2004) estimate a housing-demand model where the choices are hundreds of thousands of houses in a large metropolitan area. Bajari and Fox (2007) study an FCC spectrum auction, where the combinatorics of gathering individual spectrum licenses for sale into packages of multiple licenses results in a choice set with more elements than the number of atoms in the universe.
The only consistent estimators using data on a subset of choices to compute choice probabilities as though the true choice set is the subset with data used in estimation have been developed by McFadden (1978) and Bierlaire, Bolduc, and McFadden (2006). These estimators rely on known closed-form choice probabilities for a class of discrete-choice models characterized by the error terms having a "block additive generalized extreme value" (GEV) distribution. The block additive GEV class includes only the pure multinomial logit among well-known estimators; the class does not include the nested logit and methods with auxiliary distributions for agent heterogeneity, such as the random coefficients and mixed logit.
The GEV class is defined to include only choice models where the marginal distributions for the error terms are type I extreme value with a common scale parameter. The type I extreme value distribution is restrictive. For example, it rules out the economically interesting possibility that the marginal distribution has multiple modes: either a consumer hates or loves a product.
Although I am motivated by the same computational and missing data on choice characteristics concerns as McFadden (1978), this article uses a different set of mathematical tools to address estimation. I work with semiparametric discrete-choice models, where the term "semiparametric" refers to the fact that I specify a set of parameters to estimate, but I do not specify a particular functional form for the error term. Working with semiparametric models forces me to consider identification using properties of models that hold across a wide number of possible distributions for the error terms, rather than imposing a known function to directly compute choice probabilities. A semiparametric proof of consistency is also a constructive proof of semiparametric identification, so this article clarifies the identification of multinomial choice models using data on a subset of choices.
The pioneering work of Manski (1975) introduces semiparametric maximum-score estimation for discrete-choice models. Maximum score estimators are consistent when the choice probabilities for a given agent are rank ordered by the agent's deterministic choice payoffs. The property of rank ordering choice probabilities by non-stochastic payoffs can hold across a wide number of distributions for the error terms, and is the key to semiparametric identification and estimation using maximum-score methods. This article's major contribution is to prove that multinomial maximum score is consistent using data on a subset of choices. The reason is that when one conditions on observations that selected a choice from a subset, the choice probabilities in the subset are still rank ordered by the deterministic payoffs. All maximum score needs is rank ordering, and so maximum score is consistent when the researcher has choice and covariate data on a subset of the choices available to decision makers. More choices could improve the finite-sample accuracy of the estimates, but only two choices are needed for consistency. Although I modernize and therefore weaken the sufficient conditions on the covariates and error terms needed for consistency of multinomial maximum-score estimators in Manski (1975), the assumptions for consistency using a subset of choices are the same as or weaker than the conditions for using data on all choices.
I introduce a multinomial maximum-score estimator that focuses on comparisons between the deterministic payoffs of pairs of choices. I call the estimator pairwise maximum score. Pairwise maximum score makes full use of the rank-order property driving identification and consistency by considering the relative ranking of choice probabilities for, if included, all pairs of choices. By using the restrictions of the rank-order property for comparisons of all pairs of choices, pairwise maximum score may improve the finite-sample accuracy of the estimates.
Evaluating the pairwise maximum-score objective function requires only addition, multiplication, and pairwise comparison. Further, the estimator is consistent using covariate and choice data on a subset of choices in the true model. Compare this to estimating a semiparametric or parametric maximum-likelihood method using covariate data on all choices in the true model. In maximum likelihood with a large number of choices, a fitted probability that is a function of perhaps millions of covariates needs to be computed for each parameter value. Computing this fitted probability can be a daunting numerical challenge that involves evaluating densities in the far right tails. The rode of maximum score as a computationally simple alternative to maximum likelihood differs from the usual pitch for semiparametric methods, which concerns relaxing distributional assumptions.
I present implementation advice about how to use pairwise maximum score in applications. The availability of new global optimization routines has decreased the computational difficulty of numerically maximizing a step function. I also introduce a two-stage instrumental variables estimator.
I present Monte Carlo studies about the finite-sample properties of maximum score when using data on a subset of choices. The Monte Carlo experiments compare maximum score to the parametric logit estimator for a true model where the error terms do not have the extreme value distribution, so that the logit is inconsistent.
This article revisits the multinomial maximum-score estimator of Manski (1975). Most other work on maximum-score estimators, including Manski's later work, focuses on the binary (two) choice case. I do not believe multinomial maximum-score methods have been used in any applications other than Briesch, Chintagunta, and Matzkin (2002) and my own (Bajari and Fox, 2007; Bajari, Fox, and Ryan, 2006). By improving and describing new properties of multinomial maximum score, I hope to make maximum score an attractive and practical method that can be used by applied researchers.
2. Rank ordering of choice probabilities
* The identification strategy relies only on the rank ordering of choice probabilities. This section defines rank ordering and provides sufficient conditions on the distribution of errors in a random utility model for rank ordering.
[] Model. The model is completely standard. Consider a single-agent, unordered discrete-choice, random-utility model. An agent makes a choice among i = 1, ..., J products. In a duplication of notation, J refers to both the set of choices and the number of choices. The computational cost in estimation that this article in part addresses arises when J is large. The number and set of choices can vary from agent to agent. Ignoring ties, the agent picks choice i if
[u.sub.i] > [u.sub.j] [for all] [member of] J, j [not equal to] i. (1)
The agent chooses i when the payoff from i exceeds the payoffs from all J - 1 alternatives. If i satisfies (1), 1 refer to i as the selection.
The payoff from choosing i [member of] J is
[u.sub.i] = [x'.sub.i][beta] + [[epsilon].sub.i]
where [x.sub.i] is a vector of d covariates, [beta] is a vector of d parameters multiplying [x.sub.i], and [[epsilon].sub.i] represents the sum of factors the agent feels are important but are unobserved to the econometrician. Let the J x d matrix x be the observable covariates for all choices. Also let [epsilon] be the vector of all J [[epsilon].sub.i]'s.
In applications, [x.sub.i] is typically formed from the characteristics of choice i and the interactions of the characteristics of choice i with the characteristics of the agent. Point identification requires varying the x's across observations on agents. Variation across agents can come from differences in the characteristics of agents facing the same choice set, or from agents facing different choice sets. (1)
[] The rank ordering property. The estimators in this article are semiparametric, as I will not specify a parametric functional form for F([epsilon] | x, J), the distribution function of the J [[epsilon].sub.i]'s. This distinguishes semiparametric estimation from parametric estimators such as the logit and probit, which assume a known functional form for F([epsilon] | x, J). Instead, semiparametric estimators require that some particular property of the underlying choice model holds across a range of functional forms for the distribution of the error terms.
For the case of discrete choice, the pioneering work of Manski (1975) introduces the property that a given agent makes choices that have higher deterministic payoffs with greater frequency. The deterministic payoffs of choices rank order the choice probabilities. Manski's formal property is
Assumption 1. For a given agent, and for i, j [member of] J,
[x'.sub.i][beta] > [x'.sub.j][beta] (2)
if and only if
P(i | x, J, [beta]) > P(j | x, J, [beta]).
Here P(i | x, J, [beta]) is the probability of/being selected. The probability is an integral over the J unknown [[epsilon].sub.i]'s over the domain where the decision rule in (1) is satisfied. The probability P(i | x, J, [beta]) is a...
|