Home | Business News | Browse by Publication | O | Operations Research

Scoring rules, generalized entropy, and utility maximization.

Publication: Operations Research
Publication Date: 01-SEP-08
Format: Online
Delivery: Immediate Online Access

Article Excerpt
Information measures arise in many disciplines, including forecasting (where scoring rules are used to provide incentives for probability estimation), signal processing (where information gain is measured in physical units of relative entropy), decision analysis (where new information can lead to improve decision), and finance (where investors optimize portfolios based on their private information and risk preferences). In this paper, we generalize the two most commonly used parametric families of scoring rules and demonstrate their relation to well-known generalized entropies and utility functions, shedding new light on the characteristics of alternative scoring rules as well as duality relationships between utility maximization and entropy minimization. In particular, we show that weighted forms of the pseudospherical and power scoring rules correspond exactly to measures of relative entropy (divergence) with convenient properties, and they also correspond exactly to the solutions of expected utility maximization problems in which a risk-averse decision maker whose utility function belongs to the linear-risk-tolerance family interacts with a risk-neutral betting opponent or a complete market for contingent claims in either a one-period or a two-period setting. When the market is incomplete, the corresponding problems of maximizing linear-risk-tolerance utility with the risk-tolerance coefficient [beta] are the duals of the problems of minimizing the pseudospherical or power divergence of order [beta] between the decision maker's subjective probability distribution and the set of risk-neutral distributions that support asset prices.

Subject classifications: decision analysis: theory; probability: entropy; utility/preference: theory; finance: portfolio.

Area of review: Decision Analysis.

History: Received September 2006; revision received March 2007; accepted August 2007. Published online in Articles in Advance August 21, 2008.

1. Introduction

Suppose that there is uncertainty concerning which of a set of n mutually exclusive and exhaustive events will occur, and the initial representation of that uncertainty consists of a "baseline" probability distribution q = ([q.sub.1], ..., [q.sub.n]), which could be the subjective prior distribution of an individual or a distribution obtained from a statistical model or from market prices for contingent claims. If new information is subsequently received from an experiment or an expert's forecast, causing the baseline distribution to be revised to another distribution p, how should the quantity or value of the information be measured?

The need for a quantitative measure of information--or more generally, a practical measure of the distance from one distribution p to some other distribution q--arises in many fields, and the considerable literature on this topic includes (at least) three distinct but intertwined stands: scoring rules, entropy, and decision analysis. Scoring rules are reward functions for eliciting and evaluating probability forecasts, and the expected score associated with a forecast can be interpreted as a measure of the value of the forecaster's information. Entropy is a measure of the channel capacity required to communicate a stream of signals generated by a stationary process, and relative entropy measures the reduction in channel capacity that is possible when new information yields an updated signal distribution. Decision analysis provides a general framework for measuring information in terms of gains in expected utility as well for determining how to optimally use information to choose portfolios of financial assets.

The information-theoretic tools have been used for many decades, but new application and theoretical developments have emerged during the last few years on several fronts, including experimental economics, Bayesian statistics, and financial engineering. The objective of this paper is to add to this recent stream of interdisciplinary literature by broadening the concept of a scoring rule to include a not-necessarily-uniform baseline distribution and to show that this leads immediately to tight connections with some well-known measures of divergence (relative entropy) as well as with models of utility maximization in markets under uncertainty. First, in [section]2 it is shown that the power and pseudospherical scoring rules (of which the quadratic and spherical rules are special cases) can be normalized so that they are continuous functions of their power parameter (denoted by [beta]) on the entire real line and weighted by a baseline distribution q to reward updating of probabilities in relative rather than absolute terms. in [section]3, the forecaster's expected gains under these weighted scoring rules are shown to correspond exactly to two well-known parametric families of generalized divergence that both reduce to the Kullback-Leibler divergence at [beta] = 1. Section 4 introduces two canonical decision problems in which an individual with probability distribution p bets optimally against a nonstrategic, less well-informed opponent (or market) with distribution q. The decision maker's utility function is assumed to belong to the normalized linear-risk-tolerance (LRT) family of utility functions, which includes the familiar exponential, logarithmic, and power functions and is indexed by a single parameter, namely, the risk-tolerance coefficient (also denoted by [beta]). The solution of one canonical decision problem with LRT utility is shown to yield the weighted pseudospherical scoring rule and its associated relative entropy measure, with the same value of [beta], while the second canonical problem yields the weighted power scoring rule and its associated relative entropy measure. Section 5 generalizes the results of the earlier sections to the situation in which a decision maker with LRT utility optimally invests in an incomplete market for contingent claims, highlighting the duality between expected-utility maximization and relative-entropy minimization. Concluding comments are given in [section]6.

2. Weighted Scoring Rules

Scoring rules are reward functions for eliciting and evaluating probabilities, and they have played an important role in the foundations of subjective probability theory (de Finetti 1937, 1974; Good 1952; Winkler 1967, 1996; Savage 1971; Lindley 1982) as well as practical applications such as incentive schemes for paying weather forecasters (Brier 1950) and subjects in economic experiments (Selten 1998) and for evaluating the quality of forecasts used in risk analysis (Cooke 1991). Consider an individual (the "fore-caster") who is asked to assess a probability distribution over a set of n mutually exclusive and collectively exhaustive events. Let p denote the forecaster's true distribution, let r denote her reported distribution (if different from p), and let [e.sub.i] denote the probability distribution that assigns probability one to event i and zero to all other events, i.e., the indicator vector for even i. A scoring rule is conventionally expressed as a function S(r, p), linear in its second argument, such that the score obtained if event i occurs is S(r, [e.sub.i]), and the forecaster's expected score for reporting r when her true distribution is p is S(r, p) = [[summation].sub.i] [p.sub.i] S(r, [e.sub.i]). It is assumed that the forecaster's objective is to maximize her expected score, which means that either she is risk neutral and S(r, [e.sub.i]) is measured in units of money or else she is not risk neutral and S(r, [e.sub.i]) is measured in units of utility.

The scoring rule is defined to be (strictly) proper if it encourages honest reporting in the sense that S(p, p) [greater than or equal to] S(r, p) for every r and p (with equality only when r = p), so that the forecaster whose true distribution is p maximizes her expected score by truthfully reporting p rather than some other distribution. The forecaster's optimal expected score that is obtained when her distribution is p will be denoted by merely suppressing the first argument: S(p) [equivalent to] S(p, p). A proper scoring rule is uniquely determined by its optimal-expected-score function, as noted by McCarthy (1956) and further elaborated by Hendrickson and Buehler (1971) and Savage (1971). In particular, if S(*) is a differentiable function, then S(*, *) satisfies

S(r, P) = S(r) + [DELTA]S(r) * (p - r], (1)

where [nabla]S (r) denotes the gradient of S(*) evaluated at r, and conversely every function S that is (strictly) convex and differentiable uniquely defines a (strictly) proper scoring rule.

The expected-score function of a proper scoring rule is closely related to a measure of distance between probability distributions known as a Bregman divergence (Bregman 1967), which generalizes the Kullback-Leibler divergence. Any strictly convex function F defines a Bregman divergence [B.sub.F] (p || r) as follows:

[B.sub.F](p || r) = F(p) - F(r) - [nabla]F(r) * (p - r). (2)

Letting F (p) = S (p), it follows that for any strictly proper scoring rule, the function S (p) - S (r, p), which represents the forecaster's expected loss for reporting r when the true distribution is p, is a Bregman divergence, and vice versa. A Bregman divergence [B.sub.F] (p || r) is therefore a decision-theoretic measure of the "information deficit" that is faced by a decision maker who acts on the basis of the distribution r when the true distribution is p. In this capacity, Bregman divergences (and their corresponding strictly proper scoring rules) provide a potentially rich class of loss functions that can be used for robust Bayesian inference, as discussed by Grunwald and Dawid (2004), Dawid (2007), and Gneiting and Raftery (2007). A problem of this kind can be framed as a game against nature in which nature chooses a true distribution p from some convex set P, such as the of distributions satisfying a mean value constraint. The robust Bayes problem for the decision maker is to determine the distribution r that minimizes her maximum expected loss over all p [member of] P, where the expected loss (in our terms) is the negative expected score - S (r, p). Grunwald...

View this article FREE - Now for a Limited Time, try Goliath Business News
Free for 3 Days!



More articles from Operations Research
Asymptotically optimal control for an assemble-to-order system with ca..., September 01, 2008
Polynomial-time algorithms for stochastic uncapacitated lot-sizing pro..., September 01, 2008
Approximation algorithms for capacitated stochastic inventory control ..., September 01, 2008
Analysis of the (Q, r) inventory model for perishables with positive l..., September 01, 2008
Assisting defined-benefit pension plans., September 01, 2008

Looking for additional articles?
Search our database of over 3 million articles.

Looking for more in-depth information on this industry?
Search our complete database of Industry & Market reports by text, subject, publication name or publication date.

About Goliath
Whether you're looking for sales prospects, competitive information, company analysis or best practices in managing your organization, Goliath can help you meet your business needs.

Our extensive business information databases empower business professionals with both the breadth and depth of credible, authoritative information they need to support their business goals. Whether it be strategic planning, sales prospecting, company research or defining management best practices - Goliath is your leading source for accurate information.