|
...between the theoretical coverage of the high-density predictive interval (HDPI) and the observed coverage than those corresponding to selecting the best model. The performance of the different procedures is illustrated with simulations and some known engineering data.
KEY WORDS: Bayes information criterion; Bayesian model averaging; Fractional Bayes factor; Intrinsic Bayes factor.
**********
1. INTRODUCTION
In many engineering situations where the response variable of interest is a polynomial function of an independent variable, an important problem is to determine the degree of the polynomial. From the frequentist standpoint, the most common approaches are (1) applying a variable selection method (e.g., forward or backward selection), which uses the t statistic for testing the coefficient of the highest-order polynomial, and (2) selecting the model by an order determination criterion, such as that of Akaike (1973) and others. From the Bayesian standpoint, two alternative options are available: (1) determining the order of the polynomial by means of the Bayes factors and (2) using an asymptotic approximation to the posterior model probabilities, such as the criteria of Schwarz (1978), Philips and Guttman (1998), and others.
Although these approaches are very useful for selecting the model that seems to have generated the data, they are less useful for forecasting purposes when there is a considerable uncertainty regarding the degree of the polynomial. In particular, the highest posterior prediction intervals, or the confidence intervals for the parameters, may be too short because the uncertainty about the degree of the polynomial involved is not completely taken into account. In this article we first compare different procedures for computing the posterior probabilities for different polynomial degrees, then take into account the model uncertainty for forecasting using Bayesian model averaging (BMA).
The main idea of BMA is as follows. Suppose that we have a set of possible models, [M.sub.1], [M.sub.2],...,[M.sub.K], that can generate a given dataset y. Suppose that we have prior probabilities, P([M.sub.i]), and are able to compute the posterior probabilities of the models given the available data, P([M.sub.i]|y). Then the predictive distribution of a new observation [y.sub.f] can be obtained by weighting the predictive distributions of each model by their posterior probabilities, P([M.sub.i]|y). Accordingly, BMA takes into account the uncertainty about the different models, as was pointed out in the seminal work of Leamer (1978). (See Draper and Guttman 1987; George 1999; Draper 1995; Chatfield 1995; Kass and Raftery 1995; Hoeting et al. 1999; Raftery et al. 1997; Fernandez et al. 2002 for different applications of this procedure.)
The probability P([M.sub.i]|y) is proportional to p(y|[M.sub.i])P([M.sub.i]), and p(y|[M.sub.i]) is obtained by averaging over the possible parameter values, which requires the posterior probabilities for the model parameters. If we do not have clear prior information about the parameters and want to use a reference or noninformative prior for them, then the probabilities P([M.sub.i]|y) cannot be determined. To illustrate this problem, suppose that the model [M.sub.i] depends on some parameter vector [[theta].sub.i] and that the prior probabilities for these parameter vectors, p([[theta].sub.i]|[M.sub.i]), are improper, that is, p([[theta].sub.i]|[M.sub.i]) [proportional]g([[theta].sub.i]), so that p([[theta].sub.i]|[M.sub.i]) = [c.sub.i]g([[theta].sub.i]), which means that the integral of g([[theta].sub.i]) diverges. Then the marginal distribution of the data when [M.sub.i] holds is given by
p(y|[M.sub.i]) = [c.sub.i][integral]p(y|[[theta].sub.i], [M.sub.i])g([[theta].sub.i])d[[theta].sub.i],
and the posterior probability that model [M.sub.i] holds is
p([M.sub.i]|y) = [c.sub.i](m(y))[.sup.-1]{[integral]p(y|[[theta].sub.i],[M.sub.i])g([[theta].sub.i])d[[theta].sub.i]}p([M.sub.i]), (1)
where m(y) = [[summation].sub.i=1.sup.K]p(y|[M.sub.i])p([M.sub.i]). Thus we see that this probability, which is needed for choosing among the models and for computing a forecast by BMA, depends on the unknown constant [c.sub.i]. We note that, using (1) with the definition of m(y) given below (1), [[summation].sub.i=1.sup.K]p([M.sub.i]|y) = 1. The Bayes factor for comparing two models, [M.sub.i] and [M.sub.j], is
[B.sub.ij] = [[p([M.sub.i]|y)]/[p([M.sub.j]|y)]] = [[c.sub.i]/[c.sub.j]][[p(y|[M.sub.i])]/[p(y|[M.sub.i])]][[p([M.sub.i])]/[p([M.sub.j])]], (2)
and depends on the unknown and indeterminate ratio [c.sub.i]/[c.sub.j].
Once this problem is solved, we can compute forecasts taking into account all sources of uncertainty as follows. For a given model [M.sub.i], the posterior predictive distribution, p([y.sub.f]|y,[M.sub.i]) when predicting a future observation, [y.sub.f], where we assume that [y.sub.f] is independent of y, is given by
p([y.sub.f]|y,[M.sub.i]) = [integral]p([y.sub.f]|[[theta].sub.i], [M.sub.i])p([[theta].sub.i]|y,[M.sub.i])d[[theta].sub.i], (3)
where p([[theta].sub.i]|y, [M.sub.i]) is the posterior distribution for the parameters involved in model [M.sub.i]. This predictive distribution takes into account the variability of the parameters, measured by p([[theta].sub.i]|y, [M.sub.i]). The unconditional predictive distribution is then found by
p([y.sub.f]|y) = [K.summation over (k=1)]p([y.sub.f]|y, [M.sub.k])p([M.sub.k]|y). (4)
We use (4) in the sequel and refer to it as BMA, for indeed the predictive of [y.sub.f], given the data y stated in (4), is a weighting of predictives of [y.sub.f] under models [M.sub.k], k = 1,...,K, with the weights given by the posterior probabilities that model [M.sub.k] holds.
This equation can also be written, inserting (3) in (4), as
p([y.sub.f]|y) = [K.summation over (k=1)]p([M.sub.k]|y) [integral]p([y.sub.f]|[[theta].sub.k], [M.sub.k])p([[theta].sub.k]|y, [M.sub.k])d[[theta].sub.k],
which shows that by using BMA, we are taking into account both the parameter variability, as measured by the weighting over the possible parameter values made by the integral, and the model variability, as measured by the weighting over the possible models.
Here we focus on the general polynomial regression model, [M.sub.j],
y = [[beta].sub.0] + [[beta].sub.1]x + ... + [[beta].sub.j][x.sup.j] + [epsilon],
where [epsilon] is N(0, [[sigma].sup.2]) and the degree j is unknown but is assumed to be such that [less than or equal to] j [less than or equal to] d. To estimate j, a sample of values ([x.sub.i], [y.sub.i]) are obtained for i = 1,...,n. Thus for some j, the observations are generated by
y = [X.sub.j][[beta].sub.j] + [epsilon], (5)
where [[beta].sub.j] = ([[beta].sub.0],...,[[beta].sub.j])', y = ([y.sub.1],...,[y.sub.n])', and [X.sub.j] = (1,x,[x.sup.2],...,[x.sup.j]), with the n X 1 column vector [x.sup.k] given by [x.sup.k] = ([x.sub.1.sup.k],...,[x.sub.n.sup.k])'. Then, under model [M.sub.j],
E(y|[M.sub.j]) = [j.summation over (i=0)][[beta].sub.i][x.sup.i], j = 0, 1,...,d.
The rest of the article is organized as follows. Section 2 introduces three priors for the model space: one that is noninformative and two that favor the parsimony principle with respect to the degree of the polynomial. Section 3 presents three different approaches for computing the posterior probabilities of the models given the available data: the intrinsic Bayes factor (IBF) of Berger and Pericchi (1996b), the fractional Bayes factor (FBF) proposed by O'Hagan (1995), and an approximate method based on the Bayesian information criterion (BIC), proposed by Schwarz (1978). These methods are compared in a Monte Carlo study in Section 4 and using some real data examples in Section 5. Finally, Section 6 gives some concluding remarks.
2. THE PRIOR FOR THE MODELS
We consider three possible choices for the prior distribution p([M.sub.j]). The first choice is the uniform distribution over the set of possible orders,...
NOTE: All illustrations and photos
have been removed from this article.

More articles from Technometrics
Modified semiparametric maximum likelihood estimator in linear regress..., February 01, 2005 ROBPCA: a new approach to robust principal component analysis., February 01, 2005 Combining information across spatial scales., February 01, 2005 Blocked nonregular two-level factorial designs., August 01, 2004 Optimal projective three-level designs for factor screening and intera..., August 01, 2004
Looking for additional articles?
Search our database of over 3 million articles.
Looking for more in-depth information on this industry?
Search our complete database of Industry & Market reports by text, subject, publication
name or publication date.
About Goliath
Whether you're looking for sales prospects, competitive information, company
analysis or best practices in managing your organization,
Goliath can help you meet your business needs.
Our extensive business information databases empower business
professionals with both the breadth and depth of credible,
authoritative information they need to support their business
goals. Whether it be strategic planning, sales prospecting,
company research or defining management best practices -
Goliath is your leading source for accurate information.
|