Home | Industry Information | Business News | Browse by Publication | J | Journal of Computational & Graphical Statistics

Sampling schemes for Bayesian variable selection in generalized linear models.

Publication: Journal of Computational & Graphical Statistics
Publication Date: 01-JUN-04
Format: Online - approximately 9501 words
Delivery: Immediate Online Access

Article Excerpt
Bayesian approaches to prediction and the assessment of predictive uncertainty in generalized linear models are often based on averaging predictions over different models, and this requires methods for accounting for model uncertainty. When there are linear dependencies among potential in a a...

View more below

You can view this article PLUS...

  • Hundreds of the most trusted magazines, newspapers, newswires, and journals (see list)
  • Business news from North America and around the World
  • More than 10 years of article archives
  • Unlimited Access at any time - ONLINE and all in ONE place

Now for a Limited Time, try Goliath Business News - Free for 7 Days!
Tell Me More   Terms and Conditions
Already a subscriber?
Log in to view full article
Purchase this article for $4.95

...predictor variables generalized linear model, existing Markov chain Monte Carlo algorithms for sampling from the posterior distribution on the model and parameter space in Bayesian variable selection problems may not work well. This article describes sampling algorithm based on the Swendsen-Wang algorithm for the Ising model, and which works well when the predictors are far from orthogonality. In problems of variable selection for generalized linear models we can index different models by a binary parameter vector, where each binary variable indicates whether or not a given predictor variable is included in the model. The posterior distribution on the model is a distribution on this collection of binary strings, and by thinking of this posterior distribution as a binary spatial field we apply a sampling scheme inspired by the Swendsen-Wang algorithm for the Ising model in order to sample from the model posterior distribution. The algorithm we describe extends a similar algorithm for variable selection problems in linear models. The benefits of the algorithm are demonstrated for both real and simulated data.

Key Words: Auxiliary variables: Bayesian variable selection: Markov chain Monte Carlo: Reversible jump; Swendsen-Wang algorithm.

1. INTRODUCTION

Bayesian approaches to prediction and assessment of predictive uncertainty in linear and generalized linear models are often based on averaging predictions over different models, and this requires methods for accounting for model uncertainty (Hoeting, Madigan, Raftery, and Volinsky 1999). Continuing developments in Markov chain Monte Carlo (MCMC) methods for Bayesian computation are making it possible to explore model uncertainty in increasingly complex problems, and in this article we discuss Markov chain Monte Carlo sampling algorithms for estimation of posterior model probabilities in generalized linear models.

The sampling schemes we consider are extensions of the algorithm described by Nott and Green (2004) for linear models. Given a set of potential predictor variables in a linear or generalized linear model we can identify each subset of the predictor variables with a binary string (one binary variable for each potential predictor, with a one indicating inclusion in the model and a zero indicating omission). Hence we can think of the posterior distribution on different possible models as a distribution on a set of binary strings [see, e.g., George and McCulloch (1993), for an example of this binary indicator formulation of Bayesian variable selection problems]. If we think of this posterior distribution as a binary spatial field, we can make an analogy between our posterior distribution and the Ising model of statistical physics, and we can use sampling schemes which are similar to the well-known Swendsen-Wang algorithm for the Ising model (Swendsen and Wang 1987) to explore model uncertainty. As for the Swendsen-Wang algorithm, auxiliary variables are introduced which conditionally reduce interactions among components in the posterior distribution. The Swendsen-Wang algorithm is an application of the slice sampler (Damien, Wakefield, and Walker 1999; Mira and Tierney 2002; Neal 2002) and the sampler of Nott and Green (2004) for variable selection in linear models is a Metropolis-Hastings algorithm where proposal distributions are based on consideration of a slice sampler for an approximating Ising model to the posterior distribution with the regression coefficients integrated out. For generalized linear models it is usually not possible to integrate out the regression coefficients analytically, and in this article we extend the original Nott and Green (2004) algorithm by combining the slice sampler with reversible jump MCMC methods to obtain a variable selection algorithm for generalized linear models. Further details are given in the next section.

Our algorithm improves on existing algorithms for sampling the posterior distribution on the model and parameter space for generalized linear models, especially in the situation where there are near linear dependencies among potential predictors. When the predictors are far from orthogonality conventional sampling schemes for exploring the posterior distribution on the model and parameters for generalized linear models may perform poorly. A review of various sampling schemes for Bayesian variable selection is given by Dellaportas, Forster, and Ntzoufras (2002). The sampling schemes they discuss are all variants of general methodologies for Bayesian computation in model selection problems described by Carlin and Chib (1995) and Green (1995). The algorithms can be divided into local strategies that change only one variable and one regression coefficient at a time, and more global strategies where more general moves are allowed. Both kinds of strategies can be developed in the framework of the reversible jump or Metropolized versions of the Carlin and Chib sampler, although the choice of proposal distribution can be a difficult issue when attempting global type moves. The local strategies work well when potential predictor variables are orthogonal or nearly so, and where coefficients for predictors have a similar interpretation across different models. Here we are interested in situations where there are near linear dependencies among the predictors and where more global schemes are necessary.

The structure of the article is as follows. In the next section we discuss the problem of variable selection for generalized linear models and describe an MCMC sampling scheme which is inspired by the Swendsen-Wang algorithm, an auxiliary variable MCMC algorithm for the Ising model. Here we use a BIC approximation for the posterior distribution on the model space to define the auxiliary variables in our algorithm before sampling from the exact joint distribution over the model space, regression coefficients, and auxiliary variables using the reversible jump method of Green (1995). The novelty of this article is in the way we combine the idea of the algorithm of Nott and Green (2004) for Bayesian linear models with reversible jump Markov chain Monte Carlo to obtain a sampling scheme for Bayesian variable selection in generalized linear models which is applicable when the regression coefficients cannot be integrated out of the posterior distribution analytically. In Section 3, we discuss empirical performance of our sampling scheme in some examples, and in Section 4 we give some concluding remarks.

2. BAYESIAN VARIABLE SELECTION FOR GENERALIZED LINEAR MODELS

2.1 NOTATION AND PRIOR DISTRIBUTIONS

Let [y.sub.1], ...,[y.sub.n], be n observations of a response variable, and let [x.sub.1], ...,[x.sub.n] be corresponding observations of a p x 1 set of predictor variables which are thought to contain information about the response. In a generalized linear model it is assumed that the response distribution comes from the exponential family. Writing p([y.sub.i]; [[theta].sub.i], [phi]) for the distribution of the ith response, i = 1, ..., n, we have

p([y.sub.i]; [[theta].sub.i], [phi]) = exp [([A.sub.i]{[y.sub.i][[theta].sub.i] - b([[theta].sub.i])}]/[phi] + c([y.sub.i], [phi]/[A.sub.i])),

where b(*) and c(*) are known functions, and [[theta].sub.i] and [phi] are parameters with [phi] common to the distributions of all the [y.sub.i]. The values [A.sub.i] are known weights, and in what follows we assume that [phi] is known (this is always true for binomial and Poisson response distributions, for example, where [phi] = 1). The mean [[mu].sub.i] of [y.sub.i] is directly related to the parameter [[theta].sub.i]:

[[mu].sub.i] = b'([[theta].sub.i]).

Let [x.sub.i] = [([x.sub.i1], ..., [x.sub.ip]).sup.T] be the values of p predictors for the ith response, and let X be an n x p design matrix with ith row [x.sub.i].

We describe dependence of [[mu].sub.i] on [x.sub.i] by

g([[mu].sub.i) = [x.sup.T.sub.i] [beta],

where [beta] = [([[beta].sub.1], ..., [[beta].sub.p]).sup.T] is a set of parameters and g(*) is a smooth monotone function called the link function. We assume the link function is known in what follows: see Ntzoufras, Dellaportas, and Forster (2001) and the references therein for methods for describing uncertainty about the link function in a Bayesian framework.

To make inferences about [beta] we use a hierarchical prior on [beta] which gives a positive prior probability to having some of the components of [beta] equal to zero. This allows variable selection. Using the notation of Dellaportas, Forster, and Ntzoufras (2002), let [gamma] = [([gamma].sub.1], ..., [[gamma].sub.p]).sup.T] be a set of binary variables where [[gamma].sub.i] indicates inclusion or omission of variable i from the model ([[beta].sub.i] = if and only if [[gamma].sub.i] = 0, [[beta].sub.i] [not equal to] if [[gamma].sub.i] = 1). Let[[beta].sub.[gamma]] be the vector of nonzero components of [beta], and let [X.sub.[gamma]] be the design matrix obtained by deleting from X those columns i for which [[gamma].sub.i] = 0. Given [gamma], the prior on[[beta].sub.[gamma]] is normal,

[[beta].sub.[gamma]| ~ N ([[mu].sub.[gamma], [[SIGMA].sub.[gamma]).

In the absence of prior information about [[beta].sub.[gamma]], we set [[mu].sub.[gamma]] to be the zero vector....

NOTE: All illustrations and photos have been removed from this article.



More articles from Journal of Computational & Graphical Statistics
Automatic smoothing with wavelets for a wide class of distributions., June 01, 2004
Optimal pair matching with two control groups., June 01, 2004
Variable length Markov chains: methodology, computing, and software., June 01, 2004
An algorithm for a letter-based representation of all-pairwise compari..., June 01, 2004
Computation of confidence regions for optimal factor levels in constra..., June 01, 2004

Looking for additional articles?
Search our database of over 3 million articles.

Looking for more in-depth information on this industry?
Search our complete database of Industry & Market reports by text, subject, publication name or publication date.

About Goliath
Whether you're looking for sales prospects, competitive information, company analysis or best practices in managing your organization, Goliath can help you meet your business needs.

Our extensive business information databases empower business professionals with both the breadth and depth of credible, authoritative information they need to support their business goals. Whether it be strategic planning, sales prospecting, company research or defining management best practices - Goliath is your leading source for accurate information.