Home | Industry Information | Business News | Browse by Publication | J | Journal of Computational & Graphical Statistics

Bayesian inference for the RC(m) association model.

Publication: Journal of Computational & Graphical Statistics
Publication Date: 01-MAR-05
Format: Online - approximately 9224 words
Delivery: Immediate Online Access

Article Excerpt
Describing the structure in a two-way contingency table in terms of an RC(m) association model, we are concerned with the computation of posterior distributions of the model parameters using prior distributions which take into account the nonlinear restrictions of the model. We are further of...

View more below

Read this article now - Try Goliath Business News - FREE!   
You can view this article PLUS...

  • Over 5 million business articles
  • Hundreds of the most trusted magazines, newswires, and journals (see list)
  • Premium business information that is timely and relevant
  • Unlimited Access

Now for a Limited Time, try Goliath Business News - Free for 7 Days!
Tell Me More   Terms and Conditions

Purchase this article for $4.95

Already a subscriber? Log in to view full article

...involved with the determination the order of association m, based on Bayesian arguments. Using projection methods, a prior distribution over the parameters of the simpler RC(m) model is induced from a prior of the parameters of the saturated model. The fit of the assumed RC(m) model is evaluated using the posterior distribution of its distance from the full model. Our methods are illustrated with a popular dataset.

Key Words: Contingency tables; Kullback--Leibler projection: MCMC methods.

1. INTRODUCTION

Let [PI] = ([[pi].sub.ij]) be the I x J probability table corresponding to a frequency table y : ([y.sub.ij]) with n = [[summation of].sup.I.sub.i=1] [[summation of].sup.I.sub.j=1] [y.sub.ij], that is produced by the cross-classification of two categorical variables with I and J categories, respectively. The underlying sampling scheme can be either multinomial with expected cell frequencies [[epsilon].sub.ij] = n[[pi].sub.ij]) or independent Poisson for each cell of the table with means [[epsilon].sub.ij]. The saturated log-linear model is expressed as

(1.1) [[eta].sub.ij] = log([[zeta].sub.ij]) = [lambda] + [[lambda].sup.(1).sub.i] + [[lambda].sup.(2).sub.j] + [[lambda].sup.(12).sub.ij] i = 1, ..., I, j = 1, ..., J,

where the row and column main effects, [[lambda].sup.(1).sub.i] and [[lambda].sup.(2).sub.j] sum over i and j, respectively, to zero. The zero-sum constraints (over i and j) hold also for the interaction parameters [[lambda].sup.(12).sub.ij]). The interaction terms matrix is of full rank [M.sup.*] = min (I, J) - 1. Hence, we may consider

its generalized singular value decomposition

[[lambda].sup.(12).sub.ij] = [[M.sup.*].summation over k=1] [[phi].sub.k][[mu].sub.ik][v.sub.jk],

where [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] and the row and column scores [[mu].sub.ik], [v.sub.jk] satisfy the constraints

(1.2) [I.summation over i=1][w.sub.1i][[mu].sub.ik] = [J.summation over j=1][w.sub.2j][v.sub.jk] = 0, k = 1, ..., [M.sup.*],

[I.summation over i=1][w.sub.1i][[mu].sub.ik][[mu].sub.il] = [J.summation over j=1][w.sub.2j][v.sub.jk][v.sub.jl] = [[delta].sub.kl], k, l = 1, ..., [M.sup.*].

[w.sub.1i], [w.sub.2j] are some positive weights and [[delta].sub.kl] denotes Kronecker's [delta]. Common values of the weights are [w.sub.1i] = [w.sub.2j] = 1, for all i, j or [w.sub.1i] = [[pi].sub.i]. and [w.sub.2j] = [pi]. j, i -= 1, ..., I, j = 1, ..., J. This decomposition leads to a reparameterization of (1.1) called RC([M.sup.*]) association model. If we retain only the first m eigenvalues, assuming that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] we obtain the association model RC(m) of mth order (Goodman 1985)

(1.3) [[eta].sub.ij] = log([[xi].sub.ij] = [lambda] + [[lambda].sup.(1).sub.i] + [[lambda].sup.(2).sub.j] + [m.summation over k=1] [[phi].sub.k][[mu].sub.ik][v.sub.jk], i = 1, ..., I, j = 1, ..., j.

The representation of the log-odds ratio for any 2x2 subtable formed from rows i and i' and columns j and j' as a sum of m components

log ([[pi].sub.ij][[pi].sub.i'j']/[[pi].sub.ij'][[pi].sub.i'j]) = [m.summation over k=1] [[phi].sub.k]([[mu].sub.ik] - [[mu].sub.i'k])([v.sub.jk] - [v.sub.j'k])

offers an insight into the meaning of these parameters. Another interpretation of the quantity [[phi].sub.m] is provided by expressing it as a contrast of the parameters [[eta].sub.ij]'s (Goodman 1991)

[[phi].sub.m] = [I.summation over i=1][J.summation over j=1][[mu].sub.im][v.sub.jm][[eta].sub.ij].

For m = 1, (1.3) reduces to the multiplicative row-column association model RC. In the case of ordered contingency tables, if we assign to the rows and columns of the table scores that reflect their ordinality, we obtain the uniform association model, with one parameter additional to the independence model. Bayesian methods that incorporate the category orderings have been proposed by Chuang (1982); Agresti and Chuang (1989); Evans, Gilula, and Guttman (1993); and Albert (1997). Agresti and Chuang (1989) considered a prior distribution for the cell proportions or for the parameters of the saturated log linear model and chose the components of the prior mean to satisfy a simple model for ordinal data such as the uniform association model. Evans et al. (1993) set a prior distribution for the parameters of the saturated log linear model and then they estimated the posterior distribution of the parameters of the RC(1) model using the values that minimize the Euclidean squared distance between the interaction terms of the two models. As a referee pointed out, this approach can be extended in RC(m) models. However, their approach can be thought of as only an approximation of the correct posterior distribution. Furthermore, such an approach does not allow for the implementation of the full Bayesian inference including evaluation of the posterior model weights and implementation of Bayesian model averaging.

The contribution of this article is two-fold. First, we present a Markov chain Monte Carlo (MCMC) algorithm that is used to compute the posterior distributions of the parameters of the RC(m) model. Second, we are concerned with the assessment of the fit of such models. The methodology we introduce in this article is important because it can be considered as the first step for full Bayesian analysis in RC(m) models. Further work on this aspect may include incorporation of prior information, estimation of posterior model probabilities, and Bayesian model averaging. In Section 2, we propose vague prior distributions for the parameters of the RC(m) model that take into account the nonlinear restrictions of the model and we describe in detail the MCMC algorithm. We proceed in Section 3 using projection methods to evaluate the fit of the interaction components and determine the number of components that are required to produce a reasonable approximation of the data. We propose rules to assess the fit of the RC(m) model based on the posterior distribution of its Kullback-Leibler distance from the saturated RC([M.sup.*]). We use a popular dataset to illustrate our results in Section 4. We conclude with a discussion on the issues of incorporation and elicitation of prior information as well as the extension of the MCMC methodology over parameter spaces of different dimension.

2. BAYESIAN ESTIMATION OF THE RC(m) MODEL THROUGH MARKOV CHAIN MONTE CARLO

Without loss of generality, we consider the Poisson sampling. The likelihood function of the corresponding RC(m) model is

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

where [[eta].sub.ij] is given by (1.3), [[LAMBDA].sup.(1)] = ([[lambda].sup.(1).sub.i]) and [[LAMBDA].sup.(2)] = ([[lambda].sup.(2).sub.i]) are the I x 1 and J x 1 vectors of the row and column main effects, [M.sub.m] = ([[mu].sub.ik]), [N.sub.m] = ([v.sub.jk]) are the I x m and J x m, matrices of the row and columns scores, and [[PHI].sub.m] = diag([[phi].sub.1], ..., [[phi].sub.m]). For the parameters of the main effects we use sum-to-zero constraints and fairly "vague" normal prior distributions. Therefore we set

[[lambda].sup.(1).sup.1] = -[I.summation i=2] [[lambda].sup.(1).sup.i], [[lambda].sup.(2).sup.1] = -[J.summation j=2] [[lambda].sup.(2).sup.j]

and we assume that [lambda] ~ Normal(0, [[sigma].sup.2.sub.0]), [[lambda].sup.(2).sup.i] ~ Normal (0, [[sigma].sup.2.sub.1]), [[lambda].sup.(2).sup.j] ~ Normal [[sigma].sup.2.sub.2] for i = 2, ..., I and j = 2, ..., J. In our examples we select [[sigma].sup.2.sub.0] = [[sigma].sup.2.sub.1] = [[sigma].sup.2.sub.2] = 1,000.

For the distribution of [[phi].sub.1] we use the log-normal with prior mean [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] and variance [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] of the log [[phi].sub.1]. In our example we set [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] = 1,000. All the other conditional distributions f([[phi].sub.k]|[[phi].sub.k-1]), k = 2, ..., m, are defined as uniform distributions on the intervals (0, [[phi].sub.k-1]). Hence, the joint prior distribution is given by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]

where I(X) is equal to one if X is true and zero otherwise.

For the row and column scores [M.sub.m] and [N.sub.m], respectively, we use the uniform prior suggested by Viele and Srinivasan (2000) in the context of two-way analysis of variance models with multiplicative interaction terms. The columns [[mu].sub.k] = [([[mu].sub.1k], ..., [[mu].sub.Ik]).sup.T] k = 1, ..., m of the matrix [M.sub.m] (similarly for [N.sub.m]) are generated successively to satisfy the constraints (1.2). Each score vector has unit length and therefore it can be visualized as a point on the surface of a hypersphere of unit radius; its elements sum to zero implying that it is orthogonal to the unit vector and they are also orthogonal to each other. Hence, we can regard the first column vector [[mu].sub.1] as a point on the surface of a sphere that is the intersection of an I dimensional hypersphere [S.sub.I] of unit radius with a hyperplane in [R.sup.I] orthogonal to the unit vector. The additional score vectors need to be also orthogonal...

NOTE: All illustrations and photos have been removed from this article.



More articles from Journal of Computational & Graphical Statistics
Kernel logistic regression and the import vector machine., March 01, 2005
Regression tree analysis using TARGET., March 01, 2005
Multicategory [psi]-learning and support vector machine: computational..., March 01, 2005
Generalized partially linear measurement error models., March 01, 2005
Classification of mixtures of spatial point processes via partial Baye..., March 01, 2005

Looking for additional articles?
Search our database of over 3 million articles.

Looking for more in-depth information on this industry?
Search our complete database of Industry & Market reports by text, subject, publication name or publication date.

About Goliath
Whether you're looking for sales prospects, competitive information, company analysis or best practices in managing your organization, Goliath can help you meet your business needs.

Our extensive business information databases empower business professionals with both the breadth and depth of credible, authoritative information they need to support their business goals. Whether it be strategic planning, sales prospecting, company research or defining management best practices - Goliath is your leading source for accurate information.