|
Article Excerpt 1. INTRODUCTION
It is common in the social sciences to encounter [2.sup.n] contingency tables, where n can be as large as several hundreds. These tables arise from, for instance, collecting the responses of a sample of individuals to a survey, a personality inventory, or an educational test consisting of n items, each with two possible responses. For instance, Chang, D'Zurilla, and Maydeu-Olivares (1994) considered modeling the responses of 393 individuals to the Beck Hopelessness Scale (BHS) (Beck, Weissman, Lester, and Trexler 1974), a set of n = 20 true-or-false questions used to predict depression, suicidal ideation, and suicidal intent. There are [2.sup.20] (>[10.sup.6]) cells in the contingency table.
A researcher confronted with the problem of modeling such a [2.sup.n] contingency table faces several challenges. Perhaps the most important challenge is how to assess the overall goodness of fit of the hypothesized model. For large n, binary contingency tables most often become sparse, and the empirical type I error rates of [X.sup.2] and [G.sup.2] test statistics do not match their expected rates under their asymptotic null distribution. This problem can be overcome by generating the empirical sampling distribution of the statistic using the parametric bootstrap method (e.g., Collins, Fidler, Wugalter, and Long 1993; Bartholomew and Tzamourani 1999). However, this approach may be very time-consuming if the researcher is interested in comparing the fit of several models.
If, as is often the case, the overall tests suggests significant misfit, then a second challenge that a researcher must confront is to identify the source of the misfit. Inspecting cell residuals is often not very useful toward this aim. It is difficult to find trends when inspecting these residuals, and even for moderate n the number of residuals that need to be inspected is too large. And perhaps most important, Bartholomew and Tzamourani (1999) pointed out that because the cell frequencies are integers and the expected frequencies in large tables must be very small, the resulting residuals will be either very small or very large. To overcome these two challenges, numerous authors, particularly in psychometrics, have advocated using residuals for pairs and triplets of variables to assess the goodness of fit in [2.sup.n] contingency tables. Key references in this literature include works by Reiser (1996), Reiser and Lin (1999), Reiser and VandenBergh (1994), Bartholomew and Tzamourani (1999), and Bartholomew and Leung (2002).
A third challenge that a researcher may face when dealing with large binary tables is a parameter estimation problem. Take, for instance, latent trait models (for an overview, see Bartholomew and Knott 1999), which are extremely popular in the social sciences. If the distribution of the latent traits is assumed to be multivariate normal, as is most often the case, then computing the binary pattern probabilities becomes very difficult as the number of latent traits increases. However, estimation for these models using only univariate and bivariate information is relatively straightforward. There is a long tradition in psychometrics of using estimation methods that use information only from the low-order marginals of the table (e.g., Christoffersson 1975; Muthen 1978, 1984, 1993). Here we refer to testing and estimation methods that use only the information contained in the low-order margins of the contingency table as limited-information methods. There have also been some proposals in statistics in using limited-information methods (Joe 1997, chap. 10). Limited-information methods naturally yield limited-information testing procedures, whose asymptotic properties are well known (see Christoffersson 1975; Muthen 1978, 1993; Maydeu-Olivares 2001). However, the asymptotic distribution of full-information test statistics when the parameters have been estimated using limited-information procedures has never been studied.
What is needed is a unified treatment of limited- and full-information estimation and testing in [2.sup.n] contingency tables. We provide such a framework in this article under multivariate Bernoulli (MVB) sampling. In Section 2 we provide a convenient representation of the MVB distribution using its joint moments. From the asymptotic distribution of sample joint moments (marginal proportions), we obtain the asymptotic distribution of marginal residuals. In Section 3 we propose a family of limited information quadratic form statistics based on these marginal residuals to assess the goodness of fit of simple null hypotheses. These statistics are asymptotically chi-squared distributed under the null hypothesis, and Pearson's full-information [X.sup.2] statistic is a special case of this family. In Section 4, we extend the results of Section 3 to composite null hypotheses, the common situation for applications. We consider two classes of estimators: minimum variance full-information estimators, such as maximum likelihood, and consistent and asymptotically normal estimators, including limited-information estimators. We propose a family of limited-information goodness-of-fit test statistics whose members are asymptotically chi-squared distributed for both classes of estimators. To study asymptotic power of our new statistics, we derive results for the asymptotic distribution under a sequence of local alternatives for testing one form of a nested null model. In Section 5 we propose a family of limited-information estimators that is closely linked to our proposed family of limited-information goodness-of-fit tests. These estimators are computationally advantageous when the multivariate binary probabilities are difficult to compute. We show that these estimators are highly efficient for one common latent trait model. In Section 6 we include an example of binary item response data from Bartholomew and Knott (1999) and a summary from the BHS to illustrate our results. Finally, in Section 7 we provide conclusions and a discussion of further research.
2. MULTIVARIATE BERNOULLI DISTRIBUTIONS AND ASYMPTOTIC DISTRIBUTION OF SAMPLE MOMENTS
In this section we characterize the MVB distribution in terms of multivariate moments and define the notation used in the remainder of the article. Consider an n-dimensional random vector Y = ([Y.sub.1],..., [Y.sub.n])' of Bernoulli random variables, with [dot.[pi].sub.i] = Pr([Y.sub.i] = 1), i = 1,..., n, and joint distribution
[[pi].sub.y] = Pr([Y.sub.i] = [y.sub.i], i = 1,..., n),
y = ([y.sub.1],..., [y.sub.n]), [y.sub.i] [member of] {0, 1}. (1)
When we consider a parametric model with parameter vector [theta], we write [[pi].sub.y]([theta]) for an individual probability and [pi]([theta]) for the vector of [2.sup.n] joint probabilities. One convenient way of ordering the elements of [pi]([theta]) is by order of the values of y'1 = 0, 1,..., n, and by lexicographic ordering within a constant sum. An example with n = 3 is given later.
The n-variate Bernoulli distribution may be alternatively characterized by the ([2.sup.n] - 1)-dimensional vector [dot.[pi]] of its joint moments (Teugels 1990), [dot.[pi]'] = ([dot.[pi]'.sub.1], [dot.[pi]'.sub.2],..., [dot.[pi]'.sub.n])', where [dot.[pi]'.sub.1] = ([dot.[pi].sub.1],..., [dot.[pi].sub.n]), [dot.[pi].sub.2] is the ([MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII])-dimensional vector of bivariate noncentral moments with elements E([Y.sub.i][Y.sub.j]) = Pr([Y.sub.i] = 1, [Y.sub.j] = 1) = [dot.[pi].sub.ij], j < i, and so on, up to [dot.[pi].sub.n] = E([Y.sub.1] ... [Y.sub.n]) = Pr([Y.sub.1] = ... = [Y.sub.n] = 1).
There is a ([2.sup.n] - 1) X [2.sup.n] matrix T of 1's and 0's, of full row rank, such that [dot.[pi]] = T[pi]. T is an upper triangular matrix if [pi] is ordered as described earlier. For example, for n = 3, we have
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].
The first column of T is a column of 0's, so we can partition T = (0 [dot.T]) and write [dot.[pi]] = [dot.T][v.[pi]] with [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Because [[pi].sub.0 ... 0] = 1 - 1'[v.[pi]], the inverse relationship between [dot.[pi]] and [pi] is
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].
Alternatively, T can be partitioned according to the partitioning of [dot.[pi]],
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].
Furthermore, the vector of joint moments of the MVB distribution up to order r [less than or equal to] n, denoted by [[pi].sub.r] = ([dot.[pi]'.sub.1],..., [dot.[pi]'.sub.r])', can be written as
[[pi].sub.r] = [T.sub.r][pi],
where [T.sub.r] = ([T'.sub.n1],..., [T'.sub.nr])'. Note that by definition, [[pi].sub.n] = [dot.[pi]].
For a random sample of size N from (1), let p and [dot.p] denote the [2.sup.n]-dimensional vector of cell proportions, and the ([2.sup.n] - 1)-dimensional vector of sample joint moments. Then we have
[square root of N]([dot.p] - [dot.[pi]]) = T[square root of N](p - [pi]). (2)
Because
[square root of N](p - [pi]) [d.[right arrow]] N(0, [GAMMA]),
where [GAMMA] = D - [pi][pi]' and D = diag ([pi]) (Agresti 1990), it follows from (2) that
[square root of N]([dot.p] - [dot.[pi]]) [d.[right arrow]] N(0, [XI]), [XI] = T[GAMMA]T'.
Let [dot.p.sub.a] and [dot.p.sub.b] be any two elements of [dot.p] (not necessarily univariate proportions). Then the elements of [XI] are of the form N var([dot.p.sub.a]) = [dot.[pi].sub.a](1 - [dot.[pi].sub.a]) and N cov([dot.p.sub.a], [dot.p.sub.b]) = [dot.[pi].sub.a[union]b] - [dot.[pi].sub.a][dot.[pi].sub.b], so that, for example, when n [greater than or equal to] 3, for i [not equal to] j, j = k, N var([dot.p.sub.ij]) = [dot.[pi].sub.ij](1 - [dot.[pi].sub.ij]) and N cov([dot.p.sub.ij], [dot.p.sub.k]) = [dot.[pi].sub.ij] - [dot.[pi].sub.ij][dot.[pi].sub.k] = [dot.[pi].sub.ij](1 - [dot.[pi].sub.k]); whereas for i, j, and k distinct, N cov([dot.p.sub.ij], [dot.p.sub.k]) = [dot.[pi].sub.ijk] - [dot.[pi].sub.ij][dot.[pi].sub.k].
Also, let [p.sub.r] be the vector of sample moments up to order r, with dimension [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Then we have
[square root of N]([p.sub.r] - [[pi].sub.r]) [d.[right arrow]] N(0, [[XI].sub.r]), [[XI].sub.r] = [T.sub.r][GAMMA][T'.sub.r]. (3)
Because [T.sub.r] is of full row rank s, [[XI].sub.r] is also of full rank s (see Rao 1973, p. 30).
3. LIMITED INFORMATION TESTS OF SIMPLE NULL HYPOTHESES
Consider a simple null hypothesis [H.sub.0]:[pi] = [[pi].sub.0] versus [H.sub.1]:[pi] [not equal to] [[pi].sub.0]. The two statistics most widely used in this situation are the likelihood ratio test statistic, [G.sup.2] = 2N [[summation].sub.c][p.sub.c] ln[[p.sub.c]/[[pi].sub.c]], and Pearson's test statistic, [X.sup.2] = N[[summation].sub.c]([p.sub.c] - [[pi].sub.c])[.sup.2]/([[pi].sub.c]). Under the null hypothesis (e.g., Agresti 1990), [G.sup.2] = [X.sup.2] + [o.sub.p](1) [d.[right arrow]] [[chi square].sub.[2.sup.n]-1]. However, in sparse tables, when N/[2.sup.n] is small, the empirical distribution of these statistics is not well approximated by their limiting chi-squared distribution (e.g., Koehler and Larntz 1980), with [X.sup.2] having a better small-sample performance than [G.sup.2].
The poor approximation of [X.sup.2] to its reference asymptotic distribution in sparse [2.sup.n] tables can be attributed to the fact that the mean and variance of its reference asymptotic distribution are [2.sup.n] - 1 and 2([2.sup.n] - 1), but E([X.sup.2]) = [2.sup.n] - 1 and var([X.sup.2]) = 2([2.sup.n] - 1) + [N.sup.-1][2 - 2 * [2.sup.n] - [2.sup.2n] + [[summation].sub.c][[pi].sub.c.sup.-1]] (Read and Cressie 1988, pp. 176-179). Thus the discrepancy between the empirical variance of [X.sup.2] and its variance under its reference asymptotic distribution can be large when some probabilities [[pi].sub.c] are small, and for sparse tables with [[pi].sub.c] [much less than] [2.sup.-n], the type I error [X.sup.2] will be larger than the [alpha] level based on its asymptotic critical value.
We show in the Appendix that [X.sup.2] is a member (with r = n) of the...
|