|
Article Excerpt 1. INTRODUCTION
It is well known that the F test is robust to the normality assumption if the number of factor levels or groups is small (fixed) and the sample or group sizes are large (tend to infinity); see Arnold (1980). In this setting the theory of weighted least squares statistics is also well understood; see Arnold (1981, chap. 13). However, the case where the number of factor levels is large (tends to infinity) is still not adequately developed.
When the number of levels, a, goes to infinity, the asymptotic distribution of the F statistic F = MST/MSE, where MST is the mean square for treatment and MST is the mean square for error, is found by obtaining the asymptotic distribution of [a.sup.1/2](F - 1). The MSE typically converges in probability to a constant and thus, by Slutsky's theorem, the foregoing expression reduces to finding the asymptotic distribution of [a.sup.1/2](MST - MSE). Boos and Brownie (1995) presented some results in this direction, but used a specialized technique applicable only to few models. Because MST - MSE is a quadratic form, the most direct way to find its asymptotic distribution is to apply results for the asymptotic normality of quadratic forms; see de Jong (1987) and Jiang (1996) and references therein. However, it is not straightforward to apply these results. Akritas and Arnold (2000) developed an approach that is based on finding the joint limiting distribution of (MST, MSE). With this technique they covered a very general class of models and also obtained the asymptotic distribution of the statistics under fixed alternatives. Independently and using different asymptotic techniques, Bathke (2002) also generalized the results of Boos and Brownie (1995) to fixed effects balanced multifactor designs.
The aforementioned results all pertain to the homoscedastic case with small (fixed) group sizes. However, the assumption that a large number of populations are homoscedastic is difficult to ascertain when the group size from each population is small. As demonstrated by Scheffe (1959, chap. 10), the F test is sensitive to departures from the homoscedasticity assumption, particularly in the unbalanced case. The homoscedastic procedure based on the asymptotic theory for large a is equally sensitive. For example, 1,000 simulation replications with a = 30 levels, group sizes 14, 15, 7, 4, 4, 4, 4, 6, 5, 4, 4, 4, 6, 5, 5, 4, 6, 6, 8, 4, 5, 4, 5, 6, 4, 5, 4, 6, 4, 5, and heteroscedastic normal errors with corresponding variances (1 + .133i)[.sup.2], i = 1,..., 30, at [alpha] = .05, yielded achieved [alpha] levels of .141 and .178 for the classical F test and that based on Theorem 2.2(a), respectively. [For the same setting but homoscedastic errors, the procedure of Theorem 2.2(a) achieved an [alpha] level of .070.] For the same setting, the unweighted heteroscedastic test procedure for large a (see Theorem 2.5), achieved an [alpha] level of .073. [Also see Remark 2.3(ii) for a similar simulation in the balanced case.]
Even under homoscedasticity, the usual F test is not asymptotically valid in the unbalanced case if the group sizes are small; see Section 2.1. Although an asymptotically valid procedure using the F-test statistic is provided in Section 2.1, it requires estimation of the fourth moment.
The purpose of the present article is to provide test procedures that are valid and perform well in unbalanced and/or heteroscedastic situations when a tends to infinity. We consider both the classical weighted statistic and an unweighted statistic that appears to be new. Using exact calculations under normality, we demonstrate that the classical weighted statistic is very unstable if the group sizes are small, which explains Krutchkoff's (1989) observation. Asymptotic approximation to the distribution of the weighted statistic requires the average group size to tend to infinity faster than [a.sup.1/2]. The procedure that uses the new unweighted statistic is applicable also with small group sizes. Its asymptotic and small sample properties are preferable to those of the procedure based on the F-test statistic, even in the homoscedastic case. Indeed, it does not require estimation of the fourth moment and its asymptotic theory uses weaker conditions.
The technique we apply is based on an application of the projection principle. It allows us to study, directly and elegantly, the asymptotic null distribution of the quadratic form MST - MSE. The novelty of the technique rests on the choice of the class of random variables onto which to project. Although for simplicity we focus on the one-way model, it is rather obvious that the projection principle and the idea of choosing the class of variables onto which to project, applies to general multifactor models; see Wang (2003) and Wang (2004). The basic technique is demonstrated for the homoscedastic case, where the transparency of the method permits derivation of the asymptotic theory under weaker assumptions on the moments and group sizes in the unbalanced case than those of Akritas and Arnold (2000). In doing so, we also consider the case where the group sizes are allowed to tend to infinity together with the number of levels. This setting was also considered by Portnoy (1984), but from the M-estimation point of view.
The one-way layout F statistic coincides with the lack-of-fit statistic for testing the hypothesis of constant regression against a general alternative with replicated observations, and thus the present results have direct bearing on this problem as well. The literature of lack-of-fit testing in regression is quite extensive: see Eubank and Hart (1992), Muller (1992), Hardle and Mammen (1993), Hart (1997), and Dette and Munk (1998), to mention a few. It is quite interesting that the asymptotic validity of the common lack-of-fit test in the case of replicated observations has never been considered. Our study of local alternatives reveals that the classical lack-of-fit test with replicated observations cannot detect alternatives that converge to the null hypothesis at rate [a.sup.-1/2], but rather at rates that resemble those in the nonparametric literature. Given the calculation in the case of normal variables with known variance presented in Fan (1996), this is not surprising.
Section 2 gives the test statistics and their limiting null distribution with some comments on their performance. In Section 3 we present the projection method in the context of quadratic forms. Section 4 gives asymptotic results under local alternatives. Some simulation results are discussed in Section 5. Proofs of the results presented in Sections 2 and 4 are given in the Appendix.
In all that follows, [X.sub.ij], i = 1,..., a, j = 1,..., [n.sub.i], denotes a double sequence of independent random variables, [S.sub.i.sup.2] = ([n.sub.i] - 1)[.sup.-1] [[summation].sub.j=1.sup.[n.sub.i]] ([X.sub.ij] - [bar.X.sub.i.])[.sup.2] and
MST = [1/[a - 1]][a.summation over (i=1)][n.sub.i]([bar.X.sub.i.] - [bar.X.sub...])[.sup.2],
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], (1.1)
[F.sub.a] = MST/MSE,
where [bar.X.sub.i.] = [n.sub.i.sup.-1][[summation].sub.j=1.sup.[n.sub.i]] [X.sub.ij] and [bar.X.sub...] = [N.sup.-1] [[summation].sub.i=1.sup.a] [[summation].sub.j=1.sup.[n.sub.i]] [X.sub.ij] with N = [n.sub.1] +...+ [n.sub.a]. In case the group sizes [n.sub.i] = [n.sub.i](a) [right arrow] [infinity] as a [right arrow] [infinity], we also write [S.sub.i.sup.2](a) and N(a).
2. MAIN RESULTS
In this section we present and discuss the asymptotic null distribution of the proposed test statistics. Corresponding results under local alternatives are stated in Section 4.
2.1 Homoscedastic Models
Let [U.sub.a] have an [F.sub.a-1,N-a] distribution, which is the distribution of [F.sub.a] under homoscedasticity and normality. Let also N/a [right arrow] b, [a.sup.-1][[summation].sub.i=1.sup.a][n.sub.i.sup.-1] [right arrow] [b.sub.1]; thus, in the balanced case, b = n and b[b.sub.1] = 1. It is easily verified that if b < [infinity], [a.sup.1/2]([U.sub.a] - 1) [d.[right arrow]] N(0, 2b/(b - 1)) as a [right arrow] [infinity], and if N/a also tends to infinity with a, then [a.sup.1/2]([U.sub.a] - 1) [d.[right arrow]] N(0, 2). Thus, Theorem 2.1 asserts that the usual F test is asymptotically, as a [right arrow] [infinity], correct in the balanced homoscedastic case even without the normality assumption. Theorem 2.2, however, shows that the usual F procedure for unbalanced models is robust to departures from the normality assumption only if b[b.sub.1] = 1 or the group sizes are also large.
Theorem 2.1 (Balanced case). Let [X.sub.ij], i = 1,..., a, j = 1,..., n, be an iid (independent, identically distributed) sequence of random variables with E[X.sub.ij] = [mu] and < Var [X.sub.ij] = [[sigma].sup.2] < [infinity].
(a) If n [greater than or equal to] 2 remains fixed, then
[a.sup.1/2]([F.sub.a] - 1) [d.[right arrow]] N(0, [2n/[n - 1]]) as a [right arrow] [infinity].
(b) If n = n(a) [right arrow] [infinity], as a [right arrow] [infinity], then
[a.sup.1/2]([F.sub.a] - 1) [d.[right arrow]] N(0, 2) as a [right arrow] [infinity].
Note that the result of part (b) of Theorem 2.1 is easily guessed from part (a). Its proof, however, involves a rather interesting application of the Lindeberg condition.
Theorem 2.2 (Unbalanced case). Let [X.sub.ij], i = 1,..., a, j = 1,..., [n.sub.i], be an iid sequence of random variables with E[X.sub.ij] = [mu], < Var [X.sub.ij] = [[sigma].sup.2] < [infinity].
(a) If, for some [delta] > 0, E|[X.sub.ij]|[.sup.4+[delta]] < [infinity], [sup.sub.a[greater than or equal to]1][a.sup.-1] X [[summation].sub.i=1.sup.a] [n.sub.i.sup.4+[delta]] < [infinity],
[bar.n] = [bar.n](a) = [1/a][a.summation over (i=1)][n.sub.i] [right arrow] b [member of] (1, [infinity]),
and
[1/a][a.summation over (i=1)][1/[n.sub.i]] [right arrow] [b.sub.1] as a [right arrow] [infinity],
then
[a.sup.1/2]([F.sub.a] - 1) [d.[right arrow]] N(0, [[tau].sup.2]) as a [right arrow] [infinity],
where, letting [[mu].sub.4] = E[([X.sub.ij] - [mu])[.sup.4]/[[sigma].sup.4]],
[[tau].sup.2] = [2b/[b - 1]] + ([[mu].sub.4] - 3)[[b(b[b.sub.1] - 1)]/[(b - 1)[.sup.2]]].
(b) Let [n.sub.i] = [n.sub.i](a), and set n(a) = min{[n.sub.i](a); i = 1,..., a} and [kappa](a) = max{[n.sub.i](a); i = 1,..., a}. Assume that
n(a) [right arrow] [infinity] as a [right arrow] [infinity],
and
[kappa](a)/n(a) [less than or equal to] C < [infinity] for all a.
If E[X.sub.ij.sup.4] < [infinity], then
[a.sup.1/2]([F.sub.a] - 1) [d.[right arrow]] N(0, 2) as a [right arrow] [infinity].
The reason we need higher moments in the unbalanced case is because the square terms (i.e., [X.sub.ij.sup.2]) do not cancel [see (A.6)]. The preceding results with fixed [n.sub.i]'s overlap with those of Boos and Brownie (1995) and Akritas and Arnold (2000), although the present assumptions are slightly weaker. The new results under homoscedasticity pertain to the case given in Section 4, where the group sizes also tend to infinity and the limiting distribution is under local alternatives.
2.2 Heteroscedastic Models
2.2.1 The Possible Statistics. Under heteroscedasticity it is possible to have both weighted and unweighted statistics. In this section we first introduce the two statistics and then present their asymptotic theory.
The unweighted statistic, whose version for the unbalanced case is new, is based on the observation that in the balanced case, E(MST) = E(MSE) under the null hypothesis, so that the statistic MST - MSE is still centered. In the unbalanced case this is not true, but centering can be achieved by replacing MSE with MSE* = (a - 1)[.sup.-1] [[summation].sub.i=1.sup.a](1 - [n.sub.i]/N)[S.sub.i.sup.2]. This leads to the statistic
[T.sub.a] = [a.sup.-1/2][a.summation over (i=1)][[n.sub.i]([bar.X.sub.i.] - [bar.X.sub...])[.sup.2] - (1 - [[n.sub.i]/N])[S.sub.i.sup.2]], (2.1)
which, for the balanced case only, is closely connected to [F.sub.a] via the relationship
[T.sub.a] = (1 - [1/a])([a.sup.1/2]([F.sub.a] - 1))([1/a][a.summation over (i=1)][S.sub.i.sup.2]) = (1 - [1/a])([a.sup.1/2](MST - MSE)). (2.2)
The Wald-type weighted statistic is
[T.sub.W] = [bar.X'.sub..]C'(CVC')[.sup.-1]C[bar.X.sub..], (2.3)
where...
|