|
Article Excerpt 1. INTRODUCTION
Additive models constitute an important family of structured multivariate
nonparametric models. They model a random sample {([Y.sub.i], [X.sub.i])}[.sub.i=1.sup.n] by
[Y.sub.i] = [alpha] + [D.summation over (d=1)][m.sub.d]([X.sub.di]) + [[epsilon].sub.i], i = 1,..., n, (1)
where {[[epsilon].sub.i]} is a sequence of iid random variables with mean and finite variance [[sigma].sup.2]. The additive models, which were suggested by Friedman and Stuetzle (1981) and Hastie and Tibshirani (1990), have been widely used in multivariate nonparametric modeling. Because all of the unknown functions are one-dimensional, the difficulty associated with the so-called "curse of dimensionality" is substantially reduced (for details, see Stone 1985; Hastie and Tibshirani 1990). In fact, Fan, Hardle, and Mammen (1998) have shown that an additive component can be estimated as well as in the case where rest of the components are known. Similar oracle properties were obtained by Linton (1997) and Mammen, Linton, and Nielsen (1999). Several methods for estimating the additive functions have been proposed, including the marginal integration estimation methods of Tjostheim and Auestad (1994) and Linton and Nielsen (1995), the backfitting algorithms of Buja, Hastie, and Tibshirani (1989) and Opsomer and Ruppert (1998), the estimating equation methods of Mammen et al. (1999), the Fourier series approximation approach of Amato, Antoniadis, and De Feis (2002), the linear wavelet strategies of Amato and Antoniadis (2001), and the nonlinear wavelet estimation method of Sardy and Tseng (2004) using the block coordinate relaxation algorithm of Sardy, Bruce, and Tseng (2000), among others. Among these methods, the backfitting algorithm is considered a useful fitting tool and has received much attention for its easy of implementation. Hardle and Hall (1993) and Ansley and Kohn (1994) explored the convergence of the algorithm based on projection smoothers. Opsomer and Ruppert (1997) studied asymptotic properties of the backfitting estimators for a bivariate additive model based on a nonprojection smoother, local polynomial regression, and Wand (1999) and Opsomer (2000) extended the results to general D-dimensional additive models. Recently, Hastie and Tibshirani (2000) considered Bayesian backfitting, which is a stochastic generalization of the backfitting algorithm discussed earlier. A simulation study comparing the finite-sample properties of backfitting and marginal integration methods was conducted by Sperlich, Linton, and Hardle (1999).
After fitting the additive model via a backfitting algorithm, one often asks whether a specific additive component in (1) is significant or admits a certain parametric form, such as a polynomial function. This amounts to testing whether the additive component is or of a polynomial form. However, only limited tools are available for such kinds of frequently asked questions. Compared with the studies on estimation, the understanding of such testing problems is limited in the additive model. To our knowledge, the literature contains virtually no formal and theoretical work on testing under the present settings. Recently, Hardle, Sperlich, and Spokoiny (2001) used wavelets along with the adaptive Neyman type of idea (Fan and Gijbels 1996) to test additive components. Although this procedure is useful, it is tailored to their specific problem and is not easy to comprehend. In contrast, we develop an easily understandable and generally applicable approach to testing problems. The idea is based on comparisons of likelihood functions under null and alternative hypotheses. If the likelihood function for the best model fit under the alternative hypothesis is much larger than that under the null hypothesis, then the null hypothesis looks implausible and should be rejected. How do we determine the critical value? Does the null distribution of the likelihood ratio test depend on nuisance parameters? These questions are poorly understood, particularly for additive models. This motivates us to unveil a new phenomenon for additive models.
Fan, Zhang, and Zhang (2001) proposed generalized likelihood ratio (GLR) tests and showed that the Wilks type of results hold for a variety of useful models, including univariate nonparametric regression models and varying-coefficient models and their extensions. The procedure was motivated by the fact that the nonparametric maximum likelihood estimate (MLE) usually does not exist and even when it does exist, the resulting maximum likelihood ratio test is not optimal. The idea is to replace the MLE with a nonparametric estimate, which results in a more relaxed family of tests, called GLR tests, Fan et al. (2001) have shown that the resulting tests are optimal. Like the wide applicability of likelihood ratio tests for parametric models, the GLR tests should be useful in our setting. However, in general, because the distribution of [[epsilon].sub.i] is unknown, the likelihood function is unavailable. Two important questions that relate to the GLR tests arise naturally: first, it is unclear how to construct a GLR statistic for a variety of unknown error distributions of [[epsilon].sub.i]; second, it remains unknown whether a particularly constructed GLR test will follow the Wilks' type of results and share certain optimality. In this article we develop GLR tests and their bias-corrected versions for the additive model to address the foregoing questions. This not only will provide useful tools to address frequently asked questions in additive modeling, but also will enrich the GLR test theory. Our results, together with those of Fan et al. (2001), convincingly show the generality of the Wilks phenomenon, and the wide applicability of the GLR tests. This will encourage other researchers to apply GLR tests to related problems.
The technical derivations of GLR tests for the additive model (1) based on local polynomial fitting and a backfitting algorithm are very involved, due to the lack of simple expressions for the backfitting estimators. Furthermore, the GLR statistics involve nonparametric estimators in complicated nonlinear forms. Even though they are approximated by generalized quadratic forms, technical challenges include deriving quadratic approximations and the distributions of the quadratic functionals with a backfitting estimator. Because the additive model and local polynomial smoother are widely used in multivariate nonparametric modeling, determined efforts have been made in this article to examine the null distribution and powers of the GLR tests for the additive model. Such efforts enable us to answer some important questions, such as whether the Wilks type of results hold for additive models and whether the intuitively appealing GLR tests are powerful enough.
We prove that, under general assumptions on the error distribution of [[epsilon].sub.i], the proposed GLR tests follow the Wilks type of results and have the asymptotic optimality for nonparametric hypothesis testing. In addition, unlike the classical Wilks type of results and their generalization by Fan et al. (2001), the additivity of degrees of freedom does not hold. The additivity property holds in a more generalized sense (see Thm. 2). Furthermore, testing a hypothesis on one additive component has the same asymptotic null distribution as the case where the rest of the components are known (Remark 1). These types of adaptive results are in line with the oracle property given by Fan et al. (1998) and Mammen et al. (1999). Our theoretical results from the proposed GLR tests shed some light on the validation of the Wilks phenomenon and even future research directions on nonparametric inferences.
This article proceeds as follows. In Section 2 we describe the backfitting estimators based on a local polynomial smoother. In Section 3 we develop the theoretical framework for the GLR tests. We introduce the bias-corrected GLR tests and a conditional bootstrap method for approximating the null distributions of the GLR statistics in Section 4. In Section 5 we demonstrate the performance of GLR tests on simulated data, and in Section 6 we provide an example of testing on a real dataset. We defer technical proofs to Appendix B.
2. BACKFITTING ESTIMATORS
To ensure identifiability of the additive component functions [m.sub.d]([x.sub.d]), we impose the constraint E[[m.sub.d]([X.sub.di])] = for all d. Fitting the additive component [m.sub.d]([x.sub.d]) in (1) requires choosing bandwidths {[h.sub.d]}. The optimal choice of [h.sub.d] can be obtained as was done by Opsomer and Ruppert (1998) and Opsomer (2000). Here we follow notation introduced by Opsomer (2000). Put [K.sub.h.sub.d](x) = [h.sub.d.sup.-1]K(x/[h.sub.d]), [K.sub.s](v) = [v.sup.s-1]K(v), [H.sub.d] = diag(1, [h.sub.d],..., [h.sub.d.sup.[p.sub.d]]), [m.sub.d] = {[m.sub.d]([X.sub.d1]),..., [m.sub.d]([X.sub.dn])}[.sup.T], and Y = ([Y.sub.1],..., [Y.sub.n])[.sup.T]. The smoothing matrices for local polynomial regression are
[S.sub.d] = ([s.sub.d,[X.sub.d1]],..., [s.sub.d,[X.sub.dn]])[.sup.T],
where [s.sub.d,[x.sub.d].sup.T] represents the equivalent kernel (Fan and Gijbels 1996) for the dth covariate at the point [x.sub.d],
[s.sub.d,[x.sub.d].sup.T] = [e.sub.1.sup.T]([X.sub.[x.sub.d].sup.dT][K.sub.x.sub.d][X.sub.[x.sub.d].sup.d])[.sup.-1][X.sub.[x.sub.d].sup.dT][K.sub.x.sub.d], (2)
with [e.sub.i] as a vector with a 1 in the ith position and 0's elsewhere, the matrix [K.sub.x.sub.d] = diag{[K.sub.h.sub.d]([X.sub.d1] - [x.sub.d]),..., [K.sub.h.sub.d]([X.sub.dn] - [x.sub.d])} for a kernel function K(x) and bandwidths [h.sub.d],
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],
and [p.sub.d] is the degree of the local polynomial for fitting [m.sub.d](x). The intercept [alpha] = E([Y.sub.i]) is typically estimated by [^.[alpha]] = [[summation].sub.i=1.sup.n][Y.sub.i]/n. The [m.sub.d]'s can be estimated through the solutions to the set of following normal equations (see Buja et al. 1989; Opsomer and Ruppert 1998):
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],
where [S*.sub.d] = ([I.sub.n] - [11.sup.T]/n)[S.sub.d] is the centered smoother matrix. In practice, the backfitting algorithm (Buja et al. 1989) is usually used to solve these equations, and the backfitting estimators converge to the solution
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], (3)
provided that the inverse of M exists.
Following Opsomer (2000), we define the additive smoother matrix as
[W.sub.d] = [E.sub.d][M.sup.-1]C, (4)
where [E.sub.d] is a partioned matrix of dimension n X nD with an n X n identity matrix as the dth "block" and 0's elsewhere, so that the backfitting estimator for [m.sub.d] is [^.m.sub.d] = [W.sub.d]Y. Let [W.sub.M.sup.[-d]] be the additive smoother matrix for the data generated by the (D - 1)-variate regression model, [Y'.sub.i] = [[summation].sub.k=1,[not equal to]d.sup.D][m.sub.k]([X.sub.ki]) + [[epsilon].sub.i]. Denote m = [[summation].sub.d=1.sup.D][m.sub.d] and [W.sub.M] = [[summation].sub.d=1.sup.D][W.sub.d]. The backfitting estimator of m is then [^.m] = [W.sub.M]Y.
If ||[S*.sub.d][W.sub.M.sup.[-d]]|| < 1 for some d [member of] (1,..., D) and a matrix norm ||*||, then by lemma 2.1 of Opsomer (2000), the backfitting estimators exist and are unique and
[W.sub.d] = [I.sub.n] - ([I.sub.n] - [S*.sub.d][W.sub.M.sup.[-d]])[.sup.-1]([I.sub.n] - [S*.sub.d]) = ([I.sub.n] - [S*.sub.d][W.sub.M.sup.[-d]])[.sup.-1][S*.sub.d]([I.sub.n] - [W.sub.M.sup.[-d]]). (5)
For a finite n in practice, the foregoing existence and uniqueness condition can be numerically verified. To ensure the existence of the backfitting estimators when n is sufficiently large, here we consider only the design points, denoted by X, such that
[lim sup.[n]]||[S*.sub.d][W.sub.M.sup.[-d]]|| < 1 (6)
for a matrix norm ||*||. In practice, the smoothing operators [S.sub.1],..., [S.sub.d] are conducted over compact sets of design densities. Hence we need to deal only with the case where the design densities have bounded support. In the case of D = 2, a sufficient condition for (6) is
[sup.[x.sub.1],[x.sub.2]]|[[[f.sub.12]([x.sub.1], [x.sub.2])]/[[f.sub.1]([x.sub.1])[f.sub.2]([x.sub.2])]] - 1| < 1,
where [f.sub.d]([x.sub.d]) is the density of [X.sub.d] and [f.sub.12]([x.sub.1], [x.sub.2]) is the joint density of [X.sub.1] and [X.sub.2]. This is exactly the restriction (4) of Opsomer and Ruppert (1997). Then, by Lemma B.2 in Appendix B and direct matrix multiplication.
[lim sup.[n]]||[S*.sub.1][S*.sub.2]||[.sub.r] < 1,
where ||A||[.sub.r] = [max.sub.l[less than or equal to]i[less than or equal to]n][[summation].sub.j=1.sup.n]|[a.sub.ij]| denotes the norm of the maximum row sum. However, for D > 2, the condition in (6) is not easily replaced with other conditions. In fact, for the backfitting algorithm using any smoothing technique, condition (6) must be satisfied to ensure the existence of the backfitting estimators. Hence we restrict the design points in X.
3. GENERALIZED LIKELIHOOD RATIO TESTS
3.1 The Generalized Likelihood Ratio Test
In this section we define the GLR statistics and develop their asymptotic theory under model (1), which is based on the local polynomial smoother and the backfitting algorithm. The Wilks phenomenon and optimality are unveiled in this general setting.
For simplicity, we first consider the hypothesis testing problem
[H.sub.0]:[m.sub.D]([x.sub.D]) = vs. [H.sub.1]:[m.sub.D]([x.sub.D]) [not equal to] 0. (7)
This tests whether the Dth variable has any significant contribution to the dependent variable. The testing problem is a nonparametric null hypothesis versus a...
|