|
Article Excerpt 1. INTRODUCTION
Semiparametric models arise frequently in applications and have two indices, a parametric index [theta] and a nonparametric index [eta]. Often, inference about [theta] is the primary interest and [eta] is a nuisance parameter. An example is the Cox (1972) regression model in survival analysis, where the components in [theta] are the hazard ratios for the covariates and [eta] is the baseline hazard function. Other applications include mixture models (Kiefer and Wolfowitz 1956), where [theta] is the parameter defining the mixed distributions and [eta] is an unspecified mixture distribution, and partially linear models (Green and Yandell 1985; Green 1987; Mammen and van de Geer 1997), where [theta] comprises the linear terms and [eta] contains the nonparametric effects.
In special cases--for example, the celebrated partial likelihood for the Cox model with right-censored data--the profile likelihood for [theta] does not involve [eta]. Unfortunately, the form of the profile likelihood is typically quite complicated, and [eta] is not easily eliminated. Inferences about [theta] have been studied in specific examples (Nielsen, Gill, Andersen, and Sorensen 1992; Huang 1996; Roeder, Carroll, and Lindsay 1996), and Murphy and van der Vaart (2000) have provided a general justification for such practices. Under mild structural conditions, the profile likelihood for [theta] has an asymptotic quadratic expansion resembling that of a parametric likelihood. Furthermore, the maximum profile likelihood estimator for [theta], [^.[theta].sub.n], is asymptotically normal with mean [[theta].sub.0], the true value of [theta], and covariance matrix [n.sup.-1] times the inverse of the efficient Fisher information matrix [~.I.sub.0], which is corrected for the presence of the infinite-dimensional nuisance parameter (Bickel, Klaassen, Ritov, and Wellner 1996; van der Vaart 1998).
Inferences about [theta] may be obtained without [^.[theta].sub.n]. The quadratic expansion of the profile likelihood permits the construction of confidence ellipsoids by inverting the log-likelihood ratio. Translating this elegant theory into practice has been limited by computational difficulties. Even if the log profile likelihood ratio can be successfully inverted for a multivariate parameter, this inversion does not enable the construction of confidence intervals for each parameter subcomponent separately, as is standard practice in data analysis. For such confidence intervals, it would be necessary to further profile over all remaining components in [theta]. A related problem for which inverting the log-likelihood is not adequate is the construction of rectangular confidence regions for [theta], such as minimum volume confidence rectangles (Di Bucchianico, Einmahl, and Mushkudiani 2001) or rescaled marginal confidence intervals. For many practitioners, rectangular regions are preferable to ellipsoids for ease of interpretation. Procedures for computing these regions have not yet been developed for our set up.
In principle, having an estimator of [theta] and its variance simplifies these inferences considerably. However, computation of these quantities using the semiparametric likelihood poses stiff challenges relative to those encountered with parametric models. Finding the maximizer of the profile likelihood is done implicitly and typically involves numerical approximations. When the nuisance parameter is not [square root of n] estimable, nonparametric functional estimation of [eta] for fixed [theta] may be required, which depends heavily on the proper choice of smoothing parameters. Even when [eta] is estimable at the parametric rate and without smoothing, [~.I.sub.0] does not ordinarily have a closed form. When it does have a closed form, it may include linear operators that are difficult to estimate well, and inverting the estimated linear operators may not be straightforward. The validity of these variance estimators must be established on a case-by-case basis.
The bootstrap is a possible solution to these problems, but theoretical justification is not available for semiparametric models where the nuisance parameter is not [square root of n]-consistent. The results of van der Vaart and Wellner (1996) apply only to estimators converging at the parametric rate. Even if this theory were adapted, the computational burden would be substantial, because maximization over both [theta] and [eta] would be needed for each bootstrap sample. A different approach to variance estimation may be based on corollary 3 of Murphy and van der Vaart (2000), which demonstrates that the curvature of the profile likelihood near [^.[theta].sub.n] is asymptotically equal to [~.I.sub.0]. In practice, one can perform second-order numerical differentiation by evaluating the profile likelihood on a hyperrectangular grid of [3.sup.p] equidistant points centered at [^.[theta].sub.n], taking the appropriate differences, and then dividing by 4[h.sup.2], where p is the dimension of [theta] and h is the spacing between grid points. Although the properties of h for the asymptotic validity of this approach are well known, there are no clear-cut rules on choosing the grid spacing in a given dataset. Thus it would seem difficult to automate this technique for practical use.
To our knowledge, there does not exist a general theoretically justified and automatic method for approximating [~.I.sub.0]. In this article, we propose an application of Markov chain Monte Carlo (MCMC) to the semiparametric profile likelihood. This method, given in Section 2, automatically approximates the maximizer of the profile likelihood and estimates its curvature without computing derivatives or requiring the use of a grid. Repeated maximization over [theta] is not needed, unlike with the bootstrap. A byproduct of the algorithm is an estimate of [~.I.sub.0] without functional estimation of the profile information matrix. The output from the Markov chain can be used directly to construct various confidence sets, including minimum volume confidence rectangles. The procedure's validity rests on a careful analysis of the stationary distribution of the chain, which involves an extension of the theory of Murphy and van der Vaart (2000).
The essence of our argument is that the "posterior" distribution of the profile likelihood with respect to a prior on [theta] is asymptotically equivalent to the distribution of [^.[theta].sub.n]. Note that inferences about [theta] might also be based on the marginal posterior of [theta] from the full likelihood with respect to a joint prior on ([theta], [eta]). Shen (2002) has shown that this approach yields valid inferences for [^.[theta].sub.n] when [theta] is estimable at the parametric rate. The profile likelihood sampler greatly simplifies the theory and computations, because a prior is not explicitly specified for [eta]. At the least, our approach is a useful alternative to fully Bayesian computations when [eta] is strictly a nuisance parameter. It may also enable exact Bayesian inference that complements asymptotic frequentist inference, if one accepts the use of the profile likelihood for Bayesian analysis.
In Section 3 we briefly compare the asymptotic properties of our proposed procedure with numerical differentiation. In Sections 4-6 we show that the regularity conditions for the theoretical developments of Section 2 are satisfied by three practical examples. For two of these examples, profile likelihood computation is well established, but [eta] is not estimable at the parametric rate. In Section 4 we examine the Cox model with current status data, where the partial likelihood is not available for inference and the validity of Huang's (1996) direct nonparametric estimator of [~.I.sub.0] has not been established theoretically. The new methods perform well in both simulation and data analysis. In the simulation, we also compare the proposed approach with numerical differentiation. In Section 5 we study logistic regression with measurement error. This practical example was studied by Carroll, Gail, and Lubin (1993), Roeder et al. (1996), and Murphy and van der Vaart (2001), none of whom developed an automatic estimation procedure for [~.I.sub.0]. In Section 6 we present the odds-rate regression model for right-censored data. In addition to establishing the theory for this situation, we present a data analysis using rectangular confidence regions. We give concluding remarks in Section 7.
2. USING MARKOV CHAIN MONTE CARLO
Given a full likelihood [1.sub.n]([theta], [eta]) based on a random sample [X.sub.1],..., [X.sub.n], the profile likelihood for [theta] is defined as the function p[l.sub.n]([theta]) = [1.sub.n]([theta], [^.[eta].sub.[theta]]), where [^.[eta].sub.[theta]] = arg [max.sub.[eta]][1.sub.n]([theta], [eta]). Although it is rarely possible to compute p[l.sub.n]([theta]) explicitly, its numerical evaluation is often feasible. The maximizer of p[l.sub.n]([theta]) is the maximum likelihood estimator (MLE) for [theta] given by the first component of ([^.[theta].sub.n], [^.[eta].sub.n]) that maximizes [1.sub.n]([theta], [eta]). In the sequel, we require [~.I.sub.0] to be positive definite and
[square root of n]([^.[theta].sub.n] - [[theta].sub.0]) = [1/[square root of n]][n.summation over (i=1)][~.I.sub.0.sup.-1][~.l.sub.0]([X.sub.i]) + [o.sub.P.sub.0](1), (1)
where [~.l.sub.0] is the efficient score function and [P.sub.0] is the distribution under the true parameter ([[theta].sub.0], [[eta].sub.0]).
Under a set of structural conditions, Murphy and van der Vaart (2000) proved that for any random sequence {[~.[theta].sub.n]} that converges in probability to [[theta].sub.0],
log p[l.sub.n]([~.[theta].sub.n]) = log p[l.sub.n]([^.[theta].sub.n]) - [1/2]n([~.[theta].sub.n] - [^.[theta].sub.n])[.sup.T][~.I.sub.0]([~.[theta].sub.n] - [^.[theta].sub.n]) + [o.sub.P.sub.0]([square root of n]||[~.[theta].sub.n] - [[theta].sub.0]|| + 1)[.sup.2]. (2)
Here and throughout the article, ||*|| denotes the Euclidean norm. The quadratic expansion (2) justifies using the profile likelihood in place of the full likelihood for inference about [theta]. In principle, the curvature of the profile likelihood can be used to estimate the covariance matrix of [^.[theta].sub.n] (see corollary 3 of Murphy and van der Vaart 2000). We now show...
|