|
Article Excerpt 1. INTRODUCTION
In survival analysis, the Cox multiplicative hazards model (Cox 1972) has been used extensively. In this model, the hazard rate function of the survival time given an external (possibly time-dependent) covariate vector Z(t) is assumed to be
[lambda](t|Z(t)) = [lambda](t)exp{[[beta].sup.T]Z(t)},
where [lambda](t) is an unknown and unspecified baseline hazard function and [beta] is the regression coefficient for Z(t). An efficient estimate for [beta] can be obtained by maximizing a partial likelihood function (Cox 1975; Andersen and Gill 1982). Because the proportionality in the multiplicative hazards model does not hold in many applications, one alternative form of modeling the hazard rate function is to assume that the hazard risks are additive across covariates, that is,
[lambda](t|Z(t)) = [mu](t) + [[beta].sup.T]Z(t),
where [mu](t) is an unknown baseline hazard function. The additive hazards model has been studied by Lin and Ying (1994). Furthermore, to accommodate both the multiplicative and additive hazards structures, Lin and Ying (1995) proposed a multiplicative-additive hazards model where the hazard function takes the form
[lambda](t|Z(t), [Z.sub.2](t)) = [lambda](t)exp{[[beta].sub.1.sup.T][Z.sub.1](t)} + [[beta].sub.2.sup.T][Z.sub.2](t),
where [Z.sub.1](t) and [Z.sub.2](t) are different covariates of Z(t). But all of these hazard-based regression models are restrictive in practice, because they may not be flexible enough to entertain situations where hazard risks are neither multiplicative nor additive among groups. Therefore, it is desirable to obtain a class of hazard-based models that allows a wide range of hazard structures while at the same time retaining the simple structures of the multiplicative and additive hazards models.
In this article we propose a unified family of hazard-based regression models. We propose a class of transformed hazards models by imposing both an additive structure and a known transformation G(*) on the hazard function. In this class, the hazard function for the survival times given covariate Z(t) takes the form
G{[lambda](t|Z(t))} = [mu](t) + [[beta].sup.T]Z(t), (1)
where [beta] is the unknown regression coefficient vector, [mu](t) is an unknown baseline hazard function, and G(*) is a known and increasing transformation function. Essentially, model (1) can be considered a partial linear regression model for the transformed hazard function. One example of the transformation G(*) is the Box-Cox transformation (Box and Cox 1964), in which G(x) is given by
G(x) = ([x.sup.s] - 1)/s (2)
for s > and we define G(x) = log(x) if s = 0. Within the Box-Cox transformation family, when s = 1 in (2), (1) is the additive hazards model, and if s = 0, then (1) becomes the multiplicative hazards model. Thus the transformed model in (1) with G(*) given by (2) can be considered a smoothed class of hazards models linking the additive and multiplicative hazards models, which are the extremes of this class if s is restricted to the range of [0, 1]. Because our proposed class (1) allows a much broader class of hazard patterns than are allowed in the proportional hazards and additive hazards models, it provides us with more flexible models for analyzing survival data.
Our goal in this article is to provide a unified framework for deriving an efficient estimate for [beta] in model (1) for any given transformation G, where [G.sup.-1] is continuously three times differentiable. In particular, we use the sieve maximum likelihood estimation approach to construct an estimate of [beta]. We then examine the asymptotic properties of the resulting estimator.
The rest of this article is organized as follows. In Section 2 we present a general framework of sieve maximum likelihood estimation. In Section 3 we derive the asymptotic properties of the estimator, including consistency and asymptotic normality. In Section 4 we report on simulation studies that we conducted to examine the numerical properties of the proposed method in small samples. In Section 5 we analyze a lung cancer dataset using the proposed class of models and estimation procedure. We present a brief discussion in Section 6, and provide proofs of all theorems in the Appendix.
2. INFERENCE PROCEDURE
Suppose that we observe survival data with n iid observations in a study with termination time [tau]. We denote the observation for subject i by ([Y.sub.i] = [T.sub.i] [and] [C.sub.i], [[DELTA].sub.i] = I([T.sub.i] [less than or equal to] [C.sub.i]), {[Z.sub.i](t) : t [member of] [0, [tau]]}), where [T.sub.i] is the failure time of subject i, [C.sub.i] is the censoring time, {[Z.sub.i](t) : t [member of] [0, [tau]]} denotes the external covariate process, "[and]" denotes the minimum of two values, and I(*) is the indicator function.
We assume that [C.sub.i] is independent of [T.sub.i] conditional on the covariates. Under the assumption that the transformation G(*) in the model (1) is strictly increasing and differentiable, the observed likelihood function of the parameters ([beta], [mu]) can be written as
[L.sub.n]([beta], [mu]) = [n.[product].[i=1]]{H([mu]([Y.sub.i]) + [[beta].sup.T]Z([Y.sub.i]))}[.sup.[[DELTA].sub.i]] X exp{- [[integral].sub.0.sup.[Y.sub.i]]H([mu](t) + [[beta].sup.T]Z(t))dt}, (3)
where H(*) is the inverse function of G(*).
To obtain estimates for [beta] and [mu](t), we wish to maximize [L.sub.n]([beta], [mu]) in (3). But such a maximum does not exist, because one can always find some function [mu](t) such that [L.sub.n]([beta], [mu]) = [infinity]. Therefore, we must restrict [mu](t) to some smaller functional space to ensure that the maximum of [L.sub.n]([beta], [mu]) exists. One important method of doing this is sieve maximum likelihood estimation, which has been used in many semiparametric estimation problems (Shen and Wong 1994; Shen 1997, 1998). In the sieve estimation method, the infinite-dimensional functional parameter [mu](t) is restricted to a functional space with finite dimension, which is called the sieve space for [mu](t). Moreover, the size of this sieve space increases with increasing sample size n, and as n [right arrow] [infinity], the sieve space approximates the whole space for [mu](t). However, for fixed sample size n, the choice of the sieve space for [mu](t) cannot be arbitrary; the space should be chosen large enough so that the bias of the sieve estimate for [mu](t) does not dominate. On the other hand, the space cannot be chosen too large so that the variation in estimating [mu](t) dominates the variation in estimating [beta], which is the main parameter of interest. Once a sieve space is chosen, maximizing the likelihood function can be carried out on this space, which contains only a finite number of parameters.
Usually, the sieve space for [mu](t) is constructed from a linear space with a finite number of basis functions. Many basis functions can be used for this purpose. The most commonly used basis functions include B-splines and wavelet basis functions. In this article, we use wavelet basis functions to construct a sieve space for [mu](t) for both mathematical and computational convenience, as is demonstrated in the subsequent arguments. A sequence of wavelet basis functions can be obtained from a single function [phi](t), which is called the "father" wavelet and satisfies the...
|