Home | Business News | Browse by Publication | J | Journal of the American Statistical Association

Functional adaptive model estimation.

Publication: Journal of the American Statistical Association
Publication Date: 01-JUN-05
Format: Online
Delivery: Immediate Online Access
Full Article Title: Functional adaptive model estimation.(method of regression analysis)

Article Excerpt
1. INTRODUCTION

It is increasingly common to encounter regression problems where either the predictor, the response or both are functional in nature. Most of the previous work in this area involves a functional response. For instance, Moyeed and Diggle (1994) and Zeger and Diggle (1994) modeled the relationship between response, Y(t), and predictor, X(t), both measured over time, using the equation

Y(t) = [[alpha].sub.0] (t) + [[beta].sub.0.sup.T]X(t) + [epsilon](t), (1)

where [[alpha].sub.0](t) is a smooth function of t, [[beta].sub.0] is a fixed but unknown vector of regression coefficients, and [epsilon](t) is a mean-0 stationary Gaussian process. Hoover, Rice, Wu, and Yang (1998), Wu, Chiang, and Hoover (1998), and Lin and Ying (2001) used the varying-coefficient models proposed by Hastie and Tibshirani (1993) to extend (1) by allowing the regression coefficients to vary over time. Fahrmeir and Tutz (1994) and Liang and Zeger (1986) suggested an even more general framework where the response is modeled as a member of the exponential family of distributions.

We are interested in an alternative situation where the predictors are functional but the response is scalar. An example of such a situation is provided in Figure 1. These images come from an excavation in the north of England that exhumed the skeletons of 2,000 adults dating from between 1,000 and 1,500 C.E. (Shepstone, Rogers, Kirwan, and Silverman 1999). The plots show two-dimensional cross-sectional outlines of the intercondylar notch from the knee joint on the femur bone of three such individuals. For each joint in the sample, an indicator of osteoarthritis of the knee was recorded. For example, the first two joints here contained no evidence of osteoarthritis, whereas the third did. It has been conjectured that certain bone shapes may affect the biomechanics of the joint and lead to osteoarthritis. Hence we are interested in whether the shape of the bone can be used as a predictor of osteoarthritis, and, if so, what type of shape provides the strongest evidence.

This type of structure arises in numerous applications. Mueller and Stadtmuller (2005) provided illustrations in astronomy (Hall, Reimann, and Rice 2000), DNA expression arrays with repeated measures (Alter, Brown, and Botstein 2000) and engineering (Hall, Poskitt, and Presnell 2001). However, there has been limited methodological work in this area. Hastie and Mallows (1993), Ramsay and Silverman (1997, chap. 10), and Cardot, Ferraty, and Sarda (2003b) discussed performing linear regression where the response is a scalar and the predictors functional. Ferraty and Vieu (2002) developed a nonparametric regression procedure. James and Hastie (2001) and Ferraty and Vieu (2003) used functional linear discriminant analysis models to perform classification for categorical responses with functional predictors. Marx and Eilers (1999), James (2002), and Mueller and Stadtmuller (2005) suggested somewhat more general methods that provide extensions of generalized linear models (GLMs) (McCullagh and Nelder 1989) to functional predictors. In this article we introduce a procedure that facilitates the modeling of highly nonlinear response surfaces on general classes of response distributions using functional predictors. For standard p-dimensional predictors, nonlinearity can be achieved through the use of such procedures as generalized additive models (GAMs) (Hastie and Tibshirani 1990) or, if even more flexibility is required, through projection pursuit regression (PPR) (Friedman and Stuetzle 1981). Our approach, which we call functional adaptive model estimation (FAME), combines characteristics of PPR with generalized linear and additive models.

In Section 2 we present and motivate the FAME model for data with a single functional predictor as well as providing a fitting algorithm. We also develop asymptotic results under the restriction that the FAME model can be represented using a finite-dimensional basis. These results are used to provide confidence intervals and significance tests for model parameters. We provide several extensions in Section 3. We first illustrate a procedure for applying the FAME methodology where there is measurement error in the predictors and demonstrate this approach on a simulated dataset. We also provide extensions to multiple functional and finite-dimensional covariates and apply this method to the femur bone data. In Section 4 we present a simulation study that compares the performance of the FAME approach with other possible methods. Finally, in Section 5 we provide a discussion of the relationship of the FAME methodology to other finite-dimensional and functional approaches.

[FIGURE 1 OMITTED]

2. FUNCTIONAL ADAPTIVE MODEL ESTIMATION

To motivate our approach we first briefly review GLMs, GAMs, and PPR. GLMs provide a flexible framework for regressing response variables from the exponential family of distributions. One models the relationship between predictors X = ([X.sub.1], [X.sub.2],..., [X.sub.p]) and response Y using the link function g([mu]) = [[beta].sub.0] + [[summation].sub.j=1.sup.p] [X.sub.j][[beta].sub.j], where [mu] = E(Y|X). Although GLMs cover a wide class of response distributions, they still assume a linear relationship between the predictors and g([mu]). This linearity assumption is relaxed with GAMs using the link g([mu]) = [[beta].sub.0] + [[summation].sub.j=1.sup.p] [f.sub.j]([X.sub.j]), where [f.sub.j] is a smooth function estimated as part of the fitting procedure. GAMs allow for nonlinear but still additive relationships between the predictors and g([mu]). The additivity of GAMs has the advantage of allowing one to identify the effect of each predictor individually while holding all other predictors constant, but it significantly restricts the range of functions that can be fit.

PPR removes the additivity constraint by modeling a Gaussian response using

Y = [[beta].sub.0] + [r.summation over (k=1)][f.sub.k]([X.sup.T] [[beta].sub.k]) + [epsilon],

where both [f.sub.k] and [[beta].sub.k] are estimated in the fitting procedure and r is arbitrary. PPR has several advantages over both GLM and GAM. First, it allows one to model a larger class of functions. For example, GAM cannot model the simple interaction g([mu]) = [X.sub.1][X.sub.2] but PPR can. In fact, by setting r large enough, one can model any continuous function. Second, by studying the [[beta].sub.k]'s, one learns in which directions the variability of the predictors provide the most information about the response. However, because PPR does not utilize a link function, it has less flexibility in terms of response distributions that can be modeled. Roosen and Hastie (1993) and, more recently, Lingjaerde and Liestol (1998) removed this constraint by adding a link of the form

g([mu]) = [[beta].sub.0] + [r.summation over (k=1)][f.sub.k] ([X.sup.T] [[beta].sub.k]). (2)

This method is called generalized projection pursuit (GPP). The GLM and GAM link functions may both be considered special cases of (2).

2.1 The FAME Model

The aim of this article is to extend GPP to data with functional predictors using our FAME procedure. FAME can model non-Gaussian responses with the ease of GLM and GAM, it has the flexibility of PPR for fitting nonlinear response surfaces, and it can be applied to functional data. One possible approach to fitting GPP to such data would be to sample the functional predictor, X(t), over a fine grid of p time points to create a vector X, thus removing the functional aspect of the problem. But, this approach has several potential problems. First, it necessitates modeling a very high-dimensional vector of coefficients, which may lead to an extremely unstable fit. Second, in many applications, individuals may be measured at different sets of time points and/or have differing numbers of observations. For such data, it is not possible to create finite-dimensional predictors by simple discretization, and so (2) cannot be directly applied. A more successful approach is to replace the summation [X.sup.T] [[beta].sub.k] with its functional analog, the integral

[Z.sub.ik] = [integral] [X.sub.i](t)[[beta].sub.k](t)dt, (3)

where [[beta].sub.k](t) is a coefficient function giving the weighting placed on X(t) at each time. This method has a couple of advantages over the more ad hoc discretization approach. First, through the use of a smooth function to estimate [beta](t), it properly uses the inherent correlation between nearby time points, effectively reducing the high-dimensional nature of the data. Second, by using smoothing techniques, the integral can be calculated even on sparsely sampled predictors where the discretization approach would fail.

Combining (2) and (3) gives the FAME link.

g([[mu].sub.i]) = [[beta].sub.0] + [r.summation over (k=1)] [f.sub.k] ([Z.sub.ik]) = [[beta].sub.0] + [r.summation over (k=1)] [f.sub.k] ([integral] [X.sub.i](t)[[beta].sub.k](t)dt). (4)

Equation (4) extends standard PPR in two directions by introducing a link function to allow for non-Gaussian responses and replacing the summation [X.sup.T][[beta].sub.k]...



More articles from Journal of the American Statistical Association
Experimental and Quasi-Experimental Designs for Generalized Causal Inf..., June 01, 2005
Numerical Issues in Statistical Computing for the Social Scientist.(Bo..., June 01, 2005
Computational Methods in Statistics and Econometrics.(Book Review), June 01, 2005
The Statistical Evaluation of Medical Tests for Classification and Pre..., June 01, 2005
Quantitative Methods in Population Health: Extensions of Ordinary Regr..., June 01, 2005

Looking for additional articles?
Search our database of over 3 million articles.

Looking for more in-depth information on this industry?
Search our complete database of Industry & Market reports by text, subject, publication name or publication date.

About Goliath
Whether you're looking for sales prospects, competitive information, company analysis or best practices in managing your organization, Goliath can help you meet your business needs.

Our extensive business information databases empower business professionals with both the breadth and depth of credible, authoritative information they need to support their business goals. Whether it be strategic planning, sales prospecting, company research or defining management best practices - Goliath is your leading source for accurate information.