Home | Business News | Browse by Publication | J | Journal of the American Statistical Association

Robust analysis of generalized linear mixed models.

Publication: Journal of the American Statistical Association
Publication Date: 01-JUN-04
Format: Online - approximately 8469 words
Delivery: Immediate Online Access

Article Excerpt
1. INTRODUCTION

Generalized linear mixed models (GLMM's) are widely used in the analysis of clustered data, including longitudinal data or repeated measurements (see, e.g., Breslow and Clayton 1993). GLMM's are useful for accommodating the overdispersion often observed among nonnormally distributed responses and for modeling the dependence among responses inherent in longitudinal or repeated measures data by incorporating random effects (Stiratelli, Laird, and Ware 1984; Zeger, Liang, and Albert 1988). It is usually assumed that the random effects have a multivariate normal distribution whose variance components are to be estimated from the data. A full maximum likelihood (ML) analysis based on the joint marginal likelihood of the responses can be used for estimating both fixed- and random-effects parameters in GLMM's, which requires numerical integration techniques for calculating the log-likelihood, score equations, and information matrix. However, its use in GLMM's is limited to relatively simple models, and it is found intractable for more complicated problems involving irreducibly high-dimensional integrals.

To avoid such computational problems, a number of Bayesian approaches have been suggested that generate repeated samples from the posterior distributions of the random effects using Gibbs sampling techniques (Besag, York, and Mollie 1991; Zeger and Karim 1991). The approach of generalized estimating equations (GEE's) (Diggle, Liang, and Zeger 1994) is useful for the analysis of longitudinal data, but it suffers from a lack of efficiency (Crowder 1995; Fitzmaurice 1995; Sutradhar and Das 1999). In a recent work, Vonesh, Wang, Nie, and Majumdar (2002) proposed conditionally second-order generalized estimating equations (CGEE2's) for estimating the parameters in GLMM's. Under appropriate regularity conditions, the CGEE2 estimator is shown to be consistent and asymptotically efficient. McCulloch (1994) investigated a Monte Carlo EM (MCEM) approach for analyzing models with complicated fixed- and random-effects structure but is limited to binary data with probit link. McCulloch (1997) developed a Monte Carlo Newton-Raphson (MCNR) algorithm for approximating the ML estimates in GLMM's. The MCNR estimates were compared to the exact ML likelihood estimates for simple models, and it was found that MCNR inherits the properties of the exact ML estimates. Recently, Sutradhar and Sinha (2002) developed a pseudo-likelihood approach for estimating the variance components of a binary mixed model. This approach is based on an assumption that the variance components of the random effects are small in magnitude.

These classical analyses of GLMM's can be very sensitive to outliers or departures from the underlying distributions. The deviations from underlying distributions or assumptions refer to the fact that a small proportion of the data may come from an arbitrary distribution rather than the "true" distribution, which can result in outliers or influential observations in the data. The nonrobustness properties of classical ML estimators in the framework of generalized linear models (GLM's) have been studied by a number of authors (see, e.g., Pregibon 1982; Stefanski, Carroll, and Ruppert 1986; Kunsch, Stefanski, and Carroll 1989; Morgenthaler 1992; Carroll and Pederson 1993). In a recent work. Cantoni and Ronchetti (2001) studied robust estimators for GLM's based on the notion of quasi-likelihood. However, robust analysis of GLMM's has received less attention, possibly due to the increased technical problems imposed by a dependance structure in the data. Preisser and Qaqish (1999) considered robust analysis of clustered data in the framework of generalized estimating equations. They proposed a resistant GEE (REGEE) estimator for robust estimates of the model parameters. Similar to GEE estimators, however, REGEE estimators may also suffer from lack of efficiency.

Here I consider robust analysis of GLMM's in the framework of maximum likelihood estimation. The proposed bounded influence (Hampel, Ronchetti, Rousseeuw, and Stahel 1986) robust estimation, referred to as the RML estimation, of the regression parameters and variance components in GLMM's involves the specification of the posterior distribution of the random effects, which cannot be evaluated in closed form. However, it is possible to approximate this posterior distribution by producing random draws from the distribution using a Metropolis algorithm (Tanner 1993), which does not require the specification of the posterior distribution. Here I develop a robust Monte Carlo Newton-Raphson (RMCNR) method of estimation, which can be considered as a modification of the Monte Carlo Newton-Raphson (MCNR) method of McCulloch (1997) and reduces to MCNR when there are no influential points in the data such that all observations receive equal weights.

The asymptotic properties of the RML estimator have been investigated in some detail. Under appropriate regularity conditions, which include uniform integrability and mixing conditions of a sequence, the RML estimator is shown to be consistent and is asymptotically normally distributed with certain mean vector and covariance matrix. Simulations were carried out to explore the small-sample behavior of the robust estimates in the presence of outliers. The results from the simulation study indicate that, unlike the classical method, the proposed robust method is useful in downweighting the influential points in the data when estimating the parameters in GLMM's.

The article is organized as follows. Section 2 defines the GLMM and introduces the proposed RML method for fitting GLMM's. To obtain a Monte Carlo version of the RML estimate, similarly to McCulloch (1997), a Metropolis algorithm is presented that calculates the RMCNR estimates by approximating the posterior distribution of the random effects. Section 3 addresses the asymptotic properties of the RML estimator. Section 4 presents some computational details for fitting two simple binary and Poisson mixed models using the stochastic RMCNR as well as the deterministic RML methods. Small simulations were carried out for investigating the behavior of the robust estimates. The simulation results are presented in Section 4 as well. In Section 5, a real dataset obtained from a clinical experiment described in a biometrical journal is analyzed, where a binary mixed model is fitted to the data using the robust method. Section 6 concludes the paper with some discussion.

2. THE MODEL AND NOTATION

Suppose, conditional on the random effects u, the elements of the observed data vector y = ([y.sub.1],...,[y.sub.n])[.sup.T] are independently distributed and follow a distribution in the exponential family:

[f.sub.y.sub.i|u]([y.sub.i]|u, [beta], [phi]) = exp{([y.sub.i][[theta].sub.i] - b([[theta].sub.i]))/a([phi]) + c([y.sub.i], [phi])} (1)

for some functions a, b, and c. Here the canonical parameter [[theta].sub.i] = [x.sub.i.sup.T] [beta] + [z.sub.i.sup.T] u, with [x.sub.i.sup.T] being the ith row of the design matrix X for the fixed effects and with [z.sub.i.sup.T] being the ith row of the design matrix Z for the random effects. We also assume that the vector of random effects u follows a distribution:

u [approximately] [f.sub.u] (u|[SIGMA]) (2)

depending on parameters [SIGMA]. For (1) and (2), the classical likelihood function can be defined as

L([beta], [phi], [SIGMA]|y) = [integral] [n.[PI] (i=1)] [f.sub.[y.sub.i]|u]([y.sub.i]|u, [beta], [phi]) [f.sub.u](u|[SIMGA]) du. (3)

For the ML estimates of the parameters [beta], [phi], and [SIMGA], one can maximize this likelihood function by using suitable numerical techniques. A number of numerical methods are available in the literature (see, e.g., McCulloch 1997). However, it is well known that the classical likelihood estimators can be very sensitive to outliers or other departures from the underlying assumptions. Here I consider developing robust techniques for fitting GLMM's, which can downweight any unusual data points when estimating the model parameters.

For simplicity, I consider [phi] = 1. Note that when the marginal distribution of y can be defined as a mixture as in (3), the classical ML estimating equations for [beta] and [summation] take the form

E[[[[partial derivative]ln[f.sub.y|u](y|U, [beta])]/[[partial derivative][beta]]]|y] = (4)

and

E[[[[partial derivative]ln[f.sub.u](U|[SIGMA])]/[[partial derivative][SIGMA]]]|y] = 0. (5)

The ML estimates of [beta] and [SIGMA] can be obtained by solving the preceding equations numerically. McCulloch (1997) developed a Monte Carlo Newton-Raphson (MCNR) algorithm for solving these estimating equations, and obtained approximate ML estimates of the...

Read the FULL article now - Try Goliath Business News - FREE!   
You can view this article PLUS...

  • Over 5 million business articles
  • Hundreds of the most trusted magazines, newswires, and journals (see list)
  • Premium business information that is timely and relevant
  • Unlimited Access

Now for a Limited Time, try Goliath Business News - Free for 3 Days!
Tell Me More   Terms and Conditions

Get Goliath Business News for 1 year - Just $99 (Save 65%)
Tell Me More   Terms and Conditions

Already a subscriber? Log in to view full article



More articles from Journal of the American Statistical Association
Modified large-sample confidence intervals for linear combinations of ..., June 01, 2004
A semiparametric basis for combining estimation problems under quadrat..., June 01, 2004
A conditionally distribution-free multivariate sign test for one-sided..., June 01, 2004
Monte Carlo state-space likelihoods by weighted posterior kernel densi..., June 01, 2004
Parameterization and Bayesian modeling., June 01, 2004

Looking for additional articles?
Search our database of over 3 million articles.

Looking for more in-depth information on this industry?
Search our complete database of Industry & Market reports by text, subject, publication name or publication date.

About Goliath
Whether you're looking for sales prospects, competitive information, company analysis or best practices in managing your organization, Goliath can help you meet your business needs.

Our extensive business information databases empower business professionals with both the breadth and depth of credible, authoritative information they need to support their business goals. Whether it be strategic planning, sales prospecting, company research or defining management best practices - Goliath is your leading source for accurate information.