Home | Business News | Browse by Publication | J | Journal of the American Statistical Association

Monte Carlo state-space likelihoods by weighted posterior kernel density estimation.

Publication: Journal of the American Statistical Association
Publication Date: 01-JUN-04
Format: Online - approximately 11062 words
Delivery: Immediate Online Access

Article Excerpt
1. INTRODUCTION

A difficulty for frequentist likelihood-based analysis with mechanistic models is that the models often include unknown states or process errors, so that likelihood calculations require high-dimensional integrations. Approaches to maximum likelihood estimation using Monte Carlo (MC) integration in this setting include MC expectation maximization (MCEM) (Wei and Tanner 1990; Chan and Ledolter 1995; McCulloch 1997; Robert and Casella 1999; Booth and Hobert 1999; Hurzeler and Kunsch 2001; Levine and Casella 2001); MC likelihood ratio (MCLR) estimation via importance sampling (Geyer and Thompson 1992; Geyer 1994, 1996; McCulloch 1997); basic MC integration; more advanced MC integration, such as importance sampling (Durbin and Koopman 1997, 2000); and particle filtering (PF) (Gordon, Salmond, and Smith 1993; Kitagawa 1996, 1998; Pitt and Shephard 1999; Doucet, de Freitas, and Gordon 2001b; Hurzeler and Kunsch 2001). However, for my motivating model, a biologically stage-structured population dynamics model for a replicated experimental setting, none of these methods is very efficient.

I present a faster approach based on estimating likelihoods from weighted kernel density estimates of a Bayesian posterior, which is motivated by importance sampling the kernel smoothing integral. I consider convergence and accuracy of the method, including MC error and mode bias due to kernel smoothing. I give methods to gain accuracy by "zooming in" on the likelihood maximum using focused priors, which is equivalent to importance sampling near the maximum, and to estimate the mode bias due to kernel smoothing using posterior cumulants. I give a second example showing that even in 10-20 dimensions, these methods can provide high accuracy for chi-squared hypothesis test cutoffs. This approach may be widely and easily applicable, because it can take advantage of the boom in computational methods for Bayesian posterior sampling.

My investigation is motivated by the need to fit demographic models to time series from ecological population dynamics experiments. In population ecology there is a wide gap between plausible mechanistic models and models commonly used to analyze data. Population dynamics experiments typically produce replicated time series structured by species, life stages within species, and/or location. These common types of experiments (reviewed by Hairston 1989, Underwood 1997; Resetarits and Bernardo 1998) follow a long tradition that includes classic work by Huffaker (1958) on predatory and herbivorous spider mites and by Gause (1934) and Luckinbill (1973) on predator and prey protozoans. Population dynamics experiments are used to study direct and indirect species interactions (e.g., Wootton 1994; Rosenheim, Kaya, Ehler, Marois, and Jaffee 1995; Karban, English-Loeb, and Hougen-Eitzman 1997; Murdoch, Nisbet, McCauley, de Roos, and Gurney 1998; Ellner et al. 2001; Rosenheim 2001; Snyder and Ives 2001), population cycles (e.g., Nicholson and Bailey 1935; Gurney, Blythe, and Nisbet 1980; McCauley, Nisbet, Murdoch, de Roos, and Gurney 1999), species' roles in ecosystem functioning (e.g., Naeem, Thompson, Lawler, Lawton, and Woodfin 1994; Downing and Leibold 2002), trophic cascades (e.g., Carpenter and Kitchell 1993; Ives, Carpenter, and Dennis 1999; Strong, Whipple, Child, and Dennis 1999; Pace, Cole, Carpenter, and Kitchell 1999; Klug, Fischer, Ives, and Dennis 2000), and laboratory "model" systems (e.g., Costantino, Desharnais, Cushing, and Dennis 1997; Kaunzinger and Morin 1998; Holyoak 2000; Dennis, Desharnais, Cushing, Henson, and Costantino 2001).

I use agricultural insect ecology to illustrate the model-fitting problem for population dynamics experiments. Entomologists routinely conduct laboratory, greenhouse, and field experiments in which abundances of eggs, immatures, and adults of different species are estimated at several times under various plant conditions, predator communities, or other experimental treatments. Currently, such data are almost always analyzed with generalized linear models that do not reflect processes of reproduction, growth, mortality, and predation that produced the data. In contrast, the models used for theoretical studies of population dynamics describe these processes and often involve nonlinear predator--prey interactions, time lags arising from development, and other factors that can produce complex dynamics (e.g., Metz and Diekmann 1986; Tuljapurkar and Caswell 1997; Gurney and Nisbet 1998).

Because of their relative dimensionality of noises, states, observations, and replicates, experimental population dynamics problems have a different balance of efficiency issues than other state-space or hidden-variables problems. MC state-space research has typically considered methods for single long time series, such as for financial data (Carlin, Polson, and Stoffer 1992; Shephard and Pitt 1997; Tanizaki and Mariano 1998; Pitt and Shephard 1999; Durham and Gallant 2002) or fisheries catch and survey records (Bjornstad, Fromentin, Stenseth, and Gjosaeter 1999; Meyer and Millar 1999; Millar and Meyer 2000). At the other extreme, MC methods for experimental data have been applied to generalized linear mixed models, where data are iid and the likelihood requires a MC integration over unknown random variables (Clayton 1996; McCulloch 1997; Booth and Hobert 1999). Experimental population dynamics data mix these features in the form of short, replicated time series. In addition, realistic population dynamics models are often continuous in time or have a short time step, so that the space of random disturbances entering the process may be of much higher dimension than the observation space. Also, the aim here is to calculate approximate likelihood ratio tests, whereas most MC state-space research has addressed nonexperimental settings with more focus on filtering and smoothing than on parameter estimation and testing.

I compare five MC methods for likelihood maximization. Two of the main approaches in the literature are MCEM (Wei and Tanner 1990; Chan and Ledolter 1995; McCulloch 1997; Robert and Casella 1999; Booth and Hobert 1999) and MCLR (Geyer and Thompson 1992, Geyer 1994, 1996; McCulloch 1997), which both use alternating steps of MC sampling and local maximization (but see Levine and Casella 2001 for potential improvements). Two other natural candidates for the problem are PF (Gordon et al. 1993; Kitagawa 1996; Pitt and Shephard 1999; Doucet et al. 2001b) and basic MC integration, possibly with importance sampling, which I call MC direct (MCD). The method developed here, MC kernel likelihood (MCKL), temporarily treats parameters as random variables for purposes of Markov Chain MC (MCMC) sampling and then uses a weighted kernel density estimator to approximate a classical likelihood surface. MCKL is related to kernel density estimation of Bayesian posterior distributions (West 1993; Chen 1994; Givens and Raftery 1996; Liu and West 2001), but the approach here of using kernel density estimates to recover the classical likelihood surface for a state-space model appears to be new.

Next I introduce the MC likelihood methods in detail. For MCKL, I discuss convergence, MC error and smoothing mode bias, zooming in by resampling near the mode to reduce smoothing bias, and estimating smoothing bias from posterior cumulants. I then introduce an example population model and hypothetical experiment, and compare maximum likelihood convergence of the different methods. Finally, I use a problem of multivariate standard deviation estimation to illustrate and evaluate the zooming and cumulant-based corrections.

2. MONTE CARLO LIKELIHOOD METHODS

Suppose that there are n experimental units with data vectors [Y.sub.i], i = 1,..., n, which include both multiple times and multiple observation dimensions. To be more explicit, denote [Y.sub.i] = ([Y.sub.i](1),..., [Y.sub.i] (T)), where [Y.sub.i] (t) is a vector of observations at time t and there is a fixed set of observation times, t = 1,..., T. (For notational simplicity, I assume the same set of observation times for each replicate.) The methods here require only that the model be amenable to various MC algorithms: MCMC for the MCKL, MCEM, and MCLR methods and sequential particle filtering for the PF method. For each replicate i, let [v.sub.i] be the unknown states or process noises, with probability density Pr([v.sub.i]), and let Pr ([Y.sub.i]|[v.sub.i]) be the probability density of observations given states. Denote [v.sub.i] = ([v.sub.i](1),..., [v.sub.i](T)), where [v.sub.i](t) are the process noises from time t--1 to t, which may include many model time steps (and noise values) between observation times.

Each method aims to maximize the likelihood integral for the d-dimensional parameter vector [THETA],

L([THETA]) = [n.[product].[i=1]][integral] Pr([Y.sub.i]|[v.sub.i]) Pr ([v.sub.i]) d[v.sub.i], (1)

or l([THETA]) = log L([THETA]). Although (1) makes sense with v as either states or process noises, from here on I use it as process noises and write X (v) for the states. In other words, Pr([Y.sub.i]|[v.sub.i]) will usually involve a calculation of state dynamics from the process noises, with the observation density related to the state values. In the example that follows, v's are random environmental variations, X is a time trajectory of dynamics for an age cohort model with a 1-day time step, and Y is a time series of estimates taken every 10 days of the abundance of several life stages, each of which is a summation of multiple age cohorts. For general introduction of each method, I do not need to specify dimensions for states, noises, or observations. Treatment of initial conditions is assumed to be included in the notation. If these conditions are fixed, then the state calculations start from them; if they are random, then they are included in the process noises. Dependence of probabilities on [THETA] is suppressed in the notation. Finally, everything is written for continuous variables, but could be easily adapted to discrete variables.

Only the MCD and PF methods estimate the likelihood without introducing an unknown constant. For the other methods, after estimating the maximum likelihood estimator (MLE), it is necessary to estimate the likelihood at the MLE for purposes of likelihood ratio approximate hypothesis tests, and I assume that this is feasible. For my population dynamics example, I do this through an importance-sampled MC estimate of (1), with an estimate of each Pr([v.sub.i]|[Y.sub.i]) as the importance density for each Pr([v.sub.i]) (Shephard and Pitt 1997).

2.1 Monte Carlo Direct

The most obvious approach, MCD draws a large sample {[v.sub.i.sup.(j)]}[.sub.j=1.sup.m] from each Pr([v.sub.i]) and uses

l([THETA]) [approximately equal to] [n.summation over (j=1)] log[[1/m] [m.summation over (j=1)] Pr([Y.sub.i]|[v.sub.i.sup.(j)])]. (2)

This method suffers from the inefficiency of basic MC integration: a large variance of Pr([Y.sub.i]|[v.sub.i.sup.(j)]) (Robert and Casella 1999, sec. 3.2). However, it has the potential efficiency that the same noise sample and state trajectories can be used for each replicate within a treatment group. If the same model applies to i...

Read the FULL article now - Try Goliath Business News - FREE!   
You can view this article PLUS...

  • Over 5 million business articles
  • Hundreds of the most trusted magazines, newswires, and journals (see list)
  • Premium business information that is timely and relevant
  • Unlimited Access

Now for a Limited Time, try Goliath Business News - Free for 3 Days!
Tell Me More   Terms and Conditions

Get Goliath Business News for 1 year - Just $99 (Save 65%)
Tell Me More   Terms and Conditions

Already a subscriber? Log in to view full article



More articles from Journal of the American Statistical Association
Parameterization and Bayesian modeling., June 01, 2004
To model or not to model? Competing modes of inference for finite popu..., June 01, 2004

Looking for additional articles?
Search our database of over 3 million articles.

Looking for more in-depth information on this industry?
Search our complete database of Industry & Market reports by text, subject, publication name or publication date.

About Goliath
Whether you're looking for sales prospects, competitive information, company analysis or best practices in managing your organization, Goliath can help you meet your business needs.

Our extensive business information databases empower business professionals with both the breadth and depth of credible, authoritative information they need to support their business goals. Whether it be strategic planning, sales prospecting, company research or defining management best practices - Goliath is your leading source for accurate information.