|
Article Excerpt 1. INTRODUCTION
Applications in many fields, from market segmentation in business to health state modeling in medicine, involve dividing a population into contextually coherent subgroups. It is frequently desirable to understand how subjects move from one group to another over time, and in particular how transition patterns are affected by different treatments applied to members of the population. Various field-specific approaches have been developed to deal with such situations (e.g., Sugar et al. 1998 in health services research). However, these methods tend to be somewhat ad hoc and potentially can be improved using likelihood procedures based on hidden Markov models (HMMs). HMMs assume that observations are generated from a mixture of distributions among which subjects move according to a latent Markov chain. By incorporating treatment data into the procedure for estimating the transition matrices, one can obtain direct assessments of a treatment's effectiveness. This article applies HMMs to a health state modeling problem involving the comparison of two antipsychotic medications for schizophrenia and discusses the advantages and disadvantages of this methodology relative to the current medical approaches.
Clinical trials typically measure different aspects of physical and mental well-being using health status instruments or questionnaires consisting of dozens of item responses. Traditionally, such data are examined by performing univariate analyses on composite scores formed from the original responses. However, clinical trial investigators have recently turned to multivariate health state models to capture structural features in the data because the phenomena being studied are too complex to be described by univariate summaries. These models divide a population's sample space into medically coherent subgroups called health states. Clinical change is measured based on the probability of moving individuals between health states, rather than by a simple net increase or decrease in the mean of a univariate continuous scale. A treatment benefits patients in a given cluster if it has a high probability of moving them to a superior state or preventing them from moving to an inferior state. Health state models have numerous advantages. In particular, they lend themselves naturally to the assessment of long-run treatment effects via the estimation of stationary distributions, and they can be used in utility elicitation and cost-benefit analyses as the basis for making objective health policy decisions.
In the medical literature, the state of the art for fitting health state models uses the k-means clustering algorithm to produce hard assignments of patients to the nearest cluster center (Sugar, James, Lenert, and Rosenheck 2004). The cluster assignments are then treated as known and used to estimate matrices of transition probabilities for different medications. The clustering approach is well suited to capturing complex relationships, because it allows the data to choose the optimal locations of the health states. The clustering method, although easy to implement, has some potential limitations. The k-means algorithm implicitly assumes that the data are distributed as an equally weighted mixture of Gaussian distributions with identity covariance matrices. Thus the algorithm may perform poorly if mixtures of nonspherical or non-Gaussian distributions fit the data more naturally, or if different mixing weights are needed (see, e.g., Banfield and Raftery 1993). Furthermore, the k-means health state model is fit using a two-stage procedure: First, the cluster centers are computed assuming independent observations, then transition matrices are estimated assuming that cluster means are known and that each subject belongs to the nearest cluster with probability 1. The two-stage estimation procedure ignores potentially valuable information about a subject's cluster membership during other observation periods. It also prevents uncertainty about cluster means, cluster membership, and transition probabilities from correctly propagating through the model.
The preceding limitations can be addressed by modeling the data using an HMM. Because HMMs directly model the temporal aspect of the data, they can borrow strength across nearby observations when estimating model parameters and classifying observations to states. HMMs are fit using likelihood-based procedures that simultaneously estimate the transition probabilities and the parameters of the mixture components. The Bayesian methods used in this article allow arbitrary functions of HMM parameters to be estimated while automatically accounting for parameter uncertainty. Furthermore, the mixture components in an HMM belong to distributional families chosen by the modeler, so HMMs provide a very flexible way to fit the data. We model the data examined in this article using mixtures of multivariate t distributions, each with its own covariance matrix. The HMM described in this article is a strict generalization of the mixture model implicit in the k-means clustering algorithm, which we refer to as the k-means model.
Both the k-means and HMM approaches assume that transitions over time are governed by a time-homogeneous Markov process, an assumption that may be violated if the effect of a treatment changes as the study progresses. To address this concern, we develop an inhomogeneous HMM in which different transition probabilities may apply for each observation period. To prevent an explosion in the number of parameters, we model the rows of each period's transition matrix as draws from a common Dirichlet distribution with parameters embedded in a Bayesian hierarchical model. The transition matrices in our inhomogeneous model benefit from Bayesian shrinkage, so that if the data show no evidence of inhomogeneity, then the inhomogeneous model collapses back to the homogeneous model. Shrinkage factors for the inhomogeneous model can be used to check whether the homogeneity assumption is reasonable.
The purpose of this article is to demonstrate the HMM approach to health state modeling and evaluate its potential advantages and disadvantages relative to the clustering method. We have fit HMMs to data from a comprehensive double-blind trial that compared the impact of haloperidol and clozapine, two medications for treating schizophrenia, on clinical outcomes; social, vocational, and community functioning; and societal costs (Rosenheck et al. 1997). This dataset has already been studied using a cluster-based health state model, which allows us to make direct comparisons between the HMM and cluster methods. Section 2 provides a description of the data. Section 3 presents details of both homogeneous and inhomogeneous hidden Markov health state models. Section 4 gives results from the HMM fit to the schizophrenia dataset. Finally, Section 5 provides a discussion of the relative merits of the clustering and HMM approaches. Details of the MCMC algorithms used to fit the model are left to an Appendix.
2. DATA
The schizophrenia dataset contains 423 patients treated at 15 veterans health centers around the United States. The measurements consist mainly of scores on standard health status instruments measuring a broad spectrum of emotional, interpersonal, and physical functioning. Our analysis focuses on movement disorders typically induced by antipsychotic medications. We combined items from three commonly used instruments, the Abnormal Involuntary Movement Scale (AIMS), which measures tardive dyskinesia, that is, unconscious movements (Guy 1976); the Barnes Akathesia Scale (BAS), which focuses on involuntary restlessness (Barnes 1989); and the Simpson-Angus Scale (SAS), which deals with syndromes of pseudo-parkinsonism such as involuntary tremors, muscle stiffness, and salivation (Simpson and Angus 1970). All of these instruments use Likert scales to measure severity of symptoms with higher scores indicating greater degree of impairment. Data were collected by trained research assistants at six time points (baseline, 6 weeks, and 3, 6, 9, and 12 months). There was evidence of significant differences in ratings among the 15 study sites. To make the responses comparable, we subtracted off the site effects, which were estimated by fitting mixed-effects models to each question using patient response as the dependent variable, with time, treatment, and study site as independent variables.
The side effects data were 24-dimensional. To reduce the dimension of the data and to allow comparisons with previous analyses (e.g., Sugar et al. 2004), we replaced the full dataset with its first four principal components. Principal components also smooth over roughness inherent in the Likert responses to individual items, making mixtures of continuous distributions more reasonable. The choice of four components was made on both quantitative and qualitative grounds. We opted to include all dimensions for which the proportion of variance explained was higher than the average variance per dimension. This procedure yielded a small number of easily interpretable dimensions. The components represent, in order of variance...
|