|
Article Excerpt 1. INTRODUCTION
To date, most work on simultaneous inference and multiple comparison has focused on comparing the means of k ([greater than or equal to] 3) populations. For instance, Tukey (1953) considered pairwise comparison of k population means, Dunnett (1955) discussed the comparison of several means with a control mean, and Scheffe (1953) derived a set of simultaneous confidence intervals for all of the contrasts among the population means. Miller (1981), Hochberg and Tamhane (1987), and Hsu (1996) have provided excellent summaries of the work in this area. The work of Spurrier (1999) seems to be the only work on the simultaneous comparison of several regression lines, in which a set of simultaneous confidence bands for all of the contrasts of several simple linear regression lines over the entire range (-[infinity], [infinity]) is constructed when the design matrices of the regression lines are the same. The restriction of equal design matrices, although inconvenient in many applications, was used to establish some rather complicated distribution theory. The purpose of the current work is to extend Spurrier's work in several directions. First, we consider the comparison of several linear regression models that can have several explanatory variables. Second, we allow the design matrices of these linear regression models to be different as long as all of the design matrices are full column rank. Third, we allow comparison among the regression models to be either pairwise comparison or the comparison of several regression models with a "control" regression model, which may be more interesting than the "all-contrast" comparison in many applications. Finally, we allow the range of each explanatory variable to be restricted to a given interval, either finite or infinite.
Suppose that the model of the ith linear regression models is
[Y.sub.i] = [X.sub.i][b.sub.i] + [e.sub.i], i = 1,...,k,
where [Y.sub.i.sup.T] = ([y.sub.i1],...,[y.sub.in.sub.i]), [X.sub.i] is an [n.sub.i] X (p + 1) full column rank matrix with the first column given by (1,...,1)[.sup.T] and the l ([greater than or equal to] 2)th column given by ([x.sub.1,l-1.sup.i],...,[x.sub.[n.sub.i],l-1.sup.i])[.sup.T], [b.sub.i.sup.T] = ([b.sub.0.sup.i],...,[b.sub.p.sup.i]), and [e.sub.i.sup.T] = ([e.sub.i1],...,[e.sub.in.sub.i]) with all of the {[e.sub.ij], j = 1,...,[n.sub.i], i = 1,...,k} being iid N(0, [[sigma].sup.2]). Because [X.sub.i.sup.T][X.sub.i] is nonsingular, the least squares estimator of [b.sub.i] is given by [^.b.sub.i] = ([X.sub.i.sup.T][X.sub.i])[.sup.-1][X.sub.i.sup.T][Y.sub.i], i = 1,...,k. Let [^.[sigma].sup.2] denote the pooled error mean square with degrees of freedom [nu] = [[summation].sub.i=1.sup.k] ([n.sub.i] - p - 1); [^.[sigma].sup.2] is independent of [^.b.sub.i].
Our purpose is to construct a set of simultaneous confidence bands for
[x.sup.T][b.sub.i] - [x.sup.T][b.sub.j] = (1,[x.sub.1],...,[x.sub.p])[b.sub.i] - (1, [x.sub.1],...,[x.sub.p])[b.sub.j], (i, j) [member of] [LAMBDA],
over a given range [x.sub.l] [member of] [[a.sub.l], [b.sub.l]], l = 1,...,p, where [LAMBDA] is an index set that determines the comparison of interest. For example, if the pairwise comparison is of interest, then [LAMBDA] = {(i, j) : 1 [less than or equal to] i [not equal to] j [less than or equal to] k}; if the comparisons of the second to kth regression models with the first regression model are of interest, then [LAMBDA] = {(i, j) : 2 [less than or equal to] i [less than or equal to] k, j = 1}; and if the successive comparison of the k regression models is of interest, then [LAMBDA] = {(i, i + 1) : 1 [less than or equal to] i [less than or equal to] k - 1}. Because the variance of [x.sup.T][^.b.sub.i] - [x.sup.T][^.b.sub.j] is given by [[sigma].sup.2][x.sup.T] [[DELTA].sub.ij]x, where [[DELTA].sub.ij] = ([X.sub.i.sup.T][X.sub.i])[.sup.-1] + ([X.sub.j.sup.T][X.sub.j])[.sup.-1], we construct the following set of simultaneous confidence bands:
[x.sup.T][b.sub.i] - [x.sup.T][b.sub.j] [member of] [x.sup.T][^.b.sub.i] - [x.sup.T][^.b.sub.j] [+ or -] c[^.[sigma]] [square root of ([x.sup.T] [[DELTA].sub.ij]x)] [for all] [x.sub.l] [member of] [[a.sub.l], [b.sub.l]] for l = 1,...,p, and [for all] (i, j) [member of] [LAMBDA], (1)
where c is the critical constant required so that the confidence level of this set of simultaneous confidence bands is equal to 1 - [alpha]. Note that the confidence level of the bands in (1) is given by P{T < c}, where
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. (2)
So the critical value c can be determined if the distribution of T can be found. Although finding an explicit formula for the distribution function of T is a daunting task, in this article we adopt the approach of simulating the random variable T. Simulation-based methods have been previously proposed for various simultaneous inferences and multiple test problems; for example, Edwards and Berry (1987) considered how to assess the accuracy of simulated critical values for some simultaneous inference procedures, Beran (1988, 1990) proposed some simulation methods for the construction of balanced simultaneous confidence sets, and Westfall and Young (1993) considered simulation-based multiple tests. Although T in (2) is related to the maxima of several Gaussian random fields over the range [x.sub.l] [member of] [[a.sub.l], [b.sub.l]], l = 1,...,p, we show how T can be simulated easily in many situations, allowing the value of c to be approximated accurately.
Note that in this article we assume that there is no functional relationship among the explanatory variables [x.sub.i] (1 [less than or equal to] i [less than or equal to] p). On the other hand, if functional relationships do exist among the explanatory variables, such as in polynomial regression, then the critical value calculated using the approach of this article is conservative.
The article is organized as follows. A general representation of the random variable T is provided in Section 2. This representation is then used in Section 3 to simulate T when [[a.sub.l], [b.sub.l]] = (-[infinity], [infinity]) [for all]l = 1,...,p. It is also used in Section 4 to simulate T for a given (finite) range [[a.sub.l], [b.sub.l]], l = 1,...,p, when p = 1 and 2. In Section 4 a maximization algorithm to simulate T for a given (finite) range for a general value of p is also provided. An application of the proposed methodology to a problem in drug stability studies is described in Section 5, and a way to assess the accuracy of the simulated critical value is presented in Section 6. Finally, some concluding remarks are given in Section 7.
2. A REPRESENTATION OF T
There exists a (p + 1) X (p + 1) nonsingular matrix [P.sub.ij] such that
([X.sub.i.sup.T][X.sub.i])[.sup.-1] + ([X.sub.j.sup.T][X.sub.j])[.sup.-1] = [P.sub.ij.sup.T][P.sub.ij] [for all] 1 [less than or equal to] i [not equal to] j [less than or equal to] k.
Let [Z.sub.i], i = 1,...,k, be independent normal random vectors independent of [^.[sigma]], with distribution [Z.sub.i] ~ N(0, ([X.sub.i.sup.T][X.sub.i])[.sup.-1]). Denote [Z.sub.ij] = ([P.sub.ij.sup.T])[.sup.-1]([Z.sub.i] - [Z.sub.j]), 1 [less than or equal to] i [not equal to] j [less than or equal to] k. Then the distribution of T is the same as
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], (3)
where
[Q.sub.ij] = [[sup.[x.sub.l][member of][[a.sub.l],[b.sub.l]],l=1,...,p]|([P.sub.ij]x)[.sup.T][Z.sub.ij]|]/[||[P.sub.ij]x|| ||[Z.sub.ij]||].
Note that neither [Z.sub.ij] nor [^.[sigma]] / [sigma] depends on x. To simulate T from (3), the key is to calculate [Q.sub.ij], which involves the maximization of a p-variate function over a hyperrectangle region.
Let [P.sub.ij] = ([p.sub.ij.sup.0],[p.sub.ij.sup.1],...,[p.sub.ij.sup.p])...
|