|
Article Excerpt 1. INTRODUCTION
In many problem settings, multiple measurements are made on each experimental unit, resulting in high-dimensional multivariate data. It is often of interest to explore potential relationships among subsets of these measurements. For example, some measurements may represent attributes of psychological characteristics, whereas others may represent attributes of physical characteristics. It may be of interest to determine whether there is a relationship between the psychological and physical characteristics. This requires a test of independence between pairs of vectors, where the vectors potentially have different measurement scales and dimensions.
Let [x.sub.i.sup.T] = ([x.sub.i.sup.(1)T], [x.sub.i.sup.(2)T]) for i = 1,..., n denote a random sample of vector pairs, where [x.sub.i.sup.(1)T] is a continuous vector of dimension p and [x.sub.i.sup.(2)T] is a continuous vector of dimension q. We seek to test
[H.sub.0] : [x.sub.i.sup.(1)] and [x.sub.i.sup.(2)] are independent versus [H.sub.a] : [x.sub.i.sup.(1)] and [x.sub.i.sup.(2)] are dependent.
The classical parametric test due to Wilks (1935) is based on
W = |A|/[|[A.sub.11]||[A.sub.22]|],
where A = [[summation].sub.i=1.sup.n] ([x.sub.i] - [bar.x.sub.i])([x.sub.i] - [bar.x.sub.i])[.sup.T] is partitioned into [A.sub.st] = [[summation].sub.i=1.sup.n] ([x.sub.i.sup.(s)] - [bar.x.sub.i.sup.(s)])([x.sub.i.sup.(t)] - [bar.x.sub.i.sup.(t)])[.sup.T], for s, t = 1, 2. Under the assumption of multivariate normality, Wilks' test is optimal, that is, the most efficient test. Under [H.sub.0] with finite fourth moments, -n log W[d.[right arrow]][[chi square].sub.pq].
A nonparametric analog to the Wilks test was given by Puri and Sen (1971). These authors developed a class of tests based on componentwise ranking that uses a test statistic of the form
[S.sup.J] = |T|/[|[T.sub.11]||[T.sub.22]|].
Here the elements of (p + q) X (p + q) matrix T are
[T.sub.st] = [1/n][n.summation over (i=1)] [J.sub.s] ([C.sub.si]/[n+1])[J.sub.t]([C.sub.ti]/[n+1]),
where [C.sub.si] denotes the rank of the sth component of [x.sub.i] among the sth components of all n vectors, [J.sub.s] and [J.sub.t] are arbitrary (standardized) score functions, and T is partitioned in the same manner as in the Wilks test. Under [H.sub.0], -n log [S.sup.J] [d.[right arrow]] [[chi square].sub.pq].
Muirhead (1982) examined the effect of the group of transformations {x [right arrow] Ax + b} on this problem. Here b is a p + q vector and
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
is an arbitrary nonsingular matrix with p X p matrix [A.sub.1] and q X q matrix [A.sub.2]. The Wilks test is invariant under this group of transformations. Thus its performance does not depend on the variance-covariance structure of either [x.sub.i.sup.(1)] or [x.sub.i.sup.(2)]. The test given by Puri and Sen is not invariant under this group of transformations. Gieser and Randles (1997) proposed a nonparametric test based on interdirection counts that generalized the univariate (p = q = 1) quadrant test of Blomqvist (1950) and is invariant under this transformation group. Taskinen, Kankainen, and Oja (2003a) proposed a more practical invariant extension of the quadrant test based on spatial signs that is easy to compute for data in the dimensions (say, <15) commonly encountered in practice. These two invariant quadrant test extensions share certain asymptotic properties, for example, the same asymptotic efficiencies.
In this article we develop tests that generalize the popular univariate tests due to Kendall (1938) and Spearman (1904) to any dimensions (arbitrary p and q). Our tests have advantages over the quadrant test extensions in that they do not require centering (i.e., subtracting a location estimator) and generally have better power properties than the quadrant test extensions. Moreover, the spatial sign versions are easy to compute for data in common dimensions, thus providing intuitive, practical, robust alternatives to multivariate normal-theory methods.
The univariate tests based on Kendall's tau, Spearman's rho, and Blomqvist's quadrant statistic are described in Section 2. Their multivariate analogs based on interdirections are also sketched. In Section 3 the more practical versions of these tests based on spatial signs and ranks are described. Section 4 shows their large-sample properties under the null hypothesis as well as under a sequence of alternatives under mild assumptions (e.g., even the assumption of the existence of first moments can be avoided). Asymptotic efficiencies are given and simulations are used to compare the finite-sample powers of these tests in Section 5. The theory is illustrated by an example in Section 6, and the article concludes with some comments in Section 7.
2. ANALOGS TO KENDALL'S TAU AND SPEARMAN'S RHO
Consider the problem of measuring dependence within the components of bivariate vectors. Let ([x.sub.i.sup.(1)], [x.sub.i.sup.(2)])[.sup.T], i = 1,..., n, be a random sample from a bivariate, continuous population. Using the univariate sign function S(x) = sign(x) = -1, 0, 1 as x <, =, > 0, the sign of the univariate [x.sub.i.sup.(1)] is [S.sub.i.sup.(1)] = S([x.sub.i.sup.(1)]) and the sign of the difference [x.sub.i.sup.(1)] - [x.sub.j.sup.(1)] is [S.sub.ij.sup.(1)] = S([x.sub.i.sup.(1)] - [x.sub.j.sup.(1)]). The centered rank of [x.sub.i.sup.(1)] is described as [R.sub.i.sup.(1)] = [ave.sub.j]S([x.sub.i.sup.(1)] - [x.sub.j.sup.(1)]), and the univariate median [^.[mu].sup.(1)] of the [x.sub.i.sup.(1)]'s satisfies [ave.sub.j]S([^.[mu].sup.(1)] - [x.sub.j.sup.(1)]) = 0. The centered sign of [x.sub.i.sup.(1)] is then [^.S.sub.i.sup.(1)] = S([x.sub.i.sup.(1)] - [^.[mu].sup.(1)]). Using [x.sub.1.sup.(2)],..., [x.sub.n.sup.(2)] to define [S.sub.i.sup.(2)], [S.sub.ij.sup.(2)], [R.sub.i.sup.(2)], [^.[mu].sup.(2)], and [^.S.sub.i.sup.(2)] analogously, the popular nonparametric measures of dependence are now conveniently expressed. They are Blomqvist's quadrant statistic,
Q = [ave.sub.i]{[^.S.sub.i.sup.(1)][^.S.sub.i.sup.(2)]},
Kendall's tau,
[tau] = [ave.sub.i,j]{[S.sub.ij.sup.(1)][S.sub.ij.sup.(2)]},
and Spearman's rho,
[rho] = [ave.sub.i]{[R.sub.i.sup.(1)][R.sub.i.sup.(2)]} = [ave.sub.i,j,k]{[S.sub.ij.sup.(1)][S.sub.ik.sup.(2)]}.
The test statistics are thus covariances (or correlations) between centered signs, signs of the pairwise differences, and centered ranks.
When correlating univariate pairs, the most interpretable feature is the magnitude of the correlation, that is, the square of the correlation coefficient. Multivariate measures of correlation provide analogs to
[Q.sup.2] = ave{([^.S.sub.i.sup.(1)][^.S.sub.i'.sup.(1)])([^.S.sub.i.sup.(2)][^.S.sub.i'.sup.(2)])},
[[tau].sup.2] = ave{([S.sub.ij.sup.(1)][S.sub.i'j'.sup.(1)])([S.sub.ij.sup.(2)][S.sub.i'j'.sup.(2)])},
and
[[rho].sup.2] = ave{([S.sub.ij.sup.(1)][S.sub.i'j'.sup.(1)])([S.sub.ik.sup.(2)][S.sub.i'k'.sup.(2)])},
where the averages are computed over all possible indices. We can now interpret [^.S.sub.i.sup.(1)][^.S.sub.i'.sup.(1)] and [S.sub.ij.sup.(1)][S.sub.i'j'.sup.(1)], for example, as the cosines of the angles between [x.sub.i.sup.(1)] - [^.[mu].sup.(1)] and...
|