Home | Business News | Browse by Publication | M | Measurement and Evaluation in Counseling and Development

Evaluation of weighted scale reliability and criterion validity: a latent variable modeling approach.

Publication: Measurement and Evaluation in Counseling and Development
Publication Date: 01-APR-07
Format: Online
Delivery: Immediate Online Access

Article Excerpt
A method is outlined for evaluating the reliability and criterion validity of weighted scales based on sets of unidimensional measures. The approach is developed within the framework of latent variable modeling methodology and is useful for point and interval estimation of these measurement quality coefficients in counseling and education research.

**********

Reliability and validity of measurement are of critical relevance for counseling, education, and developmental research and have received a great deal of attention over the past century across the behavioral and social sciences. Traditionally, scholars have been concerned with reliability and validity coefficients of simple sum scores for given sets of components, also referred to as unweighted or unit-weighted composites (e.g., Crocker & Algina, 1986). This has led to the development of procedures for (a) estimating scale reliability and criterion validity as well as related indices for congeneric or noncongeneric measures (e.g., Bollen, 1989; Feldt, Woodruff, & Salih, 1987; McDonald, 1999; Raykov, 1997, 2001, in press; Raykov & Shrout, 2002), (b) examining dependent and independent group differences in composite reliability and criterion validity or in alpha coefficients (e.g., Feldt, 1969, 1980; Hakstian & Whalen, 1976; Raykov, 2002; Woodruff & Feldt, 1986), (c) testing time invariance in reliability and criterion validity (e.g., Alanen, Leskinen, & Kuusinen, 1998; Block & Saris, 1983; Raykov, 2000, 2006; Raykov & Tisak, 2004), and (d) studying change in composite reliability following component addition or deletion (Raykov & Grayson, 2003).

Unlike unit-weighted scale reliability, which has enjoyed impressive popularity among substantive researchers and methodologists alike, the issue of choosing weights, so as to obtain the highest possible reliability and criterion validity (referred to as maximal reliability and maximal validity) for a linear combination of a given set of measures, has received less interest. In particular, maximal reliability and maximal criterion validity have rarely been used in counseling and education, which can be viewed as a cause for potentially serious concern. Because measurement error is nearly ubiquitous in the behavioral disciplines, it is highly desirable to encourage the construction of measuring instruments that exhibit the least susceptability to error. Given the frequent use of multicomponent instruments in counseling and education research--for example, scales, composites, test battery, subscales, questionnaires, self-reports, or inventories--efforts should be undertaken to construct such instruments in a way that ensures maximal possible accuracy and validity of assessment. This motivates special interest in the concepts of maximal reliability and maximal validity. Indeed, pursuing maximal reliability leads to a measuring instrument that yields least relative error variance and thus best differentiation among examined subjects along a studied dimension, whereas pursuing maximal criterion validity furnishes a scale that is most predictive of a given reference variable.

A number of instructive treatments of maximal reliability first appeared several decades ago (e.g., Green, 1952; McDonald & Burr, 1967; Thomson, 1940) and, after what seemed to be a long period of disinterest in this concept, methodologists have given it renewed attention over the past 20 years or so (e.g., Bartholomew, 1996; Bentler, 2004; Conger, 1980; Hancock & Mueller, 2001; Li, 1997; Li, Rosenthal, & Rubin, 1996; Raykov, 2004; Raykov & Hancock, 2005). More recently, the notion of maximal validity has also become of interest in quantitative research, and its relationship to maximal reliability has been highlighted (Penev & Raykov, 2006).

Despite the recent increase of interest in maximal reliability and maximal validity, both concepts have not yet received the attention they deserve in counseling and education research. A reason for this omission lies, in part, in the relatively complicated methods that have been available for their evaluation, especially with regard to routine application. The purpose of the present article is to help resolve this issue. The remaining discussion deals with maximal reliability and maximal criterion validity coefficients for sets of unidimensional measures. The underlying modeling approach is widely applicable in empirical research. The outlined method of point and interval estimation of these coefficients is based on recent advances in latent variable modeling (B. O. Muthen, 2002; L. K. Muthen & Muthen, 2006; Skrondal & Rabe-Hesketh, 2004). The method provides researchers in counseling, education, and development with a readily usable means for the evaluation of maximal reliability and maximal criterion validity on a nearly routine basis.

BACKGROUND AND ASSUMPTIONS

To facilitate the presentation of the methodological framework in this section, a brief description of some related concepts seems appropriate. Because most evaluation efforts in the behavioral and social sciences are plagued with the nearly ubiquitous error of measurement, it is meaningful to consider a true score as corresponding to the unobserved value of actual interest for any given variable; the difference between the observed score and the true score is the error score (e.g., Lord & Novick, 1968). Although the error scores pertaining to different tests are always uncorrelated with their true scores, by the definition of the latter, it is possible, although not necessary, that the error terms are correlated with one another (Zimmerman, 1975).

In many empirical situations in counseling and education studies, researchers collect multiple measures (indicators) of an unobserved dimension of concern, such as depression, intelligence, motivation, and neuroticism. When these indicators measure a dimension with different origins of scale, different units, and different precision, they are referred to as congeneric (e.g., Joreskog, 1971). Such measures represent the most general case of indicators of a single latent dimension, and they are frequently obtained in counseling and education. For example, when interested in evaluating general mental ability, researchers may administer a test of inductive reasoning, a test of figural relations, and Raven's matrices test (e.g., Baltes, Dittmann-Kohli, & Kliegl,...

View this article FREE - Now for a Limited Time, try Goliath Business News
Free for 3 Days!



Looking for additional articles?
Search our database of over 3 million articles.

Looking for more in-depth information on this industry?
Search our complete database of Industry & Market reports by text, subject, publication name or publication date.

About Goliath
Whether you're looking for sales prospects, competitive information, company analysis or best practices in managing your organization, Goliath can help you meet your business needs.

Our extensive business information databases empower business professionals with both the breadth and depth of credible, authoritative information they need to support their business goals. Whether it be strategic planning, sales prospecting, company research or defining management best practices - Goliath is your leading source for accurate information.