Home | Business News | Browse by Publication | M | Measurement and Evaluation in Counseling and Development

Computerized adaptive testing for effective and efficient measurement in counseling and education.

Publication: Measurement and Evaluation in Counseling and Development
Publication Date: 01-JUL-04
Format: Online
Delivery: Immediate Online Access

Article Excerpt
Computerized adaptive testing (CAT) is described and compared with conventional tests, and its advantages summarized. Some item response theory concepts used in CAT are summarized and illustrated. The author describes the potential usefulness of CAT in counseling and education and reviews some current issues in the implementation of CAT.

**********

Effective counseling in education frequently requires that detailed information on a variety of traits and characteristics of each student be available. In the early days of counseling, when counseling revolved around vocational issues (e.g., Brayfield, 1961; Lofquist & Dawis, 1969), student information was primarily concerned with vocationally relevant abilities and preferences. As counseling expanded beyond the vocational realm and education expanded its focus beyond just academic skills, the range of information used by counselors and educators included all aspects of personality, needs, values, attitudes, and, most recently, interpersonal and relationship variables.

In some counseling and educational situations, some of this information can be gleaned by a skilled counselor from data obtained through interviews or by a teacher from observations of students' behavior. However, in many cases, counselors and educators rely on a wide variety of psychological measuring instruments--tests, questionnaires, inventories, and scales--to provide measurement data to inform counseling and educational services. Research on counseling--as well as research in education and development--relies heavily on these measuring instruments for its data.

CONVENTIONAL PSYCHOLOGICAL MEASURING INSTRUMENTS

The vast majority of psychological measurement instruments in use today are based on the paper-and-pencil conventional test. This type of test was developed initially for use in World War I to provide a quick and inexpensive method of screening large numbers of recruits (DuBois, 1970). Tests of this type (including most inventories and other instruments used to measure nonability variables) are typically designed using procedures of classical test theory (e.g., Cronbach, 1990; Gulliksen, 1950), which has its roots in procedures that developed around the same time as the paper-and-pencil test.

A conventional test is characterized by a fixed set of questions or items that are administered to each examinee. Using the guidance of classical test construction procedures, the developers of conventional tests usually select these items by item analysis procedures that are designed to maximize the internal consistency reliability of the set of items that make up the instrument. Although the reliability will be high for that set of items, which also reduces the standard error of measurement, this reduction in measurement imprecision is assumed to be constant across the measurement scale.

There are, however, a number of important limitations of conventional tests. When classical test theory is used to select the items for a conventional test, the items selected that maximize internal consistency reliability are typically those that are appropriate for the average examinee in the group--the items that maximize reliability are those that have their difficulties (proportion correct or keyed) around p = .50. These items are those that provide best measurement for the average examinee, but they are too difficult for examinees who are below average on the trait being measured and are too easy for examinees who are above average.

For example, if a test is designed to measure arithmetic ability and it is designed for fourth graders, it will be too difficult for most second graders and too easy for most sixth graders. Yet, in the fourth-grade class, there are likely some students who are functioning at or below the second-grade level and some functioning at or well above the sixth-grade level. For the students who deviate in ability/achievement from the level of the conventional test, the test will provide very little information. The students with low ability will answer almost all of the items incorrectly, and the students with high ability will answer all or most of the items correctly. The result, for these students, will be scores that provide almost no capability of differentiating among them.

This same principle applies in the measurement of all psychological variables that can be measured as continuous variables--a fixed-item conventional measuring instrument is designed to measure well for a restricted range of the trait, usually around the mean of the anticipated trait distribution. When it is used for individuals whose trait levels deviate from that trait range, conventional measuring instruments provide increasingly poor measurement because the items have little relevance for those examinees; this has been recognized in the application of "out-of-level" testing in some educational environments. Furthermore, time limits that are frequently imposed on conventional tests (usually for the test administrator's convenience) further deteriorate the quality of measurement by introducing other traits (e.g., persistence, slowness) that interfere with good measurement of the trait(s) that the instrument is designed to measure.

ADAPTIVE TESTING

The basic measurement problem that characterizes conventional tests has been recognized for many years in a number of domains. In athletic competitions, for example, it would be unheard of to try to measure an athlete's hurdle-jumping ability by having her or him repeatedly jump over a succession of 2-foot hurdles. Rather, a series of hurdles of increasingly higher levels is set up, and the athlete tries to clear each until she or he is no longer able to do so. Then, to determine a more precise indication of the level that the participant can clear, a set of hurdles that vary in a relatively narrow range around the level at which the individual began to miss is constructed. In this way, the task is "adapted" to the individual's performance in order to obtain precise estimates of each athlete's ability.

This principle of adapting the test to the examinee was recognized in the very early days of psychological measurement, even before the development of the conventional paper-and-pencil test, by Alfred Binet in the development of the Binet IQ test (Binet & Simon, 1905) that later was published as the Stanford-Binet IQ Test. Binet's test comprised sets of test items normed by chronological age level. He selected items for each age level if approximately 50% of the children at that age level answered an item correctly. Thus, in the original version of the test, there were sets of items at ages 3 years through 11 years. All of these items constituted Binet's item "bank" for his adaptive test.

Binet's test administration procedure is a fully adaptive procedure:

1. It uses a precalibrated bank of test items.

2. It is individually administered by a trained psychologist and is designed to "probe" for the level of difficulty (i.e., chronological age) that is most appropriate for each examinee, much as jumping hurdles probes for the performance level of each athlete.

3. It has a variable starting option. The administrator sets the beginning level of the Binet test on the basis of her or his best guess about the examinee's likely level of ability (typically the examinee's chronological age, but the starting level can be lower or higher if there is information to inform such a starting level).

4. It uses a defined scoring method--a set of items at a given age level is administered and immediately scored by the administrator.

5. There is a "branching," or item selection rule, that determines which items will next be administered to a given examinee. In the Binet test, the next set of items to be administered is based on the examinee's performance on each previous set of items. If the examinee has answered some or most of the items at a given age level correctly, usually the items at the next higher age level are administered. If most of the items at a given age level are answered incorrectly, items at the next lower age level are typically administered.

6. There is a predefined termination rule. The Binet test is terminated when, for each examinee, both a "ceiling" and a "basal" level have been identified. The ceiling level is the age level at which the examinee incorrectly answers all items; the basal...

View this article FREE - Now for a Limited Time, try Goliath Business News
Free for 3 Days!



More articles from Measurement and Evaluation in Counseling and Development
A comparison study of the paper-and-pencil, personal computer, and int..., July 01, 2004
The online and face-to-face counseling attitudes scales: a validation ..., July 01, 2004
Assessment and technology--allies in educational reform: an overview o..., July 01, 2004

Looking for additional articles?
Search our database of over 3 million articles.

Looking for more in-depth information on this industry?
Search our complete database of Industry & Market reports by text, subject, publication name or publication date.

About Goliath
Whether you're looking for sales prospects, competitive information, company analysis or best practices in managing your organization, Goliath can help you meet your business needs.

Our extensive business information databases empower business professionals with both the breadth and depth of credible, authoritative information they need to support their business goals. Whether it be strategic planning, sales prospecting, company research or defining management best practices - Goliath is your leading source for accurate information.