|
Article Excerpt ABSTRACT
F. Wilfrid Lancaster has earned a reputation for greatness in the evaluation of information storage and retrieval systems. Many of his extensive contributions stem from his early experience with the National Library of Medicine (NLM) MEDLARS system. His evaluation of the MEDLARS Demand Search Service in 1966 and 1967 was an important landmark as one of the earliest evaluations of a computer-based retrieval system and as the first application of recall and precision measures in a large, operational database setting. In 1971, his evaluation of the MEDLARS AIM-TWX system was an important study of early online systems and their direct use by end users. This paper summarizes Lancaster's two major evaluations of the MEDLARS system, including the information environment at the time and their impact in the field of information science. Examples of Lancaster's other evaluation work with information retrieval systems are provided, followed by discussion of the textbooks that grew out of his evaluation experience and expertise. The article closes with comments from current and former NLM staff regarding Lancaster's time at NLM or his influence on their own career.
INTRODUCTION
F. Wilfrid Lancaster established himself as a giant in the evaluation of information storage and retrieval systems early in his career, and his reputation for greatness in this arena stands today.
Many of Lancaster's extensive contributions stem from his experience with the National Library of Medicine (NLM) MEDLARS system. As one of the earliest evaluations of a computer-based retrieval system, his evaluation of the MEDLARS Demand Search Service in 1966 and 1967 was widely regarded as an important landmark, earning praise as the "beau ideal" in the Annual Review of Information Science and Technology (ARIST) (Brandhorst & Eckert, 1972). A few years later, his evaluation of the MEDLARS AIM-TWX system in 1971 was an important study of early online systems and their direct use by end users.
Lancaster undertook these evaluations in an environment of innovation and rapid change--in computing, in information retrieval applications, in information science research, and in information system evaluation.
In this paper, I first summarize Lancaster's two major evaluations of the MEDLARS system and discuss their impact at NLM and more generally in the field of information science. Next, I provide examples of his other evaluation work with information retrieval systems and discuss the books that grew out of his evaluation experience and expertise, the books that instructed so many of us about information systems--their design, analysis, and evaluation. The article closes with comments from current and former NLM staff regarding Lancaster's time at NLM or his influence on their own career.
EVALUATION OF THE MEDLARS DEMAND SEARCH SERVICE
Lancaster is widely associated with the MEDLARS evaluation--but what was it and why was it so important?
MEDLARS stands for MEDical Literature Analysis and Retrieval System and was developed to computerize the production of Index Medicus, a major printed index to the biomedical literature produced by NLM. The computer-based searching component was called the Demand Search Service. When launched in March 1964, there was no other publicly available, fully operational electronic storage and retrieval system of its magnitude in existence (Miles, 1982).
At the time of Lancaster's evaluation, the MEDLARS database contained about 800,000 bibliographic records from January 1964 forward, growing at the rate of about 200,000 records annually. Articles were indexed from a set of 2,400 journals, using the hierarchically organized MeSH controlled vocabulary that consisted of about 7,000 "fairly conventional pre-coordinate type subject headings" (Lancaster, 1968a). This was an offline, batch search system. Search requests were submitted in writing to NLM staff, who created and entered the search strategies. The searches were then run sequentially against the database tapes.
The Information and Evaluation Environment at the Time
Appreciation of the importance of the evaluation, and why it was influential for NLM and the information retrieval field, may be helped by providing a sense of the information environment at the time and the visibility of NLM's initiative to provide computer access to bibliographic data.
The use of computers for bibliographic retrieval systems was in its infancy, and many of the extant systems were experimental or small in nature. In their comprehensive history of online information services, Bourne and Hahn (2003) credit MEDLARS as "one of the earliest large-scale online retrieval operations," and describe an environment of tremendous increase in medical research publications and need for more efficient methods of information retrieval. In a recent historical paper on the development of the MEDLARS system, Dee (2007) characterized the environment by saying, "NLM's accomplishments regarding MEDLARS were cutting edge, placing the library at the forefront of incorporating mechanization and technologies into medical information systems" (p. 416). Dee also noted "enthusiastic public interest" in MEDLARS, citing coverage in the Wall Street Journal and other newspapers. In his comprehensive history of the NLM, Miles (1982) summed it up by saying "On the whole the system was one of the largest and most successful library automation projects. Its success marked a milestone in the evolution of modern libraries."
The first year of MEDLARS operation was characterized by NLM's Deputy Director Scott Adams (1965) as
one of intensive trial, test, experiment, evaluation, and change. Internal and external pressures alike have been brought to bear on the system ... MEDLARS has been highly conspicuous nationally and internationally, and the variety of challenge and the Library's necessarily experimental response have made for an extremely busy year. (p. 139)
There was also high interest among the scientific community in computerized access to biomedical information, as evidenced by the publication in Science of a paper on MEDLARS. Coauthored by NLM Director Martin M. Cummings, the paper reported on the first year's experience with automated access (Karel, Austin, & Cummings, 1965). The Science paper also foreshadowed Lancaster's formal evaluation and characterized somewhat the environment into which he was recruited. Describing the evaluation approach, the authors wrote:
Appreciating that there is as yet no wholly satisfactory method of objectively evaluating the effectiveness of information storage and retrieval systems, the library has relied heavily on consumer reaction and appraisal. Evaluation of critical reports indicates that the percentage of missed entries is minimal; furthermore the relevance of retrieved citations as determined by the individual requester's evaluation of demand bibliographies, appears to be satisfactory. New and more precise measurements of relevance are under study. (p. 769)
Why Lancaster?
It was in this environment that the NLM director received a visit from Cyril Cleverdon, librarian of the College of Aeronautics in Cranfield, England. Cleverdon was well known for his research on evaluating the efficiency and effectiveness of information systems by determining their recall and precision ratios. He explained his ideas to Cummings and recommended Lancaster for the job (Miles, 1982). Saul Herner, another information science pioneer, concurred in the recommendation.
Cleverdon's experience with Lancaster came from their work together in England on projects using the Cranfield collection and evaluation techniques. Lancaster served as senior research assistant on the Cranfield Project from 1962-63 and published a summary of the Cranfield research in American Documentation (Lancaster & Mills, 1964). He also drew on significant prior practical experience in librarianship, classification, and indexing in conducting his evaluation research.
At the time of his recommendation to NLM, Lancaster was head of the Systems Evaluation Group at Herner & Company in Washington, DC, working on a project for the Technical Library at the U.S. Navy Bureau of Ships (Lancaster, 1964) and utilizing procedures similar to those used in the Cranfield studies and later used in the MEDLARS evaluation. The approach was described as follows: The purpose was
to evaluate and maximize the effectiveness of a computerized information retrieval system based on a specialized thesaurus used in conjunction with the Engineers Joint Council (EJC) system of role indicators and links.... The evaluation method used was that developed by Cleverdon in the ASLIB Cranfield Project.... Retrieval effectiveness was expressed in terms of relevance and recall ratios.... Reasons for search failures were analyzed in terms of indexing faults, searching faults, and system faults. (Herner, Lancaster & Johanningsmeier, 1965, p.92)
The detailed failure analysis was important as "a basis for remedy and correction" (p. 95), also a key characteristic and important contribution of the MEDLARS evaluation. Lancaster's approach and attitude toward evaluation was also conveyed in the Bureau of Ships paper:
Relevance and recall ratios cannot be construed as figures of merit; they do not tell us whether we have a good or bad system in any absolute sense. What they do tell us is what kind of system we have, and it is for us to decide whether what we have meets our needs.... No evaluation technique can tell us what we want or need. These we have to decide for ourselves. (p. 95)
This early articulation of Lancaster's evaluation viewpoint is revealing of the perspective he brought to bear not only on the MEDLARS evaluation, but throughout his career in other evaluation projects and in his influential books on the subject. Evaluations provide information for making decisions within a particular context and for measuring the effects of system or operational changes.
So upon the recommendation of Cleverdon and Herner, NLM Director Cummings engaged Lancaster in December 1965 to evaluate MEDLARS. He appointed a committee of knowledgeable computer specialists, including Cyril Cleverdon and Calvin Mooers, to review the test procedures and results.
Evaluation Description
Planning of the evaluation began in December 1965, when Lancaster joined the NLM staff as Information Systems Evaluator. As a newcomer previously uninvolved in the design or operation of MEDLARS, he was able to approach the job with a spirit of impartial analysis that was maintained throughout (Lancaster, 1968a).
The one-year evaluation was launched in August 1966 and ran through July 1967. The Demand Search component of MEDLARS had been in place for nearly two years. The evaluation results were published in a 1968 report to the National Library of Medicine (Lancaster, 1968a), followed by two journal articles, one in American Documentation for the library and information science audience (Lancaster, 1969a), and the other in JAMA for the scientific and health professional user community (Lancaster, 1969b). The following description of the evaluation and its results is based primarily on these three published accounts authored by Lancaster.
The main objectives were to study the requirements of MEDLARS users, determine the effectiveness and efficiency of MEDLARS in meeting their requirements, identify factors adversely affecting performance, and suggest ways to make improvements. The evaluation was designed to provide information on MEDLARS performance relative to user requirements around several key factors of a retrieval system: coverage of the literature, recall power, precision power, response time, format of the results, and the user effort needed to achieve a satisfactory search result. The team "wanted to identify the principal causes of search failures, thus allowing corrective action to be taken to upgrade system performance" (Lancaster, 1969a, p. 120).
Lancaster summarized the evaluation as follows in the October 1966 issue of NLM's newsletter:
In an effort to refine and improve MEDLARS services to the biomedical community, the Library has initiated a new project designed to provide data on the usefulness of demand bibliographies. This project is believed to represent the first extensive study of a large-scale operating information system. The evaluation is based on two measurements: "recall," or the proportion of useful citations in MEDLARS actually retrieved; and "precision," the ability to withhold citations to non-relevant documents. To measure "recall," it is necessary to compile a list of relevant documents by some means other than MEDLARS. This is done, first, by having the recipient of a demand search provide a list of citations already known to him; and, second, by conducting a manual search of the literature, using reference tools, such as Science Citation Index, not generated by NLM. The recipient assesses the citations identified by the manual search....
|