This paper is a personal take on the history of evaluation experiments in information retrieval. It describes some of the early experiments that were formative in our understanding, and goes on to discuss the current dominance of TREC (the Text REtrieval Conference) and to assess its impact.
The foundation of the Institute of Information Scientists in the UK in 1958 coincides closely with the beginning of the notion of experimental evaluation of information retrieval systems. Although there had been some earlier attempts, we usually mark the start of the tradition as the Cranfield experiments, which ran from 1958 to 1966. Information retrieval is commonly regarded as a core component of information science, and systematic empirical evaluation of IR systems probably represents the strongest claim that information science can make to being a science in any traditional sense. There is a nice irony here: the founder of the empirical tradition in IR, the Cranfield librarian Cyril Cleverdon, was not at all a supporter of the Institute. But more of this anon.
As for the present, and despite the concerns of the founders of the Institute, academic information science is now quite closely associated with the former library schools, many of which have adopted titles which include the word ‘information’. However, a lot of current work in IR, theoretical and experimental, takes place elsewhere, mainly in computer science departments, though several other academic domains are represented. It probably comes as a considerable surprise to a current PhD student, working on (say) a machine learning optimization technique applied to search engine ranking, that he or she is in thrall to an experimental tradition founded by a librarian, working with card indexes, a half-century ago.
Thus the history that is the subject of this paper is not too readily defined in terms of institutional or academic boundaries – or national ones. Despite this, it can be seen as a remarkably coherent development of a set of principles and methods. Like all academic subjects it generates argument and disagreement and heated disputation, but there remains a relatively stable common core, which has, despite its limitations (I will argue), served us well over the last 50 years. Furthermore, while its present international status developed out of a US dominance for a large part of that period, the strength of the UK contribution has been remarkable.