The ongoing review and consultation by the Higher Education Funding Council for England on of the use of metrics in the REF process has reinvigorated a wider debate over research metrics more generally. A thorough critique authored by Meera Sabaratnam and Paul Kirby has set the academic blogosphere alight. While they raise important points, however, I would offer a modest defence of the use of metrics in research evaluation.
Sabaratnam and Kirby put forward three main arguments:
- Metrics do not measure what they purport to and artificially conflate research ‘quality’ with ‘impact’;
- Metrics systematically bias against certain groups, notably women and ethnic minorities; and,
- Metrics are easily ‘gamed’.
These are important points, but I would suggest that they constitute largely a critique of academia as a whole, not of metrics per se. We can all, I’m sure, name controversial and even bad scholars who gained career success and influence on the basis of the controversial rather than high-quality nature of their research, long before research metrics were on the scene. Sabaratnam and Kirby point to Huntingdon’s notorious Clash of Civilizations as an example of a much-referenced, often ridiculed work, but this conflation of quality with impact long precedes the use of metrics. The gender bias in academic appointments, promotions and salary is well-established. And ‘playing the game’ in academia – getting on the right committees to get promotion, building patron-client networks with former PhD students, and just plain sucking up – is, again, a not uncommon behaviour.
It may be that metrics will exacerbate these problems. But, conversely, I would suggest that research metrics (including citation metrics) can be useful in exposing such problems and forcing us as a community to confront them. In this sense, there is an irony in the fact that Sabaratnam and Keely use a quantitative study based on citation metrics to substantiate their claim that women will be disadvantaged by metrics: without these metrics, that very point would be harder to make, but the bias no less real. Likewise, the ‘Matthew Effect’ that sees more productive scientists (in terms of quantity) receive proportionately more credit for their discoveries than less productive scientists was first theorized by Robert Merton in 1968, but it was only in the past twenty years or so that data on citations were robust enough to demonstrate it empirically. Larivière and Gingras have a useful discussion of this literature as well as their own demonstration of the impact of journal ranking on subsequent citations.
This is not to say that our existing metrics are without problem, but just as their use exposes problems and challenges within academia, so this in turn can help us modify and improve indices to provide better metrics of high-quality research.
Beyond Sabaratnam and Kirby’s critique, other criticisms of metrics are also well known. For instance, journal metrics may be more problematic for some disciplines within Humanities and Social Science than it is for other disciplines, largely because of the lack of consensus over what constitutes an appropriate qualitative rather than quantitative ranking of journal quality. The social sciences are characterized by a much wider degree of methodological and epistemological pluralism than the STEMM subjects, and journal rankings based on citation counts tend to privilege dominant approaches.
To give one concrete example, in political science, journal impact factors tend to produce the highest ranks for quantitative journal because political science in the US is overwhelmingly quantitative. Even the best qualitative research in the world may thus appear less high quality than perhaps mediocre quantitative research. The risk of naïve metrics is that both individually and institutional (in terms of appointment), this creates incentives for conforming ever closer to mainstream approaches rather than engaging with alternative or critical methods and epistemologies. But this, I would suggest, is an argument for improving our metrics rather than abandoning them.
As above, my point is that the trend toward methodological and theoretical uniformity in some disciplines is a long-existing phenomenon, not one created by metrics. Lee Smolin’s popular account of string theory in The Trouble with Physics contains an insightful sociological analysis of how this (in his view misguided) theory came to dominate theoretical physics in US research institutions long before the rise of metrics; the hegemony of rational choice approaches in economics is even older. While it may be too late for economics to recover its methodological pluralism, disaggregated metrics such as those available in SciVal may, for instance, help both expose the extent of the drift towards similar methodological uniformity in political science and enable qualitative researchers and institutions to demonstrate their impact compared with other qualitative researchers.
So how, then, would I propose research metrics be used?
We all use metrics all the time in our professional and personal lives. The following story is probably not unfamiliar to many. I have just accepted a position at the University of Western Australia. When I was first approached about the job, I did not know the institution at all, but a quick check of the QS Rankings convinced me to look at them seriously. At around the same time, I was approached by another institution; their ranking – somewhere in the doldrums where QS doesn’t even give a specific rank – prompted a quick and polite rebuttal. In my letter of application to UWA, I made use of my Google h-index and was able to compare it favourably with the LSE study of h-indices across the social sciences. After being offered the position, one major concern I had was about schooling for my son. Again, a quick consultation of The Australian’s school league tables convinced me that there were plenty of good schools in which to indoctrinate him into the metrics of the future.
Of course, I didn’t solely base my assessment of UWA on their QS ranking, neither will I pick a school for my son purely on the basis of the metrics. Likewise, I doubt very much that my wanton waving of an h-index influenced the selection committee over and above my detailed academic resumé, job presentation and interview. But they were useful bits of information that facilitated (but didn’t constrain) comparison and aided decision-making.
This is the approach that I think we should take towards metrics in the REF and in academia more generally. Metrics allow us display our strengths, but as part of an overall narrative of research excellence, not as a single indicator of ‘impact’. Metrics can and should be a useful part of the REF process, but they should be seen as a range of evidence bases that can be used to substantiate our overall case for research excellence, rather than a prescribed evaluative framework. As institutions and as individuals, we should be able to choose which metrics to use and emphasize as part of an overall narrative about research excellence.
‘My’ version of the REF does include publicly available databases of research metrics, but just as The Australian warns that its schools data is a ‘guide only and not intended for ranking schools’, so I think that research metrics should be publicly available but, in the REF and elsewhere, used as a source of data to aid in promoting research excellence, not to narrowly and programmatically constrain it.
While there are certainly potential problems and perverse incentives in the application of metrics, the debate has focused too much on these and not on the ways in which metrics can help expose existing problems and perverse incentives in the academy. If they are used carefully and reflexively, in tandem with a qualitative account of research excellence, I see no reason why they could not play a more positive role that outweighs their drawbacks.
NOTE: This is a reposting of a post that originally appeared on the University of Bath Social and Policy Sciences blog