This post has been written by Dr Jacqueline Thomson, Research Associate at the School of Psychological Science, University of Bristol.

Introduction

From our previous blog post (Prediction markets: a new tool to help assess research quality), we’ve received some questions and comments that we thought it would be useful to collate, together with our response.

We’ve therefore put together this FAQ-style document giving more details about the prediction markets in general, and our project in particular.

The original blog post was meant to be a high-level introduction to the idea, so we didn’t include too much detail— we are working on an academic paper with the attendant data, which we plan to share openly in full.

FAQs

Are you proposing that prediction markets should replace REF panels?

No! We are proposing that institutions could try using prediction markets as part of their REF-planning to help them decide which papers to submit to the REF, by predicting what REF scores their research outputs might earn in the eventual REF. When we say the prediction markets we are piloting try to assess quality, they define “quality” strictly in the terms that the REF does. Whatever you think of the REF, prediction markets are completely agnostic to the philosophy behind assessment criteria, and merely try to predict REF outcomes.

How does a prediction market work, from a participant’s point of view?

Prediction markets work like the stock market in that they reflect the overall market’s confidence in a particular outcome (or company). That aggregate measure of confidence is built up from individual trades that participants make. Each time a trader buys or sells shares in an outcome (or in a company, in the stock market analogy) it shifts the price a bit in the direction of that trade. Buying positive shares shows increased confidence, and buying negative shares (equivalent to selling positive shares) shows decreased confidence. Traders are motivated by the individual gains they can make from buying or selling bets that they think others have overlooked, and when aggregated all together, that works to show overall confidence levels.

In the original blog post we kept details to a minimum, simply to be concise. You can consult the instructions on our website for the explanation we gave participants, and explanations here or here for a more detailed understanding of the principles from the economics literature behind the market mechanism. Essentially, participants see a list of papers and are asked a question such as “Will this paper receive a 3* or above rating in the REF?” They are allocated points that they can use to bet “yes” or “no” on the papers, in any combination they want. They can bet all their points on one paper, or spread them out across many. Importantly, participants only receive a payout for papers they bet correctly on — for instance, if they bet “yes” for a paper that ended up getting 3* or above, or if they bet “no” on a paper that ended up getting a 2* or below rating. (Since we don’t have access to the actual REF ratings, we use mock REF ratings, usually from within the institution itself.) Each time a “yes” bet is made the price of a bet increases, and each time a “no” bet is made the price of a bet decreases. So, the “market price” for a paper reflects the aggregated confidence of all the traders with respect to whether that paper will do well in the REF. Market prices are freely open information to anyone taking part in the markets, although the identity of individual traders is completely anonymous.

The markets are designed to reflect the actual likelihood of events — so, in simplistic terms, unlikely events would pay out more if they do occur, but since they occur less often, in the grand scheme of things the overall payout is the same as betting on more likely events. This strategy reflects more on whether an individual trader is risk-seeking or risk-avoiding, rather than on the outcomes of the market more generally.

Why do you offer prize money? Doesn’t that trivialize the goal of assessing quality?

In terms to offering prize money, our markets operate exactly the same as previous markets in psychology and other academic disciplines that assessed whether traders thought papers were replicable. We offer prize money as it tends to help the markets to be more accurate – it incentivises participants to get the predictions right, rather than perhaps betting on what papers they would like to do well in the REF.

Do prediction markets rely on metrics? For instance, do they just reflect journal impact factors?

Metrics can be helpful in assessing research, but it is important to realise that they certainly do not tell the whole story. We have attempted to take a moderate approach toward metrics in our study – participants have access to whatever information the eventual REF panel would see (in some cases, this includes some metrics, such as citation count, but never journal Impact Factor). As part of our prediction market project we run a follow-up survey asking participants what aspects of papers influenced their betting. A notable proportion mentioned metrics like citations or concepts closely related to Impact Factor like prestige of journals – which is something for us to be aware of and look out for when judging whether this approach could work more broadly beyond REF predictions.

At the same time, many of the heuristics that participants are using may be reasonably accurate but not captured by any metrics (e.g., a “feel” about what constitutes an “important” research topic), so we think it probable that a crowd of humans might be better at this than metrics.

Will prediction markets exacerbate or improve existing bias in the REF process?

The aim of the REF prediction markets is simply to see whether it is possible to develop a more efficient or accurate process to predict REF outcomes, rather than to change or “fix” the underlying process of research assessment (although we hope it will also give us insights into how academics assess research, as a side benefit).

The prediction markets merely attempt to simulate (to some degree) the actual REF panels, so if they operate perfectly then unfortunately they would preserve whatever bias already exists in the REF process. Ultimately, traders can only do well in the market by correctly predicting outcomes, so if they operate under the same human biases as the REF panel, this may improve prediction. It is possible that prediction markets introduce new biases or attenuate existing biases of the REF, but if so this is not by design. Traders in prediction markets may even be using very similar decision criteria to what REF coordinators or internal panels are doing when they try to choose which papers to submit for the REF.

Do prediction markets bring any advantages over other methods of choosing papers for the REF, such as close reading?

Prediction markets in many fields (e.g., politics, sports, entertainment) have proved to be more accurate than polls or single expert predictions. We think there are several potential benefits of prediction markets:

Diversity of perspective: more people involved in the decision-making process means input from more perspectives. Prediction markets are meant to decrease error in some ways by spreading it out — with a small panel of a limited number of readers, their particular perspectives will already affect their ratings. With more people (for instance, in a prediction market), there are more chances for someone in the pool to correct for any perceived biases or incorrect predictions. Of course, if everyone has the same biases, this won’t help, but the chances are higher that there will be someone who sees it. Markets also have the advantage that they include people who might normally be outside the decision-making process for REF submission— e.g., early career researchers.

Dynamic decision-making: participants automatically weight their confidence in their judgements by the size of their bets. Furthermore, participants can log in many times and bet based on updated information about the aggregate crowd judgements, which may make them more accurate. For instance, if a trader believes a particular paper is low quality, but they see the group overall has rated it highly, they may wonder whether there is something they missed. Or, if they believe they know something the group doesn’t, they are incentivised to bet against the paper, thereby lowering the aggregate score – in that way, one person can correct for the misperceptions of many.

Incentivisation: participants are incentivised by the market to get the answer right (to accurately guess what the REF panel will rate the paper), rather than to give a rating based on what they think the outcome normatively should be. This is a fine-grained but important distinction; the ultimate goal of choosing papers for the REF is to predict how the REF panel will rate them, but it is easy to instead simply assign ratings based on one’s own opinions.

Efficiency and accuracy: we hope that the markets can actually be a more efficient, and perhaps more accurate, way to assess future REF ratings of papers as part of an institution’s internal process. The correlations with the mock REF panels show that the markets are capturing at least some of the same signal, but we expect that where the markets deviate from the mock REF panels may (at least sometimes) reflect knowledge (e.g., expertise in a subject) that was lacking on the panel. Our project will examine whether the prediction market approach is actually more efficient in terms of staff time and effort. However, it will be more difficult to definitively assess whether prediction markets are more accurate than other approaches.

What insights will this give us about how academics assess research?

Already we have found that the group consensus does not always match scores of the expert close readers. Unfortunately, since there is no actual “ground truth” of the true rating of a paper that we can access, it is impossible to know which measurement (the prediction markets or close readers) is more “correct.” We hope this means that prediction markets offer may be able to offer an insight that close reading does not.

We are also looking in more depth at what aspects influenced participants bets, using open-ended survey questions. Participants have told us that they partly use heuristics such as journal prestige or citation counts, but also mention less quantifiable aspects such as a sense of whether the research is important. If blog readers have any ideas or insights, we would be very happy to hear more ideas about ways we might study this!

How to get involved

We are currently still piloting prediction markets at various institutions for REF 2021, until the summer of 2020. If you would like to try it out at your institution, try participating in a market, or just see more about how it works, please get in touch.

Any comments or suggestions for us?

If you have any comments, suggestions, or responses to the FAQs above, please post them on the blog below. We would be happy to hear from you!