Paul Stokes summarises what have we learnt from some recent workshops…

The cost of Research Data Management (RDM) has been under our spotlight recently as part of the Research at Risk initiative. Most of us take it as read that making research data available for verification and reuse is a “good thing”. However, it is very difficult to quantify what a “good thing” is worth in monetary terms, especially when putting together a business case for a sustainable RDM service. This is unsurprising given the intangible nature of such a concept. It turns out, however, that costing the more tangible components of such a service is also quite difficult.

The pain

We asked a panel of stakeholders what it was they found difficult about costing RDM and what particular problems they had encountered, their pain points. These are some of the things they came up with.

Grant application process—Some researchers are unaware that they can put in a budget line for RDM when applying for funding. Even where they are aware they don’t know what is eligible and how much would be appropriate. Others leave out RDM altogether in the mistaken belief that smaller budgets make a project more likely to be funded.
Double dipping[1]—The problem here arises from identifying who paid for what. Many finance mechanisms don’t have the ability to provide such reports. As a result RDM is left out of budgets “just in case” it’s already been paid for elsewhere. This cost assignment problem has also been cited as the reason why some recharging mechanisms are thought to be unworkable.
Cost recovery—There appears to be no widely accepted cost recovery mechanism, a problem exacerbated by the monitoring problems mentioned above.
Post project funding—Data has to be kept for 10 years. Paying for things after a project has ended is a no-no. But the cost of preservation tends to continue post project. So we need to have ‘pay once, keep for ever’ solutions. They do exist (after a fashion), but they have their own set of problems. Do you trust the provider? What if they go bust? Will they keep the data usable? What happens if the data is lost? And so on.
The cost of generating/using/keeping the data—At the moment many financial systems and data management systems aren’t really set-up to answer the questions of how much it’s cost in the past to handle data, how much it costs now and how much will it cost in the future. Some parts of this jigsaw have been addressed, but we’re not there yet.
The value of the data—It’s difficult to put a value on data. But if we’re going to put forward a business case for RDM we need to.
Data and Disciplinary differences—Different types of data can have significantly different costs associated with them. For example, sensitive data is much more expensive to handle and keep; multiple small, heterogeneous, chaotic files are more difficult to handle than a few large, homogenous files. Charging mechanisms based upon cost for storage don’t take this into account.

To help better understand some of these issues and come up with some practical solutions we are undertaking some further work on the economics of research data. We have commissioned Cambridge Econometrics to investigate how best to account for the costs and benefits of research data and to come up with a framework and models to enable practitioners to understand their own value streams.

The promise

What about the benefits? If you’re planning an investment then there must be some value to justify the expenditure. What is the promise of RDM? This appears to be a simple question, but looking at it more closely it’s not as simple as you might imagine. Firstly there’s the question of just who benefits. Researchers? Institutions? Society? All of the above? Other questions include what exactly is the benefit and how important is it?

We asked the audience in the “Sustainable and efficient solutions for shared research data management” workshop at Digifest” for their thoughts.

High scoring answers included:

Institutional Cost Saving—benefiting institutions
More Interdisciplinary Research Produced—benefiting researchers
Increased Research Impact—benefiting researchers
Reduced Risk of Institutional Damage—benefiting institutions
Increased Institutional Reputation —benefiting institutions

The top benefits were:

Greater Levels of Research Collaboration—benefiting researchers
Keep Research Data Safe and Avoid Loss—benefiting everyone

And top of the entire list was:

Easier Compliance with Funder Mandates—benefiting researchers and institutions

How do you put a value on these ‘Indirect Economic Determinants’[2]? It can be done. There are ways to assign worth to such things and we’re working on applying those methods to RDM. Research Consulting have begun investigating these issues for us and how they might feed into a high level business case for RDM as outlined in blog entry posted here previously: https://researchdata.jiscinvolve.org/wp/2016/02/15/making-case-research-data-management/).

What do you think?

Have we got it right? Do you recognise these problems? Are they ones that affect you? More importantly, have we missed something out?

Also, have we understood and prioritised the benefits correctly?

Let us know by commenting below or by sending an email to paul.stokes@jisc.ac.uk

[1] Claiming the cost of an asset/service from more than one funding source.

[2] http://4cproject.eu/d4-1-ied