On the second day at the Repository Fringe, Rachel Bruce, Linda Naughton and I ran a session on research data management and relevant research at risk projects.

Rachel started the session with a few updates around the research at risk project. Each of us then run a parallel break out group focusing on a particular aspect. Linda talked to her group about the journal research data policy registry, Rachel focused on shared services and I got feedback on the costing and valuing of RDM activities.

Journal research data policy registry

A more complex project, the journal research data policy registry is currently being prototyped based on the earlier work done by Nottingham CRC on JoRD. Linda talked to the group about issues in this area and the aims of the project. The slides are available and if you would like to follow this project, Linda set up a blog. She will incorporate the feedback from this session together with her reflections on the project’s Expert Advisory Group meeting that took place in July.

Shared services

Rachel walked through the RD shared services which is strawman approach to one of the challenges raised in Research at Risk – that is the need for easy to use shared services for RDM and also the RDM architecture. As this is a strawman, Rachel raised some questions to get feedback from the group:

Does the approach to offering shared services sound sensible?
Or is it something that has been dealt with already?
What are the issues and gaps you are facing?

General points around services:

There is a general need for help with services and options;
Funders are an important element; the driving force in setting up the services is to meet their mandates; Jisc is in continuous discussion with the funders; the liaison between the sector and the funders could take place once the Research Data concordat is published in August and Jisc will aim to help facilitate this interaction as part of research at risk.

Suggestions for Jisc to consider when developing services:

Jisc to facilitate the discussion with funders and translate their requirements for the sector via the shared services; liaison with funders on the preservation aspect and requirements of their policies is important, it is unclear what they are asking for and in order for universities to meet it and both parties to be able to meet what is needed, some liaison on this point would be helpful;
Jisc solutions to draw from the solutions and lessons learned in previous Jisc and HE work on RDM;
Jisc to consider that this is an evolving space and institutions may be cautious to procure services and as a result be tied in contracts too soon.

Feedback on the architecture:

The group was positive about the development of the architecture and some said they would welcome being able to input and comment further;
The group was pleased with the focus on the UI, but suggested to also take account of programmatic interfaces e.g. REST;
A few suggested to make more explicit the relationship between the architectural components and the infrastructure for Big Data;
While the architecture looks at a single workflow around RDM, it may be useful to consider some important collaboration aspects around;
Some of the services such as DMP are shown as something local when in fact a shared approach and including DMPonline might need reflected;
There are functions and services around disciplinary needs that may be useful to incorporate under various models and with support from RCUK for example; many open-ended questions were raised on this point.

The session was quite short and we understand you may wish to think about the architecture and further comment. Please feel free to read through the results from the architecture workshop in Birmingham and post any comments there or send any feedback to john.kaye@jisc.ac.uk. The architecture and other related documentation is available on this padlet.

Costing and valuing RDM

I started the session with a show of hands for who has been involved in costing RDM activities. Roughly 4 out of about 20 people that joined the session raised their hands and the conversation followed around positive and not so positive experiences for costing RDM. We discussed what it meant to value RDM and how have institutions attempted to do that. The key points raised were around three main areas:

Costing RDM activities:

There are a wide range of RDM services and a lot of uncertainty around these, the policy space and the disciplinary requirements – these are all impeding the processes around costing;
There was some consensus that in contrast with valuing research data, costing RDM activities is not as difficult;
The difficult thing around costing is estimating the time for preparation when researchers don’t even know how much data they will generate. They can provide a rough number of articles they intend to publish, but it is still hard to infer from that;
Figshare shared their experience around costing: storing the data is quite cheap and can be reasonably done; bandwidth is expensive; this means that downloads are expensive; figshare has a team of finance people that constantly model and remodel to figure out the best estimates/forecasts, also based on previous experience and previous data downloads etc.

Recovering RDM costs:

Prisoner’s dilemma situation: the role of research data managers is to encourage the researchers to cost out any RDM work within their grant application in order to recover some of these costs from the funders. The researchers, however, are worried that they are being put at a competitive disadvantage, because they think nobody else (or they don’t know if anyone else) is actually costing these in, hence the fear the grant would bounce because of unreasonable costs. As a result, often they underestimate the RDM costs, or don’t even include them. This is true for funders that allow for RDM cost recovery. However, there are funders that don’t cover RDM costs and have explicitly asked the researcher to remove the funds dedicated for data management.
University of Dundee has a very specific issue: when dealing with sensitive data, the university has taken every precaution and can set up safe havens for anyone that has been approved to reuse sensitive data; the dilemma is around cost recovery – would the depositor be asked to include the set up costs on their grant, or would the future users be charged for access?

Valuing research data:

There was consensus that estimating the value of RD is one of the most difficult things to achieve;
Some attempts have been made, for example – every $ in GenBank equals $90 of value, but unsure how calculations are made; there is also the work done in Australia around valuing data deposit in repositories; and the King’s College value impact model;
Can RDM be valued as the cost of recollecting the data? How would you deal with instances where the data cannot be collected again because the time has passed, or for data that is useless – would that nullify the value?
Valuing data is all about trying to predict how it will be used. But you don’t know how much it will be used, and no researcher will attempt to predict that. And if you cannot estimate usage you cannot tell what value the data is going to have.
Depositing a dataset could in itself boost the value of the data because it may increase the number of citations for the paper, hence even more downloads of the data;
Some institutions have attempted to value data based on historically incurred costs – for example, how data has been used so far, what is the value on the X amount of money that has been used so far. This would be useful especially if the community would share their experiences. There was a suggestion to collect and construct a database of (even if anecdotal) creative ways of costing data; similar to 4C in a way.
Including risk assessment into the value estimation is another option. University of Dundee and Glasgow have already been doing some work around risk: what does it mean for the institution’s reputation if data is not managed well or at all?

At the end of the session I asked everyone to share with the group their biggest pain point around costing and valuing data. Here are the results:

Inability to predict the volume or amount of usage of data – mentioned 4 times
Too many ways to measure and estimate costs across different disciplines and institutions and projects even – mentioned 2 times
Where to look and find existing examples – mentioned 2 times
Inability to persuade or catch-up with researchers because they don’t have time or their projects are at different stages – mentioned 2 times
Baseline level of service, FEC aspect and what to expect to pay – mentioned 2 times
Poor community engagement – mentioned 2 times
Lack of researcher buy in because of fear of overbudgeting
Difficulty in advising researchers on the costs that can be included and how long to keep the data
Better advice from funders
Risk in not complying
Recognition from senior managers
Would like to get involved much earlier in the process with research groups

We want to thank everyone for getting involved and for the range of feedback and suggestions for the three areas. Please do feel free to get in touch with us or post any further comments about shared services, the architecture, costing and valuing RDM activities and the journal research data policy registry. These projects are in an early phase and feedback and input is very valuable in helping to shape them. You can see some early outputs from the architecture and shared services work – please do get in touch about that; further ways to engage with the journal research data registry will be made available on its project blog and as the value of RDM and costing work is further scoped there will be further communications and ways to get involved in that work – we’ll certainly share that on this RDM blog.

THANK YOU!