Report for the International Digital Curation Conference (IDCC) Research at Risk workshop, 11 February, 2015

Report of workshop in February 2015 by David Kernohan

The 10^th anniversary of the International Digital Curation Conference was an opportunity to reflect on a decade of sustained activity that have brought research data management from an interesting fringe concern to the centre of institutional and national conversation. As such, it was the perfect place to offer a first glimpse of our “Research at Risk” plans, and to offer a first look at “Directions in Research Data Management”, subsequently launched by Jisc working in collaboration with ARMA, SCONUL, RLUK, RUGIT and UCISA which maps out the next few years of activity.

More than 40 delegates from around the world signed up for our session, which acted both as an introduction to Research at Risk and “Directions in Research Data Management” and a chance for them to highlight their own key issues. There was very high level of international expertise in the room.

We started the day with a presentation from Rachel Bruce on the emerging plans for “Research at Risk”, which encompasses a lot of new work (and development of existing work). As a co-design project, “Research at Risk” plans have been informed by discussions and consultation with key stakeholder organisations. Rachel’s post from last December offers more of the background to this, and her slides highlight progress and describe planned work in more detail.

I loved seeing tweets like this appear on the #idcc15 timeline

https://twitte r .com/jrcarlso/status/56 5515716556636160

After Rachel’s presentation I had chance to briefly introduce “Directions in Research Data Management”, a sector-owned roadmap for the next few years in RDM. It has been developed through a number of consultative events (including one in Cambridge that I’ve written about on this blog.

This – however – was the first time we’d presented this work in public, and it was really useful to see that the issues and ideas raised resonated with delegates. It will be a very useful tool for taking the arguments for RDM to funders and senior managers, and to emphasise a common understanding of the work that needs to be done. We launched at DigiFest15 and my slides [http://www.slideshare.net/JISC/directions-in-research-data-management-jisc-digital-festival-2015] represented a sneak preview of the draft text

After presentations from Rachel and myself, delegates split into four groups to discuss:

Infrastructure, storage and preservation
Metadata, discovery and citation
Institutional and Funder policy
Incentives for researchers and others.

These were fast moving and vibrant discussions, with a range of perspectives both on our work and RDM more generally. We asked a member of each group to briefly summarise three (or so) key issues that arose.

Infrastructure, discovery and citation

The group emphasised the need to think “beyond archival back-ups”. There were worries about understanding what researchers are doing (or want to do) with active data – different institutions have different models of offering researchers storage of differing amounts. There was an appetite to share ideas on this, and maybe share services (for example Jisc’s new data centre in Slough http://www.jisc.ac.uk/shared-data-centre)

For some disciplines (for example in Genomics) it would be useful to archive data the day it is generated. These archives should not change, and should be preserved alongside software and metadata for full reproducibility

Regarding analysis and preservation tools – for large datasets we need to take the tools to the data rather than vice-versa. Ongoing preservation is one of the tools required.

It was noted that “large volumes of data can suddenly be generated by new projects and equipment” – however sensible planning and sign-off processes should mean that “suddenly” doesn’t happen. There is a clear need for institutions to understand potential costs at an early stage and understanding funder regulations so accurate costs can be presented as line items in bids – and accurate planning is essential in achieving this.

Incentives for researchers and others

The group emphasised that there is a need for the benefit of committing to good RDM practice to be direct and immediate to the person concerned (one example given was an allowance of dedicated time to carry out RDM within a project plan) Funders should clarify that they do not regard this as a waste of activity funding.

There are already some awards for data excellence (for instance BioMedCentral), but it was felt that there needed to be more.

The research management role should be properly recognised – there are very different ideas on what is involved in research management role. The NRC has specified common role expectations around curation work and Jisc have published a report that covers some of these issues (Lyon: Dealing with Data: Roles, Rights, Responsibilities and Relationships Consultancy Report)

Collecting case studies was a popular idea – it was felt we need the positive equivalent of data “war stories” – a way of capturing and sharing real experiences of the benefits of sharing research data in order to encourage others.

There were conversations about the need to shift thinking from simple usage statistics to useful impact metrics. One supplementary question regarded identifying whether research is more valued if it finds its way into different disciplines. This is a critical issue, but not one that is yet well understood.

Metadata, discovery and citation

This group were lucky enough to have members from Australia with direct experience of the Australian National Data Service (ANDS) and Research Data Australia. It was impressed on us that this is a (necessarily) basic cross disciplinary approach, which was useful in that it does work for everything, although specificity would be required to be useful in deeper disciplinary contexts. It was reported as being good for discovery, not so much for reuse.

In the UK taking institutional use of the metadata implicit within repository software like Pure and ePrints as a starting point for developing schema was suggested. The way metadata are documented should be streamlined, in order to avoid duplication of effort. A clear need to link metadata across different services and systems in an institution was identified. There was lots of reported join up between Pure and ePrints users, and N8 institutional members, approaches: this was seen as a good starting point in beginning to coalesce on practices.

ORCID for researcher identifiers is still perceived as being quite a hard sell at institutional level. Critical mass needed for traction, but it was only through traction with institutions that such mass would be achieved. A lack of understanding concerning what ORCID is for at an institutional level was seen as a key stumbling block.

From the perspective of a data reuse study (the DIPIR project), there is a need to be able to trust and interpret data and so contextual information to support this is an essential component to using shared data.

On the idea of a national research data registry, it was noted that aggregators show less metadata than the original source, so there is a need to link back to the source to assign credit and allow access to further information, and to allow trackback.

The required metadata around licensing, rights, embargo periods was described as a tricky area, and would benefit from clarification.

Institutional and Funder Policies

The group were very keen on harmonisation of funder RDM requirements, and suggested one possible start should be with harmonising approaches to funding RDM. (Data on the DCC website shows wildly different approaches). For example with EPSRC funding, it was claimed, one can ask for funding for RDM as a direct cost or via FEC (may be worth looking at this post from RCUK). Other funders require one or the other, and harmonising over this would show benefits for instructional planning. Within this discussion, the words of IDCC keynote speaker Melissa Terras were noted: “You cannot fund infrastructure with grant funding”.

On persuading institutional leaders to develop policy, it was reported that currently most are using the institutional risk of not being able to access funding as a persuasion mechanism. There were concerns that this was not an ideal method and may not be supported by funders. (The mandate to register outcomes on ResearchFish was given as an example where institutions were not perceived to have been penalised for failing to comply with the mandate)

There is a high cost to investment in RDM storage and management which may not be reflected by the value of potential reuse. High costs of storage (£200 TB/annum was cited) were contrasted with the ease of reproducing data in some disciplines.

Regarding the Jisc focus on the pressing EPSRC mandate, some suggested that institutions likely to be in receipt of substantial EPSRC income would already be meeting requirements – although others in the room disagreed, and the issue of institutions who do not currently apply for EPSRC funding but may wish to in future was noted. Since the workshop, Jisc and the DCC have published guidance on EPSRC mandate.