Research Data Spring – let your ideas bloom! - Research infrastructure and data

For some of us, and particularly developers, when you hear spring and data together, you are most likely to think about the spring team and its data project and I am sure there are other ideas it conjures up. The name Research Data Spring is used here to mean new starts, shoots and growth.

We are looking to build on ideas and through Research Data Spring to find fresh perspectives. If you like, an analogy might be if you start with an acorn of an idea. Through Research Data Spring we hope to help grow your ideas into oaks!

We want to give you the opportunity to collaborate and find the kind of resources that you’re missing in your project, from skills to money. The project is built on an iterative model, which means you will have plenty opportunity to receive feedback as well. We are hoping to run a workshop around the IDCC conference for example, so you can receive some input from others around the world.

If your idea is successful and receives funding, you will have 3 or so months to start off, and then you will be back to develop the next steps. If you are successful, you will get funded for 4 more months. In a third round we will offer funding for 6 months. You will be back to our sandpit style workshops that should help you not only develop, but also test your ideas on the market immediately. Yes, it is a smaller sample, but the variety of participants should provide a good first input to your project. We’re open to other ways to bring in new input but I think we’ll have to see how Research Data Spring develops in its first stage.

…click to enlarge timeline

We have devised five priority areas, they are not necessarily mutually exclusive but identify some areas that have come to fore as needing new solutions. You will notice that the shared services area can be relevant to any of the first four priority areas. We would like you to always reflect on the services you would need for depositing and reusing data in the most simple and interoperable way. What follows is a detailing of the 5 priority areas, to give you a better idea where we are coming from.

Research data deposit and sharing protocols & tools

Within the preservation, access, use and re-use of data, the depositing and sharing activities of research data play a critical role. Streamlining this process is a matter of importance for better and more effective research data sharing and management. To this end, Jisc had supported the development of the SWORD protocol with UKOLN and Cottage Labs. Content can be pushed from desktop applications to repositories via tools and widgets as was demonstrated via the DepositMO project. Other such examples include the DURA project, which synchronized Mendeley with institutional repositories and Symplectic, (using SWORD to transfer from a CRIS to a repository). The SWORD protocol is content agnostic and the SWORD 4 Data project started to outline the possible workflows between systems. Note that development may still be required for client environments and server environment if these do not already exist. There are still challenges in this area, de Castro, Lewis and Jones mention the transfer of arbitrary large files and asynchronous deposits, as research data often comes in multiple file scenarios, as well as supporting bulk transfers and streaming content among others.

Research Data Spring seeks ideas that improve SWORD and its operation, as well as that of Zendto, BitTorrent or other web application and protocols that are widely used for file sharing, deposit and indeed notification within the research lifecycle. These should be based on research and research data curation use cases and requirements that are likely to include solutions leading to an advance of standards and protocols. However we anticipate that new ideas and tools that improve deposit more generally will be forthcoming in collaboration in RD Spring.

Please note, that even though the description here focuses on SWORD, as there has been a lot of prior work looking at its potential for data deposit, projects that focus on other protocols or ways to deposit and share data ARE within scope.

Data creation, deposit and re-use by discipline

In research policies around the world research data is recognized as fast becoming the currency of research and it is no longer simply something for researchers to keep to themselves or make available just to a small group of peers (albeit there are important caveats here which require restrictions e.g. sensitive data & research data is made accessible at varying degrees at different parts of the research lifecycle). In order to fulfil the research data mandates from funders and improve research transparency and innovation, there is a need to make data more easily accessible and manageable. Research Spring seeks ideas for experiments and proto-types that address the researchers experience and the research data workflow to improve the creation, management, curation and re-use of data. It might be, for example, that a prominent and transferable tool that researchers use should be enhanced with protocols that support deposit and data sharing.

Disciplines have different cultures and use different tools and different data types, and we hope that through Research Data Spring the requirements of different disciplines can be addressed. We welcome ideas where a solution might transfer from one discipline to another or to multiple disciplines. There will be many use cases and pain points that could be addressed. Some examples just to give a flavour are set out below, these are drawn from earlier investigations:

Life sciences: researchers would benefit from leaner integration of lab notebook material, more efficient searching and discoverability of the experimental narrative.

Architecture: adding the contextual knowledge to research data from repositories would allow for easy reuse and application of the design project.

Engineers rely on social networks and data sharing, hence standards and protocols for this type of communication and indeed making data available for comments from others might favour more collaboration across universities.

Social studies: would benefit from standards for curation and organization of video data, as they use it increasingly to study interactions.

Economics: removing the tedium out of posting the paper and data before and after it has been accepted for publishing in peer-reviewed journals.

Humanities: developing standards for dissemination via social media as well as improving credibility of open online publications can contribute to the discoverability of research data.

In particular, in Research Data Spring we are interested in ideas that make it easier to manage research data, especially from the researchers’ perspective (in addition to protocols mentioned within the first theme); in this context we are also interested in re-use of data. In other words, we are seeking ideas that will smooth the processes of data management, deposit and re-use within the research lifecycle. This area is closely related with “data creation, deposit and re-use”, but we have split the two in order to emphasize that some ideas might be focusing on generic data management support and related protocols and solutions for deposit and re-use, while others would address key disciplinary and cross-disciplinary research aspects.

Research data systems integration and interoperability

Driven by research impact, reporting and collaboration, this aspect of Research Data Spring aims to encourage ideas where systems that are used across the research information management and research data management processes are enhanced. The aim would be for tools and solutions to be developed and proto-typed that ease connections and interoperability to support research data management. Relevant systems can include repositories for active data management, current research information management systems and archival and preservation systems, systems that are interoperable with publishing platforms may also be in scope. There are standards and identifiers that come to the fore here and should be considered, such as CERIF, ORCID, DOIs, DataCite, OAIS etc. Universities and disciplinary groups have different systems choices and where there are systems that don’t easily support interoperability or where there are tools that can enhance the processes for research data between them we are keen for ideas to find solutions. Through Research Data Spring we aim to bring some good solutions to the forefront that can make a difference to as many universities and research stakeholders as possible.

Research data analytics

Research data analytics is still an inchoate area within the research data lifecycle. Its applications can be twofold – firstly analysing the impact of research data whereby metrics and altmetrics could be developed. The Knowledge Exchange report The Value of Research Data: Metrics for datasets from a cultural and technical point of view emphasizes the virtuous circle relationship between data metrics and data sharing, both being key areas that we are addressing here. The report identifies that developing metrics for research data impact is still in its very early stages and calls for a reward system that would include data metrics. Secondly, we are aiming to address the issues relating to the tools for analysing large data sets. Whilst there are large scale initiatives like Digging into Data that explore tools for big data analysis, within Research Data Spring we are interested in tools that can be used more widely and deployed across disciplines.

Within the Research Data Spring we are interested in finding new uses and benefits of big data analytics for research, the research lifecycle and also approaches related to the impact and use of research data.

Shared research data services

This aspect of the Research Data Spring is not standalone, but, rather, it should be considered as an objective that can be achieved in ideas proposed under one or more of the first four strands. We are encouraging ideas that have robust use cases and that can be proto-typed to be tested for feasibility. These could range from locally shared services to ideas for nationally shared services. There are significant national and international shared services, such as the UK Research Data Discovery project, the Dryad repository, Sherpa/Juliet, the Registry of Research Data Repositories (re3data.org), Data Bib and others. We would hope to see projects seek to work with these where possible.

Before submitting, we want you to stop for a moment and think: if you had no limits in terms of money, skills and time, what would you do to make the deposit and reuse of data for you as seamless as possible. Think big! Then think what might be feasible.