The Research Data Alliance formed in 2012 with a vision for researchers and innovators to openly share data across technologies, disciplines, and countries to address the grand challenges of society. “Participation in RDA is open to anyone who agrees to its guiding principles of openness, consensus, balance, harmonisation, with a community driven and non-profit approach.” In its relatively short lifetime it has attracted an international cross-disciplinary membership that come together in plenary meetings, every 6 months, to discuss common issues and challenges and share developments.
At each plenary meeting members of RDA report on progress of Working Groups (WG), explore topics in Interest Groups (IG) or run Birds of a Feather sessions (BoF) to gauge interest in a topic. The latest Plenary (#RDA6) was recently held in Paris. Each one has a different theme and this time the theme was on Enterprise Engagement with a focus on Research Data for Climate Change.
The first plenary session kicked off with a presentation by Robert Jan-Smits, the EC’s Director General of Research and Innovation who fully supported the RDA as a positive step towards a larger systemic change of how science is done. The EU is investing 80 billion Euro into Horizon 2020 with the aim of transforming this into 80 billion Euro of data. Data management plans were singled out as an important new feature of the grant agreement. The Minister challenged the RDA’s membership to translate the efforts of the working groups into the concrete deliverables the funders need. In a video link, the European Commissioner for Digital Economy and Society, Gunther Oettinger, talked about the EU Science Cloud as a way to solve the long term storage, curation and preservation issue. There is 180 million Euro already invested in this initiative and a further 200 million Euro is planned for the next two years.
Axelle Lemaire, the French Minister of State for the Digital Sector, announced the launch of the world’s first data driven bill which will seek to ensure France’s socio-economic framework allows data to circulate in a free and open way both for innovation and public transparency. One of the highlights of her presentation, which was retweeted widely, was her choice of metaphor for data. She no longer uses the mechanistic and unsustainable likening of data to the oil that drives economies, instead she sees data as ‘the light of the new paradigm’.
The keynote speech by Barbara J Ryan, Director of GEO, focused on the use of brokering for the GEODAB service which enables public access to earth observations. The concept of brokering services is familiar to users of services like Uber and AirB&B; users and service providers are put in contact with each other without any transfer of ownership of the assets to or from the brokering company. GEODAB has access to 190 million resources without any ownership. Dr Ryan also gave a concrete example of how making data open and free generates revenue for public agencies. After Landsat Scenes were made openly available the number of downloads have soared producing an estimated economic benefit to the USA of $1.7 billion and a global total of $2.1billion. The message to public agencies was emphatic; selling data doesn’t work. The theme of brokering is currently being explored by an RDA working group.
Birds of a Feather Journal Research Data Policies
This session was organised by Linda Naughton and was well attended despite a number of clashes with other related groups. There was a presentation on the progress of the Journal Research Data Policy Registry Pilot and some of the lessons learnt so far. There was a lot of discussion around how an RDA group could intervene in this space. The consensus appeared to be that work around best practice and policy expression would be most useful at this point. This was reiterated at the IG Libraries for Research Data and this may be taken forward at the next plenary in a joint session on policy terminology.
PID Interest Group – Persistent identifiers (PIDs) in research data management: People, Places, and Things
The goal of this session was to connect PID activities across RDA WG and IGs and provide a platform to communicate and collaborate globally. There were a number of presentations in this session. The first was on the THOR project, which aims to establish seamless integration between articles, data and researchers across the research lifecyle. This involves a number of organisations including infrastructure data providers (Datacite, CERN), people IDs (ORCID), data centres (Dryad, British Library), and publishers (PLOS, Elsevier). It has a global reach working with a number of communities, for example biomed, earth science, high energy physics, humanities. It’s a challenge cracking the organisational identifier problem and the work of the Jisc Casrai-UK pilot was mentioned as having worked towards solving the problem.
The next presentation focussed on ORCID and consisted of one slide showing connections and notifications (from submitting manuscripts to publishing), with ORCID at the centre. At publication, information is pushed to ORCID, which then notifies relevant systems that this information is available. This means the researcher won’t have to remember to manually enter information as it automatically feeds to libraries, profiles, funders, etc. Jisc has worked closely with ORCID to pilot its use in institutions and setting up a national consortium arrangement.
The highlight from the GÉANT presentation was that ORCID is soon to be a Service Provider in the SURFfederation and then later on will become available in eduGAIN. The federated identity management for research communities (FIM4R) paper highlighted the requirement for a unique and persistent ID, so there is a lot of interest in ORCID.
Data Description Registry Interoperability (DDRI) Working Group
This WG is addressing the problem of cross platform discovery by connecting datasets together on the basis of co-authorship or other collaboration models such as joint funding and grants. In collaboration with a large range of international organisations they have connected up datasets across multiple registries using DOIs. Although not a specific output of the WG, a visual tool has been developed. OpenAIRE (Open Access infrastructure for research in Europe) is a partner in creating this tool and have fed data into the RD-Switchboard.
There was a demo of the Metadata & Object Repository (MoRe) Aggregator tool. This harvests metadata from multiple sources in multiple schemas. It then validates, cleans and normalises large amounts of content and transforms it into a common schema and publishes to multiple targets.
The final demo was from GESIS and their da|raSearchNet tool. They highlighted the challenges in creating cross-platform discovery: technical capabilities; available metadata; schemas, formats and semantics; de-duplication (they don’t have the solution right now). The tool was developed to help connect datasets of the partners involved so coverage is limited. There is currently a mix of international and German data providers in their database.
The first session of the day looked at RDA Outputs and Adopters (see the programme for the full list) and was followed by an Experimentation Showcase, where a number start-ups and SMEs were given one minute to highlight their work. Delegates were then able to visit the organisations’ stands and vote for their 3 favourites. There was an interesting range of presentations and the winners were:
- Climate risk visualisation.
- Plume Labs. @Plume_Labs Solution to air quality. Analysis of big data to help reduce people’s exposure to air pollution in cities.
- Biodiversity Virtual Laboratory – a platform for integrating disparate data and predicting ecological responses to climate change.
Data Packages BoF
This BoF was run by Rufus Pollock from the Open Knowledge Foundation. Learning from the way software is packaged, this work is looking to apply similar methods to packaging data to make it easier to share data. Further information is available at http://data.okfn.org/ and the notes from this session were shared. This applies similar principles involved in shipping cargo via containers. A few of us expressed an interest in this group and had a follow up meeting to discuss how we can help. This may be taken forward as an Interest Group or even a Working Group.
Metadata Standards Catalog Working Group
This WG looks to build on the work of a previous Metadata Standards Directory WG.
Its objectives are to:
- develop Metadata Standards Directory into a Metadata Standards Catalogue;
- allow records to be added, searched and retrieved by the API;
- provide representations of records in machine readable form;
- develop with the community recommendations of which standard should be used and for what purpose;
- provide information on elements defined by each standard.
The timeframe is: 6 month – collecting further use cases; 6-12 months – further develop the catalogue; 12-18 months – evaluation of the catalogue. They are encouraging people to sign up to the WG.
Persistently Linking Physical Samples with Data and Publications – reproducible science BoF
The aims of this session are clear from the title and the rationale is to:
- Address the requirements for reproducibility of sample-based data across domains;
- Enhance discovery and access of samples across domains;
- Ensure re-usability of samples and data derived from them;
- Establish a culture that recognises sample collection and curation as scholarly contribution;
- Create policies regarding preservation and curation of physical objects and need physical infrastructure;
Most of the discussion was on whether to have an IG or WG. The feeling from the group was that this should be taken forward with a decision made in 2-3 months. Comments are being collected in this Google Doc.
The final day of RDA started with a “women’s networking breakfast” (although men were allowed) with a presentation by Dame Wendy Hall. Despite being held in a noisy marquee Dame Wendy managed to give an entertaining presentation. She talked about her background at Southampton and her use of one of the first laser disks. She ended by encouraging diversity and getting everyone involved in science. There was a lively Q&A session following the presentation which highlighted the struggles women still experience in terms of being fairly represented on panels, management boards and at events.
Joint meeting of IG Domain Repositories, IG Libraries for Research Data, IG Long Tail of Research Data and IG National Data Services
This session was a combination of various Interest Groups. Unfortunately, some of the presenters didn’t attend but it still proved to be a useful session as, rather than being bombarded with slides, there was an open discussion. There was a lot of talk about how to help researchers comply with policies. A lot of policies at institutional level are compliance driven from funders.
Lots of data disappears when a project is over. Institutional support should be developed to ensure data isn’t lost. The DMPOnline Tool was mentioned as a very useful tool helping researchers determine where they can lodge their data and the pros and cons of doing so.
To enable the discovery of research data it requires interoperability and metadata harvesting. This links nicely to the Jisc UK Research Data Discovery Project, which aims to build a registry to make research discoverable.
The important thing is metadata but researchers see creating this as a burden. We, therefore, need to develop decent tools and preservation should start at the beginning of research not at the end.
Building National Scale Data Services: Joint meeting of the proposed National Data Services IG and the Domain Repositories IG
This session was particularly relevant to the work of the Jisc UK Research Data Discovery Project and Christopher Brown summarised the project and use cases during the discussion section. There was quite a bit of interest in the project, particularly around producing a business case for such a service. As this Interest Group may become a Working Group it was felt that collecting business cases from organisations internationally would be a useful output of the group.
The lightning talks were on infrastructure frameworks from DANS in the Netherlands, the EUDAT initiative, the National Data Service in the States and ANDS in Australia.
A report from this session will be produced and you can contribute to the discussion by signing up to the group.