Research data management, preservation and sustainability, are issues frequently discussed and is even the theme of this blog. However, data is just one, albeit extremely important, output of research and often the software that has been used to produce much of this data is overlooked. There is often a dependency between the two but, even without this dependency, software is another important research output that needs to be preserved, whether for reuse or reproducibility.
To look at some of these issues around software, the Software Sustainability Workshop was organised by DANS and the SSI on 7-9 March, 2017 as a follow-up to an earlier Knowledge Exchange Research Software Sustainability workshop and report. The workshop brought together invited experts from across Europe, including some of us who’d attended the original Knowledge Exchange workshop. Jisc was represented by Christopher Brown and Matthew Dovey.
Aims of the workshop
One of the main aims of the workshop was to look at how the FAIR principles used for data can be applied to software. Also, what do we actually mean by “software sustainability” and how can we arrive at an infrastructure for sustainability for software?
The workshop consisted on a number of presentations but most of the time was spent in breakout groups discussing the four main topics:
- How to implement the FAIR principles to software
- Involving both science and the cultural sector
- Setting up a European Software Sustainability Infrastructure
- Setting up a Software Seal of Approval
We looked at how barriers can be turned into drivers to get the message across that software is important. A key part of the workshop was to determine what we should do to achieve software sustainability and try to identify what are the easy things we can do to help the most people.
The slides for all the presentations will be available online soon, but this post will focus on the four main topics discussed in breakout groups.
Why is software important?
Before dealing with the four main topics we discussed the reasons why we should care about software. These include:
- Customers ask for it
- Research infrastructures demand it (e.g. CLARIN)
- Replication of results requires the data and tools that processed it
- Information about data is often encapsulated in software
- Linking data and other research resources (publications, project information (see NARCIS))
- Software as a special form of data
- Software sustainability is a required element in RDM
By preserving software we are helping to sustain knowledge. Software is fundamental to research, relevant to a wide range of disciplines and communities, and promotes trust in science – reproducibility to ensure science can be relied upon (climate science, vaccines, etc).
How to implement FAIR principles to software
We need a set of principles for open science not just data, but it is difficult to fit software into the FAIR principles. The Findable and Accessible parts seem more relevant to software than Interoperable and Re-usable. The ‘F’ for software is about discovery and being citable. To be discovered it needs metadata that can be crawled publicly. Software is connected to a wider infrastructure, contains intellectual and scientific contributions, is dynamic and can be dangerous. Changing the software changes the nature of the object. A persistent identifier needs to be linked to a specific version of the software. Is a dependency between the data and the software more important than the software’s quality? If there is a dependency the software needs to be preserved even if it’s poor quality. One suggestion for a new acronym that encompasses the main principles for software is TRIFID – Testable, Robust, Installable, Findable, Identifiable, Documented.
Involving both science and the cultural sector
Prior to this discussion there were presentations on 1) the Software Heritage archive, which is preserving software source code for present and future generations; 2) the Digital Heritage Network, which is looking at software sustainability in the digital heritage sector; and 3) the Commons Conservancy, which aims to help make software retrievable and persistent across different “technical” homes.
These led neatly into the discussion on sharing between sciences and the cultural sector. There is a need for platforms where expertise can be bundled, shared and reinforced. One group asked “how can we become better librarians for software?” To do this we need to build a consensus on a classification system for the subject area, ensure LOCKSS can be extended for software, communicate the importance to software developers of having good metadata for discoverability, agree to use software identifiers to understand “lineage” and “importance”, and understand how to have a sound legal base for preservation and future reuse of software in archives. For the librarianship of software we need to reach out to the museum, library and the archive community.
In relation to the Software Heritage archive, could crowdsourcing be used to get some of the early source code that’s not available? What components should be on top of the Software Heritage archive? What are the stimulants for doing this? The barriers preventing software being put into the archive include lack of incentive, the benefits are not always clear, and who should actually do it? This could be done through hackathons, for example. There is an overlap between SSI initiatives and the cultural sector.
Setting up a European Software Sustainability Infrastructure
Each group tried to answer the following questions and a summary of the answers follow each question.
- Will the value added of sharing knowledge on Software Sustainability be significant enough to justify an (EU) infrastructure?
Yes. Aim for a global SSI as the EU name might be a hindrance later on. Signals importance of software sustainability.
- What should an ESSI encompass in terms of physical and service elements?
Lots of suggestions including a GitHub function as a versioning system is essential. Raising awareness. Provide concrete services or community driven expertise. Idea is for a scoping study so better to have more ideas than less. Who will be the users? It should be neutral, transparent, academic, distributed and public. Develop expertise. Many more soft service elements than hard ones. People based services rather than physical ones. Lobbying for policy change and improvement.
- What should the relation be with EUDAT?
Services such as B2SHARE, B2SAFE and B2FIND could be extended to include software. Could recommend the use of EUDAT services for software preservation.
- What would be the best organisational structure? Frequency of meetings, formal or informal, legal entity?
The EGI model was mentioned but has problems. No conclusion other than have an open mind to different alternatives. Structures for helping best practice. They should be flexible enough to enable different approaches at different points in the roadmap for best practice. There is value in having a single point of contact, but have local based expertise to help you and guide you in the right direction.
- What type of parties should be involved?
Not excluding any organisation. Later stages could include commercial organisations.
Setting up a Software Seal of Approval
Software is a live object and changes over time, whereas data is more fixed. It evolves from prototype to production ready and its evolution can take a long time. Funders need to look at promising projects and not just mature software. Key message is to find a way to help funders to help projects move through the different phases of the project and any label should be re-evaluated on a regular basis. A seal should not stifle innovation. It needs to be kept simple, especially for research funders. Levels should be spread widely to be as inclusive as possible. Pay a lot of attention to wording to make them generic for communities, but don’t change the meaning of the guidelines.
There needs to be accreditation of organisations that give out seals of approval. Should the seal be for the producer or the software? Seal for a producer, certificate for a software release. Need team (group) identifiers to make this work. Peer review would provide additional trust. Minimum guidelines for all projects could include DMP, licence, deposited, and persistent identifier. These are focussed on the ‘F’ and ‘A’ in FAIR.
One group asked, is it the software or the sustainability being approved? A software seal is more about maintaining trust and functionality/usability than sustainability. Guarantees for sustainability don’t exist, only have chances for it. A SSA is more a set of good practices that help achieve sustainability. Citations to software as indication of community use. Indicators would be, how easy is the software to use and how easy is it to find? The value of software should make it more likely to receive funding. You get trust and usage through getting a badge, but will development teams bother and who will give the badge?
As well as the main topics for discussion the workshop had an objective to come up with some clear pointers as to what should be done next. These are (ordered by priority):
- Develop a “FAIR”-like manifesto
- Community engagement for the “history of software”
- Engage museum, library and archive communities on the curation of software
- Assess seals, badges or guidelines
- “International” SSI scoping study
A follow-up workshop is planned within 6 months to 1 year and could focus on one of the above topics allowing those interested to be involved. The discussion will continue at the next RDA plenary where there is a BoF on “Software Source code: Sharing, Preservation and Reproducibility”.
Finally, a mailing list will be set up and the documentation from this workshop will be shared on Google Docs or GitHub. If you’re interested in the area of software sustainability then it’s worth keeping an eye on the outputs and future work of this group, or actively joining the discussion.