Software Reuse, Repurposing and Reproducibility in 2016: all talk….and some DataCite action

In this guest blog post, Catherine Jones, talks about the Software Reuse, Repurposing and Reproducibility project and the progress she and the team have made around improving the DataCite Digital Object Identifiers (DOIs) for research software. The Software Reuse, Repurposing and Reproducibility project was funded by our research data spring initiative from April to November 2015. It was led by Ian Gent, St Andrews and Catherine Jones, STFC; joined by Jonathan Tedds, University of Leicester in the second phase.

Identifying research software, persistently

The Software Reuse, Repurposing and Reproducibility project has not changed the world, but it has changed the DataCite schema!  In our Guidelines for Persistently Identifying Software, we recommended a new DescriptionType property should include the value of “TechnicalInfo” to encourage the inclusion of more explicit technical information about software, when DataCite DOIs are generated. This change is now adopted and will be incorporated in the forthcoming Version 4.  It will aid in the discovery, reuse and repurposing of software through the clear signposting of the technical content.

Whilst the value of “other” is comprehensive, it is not as targeted at specific technical information. In Version 4 of the DataCite DOI, one will be able to clearly add the “TechnicalInfo” description. Often, simple amendments in wording can change behaviour and in this case the value of the descriptor clarifies the information that needs to be added, hence making it more likely for the metadata record to be used effectively. This schema change should improve the location of software and the understanding of the environment needed to run it. This would not have been possible without the support from Jisc through the research data spring initiative. I have had productive discussions with GitHub and Zenodo about using this new field instead of the current “DescriptionType” of “Other” once the schema is fully released.

Research software underpins academic research.  The project’s vision was to improve the discoverability of software. To achieve this, we considered issues around the metadata needed to persistently and effectively identify software, versioning and linking to things like the source code, versions in a container and other research artefacts.

Stakeholders and engagement around software reuse

The final report identified three key stakeholders in the adoption of the persistent identification of software. Software developers, researchers and digital preservation specialists all have a part to play in ensuring software is reusable, good quality and findable. Throughout 2016 I have been out and about talking about the software identification issues and raising awareness of potential benefits at meetings and conferences that involved representatives of all three stakeholder groups.

I have spoken to software developers and researchers at the Collaboration Computational Projects Steering Panel, Software Sustainability Institute’s Collaborations Workshop, and the Reproducibility symposium at the Alan Turing Institute.  Other conferences and events that have provided our team with a great opportunity to share the lessons learned and the recommendations we have made were the International Digital Curation Conference, ESIP Data Stewardship Committee (USA) and the Repository Fringe. Matthew Dovey of Jisc talked about our outcomes and recommendations at the Knowledge Exchange workshop on software sustainability and we have been included in the report.

This September, you can find me at the Research Software Engineers Conference.


Catherine Jones

About the Author: Catherine Jones is the Software Engineering Group Leader in the Scientific Computing Department in STFC. She has a Computing and Communication Systems degree and is a chartered Librarian and a member of the British Computer Society. She has a wide experience in providing information systems and services to the academic community, both within and external to STFC. These have varied from small scale single user information retrieval systems to being the STFC JiscMail Director with one million users. Her current Group provide software engineering management tools through the Software Engineering Support Centre to academia; data management pipelines to STFC Facilities, publication and data repositories to STFC staff and contribute to EC projects in these areas.   She uses her software engineering and information management expertise to deliver effective services to user communities.  Her personal research interests are the digital curation of software & data and linking research outputs (data, publications and software).