New JISCMRD Projects: Innovative Data Publications - Research infrastructure and data

As promised, here is a second post providing details of new projects in the JISC Managing Research Data Programme. Today, innovative data publications as a way of increasing the recognition and reward which researchers receive for publishing and sharing data. Coming soon, projects creating discipline-focussed training materials in research data management.

As well as seeking to demonstrate ways in which research data management in UK Universities can be improved, the JISCMRD programme seeks to demonstrate the benefits of making research data as openly available as possible.

JISC Call for Projects 14/09 Strand B sought to fund projects to explore and pilot innovative technical and organizational models for enhanced research data publications. The intended outcomes of these projects, therefore, would be examples of publications that encourage the open publication of research data linked to scholarly articles, thereby to stimulate the better management, more open sharing and easier reuse of research data.

The Call took as its premise that in order to improve the way digital research data is managed, preserved and made available for reuse, the status of data as an valuable research output needs to be enhanced. The mechanisms for recognition and reward for research effort revolve around publications, and there is a need for data publication to be closely integrated with the existing cycle and process. The vision, therefore, is of scholarly, peer-reviewed, online publications that champion the sharing of data as a core output of research activity, by exposing or linking to reusable research data held in an associated repository or archive.

Although some technical issues are at play, it is perhaps even more important that organizational issues (e.g. partnerships between publications/journals and data archives/repositories) should be addressed. Proposals were asked to demonstrate clearly how the proposed pilot will involve appropriate partnerships that are likely to ensure the pilot data publication model will gain support form the research community and stakeholders targeted. Projects are also required to explore and present business models for the long term sustainability of the data publication proposed.

Two projects were funded, representing an investment of nearly 320,000. Both projects run for a year from August 2010. The project summaries that follow are formed from the project proposals. The contact details provided are all from the ac.uk domain.

University of Cambridge (with the International Union of Crystallography, BioMed Central, Open Knowledge Foundation)

XYZ Project

The premise of the XYZ Project is that in order for research data to receive the recognition it deserves, the data underpinning a scholarly article must also be considered and assessed in the peer-review process. The project will establish a technical mechanism and human process for publication ready data to be submitted at the moment of peer-review and to be tested alongside the conclusions presented in the associated article. The project will build on outputs from previous JISC Projects (the CLARION repository and the EmMa Embargo Management tool), and will collaborate with the International Union of Crystallography and BioMed Central to test and pilot the process for a new data publication.

In XYZ, partners at the University of Cambridge with two OA publishers, the IUCr and BioMed Central, and the Open Knowledge Foundation, will create a software and organisational demonstrator of a new workflow for the journal article publication process. It requires supporting information (“data”) to be validated and packaged for submission at the same time as the manuscript. This workflow will then create both a cross-subject data journal and a data-rich journal publication.

The project’s central contention is that by making the deposit of supporting information a condition of acceptance (rather than publication) of journal articles is the single most effective step that can be taken to promote Open Data. Good curation, security and attractive presentation of data are also important to academics, and we will address these in the demonstrator.

By building the demonstrator in terms of roles and responsibilities rather than specific organisations, in a realistic organisational context, and by maintaining our commitment to high-quality Open Source software, the project intends to maximize the potential for sustainability of the demonstrator itself, and the likelihood that this work can be taken forward in the future.

The project will create a demonstrator of a new workflow for publishing data in support of full-text. The workflow will require supporting data to be submitted to a third-party trusted repository before the paper is submitted to the publisher. The XYZ software will manage the deposition, release to reviewers, embargo and dis-embargo. The process will be applicable to conventional publications as well as to data journals.

The project considers it evident that there is a growing demand for the early and comprehensive publication of data supporting publications in scholarly journals. The two principle drivers of this are 1) the need for experimental verification and falsifiability; and 2) the potential for reuse, recombination and large scale analysis by other scientists. The project will therefore address key areas of ‘resistance’ that have been identified recognition and effort.

In terms of the extra effort required to make data ‘publishable’, the project assumes that the ‘activation energy’ for an author to release data at the time of manuscript preparation is much less than finding and releasing the data after acceptance. By providing the manuscript and the data at the same time, it ensures that the work involved for releasing data is done at a time when the scientist is fully engaged with the excitement of the dataset.

The project will work with publishers who publish Open Access CC-BY material (BMC and for Acta Cryst E, IUCr). These publishers are eager to publish data for both reuse and validation, but do not wish to incur major new costs on technology development and ramp-up.

The project will deliver:

1) A report (and update) on requirements for the workflow and collaboration.

2) The workflow and data journal will be built on existing software and infrastructure, by extending the CLARION repository (authentication and authorisation, use of DOIs); the EmMa Embargo Management tool; and to extend and robustly the Crystaleye data visualisation software so that it is conferment with DOIs and DataCite, in order to form the centrepiece of the data journal. The project will produce a demonstrator data-overlay journal using existing data from the CLARION repository.

3) The project will disseminate its approach by means of editorials in IUCr and BMC publications; and at the IUCR spring meeting.

4) A business plan, laying out the model for long term, sustainable hosting by UICr.

Project Website: http://projectxyz.wordpress.com/

Project Director: Peter Murray-Rust pm286 at cam

Project Manager: Brian Brooks bjb45 at cam

University of Oxford (with the British Library, Digital Curation Centre, Charles Beagrie Ltd)

Dryad-UK

The Dryad-UK Project will work closely with an existing US initiative that has established an open access data repository providing longterm access to research data underpinning articles in a range of scholarly journals in the field of evolutionary biology. Among the interesting features of the Dryad model, is an editorial policy that currently recommends – and from January 2011 will mandate – submission of supporting data to the Dryad repository. As well as forming a mirror for the existing repository, Dryad-UK will expand the range of participating journals to include the area of epidemiology and infectious diseases. Work will also be undertaken to facilitate better citation of the datasets and to develop a robust business model for Dryad-UK.

The JISC Dryad-UK Project aims to undertake preparative work for the establishment of Dryad as a persistent international repository for bioscience research datasets, linked to the peer-reviewed journal articles they underpin. In particular, the project will work directly with the NSF-funded Dryad project that has pioneered this innovative dataset archiving approach, and the international DataCite Association that provides identifiers and services for dataset citation, to archive and publish high-value research datasets that lack natural homes in existing bioinformatics databases.

To achieve that aim, the Dryad-UK project has six immediate goals:

To create a UK mirror of the Dryad repository, under the aegis of the British Library (BL), to provide additional dataset security, and to yield data for the sustainability business plan.
To develop a sustainability business plan for the legal, organizational and financial structure of Dryad, which is currently in a critical transition phase from prototype to production service, enabling it to become a sustainable ongoing international not-for-profit organization that can ensure long-term preservation and accessibility of its data holdings.
To expand the range of journals submitting datasets to Dryad, particularly in infectious disease and epidemiology, working with academic publishers, journals and learned societies. We have chosen to target infectious diseases and epidemiology for expansion, since timely access to reliable research data here might bring real benefits in terms of global health.
To facilitate data citation, by assigning DOIs to Dryad datasets, by developing BiDO, a Biological Data Ontology, and enhanced metadata standards for describing Dryad datasets, and by publishing as Linked Open Data the citations between Dryad datasets and journal articles.
To evaluate the usefulness of Dryad data publication to the scientific community.
To show how Dryad can benefit and support HEIs and funding agencies, by allowing them automatically to harvest metadata concerning datasets published by their researchers.

The 2007 Brussels declaration on STM publishing stated ‘data that are submitted with a paper to a journal should wherever possible be made freely accessible to other scholars.’ Accordingly, data submitted to the Dryad repository are made freely available under the non-restrictive terms of the CCZero data licence.

A digital data library such as Dryad can provide numbers advantages over conventional supplementary information attached to journal articles, including economies of scale, quality curation, standardised metadata, long-term preservation, disciplinary coherence, citability, improved discoverability and standardised content-delivery mechanisms.

The project will deliver:

1a) A mirror of the Dryad digital data library;

1b) A report providing technical guidance for a distributed international network with the aim of failsafe data preservation, including an investigation of cloud hosting;

1c) Plans and requirements for permanent hosting.

2) An expanded range of journals submitting to Dryad. Dryad will work initially with a range of major academic publishers, including BioMed Central, Elsevier, Nature, OUP, PLoS, and Wiley.

3) An MoU and other agreements, contracts to define the partnerships; a sustainability business plan.

4) Interoperability with publishers and repositories by means of manuscript submission platforms, data deposition mechanisms for specialist repositories and metadata exchange with institutional repositories and agencies.

5) Metadata standards for data annotation, deposition and citation; by means providing i) metadata and data citations as Open Linked Data; ii) use of MIIDI and ISA-Creator for annotation of Dryad datasets; iii) creation of BiDO for data description and citation.

6) Evaluation work, comprising i) a formative assessment framework for future independent evaluations; ii) publications on the value of Dryad datasets to users and the scientific community; iii) a planning report allowing Dryad to project the rate of new data submissions.

7) Dissemination by means of two workshops, hosted by the British Library and the Digital Curation Centre.

Project Director: David Shotton david.shotton at zoo.ox