Pulling it all Together: Research Data Projects at OR2012 (Part 2…)
Continuing a series of posts on research data issues at OR2012…
Another theme over the week, picked up in Peter Burnhill’s closing summary, was the important of ‘linking research inputs and outputs’. A number of JISC Managing Research Data projects are taking a holistic view, seeking to ensure the joined up exchange of information between research information management systems, institutional repositories and research data archives and catalogues. This was stressed in Cathy Pink’s presentation on the Research 360 project. And it was given forceful expression in Sally Rumsey’s account of Oxford’s Data Management Rollout project and the DataFinder system which will play an important role in ‘pulling it all together’.
Sally Rumsey, DaMaRO Project, University of Oxford
Sally’s presentation can be downloaded from the OR2012 website; a video of the second research data session is available on the OR2012 YouTube channel.
In her presentation on the DaMaRO project, Sally underlined the need for cross-university, multi-stakeholder approach to addressing the RDM challenge. The DaMaRO project has four components: fostering the adoption of RDM policies and strategies; design and provision of training, support and guidance; technical development and implementation of an RDM infrastructure; and preparation of a business plan for sustainability.
Development of RDM policies, training materials and a strategy and plan for sustainability form overarching activities. The RDM infrastructure itself is to have three components, corresponding to phases in the research data lifecycle:
- Data creation and management of active data: this will be done locally in departments and research groups, taking advantage of the DataStage and VIDaaS platforms as well as other bespoke tools.
- Archival storage and curation: provided centrally, using the DataBank repository, as well as a software store, but also drawing on community, national and international infrastructure (for storage and curation) where this exists.
- Data discovery and dissemination: provided principally by the Oxford DataFinder.
Sally described DataFinder, which is being developed by the DaMaRO Project, as the keystone of Oxford’s Research Data Infrastructure. Alternatively, DataFinder could be described as the connective tissue and nerves, linking all the other elements of the RDM infrastructure together. To the researcher DataFinder will provide a catalogue of research data produced by Oxford projects, whether these are internally or externally funded, whether the data is held in the Oxford DataBank or elsewhere. It is a platform for both discovery and dissemination of research outputs and assets. DataFinder provides a mechanism for assigning DOIs to ensure proper identification and encourage appropriate citation. This service will ensure that Oxford can be seen to comply with funder requirements, by interoperating with research management systems to show what data assets have been generated from which grant and whether they are available for further analysis or reuse.
To this end, a three tier metadata approach is envisaged, comprising:
- a minimal mandatory metadata set providing core information (this starts with the DataCite kernel but includes other fields such as location, access terms and conditions and any embargo information).
- a second mandatory layer with ‘contextual’, administrative information (ideally, much of this will be automatically harvested, or passed on by administrative systems).
- and finally optional metadata (the rich, specific and discipline related information required for reuse).
The Oxford project has currently identified 12 fields for the set of minimal mandatory metadata which will be regarded as the bare minimum to be provided in relation to any research dataset. The contextual metadata will include any information mandated by the funder, for example, the identify of the funder, the name of the project or programme and the grant number or identifier.
There seems to be a growing level of consensus among JISCMRD projects and elsewhere around the broad contours of such a three tier metadata approach. I intend to revisit this in a future post to understand how much alignment there is in the detail of the first two layers of this broadly accepted structure.
Sally stressed repeatedly the size of the task ahead. DaMaRO performs an important role by pulling together a number of previous initiatives. However, it is not realistic to think that this will be a complete RDM service. By project end, in March 2013, Oxford will have a policy and a plan for sustainability; a body of training materials for researchers and research support staff; and two core services run by the Bodleian, DataBank and DataFinder. Significant progress, to be sure, but foundations nonetheless. Sustainability and cost recovery, for example, are significant challenges. It will be necessary, of course, to recover costs against research grants – and Sally urged the need for transparency in this regard, ideally a hypothecated line in grant proposals for RDM infrastructure.
However, it must be recognised that a significant amount of research – producing important data outputs – is conducted at Oxford that is not externally funded. Like many other institutions, Oxford is currently needs to examine very carefully how an infrastructure which provides a service for non-funded projects can be sustained when only partial cost-recovery from funded projects is possible.
The issue of how to fund a research data management infrastructure on a sustainable basis while only relying partially on cost-recovery from grant funded research projects is a matter of concern for all JISCMRD projects and all institutions, including Open Exeter…