‘Data is now Mainstream’: Research Data Projects at OR2012 (Part 1…)
It has been remarked in a number of places that OR 2012 saw the arrival of research data in the repository world.
In his account of his Edinburgh experience, Peter Sefton observed that we are now talking about Research Data Repositories, not just Institutional Publication Repositories And Angus Whyte has described this as bringing data into the open repository fold. It has also been remarked that research data management can make ‘a significant contribution to an institution’s research performance’, but only if services are based upon robust understanding of researchers’ needs.
Using a wordle of #or2012 tweets in his closing summary, Peter Burnhill noted that ‘Data is the big arrival. There is a sense in which data is now mainstream.’ (See Peter’s summary on the OR2012 You Tube Channel). Peter also remarked on the presence of #jiscmrd, the tag for JISC’s Managing Research Data programme, in the word cloud! A number of the current JISCMRD projects gave presentations about their work over the week.
None of these observations should be a surprise. Research data has been climbing the agenda, pushed by a variety of imperatives, that are well known. So what was significant in the various presentations on research data issues at OR2012?
Cathy Pink, Research 360 Project, University of Bath
In the DCC workshop, Cathy Pink addressed the questions of why universities should be engaging with research data and in dealing with the ‘how’ considered important use cases at Bath. The drivers come in large part from funder policies, themselves reflecting public good principles about research integrity and making the most out of research investment – this is well known. At Bath, as elsewhere, these policies coincide with the universities interest in managing better its research activity and outputs more generally. For this reason, as RDM solutions are developed and implemented (the project is evaluating Sakai and DataStage during the project, ePrints and DataBank post project) they must be integrated with the CRIS and publications repository. These relationships are important because of the need to link research inputs to research outputs, publications to research data. As the name suggests, the Bath project aims to ensure that various aspects of research management and research data management are joined up across the lifecycle. This is a sizeable challenge and is well worth taking a look at Cathy’s presentation to consider the range of questions that are being considered.
The Research 360 Project has been directly involved the development of Bath’s EPSRC Roadmap. Like other institutions, to meet EPSRC’s expectations, Bath is developing a catalogue of research data holdings. Cathy stressed that Bath relies particularly heavily on commercial and industrial partnerships – this focus is written into the universities charter – and therefore the challenge of managing commercial confidentiality is pressing. It is worth stressing plainly: it is precisely because commercial and sensitive data are concerned that the University considers it important to have in place a robust data management infrastructure. Ideally, the data catalogue will list all data assets, even where these are embargoed – but it is possible that commercial partners may require the metadata also to be restricted. Cathy also raised the interesting question of whether DOIs should be assigned to embargoed data. This would depend at least on whether the minimal metadata required by DataCite was considered sensitive or not, but there may be other considerations…
Chris Awre, History DMP Project, University of Hull
Hull has been using the institutional Hydra repository to curate and publish datasets for some time. Inter alia, this includes curating a collection of datasets for the History of Marine Animals Populations (HMAP) initiative. For an international initiative, a collaboration which may rely on a series of project grants, in a research area where there may not be an established and appropriate national and international data archive, an institutional repository thus provides an important service. Moreover, the University of Hull has a particular expertise in various aspects of Maritime History. And this includes expertise in the preparation, collation and other processing of datasets forming part of the HMAP Project. The repository has also curated data from the University’s projects around the Domesday Book. This was an AHRC funded project and the data was deposited with the History Data Service. However, the project lead wanted the data to be available to the general public without the need for a login or registration.
With the demise of the AHDS, the History Data Service collection policy has narrowed in scope. In such cases, where international and national (Tier One and Two services) do not exist, institutional repositories clearly have a role to play as a sustainable, trusted repository (see my previous post for arguments around the role of institutional repositories in the curation hierarchy). One can see the attraction for leading departments to develop research and data expertise in parallel, partnering the institutional repository service to ensure that key data assets are published and preserved for the long term, as with these example datasets at Hull. The History DMP project built on this partnership and existing expertise to understand better the data management needs of researchers in the department of history, to prepare a departmental data management plan and an adaptation of the DCC’s DMP checklist for the needs of historians.
Cathy and Chris were joined in the DCC workshop by Sally Rumsey from the University of Oxford. Sally gave two presentations during the week at OR2012. In the DCC workshop she asked, what is ‘Just Enough Metadata’, what is sufficient metadata from the perspective of an institutional repository and data service. Metadata was also an important theme in her presentation in the general Research Data Management session in which she described the JISCMRD Data Management Rollout Project… of which more soon…