Reflection on Data Policies and their Implementation - Research infrastructure and data

Steve Hitchcock (@stevehit) from Southampton’s DataPool Project has published an interesting blog post about that project’s progress on taking forward a research data management policy. DataPool’s three pronged approach (addressing system, policy and training) was described in Steve’s first post. DataPool is pushing forward with the Southampton research data policy. The project is currently hoping to have policy approved by University Executive Group and forwarded to Senate by March 2012 if all goes well. Steve shares some details and reflections about the policy, process and function which are likely to be of interest to other JISCMRD projects:

The policy includes the policy document supported by a series of user guides to smooth implementation. It would be premature to describe the specifics of the policy here, although broadly it covers a researcher’s responsibilities, IPR, storage, retention, disposal and access, as well as setting out contextual issues such as purpose, objectives, and definitions. My viewpoint on reading the draft policy is to anticipate how a researcher might respond to it in terms of clarity of actions, options and consequences. In this respect it is noticeable how much the policy has improved through review and iterations. …

We do not expect the policy to be without issues when it comes to implementation, clearly, for an initiative of this scale, but the policy will give the DataPool Project the basis to investigate and resolve the issues, in terms of actions and answers. On current schedule, there should be a year for the project to work with this.

There is little prior art on institutional data policy, and one of the reasons JISC has funded DataPool is not just to help produce a data policy, but to inform other institutions on implementation. … Policy implementation, monitoring and ability to adapt are the real testing ground for this latest phase of research data management projects.

Steve also engages in an interesting discussion on what should be defined as research data. More precisely, perhaps it is a discussion of what research data should be subject to the requirements of research data policies to be retained and made available. To my mind there are two criteria. First, as Steve recognises the need to retain those data which underpin, provide evidence for, research findings as expressed in publications, that is research data ‘concerned with the quality and reproducibility of results, the bedrock of scientific testability’. This is the most straightforward selection criteria.

Secondly, however, is the criteria of ‘reusability’ or ‘re-usefulness’ as Mansur Darlington will have it. This is where the data collected may not necessarily underpin a publication – though it would be surprising if it didn’t – but for which there may be a value proposition in reuse in a number of ways.

There is likely a considerable overlap between these two criteria – but we should allow, I think, a) that some data is collected as part of a corpus susceptible to multiple analysis (time dependent observational data, archaeological digs, social surveys etc); and b) that many research projects will collect data that remains, somehow or other on the cutting room floor when publications are being prepared, but may have value in the longer term.

These are very general and abstract comments. I would be very keen to understand whether JISCMRD projects are reaching in a more concrete and ’empirical’ definition of a) what researchers understand by research data, and b) what research data the project and institution concludes should be subject to retention and availability policies.