Arguments in favour of research data sharing stress the need for verification and reproducibility. It is fundamental to the scientific method and to good research practice for other researchers to be able to test the evidence underpinning the hypotheses and interpretations presented in a given scholarly publication.
In recognition of this a number of journals have recommended or mandated that research data be deposited in appropriate data repositories prior to publication. Parallel to this, there are a growing number of initiatives that explicitly link journal articles with the underlying data or that may be characterised as data journals, championing the publication of research data sets with commentary, analysis and visualisation.
Technical, procedural and cultural challenges exist around the use of identifiers, exchange of metadata, effective linking and data citation. There is also a need to establish sustainable partnerships between journals, data centres and research organisations which are necessary to underpin innovative forms of data publication.
Innovative data publications are likely to provide researchers with recognition and reward for making datasets available and thus encourage data to be viewed as a first class research output, for data publication to be considered an essential part of the scholarly process. Likewise, it seems likely that as well as making it easier for researchers to locate and access datasets, linking between publications and supporting data will provide a means for established data centres, or even institutional data repositories to enhance and draw attention to well-curated research outputs.
For partnerships around data publication to become established, there are important questions to be considered:
What policies are required on the behalf of journals’ editorial boards to achieve greater levels on data sharing, citation and linkages between publications and datasets?
What partnerships between journals, data centres and research organisations are necessary to establish sustainable solutions, and what business models are appropriate?
How may the costs of long term data archiving be met and appropriately distributed in models that stress the importance of publishing data and linking data sets to published outputs?
- What characterises a suitable repository and what criteria of quality and assurance are necessary of the data archive underpinning such collaborations?
- What, if any, peer review of data is appropriate before publication?
The JISC Managing Research Data Programme 2011-13 has, therefore, funded two projects to design and implement innovative technical models and organisational partnerships to encourage and enable publication of research data. These projects will also explore these questions listed above and thereby shed light on solutions which will enable the greater development of data publication.
PREPARDE: Peer REview for Publication & Accreditation of Research Data in the Earth sciences
PREPARDE will capture the processes and procedures required to publish a scientific dataset, ranging from ingestion into a data repository, through to formal publication in a data journal. It will also address key issues arising in the data publication paradigm, namely, how does one peer-review a dataset, what criteria are needed for a repository to be considered objectively trustworthy, and how can datasets and journal publications be effectively cross-linked for the benefit of the wider research community.
Project website: http://proj.badc.rl.ac.uk/preparde
PRIME: Publisher, Repository and Institutional Metadata Exchange
PRIME will enable the automated exchange of metadata between publishers, subject-based and institutional repositories. A partnership between UCL, the Archaeology Data Service and Ubiquity Press, a campus-based open access publisher located at UCL, PRIME will ensure that each stakeholder has a record of content relevant to them, even when the data itself is held elsewhere.
As previously noted, scholarly journals are increasingly recommending or requiring as a condition of publication that research data should be made available in an appropriate repository. A service to collate and summarise journal research data policies would serve the purpose of providing researchers, managers of research data services and other stakeholders with an easy source of reference to understand the requirements and recommendations made by journal editorial board with regard to data sharing. Such a service would provide a useful information and advocacy tool for a variety of stakeholders in this area (including exponents of open data, research data infrastructure providers, institutional managers with responsibilities for research data management etc). It is also likely to provide a helpful incentive for the increasing systematisation and codification of such policies and for their more regular review.
JISC and other stakeholders need to understand precisely what is required in such a service and what business models are available to maintain a sustainable service, including a consideration of sources of funding and cost recovery.
The third project funded by the JISC Managing Research Data Programme in the area of data publication is feasibility study for a service to collate and summarise journal data policies, which will consider requirements and present possible business models.
JoRD: Journal Research Data Policy Bank
The Journal Research Data Policy Bank (JoRD) project will conduct a feasibility study into the scope and shape of a sustainable service to collate and summarise journal policies on Research Data. The aim of this service will be to provide researchers, managers of research data and other stakeholders with an easy source of reference to understand and comply with Research Data policies.
Project website: http://crc.nottingham.ac.uk/projects/jord.php
Continuing a series of posts on research data issues at OR2012…
Another theme over the week, picked up in Peter Burnhill’s closing summary, was the important of ‘linking research inputs and outputs’. A number of JISC Managing Research Data projects are taking a holistic view, seeking to ensure the joined up exchange of information between research information management systems, institutional repositories and research data archives and catalogues. This was stressed in Cathy Pink’s presentation on the Research 360 project. And it was given forceful expression in Sally Rumsey’s account of Oxford’s Data Management Rollout project and the DataFinder system which will play an important role in ‘pulling it all together’.
Sally Rumsey, DaMaRO Project, University of Oxford
Sally’s presentation can be downloaded from the OR2012 website; a video of the second research data session is available on the OR2012 YouTube channel.
In her presentation on the DaMaRO project, Sally underlined the need for cross-university, multi-stakeholder approach to addressing the RDM challenge. The DaMaRO project has four components: fostering the adoption of RDM policies and strategies; design and provision of training, support and guidance; technical development and implementation of an RDM infrastructure; and preparation of a business plan for sustainability.
Development of RDM policies, training materials and a strategy and plan for sustainability form overarching activities. The RDM infrastructure itself is to have three components, corresponding to phases in the research data lifecycle:
- Data creation and management of active data: this will be done locally in departments and research groups, taking advantage of the DataStage and VIDaaS platforms as well as other bespoke tools.
- Archival storage and curation: provided centrally, using the DataBank repository, as well as a software store, but also drawing on community, national and international infrastructure (for storage and curation) where this exists.
- Data discovery and dissemination: provided principally by the Oxford DataFinder.
Sally described DataFinder, which is being developed by the DaMaRO Project, as the keystone of Oxford’s Research Data Infrastructure. Alternatively, DataFinder could be described as the connective tissue and nerves, linking all the other elements of the RDM infrastructure together. To the researcher DataFinder will provide a catalogue of research data produced by Oxford projects, whether these are internally or externally funded, whether the data is held in the Oxford DataBank or elsewhere. It is a platform for both discovery and dissemination of research outputs and assets. DataFinder provides a mechanism for assigning DOIs to ensure proper identification and encourage appropriate citation. This service will ensure that Oxford can be seen to comply with funder requirements, by interoperating with research management systems to show what data assets have been generated from which grant and whether they are available for further analysis or reuse.
To this end, a three tier metadata approach is envisaged, comprising:
- a minimal mandatory metadata set providing core information (this starts with the DataCite kernel but includes other fields such as location, access terms and conditions and any embargo information).
- a second mandatory layer with ‘contextual’, administrative information (ideally, much of this will be automatically harvested, or passed on by administrative systems).
- and finally optional metadata (the rich, specific and discipline related information required for reuse).
The Oxford project has currently identified 12 fields for the set of minimal mandatory metadata which will be regarded as the bare minimum to be provided in relation to any research dataset. The contextual metadata will include any information mandated by the funder, for example, the identify of the funder, the name of the project or programme and the grant number or identifier.
There seems to be a growing level of consensus among JISCMRD projects and elsewhere around the broad contours of such a three tier metadata approach. I intend to revisit this in a future post to understand how much alignment there is in the detail of the first two layers of this broadly accepted structure.
Sally stressed repeatedly the size of the task ahead. DaMaRO performs an important role by pulling together a number of previous initiatives. However, it is not realistic to think that this will be a complete RDM service. By project end, in March 2013, Oxford will have a policy and a plan for sustainability; a body of training materials for researchers and research support staff; and two core services run by the Bodleian, DataBank and DataFinder. Significant progress, to be sure, but foundations nonetheless. Sustainability and cost recovery, for example, are significant challenges. It will be necessary, of course, to recover costs against research grants – and Sally urged the need for transparency in this regard, ideally a hypothecated line in grant proposals for RDM infrastructure.
However, it must be recognised that a significant amount of research – producing important data outputs – is conducted at Oxford that is not externally funded. Like many other institutions, Oxford is currently needs to examine very carefully how an infrastructure which provides a service for non-funded projects can be sustained when only partial cost-recovery from funded projects is possible.
The issue of how to fund a research data management infrastructure on a sustainable basis while only relying partially on cost-recovery from grant funded research projects is a matter of concern for all JISCMRD projects and all institutions, including Open Exeter…
It has been remarked in a number of places that OR 2012 saw the arrival of research data in the repository world.
In his account of his Edinburgh experience, Peter Sefton observed that we are now talking about Research Data Repositories, not just Institutional Publication Repositories And Angus Whyte has described this as bringing data into the open repository fold. It has also been remarked that research data management can make ‘a significant contribution to an institution’s research performance’, but only if services are based upon robust understanding of researchers’ needs.
Using a wordle of #or2012 tweets in his closing summary, Peter Burnhill noted that ‘Data is the big arrival. There is a sense in which data is now mainstream.’ (See Peter’s summary on the OR2012 You Tube Channel). Peter also remarked on the presence of #jiscmrd, the tag for JISC’s Managing Research Data programme, in the word cloud! A number of the current JISCMRD projects gave presentations about their work over the week.
None of these observations should be a surprise. Research data has been climbing the agenda, pushed by a variety of imperatives, that are well known. So what was significant in the various presentations on research data issues at OR2012?
Cathy Pink, Research 360 Project, University of Bath
In the DCC workshop, Cathy Pink addressed the questions of why universities should be engaging with research data and in dealing with the ‘how’ considered important use cases at Bath. The drivers come in large part from funder policies, themselves reflecting public good principles about research integrity and making the most out of research investment – this is well known. At Bath, as elsewhere, these policies coincide with the universities interest in managing better its research activity and outputs more generally. For this reason, as RDM solutions are developed and implemented (the project is evaluating Sakai and DataStage during the project, ePrints and DataBank post project) they must be integrated with the CRIS and publications repository. These relationships are important because of the need to link research inputs to research outputs, publications to research data. As the name suggests, the Bath project aims to ensure that various aspects of research management and research data management are joined up across the lifecycle. This is a sizeable challenge and is well worth taking a look at Cathy’s presentation to consider the range of questions that are being considered.
The Research 360 Project has been directly involved the development of Bath’s EPSRC Roadmap. Like other institutions, to meet EPSRC’s expectations, Bath is developing a catalogue of research data holdings. Cathy stressed that Bath relies particularly heavily on commercial and industrial partnerships – this focus is written into the universities charter – and therefore the challenge of managing commercial confidentiality is pressing. It is worth stressing plainly: it is precisely because commercial and sensitive data are concerned that the University considers it important to have in place a robust data management infrastructure. Ideally, the data catalogue will list all data assets, even where these are embargoed – but it is possible that commercial partners may require the metadata also to be restricted. Cathy also raised the interesting question of whether DOIs should be assigned to embargoed data. This would depend at least on whether the minimal metadata required by DataCite was considered sensitive or not, but there may be other considerations…
Chris Awre, History DMP Project, University of Hull
Hull has been using the institutional Hydra repository to curate and publish datasets for some time. Inter alia, this includes curating a collection of datasets for the History of Marine Animals Populations (HMAP) initiative. For an international initiative, a collaboration which may rely on a series of project grants, in a research area where there may not be an established and appropriate national and international data archive, an institutional repository thus provides an important service. Moreover, the University of Hull has a particular expertise in various aspects of Maritime History. And this includes expertise in the preparation, collation and other processing of datasets forming part of the HMAP Project. The repository has also curated data from the University’s projects around the Domesday Book. This was an AHRC funded project and the data was deposited with the History Data Service. However, the project lead wanted the data to be available to the general public without the need for a login or registration.
With the demise of the AHDS, the History Data Service collection policy has narrowed in scope. In such cases, where international and national (Tier One and Two services) do not exist, institutional repositories clearly have a role to play as a sustainable, trusted repository (see my previous post for arguments around the role of institutional repositories in the curation hierarchy). One can see the attraction for leading departments to develop research and data expertise in parallel, partnering the institutional repository service to ensure that key data assets are published and preserved for the long term, as with these example datasets at Hull. The History DMP project built on this partnership and existing expertise to understand better the data management needs of researchers in the department of history, to prepare a departmental data management plan and an adaptation of the DCC’s DMP checklist for the needs of historians.
Cathy and Chris were joined in the DCC workshop by Sally Rumsey from the University of Oxford. Sally gave two presentations during the week at OR2012. In the DCC workshop she asked, what is ‘Just Enough Metadata’, what is sufficient metadata from the perspective of an institutional repository and data service. Metadata was also an important theme in her presentation in the general Research Data Management session in which she described the JISCMRD Data Management Rollout Project… of which more soon…
Institutional Data Repositories and the Curation Hierarchy: reflections on the DCC-ICPSR workshop at OR2012 and the Royal Society’s Science as an Open Enterprise report
At Open Repositories 2012 the Digital Curation Centre and ICPSR (the Inter-university Consortium for Political and Social Research) organised a workshop to consider issues around the roles and responsibilities of institutional data repositories.
Graham Pryor opened the workshop with an overview of the DCC’s programme of institutional engagements. The workshop featured three presentations relating to projects in the JISC Managing Research Data Programme and other institutional activities (from Cathy Pink on the progress made by the University of Bath’s Research 360 Project; from Sally Rumsey on the work of Oxford’s DaMaRO Project to establish a schema for minimal mandatory metadata; and from Chris Awre on the use of Hull’s institutional repository for curating research data and the History DMP project): on these, more in a forthcoming post. There were also presentations from Ann Green and Jared Lyle of ICPSR on the results of a survey looking at how institutions might work with national data centres (and in what areas institutional repository managers would seek to develop greater expertise through such relationships) and from Gregg Gordon on SSRN, the Social Science Research Network, which has a growing interest in joining up research data assets as well as articles, pre-prints and grey literature.
Angus Whyte has already written a very useful and comprehensive overview of the workshop. I want to focus here on the relationship between emerging institutional research data services and more established national and international data archives. This theme was central to the workshop and was introduced in Graham Pryor’s presentation using the ‘Data Pyramid’ as described in the Royal Society’s Science as an Open Enterprise Report.
The ‘Data Pyramid’ suggests that currently the management and preservation of research data may be considered as happening in four tiers, forming a hierarchy of increased value and permanence.
The Data Pyramid – a hierarchy of rising value and permanence, taken from the Royal Society’s Science as an Open Enterprise report, p.60.
Tier One comprises major international resources such as the Worldwide Protein Data Bank. In Tier Two we find the national data centres such as the UK Data Archive and the British Atmospheric Data Centre. Universities’ institutional data repositories and research data services, such as those being piloted in the JISC Managing Research Data programme are found in Tier Three. And in Tier Four come the data collections of individual researchers or research groups: as likely as not these are unsystematically structured and described, reside on temporary storage, shared only with collaborators and not subject to any plan for longer term preservation. Of course, the data pyramid is a useful model, but it does not accurately describe the current state of affairs, largely because Tier Three is underdeveloped (and Tier Four forms a far broader base to the pyramid than any diagram can allow…)
Graham used the data pyramid to ask some searching questions of workshop participants:
- What responsibility should academic institutions have for supporting the data curation needs of their researchers?
- What responsibilities should academic institutions have for curating the data they produce?
- Should academic institutions engage with these questions only where there is no tier 1 or tier 2 service available?
A challenging and provocative way of putting the final question is to ask, as Angus did so neatly in his post, whether universities and other research institutions really want to be ‘lenders of last resort’, providing ‘a home for orphaned data to fill gaps left by national and international disciplinary repositories’?
These are extremely important and pertinent questions and discussions in the workshop were constructive for exploring these issues. Short of proposing precise answers to these questions, what I would like to do here is reconsider the data pyramid and note arguments raised in the Royal Society report as a way of discussing the role, responsibility and purpose of institutional research data services in relation to national and international data archives and collections.
It should be recognised from the start that the curation hierarchy comprising Tiers One-Four should not be considered non-porous and entirely independent. The challenge with which we are faced, in my view, is ensuring that the greatest amount of research data rises up the pyramid to the greatest degree possible and appropriate for the data in question. In particular, this means encouraging potentially valuable and reusable research data to be unlocked from the disaggregated storage and scarcely managed collections that characterises ‘Tier Four’. This objective points to the need for collaboration and coordination between institutional, national and international data services in a number of areas:
- to capture, preserve and then bring together dispersed datasets, adding value through discovery, curation and analysis functions when a critical mass is achieved;
- to promote a research culture that encourages the curation and preservation of research data;
- to help develop and cultivate the skills and services which enable these steps to happen.
In the OR2012 workshop, Ann Green and Jared Lyle made similar points neatly. They recalled Chris Rusbridge’s argument that digital preservation ‘is like a relay race, with different parties taking responsibility for a limited period and then ‘passing the baton”, in order to show how partnerships between institutions and data centres may be helpful to universities seeking to offer services for the long term preservation of selected research data. Ann and Jared also cited the recommendation from a 2007 report that domain specific archives should partner with institutional repositories ‘to provide expertise, tools, guidelines, and best practices to the research communities they serve’. Data centres like the UKDA and BADC are important centres of expertise, with already impressive outreach activities. Nevertheless, anything that can be done to amplify such work and to build up specific partnerships for the propagation of expertise and skills should be encouraged.
Tier Three services in universities have an extremely important role to play in a joined-up research data ecosystem. At present the gulf is cavernous between the relatively small proportion of research data that is preserved in national and international data services (Tier One and Two) and the vast amounts of research data that are of significance and value for verification and reuse, but are effectively lost (in Tier Four).
The Royal Society report recognises this and is unequivocal in its view that institutional research data services (Tier Three) need to be developed and that this is an area of ‘particular concern … in the [curation] hierarchy’ [Science as an Open Enterprise, p.63]. The reason for this is the crucial role of research institutions in propagating the skills, culture and policies which are necessary to respond to the growing imperative to make the most of research data assets. It is by means of institutional policies and services that research data currently lost and inaccessible in the individual collections that form the base of the data pyramid can be made available and reusable.
Much important data, with considerable reuse potential, is also lost, particularly when researchers retire or move institution. This report suggests that institutions should create and implement strategies designed to avoid this loss. Ideally data that has been selected as having potential value, but for which there is no Tier 1 or Tier 2 database available, and which can no longer be maintained by scientists actively using the data, should be curated at institutional (Tier 3) level. [Science as an Open Enterprise, p.63]
Institutional data services can form an elevator by means of which important data collections may emerge from the temporary and inaccessible storage of Tier 4. The Science as an Open Enterprise report makes the point that coherent, more highly curated datasets, to answer very specific research questions, will emerge from the heterogenous collections of services like the Dryad data repository or institutional data repositories.
Data collections often migrate between the tiers particularly when data from a small community of users become of wider significance and move up the hierarchy or are absorbed by other databases. The catch-all life sciences database, Dryad, acts as a repository for data that needs to be accessible as part of journals’ data policies. This has led to large collections in research areas that have no previous repository for data. When a critical mass is reached, groups of researchers sometimes decide to spin out a more highly curated specialised database. [Science as an Open Enterprise, p.62]
The JISC Managing Research Data Programme is helping universities develop policies, strategies and curation services which will allow this role to be performed in the broader data ecology. As already noted on this blog, the Science as an Open Enterprise report recognises the importance of this activity and recommended that it ‘should be expand beyond the pilot 17 institutions within the next five years.’ [Science as an Open Enterprise, p.73] However, if the most is to be made of such investment lessons should be learnt from the approach taken by the Australian National Data Service. Recognising that a significant amount of research data management must happen in institutions – because it is in institutions that the systemic change must happen which will allow the capture of ‘the wide range of data produced by the majority of scientists not working in partnership with a data centre’ – ANDS have also developed an infrastructure, the Australian Research Data Commons, which allows institutional data collections to feed into national and disciplinary collections and discovery portals.
The Australian Data Commons, taken from the Royal Society’s Science as an Open Enterprise report, p.69.
Along similar lines, Gregg Gordon described the value added and connecting services which SSRN could offer as ‘glue for data repositories’. Such services can be built on the data assets curated by Tier 3 institutional data repositories.
To my mind, such arguments and examples make a strong case from a strategic perspective for investment in the development of Tier 3 data services in research institutions and that such data repositories can and should contribute in ways that go beyond being a repository of last resort. But they also recall the need to develop services which allow data to be most easily aggregated and for more highly curated collections to be constructed in response on the one hand to the opportunity created by the development of a critical mass of data in a given area, and on the other hand to the emergence of new research activities ready to exploit this asset.
One of the casualties of my recent stay in hospital was my moustache, which – of late – had grown to impressive proportions. As the NHS does not employ barbers to keep the inmates kempt, the moustache became unruly and had to go. I do not pretend that my ‘lip-joy’ has not divided opinion – but if you can see past it, perhaps the videos (linked below) of presentations that I have recently given may be of interest.
Encouraging Data Publication
I was invited to speak at the 1 June Repositories Support Project event at RIBA entitled Scholarly Communications: New Developments in Open Access. The event was extremely well-attended and featured an insightful keynote from Martin Hall, as well as an inspiring account of progress in Open Access from Alma Swan.
The theme of the event gave me the opportunity to discuss the various drivers for data publication (including the need for research findings to be verifiable and reproducible and the benefits of data re-use, particularly for metastudies) and to relate how the JISC Managing Research Data Programme has been seeking to encourage initiatives to link publications more effectively with the data which forms the evidence for research findings. One of the biggest successes for the Managing Research Data programme in this area has been the DryadUK Project, which engaged with UK publishers and helped the Dryad Data Repository initiative become truly international. I also took the opportunity to announce new JISC projects to promote data publication, about which more in a later post.
Video of presentation on the RSP YouTube Channel: Encouraging data publication – the JISC Managing Research Data Programme
UK Universities and Big Data
On 12 May, I gave a ‘lightning talk’ as part of the Eduserv Symposium on ‘Big Data’. The message I sought to communicate here is that many universities, as they start to tackle the research data challenge are realising just how much data is being stored by researchers in sub-optimal ways. This is a particular type of big data – it is unmanaged, vulnerable to loss and under exploited. The challenge, as often, is to develop better individual and organisations awareness of the issues involved. Decisions which are currently often made by accident or default – what data to retain and what to discard – should be made actively. And researchers should be encouraged to make data openly available, and supported by their institutions in this. I pointed to the work of the JISC Managing Research Data programme in this regard, and also to the pioneering work of the Australian National Data Service whose mantra of transforming data that are unmanaged, disconnected, invisible and have only a single use into structured collections that are managed, connected, findable and reusable offers a useful guide to how to deal with this type of ‘big data’.
Video of Presentation: Universities and (Big) Research Data
University Libraries, Librarians and Research Data
Back in March and April, I gave two presentations discussing the role of university libraries in responding to the research data challenge. The first was to the RLUK Members Meeting in Aberdeen and the second to an RLUK and DCC organised workshop with the aim of Clarifying the Roles of Libraries in Research Data Management. In both presentations, I discussed national and international developments and the work of projects in the JISC Managing Research Data programme, many of which are library-led. It is clear that university libraries must have a role in institutional responses to the data challenge, but it is equally clear that responses must be coordinated, involving a variety of campus agencies and stakeholders. Above all, cooperation between libraries, computing services and the research office is essential. But in all this, it is fundamental to work closely with research departments and groups to understand their requirements and perspectives on these issues.
Presentation to RLUK Members Meeting: Research Data Management, the institutional and national challenges
Presentation to RLUK Research Data Workshop: Changing research practices and the changing roles for university libraries to meet the research data challenge
I recently broke my leg: to be precise a lateral tibial plateau fracture, which required an operation, plates and pins to put back together. This means that until the end of August 2012, I shall be in a knee brace, unable to put weight on my right leg and moving around on crutches. The discomfort and lack of mobility that results from this condition means that I shall largely be working from home. One of the silver linings is that this gives me a chance to catch up with this neglected blog.
The last few months have been typically busy. The Managing Research Data Strands of the recent JISC Digital Infrastructure 01/12 Call resulted in a set of projects developing RDM training materials and others looking at data publication: more about these in a later post.
A workshop was held at the end of March for the projects examining the challenges of data management planning in various disciplines, and by and large these projects have now completed. Chris Rusbridge, formerly director of the Digital Curation Centre, is now preparing a summary and synthesis of these projects and others from the first Managing Research Data programme.
The recent Royal Society report, Science as an Open Enterprise, included under Recommendation 3 as a UK specific action:
b. JISC’s Managing Research Data programme, or a similar initiative, should be expanded beyond the pilot 17 institutions within the next five years. The aim of any initiative should be to support a coordinated national move towards institutional data management policies. (p.73)
The shining thread which runs through the report is the need – intelligently – to improve the availability of research data for verification and reuse, and the recommendation noted here is a strong endorsement of the work being undertaken within the Managing Research Data programme. The seventeen research data infrastructure projects have been making excellent progress, which can be followed in the very rich and informative blogs. In collaboration with the DCC, I shall be running a workshop in September to allow projects to examine and discuss the progress which they are making towards developing the various components of research data services.
The Managing Research Data Programme Evidence Gatherers have been industrious and are liaising with the projects to support their work and to ensure that, as a programme, we have a strong evidence base to show the benefits which derive from improved research data management. More on this to follow too.
Finally, there have been a number of important workshops and conferences in the last few months, organised variously by JISCMRD projects and by other organisations interested in this space. At some of these I have given presentations and intend to share some of these – and the arguments they seek to advance – here over the next few weeks.
On 12-13 March, in conjunction with DCC and the University of Leeds ROADMAP Project, the JISC Managing Research Data Programme held a workshop to examine challenges around developing institutional research data management policies.
This is an important area of activity for projects in the JISCMRD Programme. All 17 large infrastructure projects are developing policies – in 13 cases these are for the principle host institution. In the other four – looking at disciplinary or metadata issues across institutions – such policies will apply perhaps at the institutional level, perhaps at the departmental level, across a number of partner institutions.
Agreeing an institutional RDM policy has been viewed as an important step in establishing an effective and sustainable RDM infrastructure and support service. It sets the tone, underlining an institutional commitment and expectations. Projects report that the developing a policy has been useful in getting support from key stakeholders, in clarifying priorities and in testing a projects’ understanding of where roles and responsibilities lie in the provision of research data management support.
The DCC has identified three currently public RDM policies in UK universities, as well as a Statement of Commitment. A number more are currently at advanced stages in the approval process.
Such efforts have been given a shot in the arm by the EPSRC Policy Framework on Research Data, which requires that research institutions prepare an internal roadmap by 1 May 2012. The roadmap is an instrument – resulting from a gap analysis by which an institution will ensure compliance with EPSRC’s expectations by 1 May 2015.
An institutional RDM policy is likely to be an important part of such efforts.
In parallel breakout sessions, the workshop tackled three thematic areas relating to the development of RDM policies. These were:
- What approach are institutions taking to the development of RDM policies?
- How is support and approval being gained for the ratification of RDM policies?
- How are institutions planning to support the implementation of the policies?
In a fourth – plenary – session, the workshop discussed issues around the related issue of preparing a roadmap to meet EPSRC requirements.
DCC officers and JISCMRD Evidence Gatherers were on hand to facilitate discussion and to take notes. The notes will form the basis of outputs from the workshop. At the time of writing, these are likely to take the form of an abbreviated summary of discussions, key points and outstanding questions. This will undoubtedly also contribute towards a DCC checklist or step-by-step guide to developing an institutional RDM policy, based upon the experiences and findings of the projects and institution involved.
My immediate ‘take homes’ were as follows:
- A checklist of policies and supporting guidance and help materials for institutions developing and implementing such policies would be very useful. This is something on which the DCC could work, in collaboration with and collating findings from the JISC Managing Research Data Programme. Relatedly, Catherine Pink of the Bath Research 360 project has posted a preliminary checklist of guidance required.
- Some projects will be developing guidance materials in the form of researcher targeted scenarios or workthroughs. This idea was warmly received and it was also suggested that the process of creating such materials could provide valuable deeper understanding of requirements for institutional infrastructure and support services.
- One of a number of recurrent themes over the two days was the question: ‘what research data should be retained’. It was felt that policies – or the guidance materials supporting them – should do something to guide researchers and support staff in this regard. Disciplinary research considerations are likely to be the primary determinants of potential reuse value of research data – but it is possible to adduce some general principles. It was felt that more guidance and – above all – examples in this area would be extremely useful.
Since the workshop, a number of attendees have posted blogs containing reflections and summaries:
Bill Worthington, Research Data Toolkit Hertfordshire Project: Reflections on JISCMRD-DCC Policy Workshop
Laura Molloy, University of Glasgow and JISCMRD Evidence Gatherer: Emerging Themes from the JISCMRD Institutional RDM Policy Workshop
Scott Brander, Cerif for Datasets Project: Institutional Data Management Policies and Roadmaps
Angus Whyte, DCC: Turning Roadmaps into Action
Stephen Gray, data.bris Project: The Value of Research Data
It is worth noting that the scene was set by a couple of blog posts:
Jonathan Tedds, University of Leicester and JISCMRD Evidence Gatherer: Developing Research Data Management Policy
Sarah Jones, DCC: Navigating the Potholes
Presentations from the JISC Managing Research Data Programme workshop on data management planning – held on Friday 23 March – are now available.
The event page on the JISC website provides links to presentations given by projects tasked to explore the challenges of designing and implementing data management plans for research projects or for departments in specific disciplines, and to customise and implement the DCC’s DMPonline tool for specific uses. Information about the projects can be found here and here.
David Shotton, PI of the Oxford DMPonline Project has made the content of the analysis he presented available as a couple of blog posts in which he categorises and aligns the questions that comprise a data management plan ; and then makes comparisons of DMP questions sets and draws conclusions.
At the workshop, Adrian Richardson of DCC demonstrated version 3 of the DMPonline tool and his colleague Kelly Miller ran an exercise to gather feedback and suggestions for further development. Kelly has blogged a summary of these suggestions on the DCC website.
Those with an interest in the interim outputs from the JISC-funded data management planning projects – there are more to come – may be interested in the Appendices to the Workshop Programme where outputs are listed and which may be accessed here.
A competition was held to gather participants’ opinion as to:
1) which project had produced the most reusable outputs;
2) which project had produced the most potentially significant outputs (even if they were not yet reusable); and
3) which project participants wanted to find out more after the workshop.
The winners in each category were awarded a copy of Managing Research Data, ed. Graham Pryor and were as follows:
3) David Shotton, for the Oxford DMPonline Project.
All the projects in this Strand of the Managing Research Data Programme are to be congratulated in their work so far. Most of these projects will be completing between now and the end of May 2012 and will be making their final deliverables available during that timeframe.
Joss Winn has recently blogged about the Orbital Project’s use of the OAIS Reference model in which he makes some very important points.
The mistake to make with OAIS is looking at the model and thinking that you have to create a system that is designed in such a way, rather than functions in such a way. The OAIS standard is a tool that allows Archivists, Designers and Developers to share a common language when discussing and planning the implementation of a digital archive and what that archive should do, not how it should be designed.
The task ahead of the Orbital Team is to consider how our on-going [system] designs relate to the OAIS functional model.
In particular, Joss points to the OAIS Function Cards produced by Cornell and Göttingen State Universities. These are allowing the Orbital project to ‘tick off’ high level and more detailed OAIS functions in their system design.
Joss ends the blog: ‘I’d be very interested to hear from other MRD projects that are looking at the OAIS standard in detail, as well as people from earlier projects that have been through this process before.’ Please get in touch with him on the Orbital Blog.
The Managing Research Data – Gravitational Waves Project produced, inter alia, two key recommendations:
- Funders should simply require that a project develop a high-level DMP as a suitable profile of the OAIS specification.
- Funders should support projects in creating per-project OAIS profiles which are appropriate to the project and meet funders strategic priorities and responsibilities.
The MaRDI-Gross project therefore, will develop an intellectual ‘toolkit’ which will be supporting infrastructure for project-specific DMP planning, built on the OAIS model. The project argues that ‘the data-management systems for projects of this scale are essentially always bespoke, so that there is no useful way in which software or ‘tick-list’ components can be provided for DMP planning in this space. The ‘kit’ must instead be a set of documentary resources targeted at technical managers and engineers.’
The likely structure for the toolkit will be:
- Overviews of the OAIS methodology and goals
- Indications of the state of the art in OAIS application, validation and auditing.
- (References to) Backup documentation including the OAIS specification and selected developments or applications of it.
- Case-studies of existing DMP efforts in existing projects.
- Costing models for DMP planning, to the extent that this is feasible and useful.
The project also aims to develop material tailored for use by funding organisations in the development of policy and guidelines, as well as assessment materials for funders and reviewers.
The project has suffered a few delays, but is aiming to deliver a draft or strawman version by the significant date of Tuesday 21 February. Watch this space!
Some info on metadata standards being used by JISCMRD projects is available on the commonalities spreadsheet.
There was a breakout group at the Programme Launch Workshop on metadata standards: for blogged accounts of this session, see http://researchdataessex.posterous.com/metadata-session-feedback-mrd-2011-13-program and http://sonexworkgroup.blogspot.com/2011/12/thematic-parallel-session-on-metadata.html
The Metadata Breakout Group came up with some actions, though these are reported a little differently on the two blogged accounts…:
- Louise [Corti] would take a first pass at a grid template and send this round for comment
- Projects working in similar domains to consult each other about the use of metadata, early on
- Simon [Hodson] to organise a Programme meeting to be held in early Spring 12 to discuss metadata further and gain some agreement.
- Trying to locate (or otherwise collect) an already existing registry of metadata standards for different disciplines, in order to offer researchers from a given discipline an already tested metadata schema they can re-use,
- Mapping metadata standards to each other aiming to produce a minimum-sufficient-information metadata set that may be widely applicable accross disciplines,
- Taking steps towards organising a workshop in order to have metadata issues discussed among relevant stakeholders. ANDS Metadata Workshop in 2010 might be a potential source of inspiration for this with all those discipline-based approaches to metadata standards. Proposed dates for this Metadata WS were spring-summer 2012.
Louise has sent me a template, and I confess – mea culpa – that it got buried for the time being. It is now available as JISCMRD Projects’ Metadata Usage. It would be good if projects working in similar domains could build on this to share information about standards being used. We should explore whether there is already an existing registry of metadata standards for different disciplines? Does anyone know of such a thing?
I think it is very important for the programme as a whole develop convergence upon ‘a minimum-sufficient-information metadata set that may be widely applicable accross disciplines’ which may be used by the projects. To work towards this end, there will be a programme workshop on aligning JISCMRD projects’ metadata strategies. I plan to organise this for May.