Also posted on the Now and Future of Data Publishing symposium blog.
We have been overwhelmed by the interest in the Now and Future of Data Publishing and the venue is full! The same applies (nearly) for the ORCID Outreach meeting and the Joint ORCID – Dryad Symposium on Research Attribution.
However, there are still some places for the Dryad ‘Open’ Meeting (membership and more) on Friday 24 May. This meeting is open to Dryad members, prospective members and other interested parties and provides a wonderful opportunity to learn about recent developments, get a preview of upcoming features, have a say in the governance of the organization, exchange ideas and experience with other members and Dryad staff, and help chart the future of data archiving. For a place at the event, register here.
So in response to the demand, it will be possible to follow all four events (over three days) online! Click on the links below. The NFDP13 meeting is using Webex, provided by Dryad through Duke University. Instructions are at the foot of this post.
Webex Details for Now and Future of Data Publishing, Wed 22 May
Topic: Now and Future of Data Publication
Date: Wednesday, May 22, 2013
Time: 9:00 am, GMT Summer Time (London, GMT+01:00)
Meeting Number: 732 521 534
Meeting Password: opendata
To start or join the online meeting
Go to http://bit.ly/nowfuturedata
Audio conference information
Call-in toll number (US/Canada): 1-650-479-3207
Global call-in numbers: https://dukeuniversity.webex.com/dukeuniversity/globalcallin.php?serviceType=MC&ED=214875007&tollFree=0
Access code:732 521 534
1. Go to https://dukeuniversity.webex.com/dukeuniversity/mc
2. On the left navigation bar, click “Support”.
To add this meeting to your calendar program (for example Microsoft Outlook), click this link:
To check whether you have the appropriate players installed for UCF (Universal Communications Format) rich media files, go to https://dukeuniversity.webex.com/dukeuniversity/systemdiagnosis.php.
IMPORTANT NOTICE: This WebEx service includes a feature that allows audio and any documents and other materials exchanged or viewed during the session to be recorded. You should inform all meeting attendees prior to recording if you intend to record the meeting. Please note that any such recordings may be subject to discovery in the event of litigation.
Also posted on the Now and Future of Data Publishing symposium blog.
The programme for the Now and Future of Data Publishing Symposium is now available! It features keynotes from Peter Fox, of Rensselaer Polytechnic Institute, an erudite and engaging thinker on matters of data publication; and from Unni Karunakara, President of Médicins Sans Frontières. The closing address will be delivered by John Wood, Secretary General, The Association of Commonwealth Universities and Research Data Alliance Council.
The bulk of the day comprises a series of panel sessions, some in plenary, some in parallel, in which a host of experts involved in various ways with data publication will address a series of key issues. This is a format that we hope will encourage discussion and allow participants to set an agenda of issues to be tackled to help shape data publishing over the next few years.
Unfortunately, the event is already full but a waitlist has been set up. That the event should have reached capacity even before the programme was finalised demonstrates how lively the interest in data publishing is across a range of communities.
Dryad Membership Meeting and other events
Should you be disappointed and not able to attend NFDP13, do not despair: there is a series of exciting events in Oxford in the same week. In particularly, I would like to draw readers attention to the Dryad Membership Meeting, on Friday 24 May, which it should be stressed is not just restricted to Dryad members. In fact, it is open to Dryad members, prospective members and other interested parties! The meeting provides an opportunity to learn about recent developments, get a preview of upcoming features, have a say in the governance of the organization, exchange ideas and experience with other members and Dryad staff, and help chart the future of data archiving. The programme for the Dryad Membership Meeting is now available and you can register here.
We look forward to seeing you in Oxford at any of the associated meetings: the Now and Future of Data Publishing Symposium 2013; the ORCID Outreach Meeting; the Joint Dryad-ORCID Symposium on Research Attribution; and the Dryad Membership (and more) Meeting.
Followers of this blog may be pleased to know that the training materials developed by the DMTpsych Project are now available in Jorum.
These materials were originally developed by the DMTpsych Project, part of the training strand of the first Jisc Managing Research Data programme. Richard Plant, who led this project went on to work on the Data Management Planning and Storage for Psychology at Sheffield.
The materials have been reused and adapted by the current TraD RDM training project at the University of East London.
Unfortunately, the server which housed these materials at the former Higher Education Academy Psychology Subject Centre at York apparently ‘exploded’ in the Autumn and these materials – valuable, reusable and the product of much hard work and investment – have not been available for the last few months.
Now – happily – the training materials have been deposited in Jorum and they are once again available for download and reuse. Would it be over egging the pudding to suggest that an analogy can be made here with the data produced by research projects…?
Here is the Jorum description:
DMTpsych: Data Management Training for the Psychological Sciences provides resources to help postgraduates and researchers in psychology learn about data management and develop the skills to prepare data management plans. Such plans are being increasingly required by research funders. A workbook is provided that guides the reader through developing a discipline specific plan in accordance with the guidance available on digital curation provided by the Digital Curation Centre (see http://www.dcc.ac.uk/dmponline). Powerpoint slides are also available for use in training and can be adapted by lecturers for use in their own research methods training programmes.
In conjunction with a range of partners including BioSharing, DataONE, Dryad, the International Association of Scientific, Technical and Medical Publishers, Wiley-Blackwell, and others, the Jisc Managing Research Data programme is organising a one-day symposium on The Now and Future of Data Publishing. This landmark event will take place on 22 May 2013 at St. Anne’s College, Oxford, UK.
There are a growing number of initiatives around data publication and these respond in part to changing research practices but also to changing policies and advocacy of more open research. The symposium will provide an overview of the current landscape, interrogate the apparent benefits for researchers and research more generally and examine visions of the future of data publication. Above all, what sort of data publication will most engage researchers and most benefit research?
The event will be open to over 100 participants, comprising key stakeholder groups: researchers, research funders, policy makers, journal editors, publishers, data curators. The day will feature a number of provocative and inspiring keynotes. In addition, there will be panel sessions addressing key themes around data publication. Further details and a full programme will appear here soon. Registration is already open here!
Dryad and ORCID Events
‘The Now and Future of Data Publishing’ is just one of a number of exciting events taking place at St. Anne’s College, Oxford, UK from Wednesday 22 May to Friday 24 May 2013.
Also taking place are the ORCID Outreach Meeting (Thursday 23 May, am only); Getting Credit for Your Work: A Symposium on Research Attribution jointly organised by Dryad and ORCID (Thursday 23 May, pm only); and the Dryad Membership Meeting (Friday 24 May, all day). You can register for all these events, and The Now and Future of Data Publishing here!
In addition to these events, ORCID will host a CodeFest May 23-24, 2013 at St. Anne’s College in Oxford, UK. The event theme is Connections: Mash ups with the ORCID API. Meet ORCID technical staff, learn about our development resources, and bring your ideas for new tools. Prizes will be awarded to top contributors. Register today!
The Now and Future of Data Publishing is supported by:
Components of Institutional Research Data Services Workshop: introductory and concluding presentations
On 24-25 October 2012, in partnership with the Digital Curation Centre’s Institutional Engagements, the JISC Managing Research Data Programme ran a two day workshop on the ‘Components of Institutional Research Data Services’. The workshop functioned as an opportunity for JISCMRD projects, for institutions working with the DCC and for other ‘fellow travellers’ to share progress and consider the challenges in developing research data management services.
The workshop programme is available here.
I gave presentations to introduce and conclude the workshop and I have made these available via SlideShare.
The JISC Transformations Programme aims to support institutions achieve large scale change in a number of areas by supporting the deployment of JISC and non-JISC resources. In Strand B, which is concerned with improved efficiency and/or cost savings in meeting institutional missions, there are two projects concerned with supporting better management of research data.
The two projects, at York and at Leicester, are using outputs from the JISC Managing Research Data programme. Both projects have blogs which are linked below:
University of York, Tools for RDM Development: http://uoy-rdmproject.blogspot.co.uk/
University of Leicester, RDM Support Service: http://amburnham.jiscinvolve.org/wp/
The overarching aim of this programme area is to contribute to an increase in research data management skills in UK higher education and research organisations. This will be achieved by providing high quality training materials which will serve the needs of a variety of roles and stakeholders requiring research data management skills.
There is a recognised need to increase skills in managing research data among staff in HEIs, including researchers, librarians and research support staff. Important work was accomplished by projects in the first Managing Research Data Programme, which developed discipline-specific training materials, but this work was limited to certain research areas.
The present strand aims to build on previous achievements by addressing remaining gaps in availability of discipline-focussed training materials, targeting disciplines not covered by previous JISC projects or other work. In particular, there is a need for training for subject or research librarians whose role will increasingly include providing support for researchers in making best use of the research data infrastructure and services which may be available (inter-)nationally or at an institutional level.
The JISC Managing Research Data Programme 2011-13 has funded four projects to design, pilot and test training materials for research data management adapted for the needs of discipline-focussed post-graduate courses and for subject or discipline liaison librarians.
RDMRose, University of Sheffield
RDMRose will develop and adapt learning materials about RDM to meet the specific needs of liaison librarians in university libraries, both for practitioners’ CPD and for embedding into the postgraduate taught (PGT) curriculum. Its deliverables will include OER materials suitable for learning in multiple modes, including face to face and self-directed learning.
Project Website: http://www.shef.ac.uk/is/research/projects/rdmrose
RDMTPA: Research Data Management Training for the whole project lifecycle in Physics & Astronomy research, University of Hertfordshire
The RDMTPA project will build on existing JISCMRD work, both within and outwith the University of Hertfordshire, and will work with the Centre for Astrophysics Research (CAR) and the Centre for Atmospheric & Instrumentation Research (CAIR) to develop a short course in Research Data Management, directed at Post-Graduate and early career researchers in the physical sciences.
SoDaMaT: Sound Data Management Training
SoDaMaT will develop discipline-specific research data management training materials for postgraduate research students, researchers and academics working in the area of digital music and audio research.
Project Blog: http://rdm.c4dm.eecs.qmul.ac.uk/category/project/sodamat
TraD: Training for Data Management at UEL
TraD will embed good practice in data management (DM) at UEL by developing disciplinary training material for postgraduate curricula, adapting existing materials in the area of psychology and developing new materials for computer science. The project will provide training opportunities for research staff and a learning module for library support staff.
Project Blog: http://datamanagementuel.wordpress.com/
A fifth project – DaMSSI-ABC – will provide a support function, to assist projects in following best practice, ensure reusability, engage stakeholders and synthesise outcomes.
DaMSSI-ABC, DCC, University of Glasgow, University of Manchester, RIN and Vitae
DaMSSI will support and improve coherence in the development, dissemination and reuse of research data management training materials developed by the JISC RDMTrain projects. Specifically, DaMSSI will:
- work with relevant professional bodies / learned societies and funders, to endorse and promote good data management practice;
- classify course offerings, by ensuring that the anticipated outcomes of training interventions are clearly set out to allow participants to select the training that best meets their learning objectives;
- identify and agree benchmarks on learning outcomes on learning outcomes and means of assessment so that courses from a range of training providers can be effectively compared.
Project Website: http://www.dcc.ac.uk/training/damssi-abc
Project Blog: http://damssiabc.jiscinvolve.org/wp/
Manage locally, discover (inter-)nationally: research data management lessons from Australia at OR2012
What to keep and why; how to support research data management through the lifecyle; and how to make the data citable, discoverable and reusable: these are core questions in research data management. They are questions with both human and technical aspects. These are the issues which Exeter is addressing through advocacy and training, its draft RDM policy and plans for a sustainable service; and which Oxford is seeking to tackle through DataFinder and ‘just enough metadata’.
With a sizeable national investment and an impressive coordinated approach, Australia – in the form of the Australian National Data Service and a host of institutional projects – is providing useful examples of how these questions may be answered.
Natasha Simons, Griffiths University: Enabling data capture, management, aggregation, discovery and reuse
Natasha described the development of the Griffith University Research Hub, a metadata store solution designed as far as possible to automate the collation of new research data held in the university.
Metadata relating to research data created by Griffiths researchers and largely curated in the Griffiths research data repository is exposed by the Research Hub for harvesting to the ANDS Research Data Australia service ‘a set of web pages describing data collections produced by or relevant to Australian researchers. Research Data Australia is designed to promote visibility of research data collections in search discovery engines such as Google and Yahoo, to encourage their re-use.’
Metadata is exposed using RIF-CS (Repository Interchange Format – Collections and Services) a high level schema structured around four classes of objects: collections, parties, activities and services.
The Griffiths Research Hub metadata store is based on VIVO, a triple-store solution, and uses the ANDS-VITRO ontology for describing research activity. VIVO-VITRO is one of a number of metadata store solutions encouraged by ANDS and being implemented by ANDS funded projects. For more detail about the Griffiths implementation of VITRO see the DLib article Wolski et al 2011, Building an Institutional Discovery Layer for Virtual Research Collections.
As well as contributing to ANDS’s broader objectives in Research Data Australia, the benefits of the Griffiths Research Hub are to provide a platform of linked information about the university’s research activities – potentially a rich and valuable resource for the management of research information, grant funded projects and the development of collaborations and new initiatives.
Just as the Griffiths Research Hub exposes information about researchers, projects and research data, so Research Data Australia provides a platform to discover information about research data created by Australian researchers. It remains early days – analytics do not yet exist to show to what extent this platform is assisting discovery and reuse – but the potential is clear.
Anthony Beitz, Monash University, Institutional infrastructure for research data management
Anthony described an integrated and strategic approach to supporting researchers eResearch and data management needs. Fundamental to the Monash approach is the recognition that researchers, for good reason, tend to prefer more bespoke, community developed solutions to blunt and generic platforms that are often the wares of centralised IT services. Anthony was unequivocal: ‘If a research community already has an RDM solution, or an emerging one, then it is this which should be adopted and supported.’
One suspects that few would disagree with this in principle… But at a time when in the UK IT support is being withdrawn from research departments, the cry from IT directors will be: ‘Great, but how is this to be resourced.’ A good and pertinent question. But equally, real attention needs to be paid to researchers needs. There is little point in providing generic solutions if these do not respond sufficiently to researchers requirements and are scarcely fit for purpose.
I took Anthony’s point to be that it is of fundamental importance to be sensitive to the objectives and requirements of specific research areas.
For a RDM platform to be effective and have high utility, it must fit in with researchers’ tools, workflows, instrumentation, methodologies, environment, and most importantly culture. As most of these features vary from discipline to discipline, it is unrealistic to believe that a singular approach to RDM will consistently meet researchers’ needs. Indeed, research institutions should expect that a range of RDM platforms will be required in order to accommodate their researchers.
Monash uses a team of developers and agile software development methodology to support this. And the onus is upon engaging with specific research groups and communities. The Monash approach is to work along a decision tree: if possible adopt a third party product; if necessary adapt a product to disciplinary or local needs; and if these options fail to develop the product locally.
The focus on the requirements of reach communities applies both to the support of research activity (data capture, analysis etc; the active data phase) and the curation and archiving of data which in some sense is complete (the data publication phase). For the archiving and publication phase, the Monash approach is manage locally, and promote discovery (inter-)nationally by propagating metadata to national registries such as Research Data Australia, or such disciplinary hubs as may exist. Once again, this seems to push a great deal of responsibility for curation and archiving the way of the institution. The Monash response is to meet this challenge and ‘form a separate specialised support group for RDM infrastructure’.
A lot of institutions will find this approach daunting. But many of the arguments about utility and the need for products that are fit for purpose are fundamentally persuasive. It will be important to understand more about and to learn from the Monash model.
The issue of how to fund a research data management infrastructure on a sustainable basis while only partially relying on cost-recovery from grant funded research projects is a matter of concern for all JISCMRD projects and all institutions, including Open Exeter… In relation to this issue, and others, Open Exeter is paying particular attention to how the university can best support the RDM requirements of post-graduate students.
Hannah Lloyd-Jones, Open Exeter Project, University of Exeter: post-graduate research data, a new challenge for repositories?
Hannah gave a clear and comprehensive overview of the work of Open Exeter. The presentation is available from the first data management session on the OR2012 Conference website.
The project is divided into four areas of work:
Technical development: focussing providing a DSpace instance for research data, with underlying storage, and ensuring integration of document and data repositories.
Creation of training materials and guidance: to support researchers and research support staff in the use of the data infrastructure. Exeter’s guidance pages are currently in construction.
Advocacy and governance: to establish the institutional policies around the management, retention and publication of research data.
The fourth strand of the project is a distinctive feature of the Open Exeter project. ‘Follow the Data’ describes the detailed work the project is doing to understand researcher requirements. This has involved research based on the DCC’s data asset framework methodology (comprising an online survey and follow-up interviews). A report summarising findings has recently been published.
Open Exeter is also working closely with a cohort of post-graduate research students: this approach has the dual benefit of helping the project understand research practice and RDM requirements, while also assisting advocacy and dissemination of project objectives.
This focus also emanates from a widespread concern – prevalent at Exeter and other institutions – with what happens to PGR research data at completion. At the moment, Exeter requires the deposit of post-graduate theses in the institutional repository, but – surprisingly – not the data substantiating the theses’ findings. This is a matter of concern – potentially of frustration and consternation – in departments where the research data may form part of the ongoing research initiatives, part of the department’s research assets, its institutional memory.
The Open Exeter has prepared separate draft RDM policies for researchers and for PGR students. The draft policy for PGR research data notes: ‘The security of PhD students’ data is of particular importance when it is embedded in a larger research project and will need to be accessed after the completion of the students’ degree.’
To support the objectives of these draft policies, the Open Exeter project will offer an infrastructure to allow the following: deposit of data with thesis with a simple deposit mechanism; the repository will assign a persistent ID, linking the data to the thesis. The project is also focussing on awareness raising and embedding cultural change in research community through a PGR focussed support network.
The Open Exeter Summary of Findings from the Open Exeter Data Asset Framework Survey, provides some interesting insights. The overwhelming message is that the university cannot just provide an RDM service for those researchers with externally funded research. In all schools and at all career stages, there is a substantial amount of research being conducted which does not have an external funder and is funded by the university itself. Non-grant funded research at Exeter includes research involving commercially or personally sensitive data, and includes some post graduate research data also. For an institution that endorses the view that ‘good practice in research data management is a key part of research excellence’ it is scarcely conceivable that an RDM service and infrastructure could be limited to those researchers and projects with external sources of funding. The data produced by internally funded research is an institutional asset requiring careful management and, where appropriate, archiving, publication and dissemination. However, the challenging conclusion from this observation is that ‘there could only ever be partial cost recovery from grants (via direct or indirect costs) for future staffing and infrastructure for research data management.’ [p.4] Following from this, the report observes that ‘new responsibilities will need to be accepted into central and college teams’. Sustainability models for institutional RDM services ‘are likely to include recommendations for additional dedicated staffing to help manage and monitor institutional research data management policy and practice.’ [p.6]
The Exeter report provides some grounds for the view that costs of an RDM service may be offset by indirect means: avoiding the loss of research income [p.16]; reducing data loss [p.32]; cost and efficiency savings through better management and more effective data disposal [p.35]. Most importantly, the costs of the RDM service might be controlled – and good practice made more effective – by providing ‘clarification regarding when to archive and what to archive (criteria for retention or disposal)’. [p.35]
Arguments in favour of research data sharing stress the need for verification and reproducibility. It is fundamental to the scientific method and to good research practice for other researchers to be able to test the evidence underpinning the hypotheses and interpretations presented in a given scholarly publication.
In recognition of this a number of journals have recommended or mandated that research data be deposited in appropriate data repositories prior to publication. Parallel to this, there are a growing number of initiatives that explicitly link journal articles with the underlying data or that may be characterised as data journals, championing the publication of research data sets with commentary, analysis and visualisation.
Technical, procedural and cultural challenges exist around the use of identifiers, exchange of metadata, effective linking and data citation. There is also a need to establish sustainable partnerships between journals, data centres and research organisations which are necessary to underpin innovative forms of data publication.
Innovative data publications are likely to provide researchers with recognition and reward for making datasets available and thus encourage data to be viewed as a first class research output, for data publication to be considered an essential part of the scholarly process. Likewise, it seems likely that as well as making it easier for researchers to locate and access datasets, linking between publications and supporting data will provide a means for established data centres, or even institutional data repositories to enhance and draw attention to well-curated research outputs.
For partnerships around data publication to become established, there are important questions to be considered:
What policies are required on the behalf of journals’ editorial boards to achieve greater levels on data sharing, citation and linkages between publications and datasets?
What partnerships between journals, data centres and research organisations are necessary to establish sustainable solutions, and what business models are appropriate?
How may the costs of long term data archiving be met and appropriately distributed in models that stress the importance of publishing data and linking data sets to published outputs?
- What characterises a suitable repository and what criteria of quality and assurance are necessary of the data archive underpinning such collaborations?
- What, if any, peer review of data is appropriate before publication?
The JISC Managing Research Data Programme 2011-13 has, therefore, funded two projects to design and implement innovative technical models and organisational partnerships to encourage and enable publication of research data. These projects will also explore these questions listed above and thereby shed light on solutions which will enable the greater development of data publication.
PREPARDE: Peer REview for Publication & Accreditation of Research Data in the Earth sciences
PREPARDE will capture the processes and procedures required to publish a scientific dataset, ranging from ingestion into a data repository, through to formal publication in a data journal. It will also address key issues arising in the data publication paradigm, namely, how does one peer-review a dataset, what criteria are needed for a repository to be considered objectively trustworthy, and how can datasets and journal publications be effectively cross-linked for the benefit of the wider research community.
Project website: http://proj.badc.rl.ac.uk/preparde
PRIME: Publisher, Repository and Institutional Metadata Exchange
PRIME will enable the automated exchange of metadata between publishers, subject-based and institutional repositories. A partnership between UCL, the Archaeology Data Service and Ubiquity Press, a campus-based open access publisher located at UCL, PRIME will ensure that each stakeholder has a record of content relevant to them, even when the data itself is held elsewhere.
As previously noted, scholarly journals are increasingly recommending or requiring as a condition of publication that research data should be made available in an appropriate repository. A service to collate and summarise journal research data policies would serve the purpose of providing researchers, managers of research data services and other stakeholders with an easy source of reference to understand the requirements and recommendations made by journal editorial board with regard to data sharing. Such a service would provide a useful information and advocacy tool for a variety of stakeholders in this area (including exponents of open data, research data infrastructure providers, institutional managers with responsibilities for research data management etc). It is also likely to provide a helpful incentive for the increasing systematisation and codification of such policies and for their more regular review.
JISC and other stakeholders need to understand precisely what is required in such a service and what business models are available to maintain a sustainable service, including a consideration of sources of funding and cost recovery.
The third project funded by the JISC Managing Research Data Programme in the area of data publication is feasibility study for a service to collate and summarise journal data policies, which will consider requirements and present possible business models.
JoRD: Journal Research Data Policy Bank
The Journal Research Data Policy Bank (JoRD) project will conduct a feasibility study into the scope and shape of a sustainable service to collate and summarise journal policies on Research Data. The aim of this service will be to provide researchers, managers of research data and other stakeholders with an easy source of reference to understand and comply with Research Data policies.
Project website: http://crc.nottingham.ac.uk/projects/jord.php