OR2016 conference report - Research infrastructure and data

It feels very much like another world, but two weeks ago I was lucky enough to attend the OR2016 conference in Trinity College, Dublin alongside Jisc colleagues including John Kaye and Balvair Notay.

Open Repositories 2016 gathers together interest groups based around the four major open repository platforms: dSpace, ePrints, Fedora and Invenio. There were interest sessions around both these projects and linked initiatives alongside more traditional themed conference sessions. The Jisc contributions were primarily focused on posters (covering the developing Research Data Shared Service solution and our more established work around Open Access to research).

With that description one might be forgiven for imagining a dry, technical conference but this could honestly not be further from the truth. The tone was set during an utterly spellbinding keynote address from Laura Czerniewicz of Cape Town University – her focus on the global reality of access to research made for one of those rare keynotes that reverberates through presentations and questions across an entire conference. “Open Access” to research, as is widely considered, addresses only a narrow slice of the huge issues scholars in Africa and elsewhere face in recognition, publication and citation. Even when writing about local issues and situations, more “established” scholars based in the UK, US, China and Europe are cited and published in preference to their own work. I made detailed notes of this presentation, and these – along with the slides – are available here.

Petr Knoth and Drahomira Hermannova’s “Open Citation Experiment” provided another major talking point – with the poster detailing their Jisc-funded work winning an award. I also saw an “applied” demonstration from Petr using a mixture of statistics from (Jisc/OU) CORE, Mendeley and the Microsoft Academic Graph to make a multi-metric argument for Oxford’s current research “power” over Cambridge.

This same session also featured a fascinating presentation from Joseph Greene at University College Dublin on correcting repository download statistics to account for robots. This – as readers will know – is a topic close to my heart, and I was delighted to see Joseph’s efforts arrive at the same findings (85% of all downloads are robots) as earlier IRUS work did. As statistics become more widespread as a measure of repository effectiveness, it is vital that the caveats and limitations around the numbers we have are well understood.

A series on research data workflows impressed me with the range of customisations and specialised functionality that a well-designed subject data repository can offer. I saw presentations from HEPDATA (high energy physics, based at Durham), Structural Biology Data Grid, and the Digital Rocks Portal (petroleum geology) – each of which was an excellent example of a subject repository directly addressing disciplinary needs. One thing that struck was how often software tools were provided alongside data for immediate analysis – the Docker approach to digital rendering used by Digital Rocks was a particularly smart and resource-aware implementation.

As I’m not steeped in the lore of repository development I chose to take the opportunity to learn more about Fedora, and derivatives like Hydra – I particularly liked the user-friendly Hydra-in-a-box and Islandora. I would recommend in particular the superb Hydra-in-a-box design documents – all openly shared and invaluable for anyone in the throes of designing and developing any repository-like solution. There’s a literature review of repository requirements, a competitor analysis, a community survey and details of user interviews and focus groups. My interest was piqued by the implementation of the Portland Common Data Model, which – as institutional repositories expand their remit to encompass more complex structures – looks an interesting way to deal with relationships between files.

One unexpected pleasure was rightsstatements.org – a Creative-Commons-like attempt to codify machine readable statements of intellectual property rights (as opposed to licences) put together in collaboration between the Digital Public Library of America, Europeana and CC. The sheer complexity of the rights picture in large collections means that many opportunities for reuse are lost – this is a great attempt to bring the same clarity to the reuse rights world as Creative Commons brought to licences.

This is very much a personal look back at OR2016, others would no doubt have different highlights. But the attractions of the venue aside, this was an excellent event which left me with a lot to think about.

By dkernohan