A standard for Research Data Metrics?

In this blog post, David Kernohan, gives an update on some work that we are developing on research use and potentially other associated metrics. He also took the opportunity to share some information about a NISO working group on the topic. Please do share your thoughts on this issue and what you think we need to take into account. Thanks.

Any casual observer of the world of academic research would identify two key trends that dominate the scene:

  • “Openness” – funders are putting their weight behind the benefits of sharing research as widely as possible, including supporting data and methodological details.
  • And “impact” – for future REF iterations a paper in, for example, a Nature journal is only a part of the picture, a presentation of the way that research has influenced and shaped the field (or the wider world) via its underlying data is also important.

The most basic form of research data metrics demonstrates, as simply as possible, how many times the non-article results of research (be they data, methodology, software or anything else) have been downloaded. Clearly this is still only a part of the “impact” picture – what are people doing with the things they download? – but it is a useful part. Alongside other emerging methods such as data citations, and measures of social media interest, and tempered by a narrative case built around these metrics, a picture of the power of particular research outputs can be developed.

But we can and should be suspicious of simplistic metrics. Different answers to questions of inclusion and exclusion of download activity, and indeed what precisely constitutes a “download”, can mean that ostensibly similar statistics drawn from different sources are not safely comparable. How can we be sure these numbers have integrity and value?

Enter COUNTER.

COUNTER is a twelve-year-old standard originally designed to monitor article downloads in closed-access journals. It has been modified and extended over the years by a small but committed community of paid staff and adherents to tackle other content types, for example databases and e-books, but more work remains to be done in order to fully make use of it in a multi-format, open license, multi-site world.

In UK universities and colleges you’ve most likely come across COUNTER as a part of your library’s monitoring of e-resource use, or as part of the Jisc’s Journals Usage Statistics Portal, or the Jisc repositories usage statistics service, IRUS-UK. The latter aggregates download statistics for the same item in different spaces and provides a summary that can be used to demonstrate article-level or collection-level interest, repository use or similar. IRUS-UK is based on the COUNTER specifications, which means that the statistics it reports are reliable and comparable according to the COUNTER standard. IRUS was built on the work of two earlier Jisc projects, PIRUS and PIRUS2.

If you use IRUS-UK (more than half of UK higher education institutions already do) you’ll know that it does actually give you download statistics for anything in a repository, and that this does include data  if that’s what you’ve put in there. However it does seem likely that different methodologies will need to be developed if we are to estimate research data usage accurately. This will be especially important as the data storage requirements of some funders – retention 10 years from the date of last access – mean that such metrics will be likely to be used to make decisions around storage methods and thus institutional spending.

Subject-area data repositories, managed variously by disciplinary groups, funders and journals, are another key data usage statistics use-cases. In each of these situations storage is offered on behalf of the community.

I’ve recently become a member of a NISO altmetrics working group. We’ve held two meetings so far and there is a clear appetite to work closely with COUNTER in applying the standard to data metrics, and in particular discussions about the way the standard would deal with “robots” – suggesting a required separation between stuff like the “google web spider” that indexes content for the search engine and smaller, personally managed ‘bots used in research. This conversation had a similarity to the findings of a 2013 IRUS-UK/Information Power report into “identifying unusual usage”: both will feed in to an ongoing COUNTER robots working group led by Paul Needham of Cranfield University.

The purpose of the working group is to

  • Develop definitions for appropriate metrics and calculation methodologies for specific output types. Research outputs that are currently underrepresented in research evaluation will be the focus of this working group. This includes research data, software, and performances, but also research outputs commonly found in the social sciences. The working group will come up with recommendations for appropriate metrics for these research outputs and will develop guidelines for their use.
  • Promote and facilitate the use of persistent identifiers in scholarly communications.  Persistent identifiers are needed to clearly identify research outputs for which collection of metrics is desired, but also to describe their relationships to other research outputs, to contributors, institutions and funders. This working group will work closely with other initiatives in the space of identifiers.

As you can imagine, discussions are wider ranging and cover all forms of metrics including citations and social media. Members are drawn from a range of bodies around the world, including publishers, service providers and experts from within academia. We’ve touched on things like Thomson Reuters Citation Index (and will be hearing more about developments here in a future meeting), and NISO work including the next version of JATS.

There is an impact on incoming Jisc activity too. As a part of the “Research at Risk” group of projects we’ve been challenged to come up with a tool or service for research data metrics, so this international work on COUNTER, and the ongoing experience of the California Digital Library/PLOS/DataOne project “Making Data Count”, will all be valuable inputs to the activity we are currently scoping. It seems we have a great chance right now to work together in this space and make data metrics work in a way that supports quality research.

Given the NISO group and the activity on research data use and metrics we’re scoping I’d really like to hear from you on any issues, use cases or initiatives you think need taken into account. If you’ve time please comment liberally.

Further reading:

 

About the author:

David Kernohan is senior co-design manager, Jisc. He works on research data management and open education. You can follow him on twitter @dkernohan.