Can blockchain be applied to Research Data Management?

At a number of recent events I’ve had the privilege to attend—in particular PASIG NYC and iPRES—the magical incantation “We could/should use blockchain for this” has been uttered by speakers more than once (to the delight of those who play buzzword bingo at such affairs).. There was very much a feeling amongst some participants (at least those contributing to the back channels) that blockchain was simply being mentioned because it’s in vogue at present if not the flavour of the month; the implication being that it quite possibly wouldn’t be particularly useful in the scenarios identified.

Quite how this opinion was arrived at by the commentators wasn’t clear as there was also a consensus on the same back channels that people didn’t quite understand how blockchain worked, how it could be applied in the situations indicated by the speakers and hence just what was the potential for blockchain to be a game changer (in this case in the fields of Preservation and Archiving)

It just so happens that I recently took part in an internal Jisc event where we applied ourselves to answering those questions (or rather we looked at framing the questions in more detail—the answers may take a while longer) so a short blockchain primer may help to shed light on the problem.

Characteristics

To understand the usefulness (or otherwise) of blockchain in any given situation it’s necessary to understand just what it is. Many people I’ve spoken to tend to equate blockchain with digital crypto-currencies—understandable when you consider that the first place many people came across the term was in relation to bitcoin. Well crypto-currency is an application of blockchain not blockchain in and of itself.

Blockchain is in effect a method of recording transactions—a ledger—in such a way that the list of historic transactions is unchangeable—an immutable ledger or global truth. The underlying technology utilises cryptography (in particular hash functions where a small change of input produces a large change of output), distributed copies (everyone in a peer-to peer network has a copy of the ledger), and linked blocks (each new block in a chain links back to the previous block—hence the term blockchain). The really clever bit is that it’s not just the data within a block that is encrypted/hashed, the links are themselves encrypted/hashed. So not only is it possible to see if data within a block is changed, it’s also, for all practical purposes, impossible to insert a block anywhere on the chain apart from the end without breaking the links.

It’s also worthwhile noting here that there isn’t one single blockchain technology. There are now alternatives to the computationally intensive “mined” blockchains (as used for bitcoins).

So what?

What can you do with this immutable ledger? Well for a start you can use it to prove provenance indisputably. A use with wide ranging applications. You can also use it to lower dramatically process “cost”. If there is a single global truth distributed ledger then situations where multiple ledgers need to be reconciled—some of which may be off-line or even wrong—in order to complete a transaction can be avoided altogether. It also engenders trust and hence allows for process disintermediation. Trusted third parties are no longer required to manage transactions.

On the other hand there are definitely situations where, although a blockchain solution is potentially possible, it may not really be sensible. Gideon Greenspan came up with a number of tests to be considered in order to avoid a ”pointless blockchain project” which are well worthwhile keeping in mind:

is a technology for databases with multiple writers required?
is there a failure of trust between those writing to the set of databases?
is there a requirement for cross-correlation or interaction between those writing to multiple databases?
is there a requirement to exclude third parties from transactions – for disintermediation?

And research data?

Now this is where it gets really interesting for those of us involved in research data management. There are many potential applications and indeed a few that are even sensible.

For instance, one of the problems associated with the verification of research and research data is proving that the data linked to a paper is the same data that was published with the paper. There have been cases where unscrupulous researchers have been found to have changed the underlying data to suit changing conditions. If data is hashed and the resulting hash function is stored in a blockchain ledger then any attempt to change the data would be immediately apparent.

Blockchain could also be applied to data provenance and might well prove to be an elegant mechanism for providing researchers with data citations alongside their more traditional publication citations.

What else?

What else could blockchain be used for in the research data management world? This is where you come in. Let us know your ideas both here and on the Research Data Network site.

These and similar topics are regularly discussed at the Research Data Network events, the next one of which is taking place at the University of St Andrews on the 30th of November. If you are interested in attending, do register following this link.

If you’re really keen on blockchain you might also wish to suggest a Birds of a Feather session at the upcoming event or one of the future events.

By Paul Stokes