UK Data Service guest post: providing access to sensitive data – using the 5 Safes

The management of sensitive data is becoming a common concern across research disciplines and institutions within the UK. This has been clearly demonstrated by the many discussions arising on this subject, as well as a few of the proposals submitted to research data spring (managing sensitive data, information governance for clinical research studies, AMASED: Access Methods for Analysing Sensitive Data)

As part of the taking forward the research data management activities, in our case the Research at Risk Co-design challenge  we need to evaluate, learn and draw from existing leading solutions for managing sensitive data in a range of disciplines. The post below from Louise Corti and Richard Welpton from UK Data Service (UKDS) and the Administrative Data Research Network (ADRN) highlights services they provide to the UK researchers, particularly in the social science community.

At Jisc we are already working with some ADRN sites and initiatives through the safe share project which is trialling higher assurance network links between centres for sensitive data, and the methods of higher assured access management for users. This also includes a strong theme of supporting developments in medical and biomedical research e.g. Farr Institute. The standards of Information Governance with health data are primarily set by the NHS Information Governance Toolkit (IGT) in England and similar levels in the other home countries. Jisc is also working with University of Leicester on the BRISSKit project, which addresses active management of personally identifiable anonymised and pseudonymised data in the context of the IGT. Jisc has also worked with the anchor tenants from the research and education sector to provide the first shared data centre for medical and academic research in the UK and this is a platform for further developments such as eMedLab which is to be housed there.

Given the wide interest and concern with data security there is certainly a need to ensure existing solutions are built upon and applied in other contexts, and it seems that we need to do more to ensure that we are not duplicating effort in this area. So in essence the research sector needs to draw from existing best practice and solutions. We would like to consider how we better surface and share the best practice for UK universities, however the issue is of course international, only this week in discussions with the Knowledge Exchange the topic arose and many of the partners see similar needs in their countries.

Thank you Louise and Richard for your very helpful contribution to this.

Providing access to sensitive data: using the 5 Safes

uk data service

This post is in response to the thread on the data management jiscmail list regarding handling and providing access to sensitive data. While the list vigorously discussed possible solutions, including setting up a new working group under RDA, I wanted to highlight tried and tested models already in existence in the UK. In the UK, USA and Germany millions of pounds has been invested in developing and implementing a workable pathway for assessing, handling and accessing sensitive data.  I would make a plea not to reinvent any wheels for data access solutions before examining whether this model is useful.

The Economic and Social Research Council, via BIS, has invested millions of pounds over the past 5 years in the Secure Data Service from 2010 at Essex (now part of the UK Data Service) and then from 2013, the Administrative Data Research Network (ADRN), coordinated by the Administrative Data Service, also based at Essex.  This funding has allowed us to develop Safe People Safe Projects Safe Settings Safe Outputs Safe Data protocols (the 5 Safes) and to test and fully implement designated Approved and Accredited Researcher pathways that both UK government and ESRC currently use for user approval and access to sensitive data.

Here is a bit more detail provided by our resident expert, Dr Richard Welpton. Please also note our forthcoming training course on September 17/18th for this community on these protocols.

The UK Data Archive has provided safe and secure access to confidential and sensitive microdata for nearly four years now.  First, under the Secure Data Service, and since October 2012, through the UK Data Service, we have provided secure remote access to data deemed too confidential for download.  Our remote access solution, known as the Secure Lab, enables bona fide researchers who have completed the necessary steps, to log in to a secure server based at the UK Data Archive at University of Essex, and access data for analysis.  Once the researcher’s analyses are completed, the statistical results will be screened by experience staff to ensure no link between the results and the original data can be made:  preserving the confidentiality of the data subjects.  The results, when declared ‘safe’, are returned to the researcher.

This service builds upon international best practice of secure data access, established throughout the world by initiatives such as the UK Office for National Statistics Virtual Microdata Laboratory, and the NORC Data Enclave at University of Chicago.  We operate the facility on five simple protocols:

Safe People

Only ‘trusted researchers’ from UK Higher Education Institutions and other ESRC-funded research institutes may access data through the Secure Lab.  These are researchers whose interest in accessing the data is purely to serve the ‘public good’.  To apply, researchers must register with the UK Data Service using the institution credentials.  They must then undertake a full project application, describing quite clearly their research proposal, their data requirements, and a justification for accessing these data (explaining why less sensitive sources are required).    They should also complete an ‘individual’ application, in which they demonstrate their suitableness for accessing and handling such data (for example, they must have prior experience of handling such data, or be supervised by a colleague who has).  In addition, they must read and sign a User Agreement, which must also be counter-signed by a legal representative of their institution.  Finally, they undertake a mandatory full day training course, during which they cover the following topics:  the legal and ethical responsibilities of accessing confidential/sensitive data; statistical disclosure control; using the Secure Lab.

Safe Projects

We only allow projects to be undertaken in the Secure Lab which ‘serve the public good’.  Projects that would try to exploit confidential information, and indeed identify and exploit data subjects, are strictly prohibited.  Researchers must explain how their research will benefit society when they apply.

Safe Settings

The UK Data Archive provides the Secure Lab, which is a secure facility for providing access to confidential/sensitive data.  We use Citrix secure remote access technology, frequently used by the banking and military sector and renowned for its robustness.  In addition, the UK Data Archive is accredited for the ISO 27001 Information Security standard.  This means that the Archive operates an Information Security Management System, creating a culture of information security, continuous improvement, and vigilance among our staff, who apply best-practice standards for handling confidential data.  In addition, we annually hire ‘ethical hackers’ to undertake penetration tests of our secure servers.  The results of these tests, and the outcomes of our 6-monthly ISO surveillance audits, are made available to the dozen or so Government Departments and other agencies that regularly supply us with confidential data, to make available to researchers.

Safe Outputs

Only statistical outputs (results) which have been screened by staff to ensure they cannot be used to identify the data subjects, can be released to the researcher.  These typically include ‘descriptive statistics’ that have been sufficiently aggregated such that identification is near enough impossible, and modelled output (regression coefficients etc.) which are inherently non-confidential.

Safe Data

This is a misnomer.  Because of the high standards described above, we are able to provide access to ‘unsafe’ data.  These are data that are relatively easy to identify (such as business data), and data about individuals with sensitive variables (e.g. the sexual preferences of young people, child school results etc.).  Of course, direct identifiers such as names and addresses have been removed, but the data are still confidential/sensitive, and are considered ‘personal’ under the Data Protection Act.

Working with others

We work extensively with other organisations to promote and practice safe access to confidential data.  We regularly meet our partners at the HMRC Datalab, Ministry of Justice Datalab, and Office for National Statistics Virtual Microdata Laboratory, to share our experiences.  The addition of the Administrative Data Research Network has led to new thinking on secure data access, and we also work very closely with this network too, regularly sharing policies and procedures to enable a consistent and joined-up approach.  For example, we have recently completed a National Researcher Accreditation training course with our partners which will be run from April 2015.  We also intend to create a working group to consider a consistent approach to Statistical Disclosure Control.  And we regularly meet our counterparts in Germany and France:  the recently EU-funded Data without Boundaries project has provided an opportunity to achieve consensus on secure access at a European level.

In our efforts to provide secure access to confidential data, we remain very willing to engage with new service providers.  We believe we can all benefit from sharing best practice and learning from everybody’s experiences.  Despite different data collections and contexts, there are many similarities in this field, and rather than reinventing the wheel, we encourage dialogue with existing services.

Forthcoming workshop

We are holding a 2 day workshop in Manchester on 17/18th September which will explain how the 5 Safes works. Details will be circulated soon.


About the authors:

louiseLouise Corti is an Associate Director at the UK Data Archive and heads the UK Data Service functional areas of Collections Development and Producer Relations. You can contact her via twitter @LouiseCorti.

richardRichard Welpton plans, oversees and co-ordinates the work of the UK Data Service’s Secure Access team, which manages access to sensitive/restricted data sources through the Secure Data Service and the Safe Centre. You can contact him via @rwelpton.