Categories
Research at Risk

Getting Lost In Your Data?

In parallel with the research data shared service, we are running an initiative with researchers from the pilot institutions, also known as ‘The Research Data Champions’. These are all the most brave researchers that are not only taking care of their data, but are also showing others how to do it properly. Annemarie Eckes is one of them, and she wrote a blog last year on data in environmental sciences. Below, she shares her experience in running a workshop on research data and metadata for PhD students.


                          Getting Lost In Your Data?

                              Ever Lost Your Data?

Is your data getting lost in translation?

                     Feeling Lost in getting started on data management?

                                          ?

On the 4th of December, we held a two-hour hands-on workshop on data management at the Department of Geography. Attendance was low, but the PhD students and Post Docs who attended (mostly from physical geography) enjoyed it and were very engaged.

“A very good broad overview of what needs to be considered, many aspects of which I have not adequately considered” (participant feedback in follow-up survey)

The workshop

The workshop consisted of a general presentation by the Cambridge Office of Scholarly Communication, with examples and information adjusted to be more relevant to Geography. Topics covered were:

Backup and File sharing

Does this sound like a trivial point? Everyone’s already doing it, right? Imagine that you suddenly lost your computer, or it broke, could you work? or would you be forced to take a holiday? Would your work be set back much? (for example, some solutions take more frequent snap shots) What if you moved to another position and lost your access to your institution’s server, what would happen then?

At the same time, it is important to think about where you are backing your up files to – whether it is compliant with the project’s data protection policies (for example: are the servers located in the EU?/are they EU compliant? Should you share sensitive data via email?)

Nevertheless, when new to the Department, it is important to learn about access and what services are available.

It is what someone has referred to as Schonfield’s Second Law: Data does not exist, unless there are two copies of it (in different places of course.)

How to organise your data well – including sample management

Everyone wants to have a nice and tidy digital files on their computer. But, nobody (except, to my continual jealousy, from my fiancé) manages to keep things organised. But it is not as easy as tidying your room, as you are not constantly exposed to the mess. There are a few things that can help a lot in this process, like getting inspired by recommended file naming conventions and taking on board things you think are useful to you ( e.g. https://www.data.cam.ac.uk/files/gdl_tilsdocnaming_v1_20090612.pdf). I wanted to have ” the perfect” example folder structure and apply it to my PhD project. It seems to be a developing skill. I would have liked to have started with the example on this website, it would have helped me much more especially in the beginning.

Also important are little messages to your future self and others (READMEs) on what this folder contains, how the files relate to each other, even just a few links on where you got the data from that this data contains. Or did you collect it yourself using XY and Z’s protocol or your own protocol, stored in//this/folder/ .?

Personal and Sensitive Data

The Department of Geography is very good in providing information on personal data, so the mentioning of this topic was quite short. We covered the Data Protection act, and  brushed on information management, which covers the legal bits and security issues behind personal data protection (I repeat the above example: are the servers located in the EU?/are they EU compliant? Should you share sensitive data via email?)

Needless to say: If you don’t have to collect it, don’t.

Data Sharing

There are plenty of unselfish reasons for sharing your data. A lot of the scientific knowledge we trust remains unchecked (follow @PolSciReplicate to learn more). We are in a reproducibility crisis. One more reason to share all of your data with the scientific community in order to reproduce and check existing results (also quite a selfish reason one might argue). Sharing is only half the story of course. It needs to be made FAIR as well, and one important aspect of that is the proper description of the data by use of “meta” data.

However, if those don’t convince you, there are also “selfish” reasons covered in this very good article by Florian Markowetz.

Importance of Metadata:

Want to learn about metadata in a fun way? Play with LEGO! The groups built little objects with LEGO, then had to describe their LEGO object. This description was then passed on to a different group, which tried to rebuild the LEGO figure, purely based on descriptions.

You can see how successful one group was in replicating the LEGO object (see fig 1). Now imagine this is how you described your data! Other issues are the handwriting – when something is digitalised, it is much easier to write. Some people had problems with the definition of words  (e.g. “What does ‘axial’ mean?”). This can be easily described when your description is also linked with definitions, or ontology terms, so that we can be sure we agree on the definition that is used for that particular exercise.

Data Management Plans

Data Management Plans are increasingly required as part of an application by many funders. They are used to explain what data types are generated, how they are managed, and where they will be disseminated after the project. But, this is not just another document to prepare for a grant proposal. It is useful to think about one’s data even before it is generated. It helps with getting the methods and experimental setups straight and, when working collaboratively, to agree on certain project-specific standards. For the early career researcher like me, getting into the habit and learning to think about this is definitely beneficial. There is a great tool online, if you want to learn more: https://dmponline.dcc.ac.uk/

New Year’s Resolution:

We finished with a data management new year resolution exercise: Everyone wrote their own data management new year resolution on a piece of paper, to their future self for next year. This will be mailed to the participants next year.

We hope that all their RDM-resolutions will come true!

 

Figure 1

How to learn about the necessity of metadata in a fun way:

Alternative 2

1) Original object built with LEGO built by one group, 2) The group’s description of the LEGO object . 3) This description was then passed on to a different group, the replicate object is built purely based on the descriptions.

 

Research data championAbout the author: Annemarie Eckes is a PhD student in the Department of Geography at Cambridge University. She is also one of the Jisc Research Data Champions. You can follow her work and updates on twitter @AnnehEckes.