- Course: MSc Health Data Analytics
- Nationality: Italian
I came to the University of Leeds to study the MSc Health Data Analytics course because it perfectly matched my ambitions and objectives. It gave me the opportunity of becoming a Data Analyst and helping people through statistics - that is what fascinates me. There was a wide range of modules to choose from, and the compulsory modules were varied as well, trying to give you the best foundation for your future career.
I enjoyed everything about my course, from the modules (both the compulsory and optional choices), to the interaction time with lecturers and course mates. It has been an intense course, however, there’s been always room for discussion, sharing ideas and personal and professional development.
The highlights of the course have been the invaluable teaching about the important difference between modelling for prediction and modelling for causal inference. Not understanding it could blead to misleading results which is potentially harmful for the general public. These theoretical discussions and topics were well balanced with practical workshops to give actual evidence of the potential damages this could cause. The lecturers’ passion and commitment was a plus in conveying these principles.
For me the greatest challenge has been trying to communicate my results and methods in written reports. My undergraduate studies were purely exam-based, requiring a set of exercises and theoretical questions to be completed. Here I was asked to explain my reasoning so that my work would be reproducible, which has been challenging as I was not used to it, but communication is essential for good science. This was even more difficult when the report had to be targeted for a non-technical audience, me being a very technical person.
My research project stems from the PhysioNet Challenge 2019, an open machine learning challenge. This year’s aim is to build an algorithm to predict sepsis early enough for treatment to be effective and improve clinical outcomes. The dataset was fairly rich and “Messy”, ideal to get a taste of what real data look like. A lot of data cleaning was required, data visualisation and, of course, a wide lot of coding in both MATLAB and Python.
This research project has been my first original piece of work. This involved dealing with real issues that don’t usually occur when working on simulated data, as it might happen during a workshop. Trying to work on real data means finding your way through it to gain the best insight and extract information to the best of your abilities, even considering all the possible limitations. Independent work, finding the right trade-off between ideal and doable, have been incredibly valuable to me and will be in the future.
Given my project involves processing a wide dataset and requires wide computational power, I was given access to the ARC cluster. This is the University of Leeds supercomputer for advanced research, an incredibly powerful tool that allowed me to overcome all of my personal laptop’s technical limitations.