The changing scope of my research

I started on my Masters research project back in March, and the progress of both the project itself and how I've felt about it since that time has been somewhat interesting.

For some context, the codebase that I've been contributing to as a part of my research is approximately 50 thousand lines of code long, and the PhD thesis that much of my work is based on is 204 pages long (as I've previously mentioned), so even getting a grasp on what exactly research project entailed took quite a long time.

Whilst students do pick their own research projects (and I selected the research team that I joined), the projects themselves are selected by the supervisors and then advertised. This unfortunately meant that whilst my supervisor had somewhat of a grasp on the project that I had selected, I myself had basically no clue. I have no background in physics (the most that I've done is high school physics, and definitely nothing to do with gravitational waves), so even understanding how everything fit together to do what it was intended to do was incredibly difficult - and it was made worse by the fact that the codebase has very few comments.

As such, a large part of my time up until now was simply me trying to get up to speed. I did a complexity analysis of a part of the codebase, with the sole purpose of attempting to understand the small section which that analysis encompassed. It definitely worked, but as time has continued and I've gotten a better idea of how everything fits together, it turned out that the section of code I analysed actually has little to do with my main part of my research - so all that work was for nothing.

To me, it always felt like the scope of my project was changing, even though in actually that scope never changed. More files moved into what felt like the scope of my project at exactly the same time other files were suddenly discounted. I went from expecting my entire project to be in C and CUDA to it being in only Python, and then back again. Most of this change was because of my understanding of what exactly my project entailed didn't fully mature until I had a full understanding of how everything was put together, and that understanding was vital to figuring out how exactly I was meant to go about solving the problem I was given.

I think that this is likely the case for lots of people in more theoretical research than my more implementation-focused research. As your understanding of the thing you are researching increases, the more likely it is that things that you may have previously thought important suddenly become irrelevant, and ideas that only seemed to be tangential are suddenly vital - and this seems to be the pattern of any complex long-term project.

I think that this is part of the reason why senior (or at least experienced) software engineers are so coveted. Whilst they may not be burning many story points for their own managers to see, they have enough experience with the codebase and exposure to different ways to solve problems to be able to make the scope for other people much smaller, and their entire team more efficient because of it. It's also why there's always an "on-boarding" period for new employees, so they are able to have a rudimentary understanding of the potential scopes of any problems they encounter.

With this in mind, I think there's a few items that I think massively help with preventing the sort of scope creep that I've experienced. In many ways these are probably just some more common sense general guidelines, and can probably be found in any "programming processes" textbook, but will still be helpful to list here.

Have useful comments and documentation

The codebase I've been working on has very very few comments, and those that it does have are generally just commenting out old versions of the code. This means that almost all the sources for how things work need to come from outside the codebase itself, and are usually one of two things - people or papers (as in peer-reviewed papers).

The massive downside to this is that if you want to find out how one very specific part of the codebase works, you either need to have someone who hasn't worked on that part of the code for potentially many years sit down and work through it with you, or you need to trawl one of 50 research papers in the hope that it mentions the specific function that you're looking for (and spoiler alert, chances are that none of them do). Having no useful comments and no useful documentation means that the process of on-boarding is incredibly time consuming and takes valuable work hours away from people that know the codebase well in order to explain the minutia of the code to the newcomers.

This is something that I've attempted to rectify with my research project - every addition has been commented on in its function and sometimes how/why it works. My hope is that the next person that needs to work on the same area of code would be able to make use of the work I've already done and not need to retrace my steps, but unfortunately the entire codebase is so massive in comparison to the scope of my work that the chance of any overlap is minute.

Have useful commit messages

This is another area that definitely needed improving. Before I began on this research project, commit messages often looked like this:

postcoh.c: fix a bug for output trigger->ifos when the ifos are LV

or

generic_init.sh: change Virgo quality bits with latest suggestion

For the first one, what was the bug? How does this change fix it? Or for the second one, what's the "latest suggestion"? What problem does it solve?

The commit messages that are in the repository are as useful documentation as the comments in the code themselves in many cases. If you can clearly articulate the reasoning behind changes, the problems that they solve and perhaps possible alternatives, then it makes understanding the progression of the codebase significantly easier - and running git blame actually would return useful information.

When it comes to writing commit messages, I try to follow this excellent blog post on commit messages, which suggests that you should try to use a text body of a commit message whenever possible to explain "what" and "why" the change has been made. There's also been a push from some of the other members of the research team to follow similar guidelines. The project goes under an external code review every 1.5 years or so, and being able to clearly show the reasoning for changes is something that helps with the efficiency of that review in addition to helping new team members understand what they're looking at.

Provide early and timely feedback to direct efforts

One of the reasons why the area of code that I did a complexity analysis on ended up being irrelevant for my project is because I had absolutely no idea what I was being asked to do in my project. I'd written a project proposal, had talked to my supervisors about doing a complexity analysis, did the complexity analysis, and the only feedback that I got the entire time was "no one has done a complexity analysis for this type of project before, I look forward to the results!". Whilst this was nice to hear, and the complexity analysis was fun to do, it felt all for naught when I realised that it wasn't really under the purview of my project. The many hours that I spent pouring over that part of the code, trying to understand every part and running some of my own benchmarks and performance analysis ended up being for something entirely irrelevant.

I don't think that this is entirely the fault of my supervisors - a large part of it also falls on to me for not clarifying what exactly was being asked of me, especially as I didn't really understand how the whole project fit together at the time - but some early feedback to properly direct my efforts into something that was actually relevant would have allowed me to finish my project significantly earlier than it was, and possibly even have time to extend the project.

Conclusion

I know I've spent the last thousand or so words complaining about things in my research project, but I have genuinely enjoyed my time doing research this year. The value of good comments and documentation, commit messages and the role of feedback in directing efforts are lessons that I will take onto my future projects and work to ensure that the feeling of massively changing scope in a static project does not happen again to me, nor anyone else.