Breaking things one pointer at a time

Tracking myself

Partly inspired by Julian Lehr's quantified self, I'm planning on trying to track myself. I don't expect that I'll actually start on this for a while, as there are a number of tools I have to figure out before I can properly get into the swing of things, but I figured that it'd be worth writing down my reasons why I want to do this. In future weeks, I'll cover how I'm building the tools to let me track myself.

I'm generally a pretty privacy-focused person. Any applications that I use always have telemetry turned off, my social media is basically whisper quiet unless its via private messages, and I even borrowed a laptop from the university so I wouldn't have to use exam software that required root privileges on my own hardware.
Tracking myself - and posting it on this blog like Julian Lehr does - seems to run pretty contrary to all sorts of privacy principles. So why on earth would I want to do it?

I've always sort of considered myself a numbers guy. In primary school (in Australia, that's ages 4-12), I was the "math guy" in the class, and had by far the best marks in maths in my school (although I don't think I even made the top quartile in high school), and have always found maths that I can apply to be fun and interesting. It's part of the reason that I got into programming in the first place.

I'm also generally well convinced by numbers that are put before me in order to inspire some continued positive action. Watching the numbers on the weights at the gym go up, how good I felt as I ran 5km slightly faster than before, and my weight go down (even if only a bit) has encouraged me to regularly exercise far more than the nebulas "it's good for you" that my parents always told me throughout my teenage years.

As such, I am somewhat convinced that tracking myself - and watching trends for various metrics of how my time is spent - will help me be able to narrow down and improve things that in my life. Perhaps I can see if I've spent too long on a particular weight at the gym, and should go to the next weight up. Maybe I can spot transport routes that would be slightly more efficient than what I currently have. I could even find that I spend half my time doing nothing useful at all, and thus need to come up with strategies to improve productivity.

There's an extent to which doing this is good for my privacy as well. By tracking myself, I will be able to get a better understanding of what sort of data companies have on me. My smart-watch (a Garmin VivoActive 4) will probably be used for most of the data, and frankly the amount of data that it can/does have on me is slightly terrifying - especially now that I've gone through what I can get from Garmin's API. I'll be able to go digging to see what Amazon has on me (Kindle/Audible), what Google has figured out (Android/Google Assistant), and how Spotify could figure me out. In the process of tracking myself, I'll become more aware of how others can track me too.

In addition, as long as my data isn't too fine-grained, privacy shouldn't be too much of an issue. I don't plan on writing detailed logs of exactly where I've been - if my house shows up anywhere, then you know that my data collection has gone too far. I don't even plan on sharing any GPS-type metrics at all, save what city I'm in.

I imagine that it might take me some time to be able to even get the right sort of data for any useful analysis, and whilst I don't really know what I might find out, but I'm excited to see!


The changing scope of my research

I started on my Masters research project back in March, and the progress of both the project itself and how I've felt about it since that time has been somewhat interesting.

For some context, the codebase that I've been contributing to as a part of my research is approximately 50 thousand lines of code long, and the PhD thesis that much of my work is based on is 204 pages long (as I've previously mentioned), so even getting a grasp on what exactly research project entailed took quite a long time.

Whilst students do pick their own research projects (and I selected the research team that I joined), the projects themselves are selected by the supervisors and then advertised. This unfortunately meant that whilst my supervisor had somewhat of a grasp on the project that I had selected, I myself had basically no clue. I have no background in physics (the most that I've done is high school physics, and definitely nothing to do with gravitational waves), so even understanding how everything fit together to do what it was intended to do was incredibly difficult - and it was made worse by the fact that the codebase has very few comments.

As such, a large part of my time up until now was simply me trying to get up to speed. I did a complexity analysis of a part of the codebase, with the sole purpose of attempting to understand the small section which that analysis encompassed. It definitely worked, but as time has continued and I've gotten a better idea of how everything fits together, it turned out that the section of code I analysed actually has little to do with my main part of my research - so all that work was for nothing.

To me, it always felt like the scope of my project was changing, even though in actually that scope never changed. More files moved into what felt like the scope of my project at exactly the same time other files were suddenly discounted. I went from expecting my entire project to be in C and CUDA to it being in only Python, and then back again. Most of this change was because of my understanding of what exactly my project entailed didn't fully mature until I had a full understanding of how everything was put together, and that understanding was vital to figuring out how exactly I was meant to go about solving the problem I was given.

I think that this is likely the case for lots of people in more theoretical research than my more implementation-focused research. As your understanding of the thing you are researching increases, the more likely it is that things that you may have previously thought important suddenly become irrelevant, and ideas that only seemed to be tangential are suddenly vital - and this seems to be the pattern of any complex long-term project.

I think that this is part of the reason why senior (or at least experienced) software engineers are so coveted. Whilst they may not be burning many story points for their own managers to see, they have enough experience with the codebase and exposure to different ways to solve problems to be able to make the scope for other people much smaller, and their entire team more efficient because of it. It's also why there's always an "on-boarding" period for new employees, so they are able to have a rudimentary understanding of the potential scopes of any problems they encounter.

With this in mind, I think there's a few items that I think massively help with preventing the sort of scope creep that I've experienced. In many ways these are probably just some more common sense general guidelines, and can probably be found in any "programming processes" textbook, but will still be helpful to list here.

Have useful comments and documentation

The codebase I've been working on has very very few comments, and those that it does have are generally just commenting out old versions of the code. This means that almost all the sources for how things work need to come from outside the codebase itself, and are usually one of two things - people or papers (as in peer-reviewed papers).

The massive downside to this is that if you want to find out how one very specific part of the codebase works, you either need to have someone who hasn't worked on that part of the code for potentially many years sit down and work through it with you, or you need to trawl one of 50 research papers in the hope that it mentions the specific function that you're looking for (and spoiler alert, chances are that none of them do). Having no useful comments and no useful documentation means that the process of on-boarding is incredibly time consuming and takes valuable work hours away from people that know the codebase well in order to explain the minutia of the code to the newcomers.

This is something that I've attempted to rectify with my research project - every addition has been commented on in its function and sometimes how/why it works. My hope is that the next person that needs to work on the same area of code would be able to make use of the work I've already done and not need to retrace my steps, but unfortunately the entire codebase is so massive in comparison to the scope of my work that the chance of any overlap is minute.

Have useful commit messages

This is another area that definitely needed improving. Before I began on this research project, commit messages often looked like this:

postcoh.c: fix a bug for output trigger->ifos when the ifos are LV

or change Virgo quality bits with latest suggestion

For the first one, what was the bug? How does this change fix it? Or for the second one, what's the "latest suggestion"? What problem does it solve?

The commit messages that are in the repository are as useful documentation as the comments in the code themselves in many cases. If you can clearly articulate the reasoning behind changes, the problems that they solve and perhaps possible alternatives, then it makes understanding the progression of the codebase significantly easier - and running git blame actually would return useful information.

When it comes to writing commit messages, I try to follow this excellent blog post on commit messages, which suggests that you should try to use a text body of a commit message whenever possible to explain "what" and "why" the change has been made. There's also been a push from some of the other members of the research team to follow similar guidelines. The project goes under an external code review every 1.5 years or so, and being able to clearly show the reasoning for changes is something that helps with the efficiency of that review in addition to helping new team members understand what they're looking at.

Provide early and timely feedback to direct efforts

One of the reasons why the area of code that I did a complexity analysis on ended up being irrelevant for my project is because I had absolutely no idea what I was being asked to do in my project. I'd written a project proposal, had talked to my supervisors about doing a complexity analysis, did the complexity analysis, and the only feedback that I got the entire time was "no one has done a complexity analysis for this type of project before, I look forward to the results!". Whilst this was nice to hear, and the complexity analysis was fun to do, it felt all for naught when I realised that it wasn't really under the purview of my project. The many hours that I spent pouring over that part of the code, trying to understand every part and running some of my own benchmarks and performance analysis ended up being for something entirely irrelevant.

I don't think that this is entirely the fault of my supervisors - a large part of it also falls on to me for not clarifying what exactly was being asked of me, especially as I didn't really understand how the whole project fit together at the time - but some early feedback to properly direct my efforts into something that was actually relevant would have allowed me to finish my project significantly earlier than it was, and possibly even have time to extend the project.


I know I've spent the last thousand or so words complaining about things in my research project, but I have genuinely enjoyed my time doing research this year. The value of good comments and documentation, commit messages and the role of feedback in directing efforts are lessons that I will take onto my future projects and work to ensure that the feeling of massively changing scope in a static project does not happen again to me, nor anyone else.


Rescheduling when I do blog posts

It's been almost a full week since my last blog post, and I should probably discuss why.

When I originally restarted this blog, I intended to write one blog post per day. Barely three weeks later and it seems like I've given up all hope of trying for that trend.

I found, not long after I started, that when I wrote about things that I'd given some thought to, I wrote a lot more words than I thought I would. Unfortunately, I am limited by my ability to type fast and get my words onto my computer. This often means that even if I have a fully-formed and planned out idea, it can easily take me 45 minutes to an hour to write.

I'm almost the entire way through my Masters, and the semester truly has gotten back into the swing of things. I have mid-semester tests coming up, and assignment are beginning to become due. In addition, my thesis will soon need to be submitted, which means that for me, my time is at a premium.

As such, I'm instead going to move to writing one blog post a week, starting next week on Sunday. I think this should give me the time to be able to spend on my other assignments and allow me to properly spend the time on these posts.

I should probably say that this one blog post a week isn't due to a lack of ideas - I almost have more ideas written down than I have taken days off writing! Unfortunately, I just need to put more time into my university work for the moment.


My ideal computer at the moment

Chris Siebenmann recently wrote about how is ideal machine isn't in an existing category. Interestingly, I find myself in the same sort of situation, however with an almost entirely different conclusion.

As I have previously mentioned, I am of the opinion that as the number of machines you try to do work on increases, the ease of synchronization between those machines increases exponentially. This means that for me, an ideal machine would be easily portable so that it can be used anywhere with minimal effort to transport. This effectively pushes me into a being left with the options of either a laptop (as in an ultrabook) or a tablet.

Tablets are in an interesting position right now where they are continuously increasing in power and ability, but still lack many of the features that I'd require in a daily driver machine, namely the ability to easily compile code locally, easy access to the terminal and a keyboard-driven UI (tiling window managers truly have gotten the better of me). Thus, I am left with the sole choice of a laptop.

Now the nitpicking starts, what sort of specs do I want in a laptop? Well, as you may have picked up from my rant about why esports will never be mainstream, I play CS:GO on a regular basis, and also enjoy casually playing a number of other single-player games such as BioShock and Civilization. As such, I would quite like to be able to continue playing these games as I use them as my primary source of leisure.

Say what you will about their power, but gaming laptops are both (generally speaking) incredibly heavy and immobile, and very difficult to upgrade. I have no intention of needing to update my laptop every 1-2 years just so I can upgrade my GPU. So for me, the option that makes the most sense is a laptop with an external GPU enclosure, so the GPU can be upgraded independently of the rest of the machine. This would also allow me to dock my machine when I get home and be able to get use out of the high refresh rate monitors that I own - something that would be extremely valuable for gaming. External GPU enclosures do exist on the market (although they are not widely used), however they are almost all based on Thunderbolt 3. This leads me to my next problem.

I'd much rather use an AMD-based than an Intel-based CPU, as the current generation of Ryzen-based laptop CPUs run circles around their Intel equivalents. With the addition of their extra cores, both compiling and gaming would be significantly better on an AMD-based laptop than Intel-based laptop. This comes with the issue of USB 4 (the equivalent Thunderbolt 3 standard which can be used for external GPU enclosures) only being supported on the next-generation of AMD chips.

Thus my ideal machine doesn't yet exist, but I fully expect that in a year or two it might.


Turns out you can get in contact with me

Today's update will be a fair bit shorter than my usual fair (thank goodness!).

I mentioned in this post that there would be no way to contact me easily for feedback and comments. My belief at the time of writing was that if I wanted to receive emails from an email address related to this domain (e.g., I'd have to purchase a domain email service.

It turns out, I'm wrong about that. I've been able to setup to send me email, but unfortunately there's been no way (as far as I can tell) for me to be able to send mail back using that address unless I pay for email. From my hosting provider, it only costs about $2 AUD per month for email (compared to GSuite at $8.50 AUD per month), but I don't want to waste even $2 if it turns out that the email address isn't going to be used.

If I start getting enough email through the address above that it becomes worth it to me to buy purchase the domain, I will, but as my current audience (as far as I'm aware) is a grand total of zero people, it's definitely not worth my money right now.