Automatically updating git submodules using GitHub Actions

I know that the mono-repository is in style at the moment, but git submodules are a fantastic (and probably over complicated) tool for being able to store components of a git repository that may need to be kept used, but separate. For example, you may need to track a specific upstream version of another repository that isn't controlled by you or your organization for use in your repository, or you might want to mix repository visibility or share some code between different repositories that need to be separated.

I personally use submodules in the repositories for a wide variety of uses. For example, my research project uses a submodule to track the latest version of the codebase that I am modifying so I can write patches to match against those versions. Another example is my site, which has a bunch of submodules to track various things which get published there, from my resume and my research project, to the theme that I use for both my site and this blog.

I only recently moved to using the same codebase for the theme of my site and blog, which has both made my life easier and caused me a great deal of heartache - and almost all of the heartache came from git submodules. You see, submodules are tied to a specific commit, and running a command like git submodule update (which you'd think updates a submodule to the latest version) only checks out the submodule to the same commit as what your local repository already has stored. This makes life a little more difficult - how can I easily update submodules without having to specifically know the remotes, branches or location of all of my submodules?

First, all of the submodules have their remotes and locations (and optionally branches) stored in the .gitmodules file, which can be used to iterate through the submodules and pull down the latest versions. Another option is using the git submodule foreach command to update the submodules by fetching the remotes and then checking out latest commit, which has the disadvantage submodules not (by default) being checked out branches, and instead is checked out to commits. Both of these options are obviously not particularly ergonomic to work with, and so git introduced the --remote flag to git submodule update to tell it to update to the branch tracked on the remote instead of the local repository.

Great, so now we know update our submodules to the latest remote commit (huzzah!), but I want to push to one repository and watch the other repositories automatically update to track that commit (instead of requiring me to run a command). How can we do that?

GitHub Actions to the rescue! (although unfortunately it won't quite get us all the way there)

GitHub Actions allows you to trigger commands on arbitrary events using the repository_dispatch event. The first step is to create a GitHub Action in our repository that updates submodules for us.

Of course, the simplest way to do this would simply be to recursively clone the repository using GitHub's checkout action, to run git submodule update --remote and then git push the changes back into the repository. Unfortunately, this doesn't quite work out for two reasons; we won't have write access to the repository and we won't be able to clone private repositories. We can get around this by passing a personal access token to GitHub's checkout action and telling it that we need to pull submodules recursively. An example of this action can be found below, and is actually what I use in my site's repo.

name: Update module
on:
  repository_dispatch:
    types: update
jobs:
  update:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
        with:
          token: ${{ secrets.PAT }}
          submodules: recursive
      - name: Update module
        run: |
          git submodule update --init --recursive --checkout -f --remote -- "${{github.event.client_payload.module}}"
          git config --global user.name "GitHub Action"
          git config --global user.email "noreply@github.com"
          git commit -am "deploy: ${{github.event.client_payload.module}} - ${{github.event.client_payload.sha}}"
          git push

The second step is to create the trigger for the repository_dispatch in the submodule repositories themselves. This can be done using the excellent repository-dispatch action to send the event. An example of this can be shown below.

name: Dispatch to repo
on: [push, workflow_dispatch]
jobs:
  dispatch:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        repo: ["owner/repo"]
    steps:
      - name: Push to repo
        uses: peter-evans/repository-dispatch@v1
        with:
          token: ${{ secrets.PAT }}
          repository: ${{ matrix.repo }}
          event-type: update
          client-payload: '{"ref": "${{ github.ref }}", "sha": "${{ github.sha }}", "module": "owner/submodule", "branch": "master"}'

This of course works with submodules that you control, but what about for submodules that you don't control? Unfortunately there's no way to start an action based on a push in a different repository, nor is there a way to create a webhook that you could use to trigger another action from a repository that you don't control. As such you are left with two options; to simply run the update action at a frequent interval or to hack together a way to do it with activity email notification and a repository_dispatch. I know which of the two I'd rather implement in a hurry!