Migrating from Bors-NG to Github Merge Queues

Edd Barrett, Thu 16 November 2023.

tags: continuous integration testing


Introduction

When I started in the Software Development Team, it was a transitionary phase for me in terms of software testing. People around me were beginning to up their testing game, and I was learning that testing wasn't merely "a nice thing to do", but rather it was essential if you care about the quality of your code going forward. Although, I remember being mildly annoyed at the additional effort [0] of always writing tests, over time I saw that it was worth it. Bugs could be found and fixed earlier, often before even raising a Pull Request (PR).

Since then, I've fallen (somewhat accidentally) into the role of maintaining testing infrastructure. At first we used free CI services, but with ever more demanding build jobs [1], we quickly outgrew them and turned to hosting our own continuous integration (CI) infrastructure.

At some point we stumbled on the idea of a merge queue, and for many years we used (and were content with) a combo of Bors-NG (as a merge queue) and Buildbot (as a self-hosted CI service). However, in 2023 the author of Bors-NG announced that it was to be deprecated. With Bors-NG being an open source project, we could have self-hosted our own instance of it, but with the uncertain future of the project, we decided to look elsewhere.

In this post I share my experiences with switching the team's merge queue over to Github Merge Queues (GHMQs). As is often the case with my blog, there is nothing particularly deep here. By writing this down, I'm hoping that I may at least be able to help people caught in the same predicament.

Why a Merge Queue?

Let's first look at what "regular" automated CI looks like for a Github Pull Request. In the following, I use the term feature branch to mean the branch under consideration for merging (via the PR process) into a base branch.

First a pull request is raised, at which point the automated CI kicks off immediately, running the tests on the feature branch. While the tests are running, a code review typically takes place. If the reviewer requests any changes, then the PR author pushes more commits to the feature branch and the tests start again. The review process may take several rounds of back and forth before the reviewer is happy. The reviewer then waits for the tests to come back green before clicking the "merge" button, causing the feature branch to be merged into the base branch and thus concluding the pull request [2].

This is of course fantastic. CI acts as an additional layer of protection, blocking obviously faulty changes from being merged. The problem is, it doesn't always catch every faulty change. If the feature branch is out of date with the base branch, then the outcome of running the tests on the feature branch may be different to the outcome of running the tests on the base branch with the feature branch merged in. That's bad because it's an opportunity for test failures to creep in to the base branch, and no one wants that. This problem is often referred to as "merge skew" or "semantic conflict".

One way to fix that (at least with Github) is to enable the "Require branches to be up to date before merging" setting in the branch protection rules, however there's a better way: the merge queue.

With a merge queue, the reviewer conducts their code review and when they are happy, they approve the PR, causing it to be added to the merge queue. All the while, the merge queue is being monitored. PRs are popped from the queue and their changes sent to the CI service to test what would happen if the base branch were to have the feature branch merged on top. If the tests pass, then the feature branch is automatically merged into the base branch and the corresponding PR is also automatically closed. This is in contrast to "traditional CI testing", where the reviewer must wait for the tests to pass before clicking the "merge" button on the PR.

This may seem insignificant at first, but when (like us) you have long-running test suites, it really makes a difference. If the reviewer is forced to wait for a long time for the "merge" button to appear, it invites them to context switch to another task and PRs can get forgotten. With a merge queue, they can approve the PR and walk away[3]. Further attention is only required if the tests fail.

As mentioned earlier, our merge queue implementation of choice for a long time was Bors-NG (a spiritual successor to Homu, the original Rust merge bot). Bors-NG integrates well with Github (which, at the time of our adoption, had no merge queue implementation of its own), is free, and it gave us all the good stuff I mentioned above[4]. With Bors-NG's deprecation we had to start looking elsewhere for the same functionality. The official advice was to switch to GHMQs.

As we shall see, GHMQs are a bit different, but with some fiddling, we can get them to behave just like Bors-NG.

The Problem with GHMQs

GHMQs expects two distinct phases of testing, whereas Bors-NG had only one. With GHMQs:

  • An initial round of tests is expected to run ASAP when a PR is raised. For this round of testing to pass all of the "required checks" (configured in the repo settings) must pass. This initial round of testing tests the feature branch in isolation. If new commits are added to the feature branch (or its history is re-written), then the initial round of tests must be re-run.

  • There is then a second round of testing which is triggered when the first round of tests has passed and the reviewer has clicked the "merge when ready" button in the Github UI. This second round of testing is equivalent to the one round of testing done by Bors-NG: it tests what would happen were the feature branch to be merged into the base branch. Once this second round of testing passes, GHMQs automatically merges the feature branch into the base branch and closes the PR.

And here's the catch: currently there's no way to configure Github to run a different set of required checks for these two rounds of testing. This is bad for us, as we don't need the first round of testing[5]. If you want a Bors-NG-like workflow, to run those tests would be a waste of time and computation.

What's worse is, if (like us) you ask PR authors to squash their follow-up commits before merging, then that will re-trigger the initial round of testing, even though this squashing is unlikely to have a different test outcome.

Working around the Limitation

OK, so Github requires our CI to report the same set of status checks for these two rounds of testing. That's annoying, but not insurmountable. I'm not proud of this, but I'm about to share a gross hack to give Bors-NG-like behaviour in GHMQs.

Github defines a set of events which can be sent to external applications to trigger things to happen. Raising a PR, adding new commits to an existing PR, or rewriting the history of a PR's feature branch are pull_request events. When GHMQs creates a temporary branch for its second round of testing, that's a push event. If you configure Github to notify your CI of these two kinds of event, then you can teach your CI server to instantaneously report success for the times when we don't really want to run tests, but where Github expects CI to report a test outcome. It's like the first round of testing doesn't exist!

This is (roughly) how I implemented this in our Buildbot config:

def ghmq_do_step(step):
    """Determine if a step in a github-merge-queue-managed build factory should
    run."""

    event = step.getProperty("event")
    branch = step.getProperty("branch")

    if event is None:
        # If there is no `event` property, then the build request didn't
        # originate from github. It could be a scheduled or forced build. Let
        # it run.
        return True
    else:
        # The change notification originated from github. We may want to test
        # it.
        #
        # If it's a push event to a merge queue branch (i.e. it's the merge
        # queue branch being created), then we want to run the tests for real.
        return event == "push" and branch.startswith("gh-readonly-queue/")


buildbot_sh_factory = util.BuildFactory()
buildbot_sh_factory.addStep(steps.Git(..., doStepIf=ghmq_do_step))
buildbot_sh_factory.addStep(steps.ShellCommand(command=["./test.sh"],
                                               doStepIf=ghmq_do_step))

Here doStepIf() nullifies all build steps if it's not a time we want to really run tests. A Buildbot build factory with no unskipped steps is deemed successful and in turn Github is notified of success.

Note also that you don't want to test all push events: only pushes to the specially prefixed (gh-readonly-queue/) merge queue branches.

Setting up GHMQs Check-list

With my gross hack in mind, here's how we set up Bors-NG-like GHMQs with a self-hosted Buildbot (something similar can probably be done for other CI services):

  • There needs to be an outgoing web-hook posting to your CI service. This can be done either at the repo or organisation level. For an organisation:

    • in the organisation settings, go to "webhooks" and add a web hook with a URL pointing to to the CI notification endpoint (e.g. https://my-buildbot/change_hook/github).

    • Choose "let me select individual events", and select only "Pull requests" and "Pushes".

  • Configure the CI to properly test only push events to GHMQ branches. For all other events, have the CI return success (e.g. see doStepIf above).

  • Make sure the CI reports the test outcome back to Github (for Buildbot we used GitHubStatusPush).

  • Go to the repo in question's branch settings (settings->branches) and edit (or add, if there isn't one) the base branch's protection rules. The base branch is usually main or master. Whichever branch your PRs will merge into. Ensure:

    • "Require a pull request before merging" is checked.
    • "Require status checks to pass before merging" is checked. Without this the merge queue will blindly merge everything!
    • Under there, ensure that your CI service is added to "status checks that are required". Note that only services which have reported a status to your repo within the last week will show up here. You may have to force a build in your CI to get it to appear.
    • "Require merge queue" is checked, and under there check "Only merge non-failing pull requests".
    • You can choose suitable values for the other settings.
  • In your repo, make a branch that adds a YAML workflow file (e.g. .github/workflows/ci.yml) that contains this:

on:
  pull_request:
  merge_group:

# This is required to silence emails about the workflow having no jobs. We
# simply define a dummy job that does nothing much. I hope github can allow
# workflows with no jobs in the future, as this is quite wasteful.
jobs:
  dummy:
    runs-on: ubuntu-latest
    steps:
      - run: /usr/bin/true

(Yes, another gross hack. I'm sorry)

If you now raise a PR for this new branch, it should use Bors-NG-like merge queues!

Conclusion

GHMQs aren't quite as flexible as I'd hoped, but I was (just about) able to bend them to my will.

I note that others have already complained about the inability to run different tests at different times, and Github have acknowledged this, so with luck this will be fixed and my gross hack can go away.

Discussion

Acknowledgements

Thanks to Laurence Tratt and Andrei Lascu for their comments on a draft of this post.

Footnotes

  • [0] Often writing the associated tests takes longer than implementing the feature or fixing the bug.

  • [1] Some of our test suites require building LLVM multiple times, which requires a lot of time and RAM.

  • [2] If the tests fail, then the PR author has to fix it and those changes then have to be reviewed again.

  • [3] An interesting side-effect of the "walk away" property of merge queues is that we have become way more willing to include longer-running tasks in our test suites.

  • [4] And some extra functionality, like "roll-up merges", where the queue tests the compound changes of several PRs at once in order to save time when testing. Of course this comes with some other challenges. If the tests fail, which PR caused the problem? (I have a vague memory of seeing a merge queue in the wild that would bisect the failing test branch to attempt to find out who was to blame!)

  • [5] We might consider running a reduced set of short-running tests in an initial round of testing, but Github doesn't currently provide a good way to do that either.