Edd Barrett, Thu 16 November 2023.
tags: continuous integration testing
Introduction
When I started in the Software Development Team, it was
a transitionary phase for me in terms of software testing. People around me
were beginning to up their testing game, and I was learning that testing wasn't
merely "a nice thing to do", but rather it was essential if you care about
the quality of your code going forward. Although, I remember being mildly
annoyed at the additional effort [0] of always writing tests,
over time I saw that it was worth it. Bugs could be found and fixed earlier,
often before even raising a Pull Request (PR).
Since then, I've fallen (somewhat accidentally) into the role of maintaining
testing infrastructure. At first we used free CI services, but with ever more
demanding build jobs [1], we quickly outgrew them and turned to
hosting our own continuous integration (CI) infrastructure.
At some point we stumbled on the idea of a merge queue, and for many years we
used (and were content with) a combo of Bors-NG (as a
merge queue) and Buildbot (as a self-hosted CI
service). However, in 2023 the author of Bors-NG announced that it was to be
deprecated. With Bors-NG being an open source project, we could have
self-hosted our own instance of it, but with the uncertain future of the
project, we decided to look elsewhere.
In this post I share my experiences with switching the team's merge queue over
to Github Merge Queues
(GHMQs). As is often the case with my blog, there is nothing particularly deep
here. By writing this down, I'm hoping that I may at least be able to help
people caught in the same predicament.
Why a Merge Queue?
Let's first look at what "regular" automated CI looks like for a Github Pull
Request. In the following, I use the term feature branch to mean the branch
under consideration for merging (via the PR process) into a base branch.
First a pull request is raised, at which point the automated CI kicks off
immediately, running the tests on the feature branch. While the tests are
running, a code review typically takes place. If the reviewer requests any
changes, then the PR author pushes more commits to the feature branch and the
tests start again. The review process may take several rounds of back and forth
before the reviewer is happy. The reviewer then waits for the tests to come
back green before clicking the "merge" button, causing the feature branch to be
merged into the base branch and thus concluding the pull request
[2].
This is of course fantastic. CI acts as an additional layer of protection,
blocking obviously faulty changes from being merged. The problem is, it doesn't
always catch every faulty change. If the feature branch is out of date with the
base branch, then the outcome of running the tests on the feature branch may be
different to the outcome of running the tests on the base branch with the
feature branch merged in. That's bad because it's an opportunity for test
failures to creep in to the base branch, and no one wants that. This problem is
often referred to as
"merge skew" or "semantic conflict".
One way to fix that (at least with Github) is to enable the "Require branches
to be up to date before merging" setting in the branch protection rules,
however there's a better way: the merge queue.
With a merge queue, the reviewer conducts their code review and when they are
happy, they approve the PR, causing it to be added to the merge queue. All the
while, the merge queue is being monitored. PRs are popped from the queue and
their changes sent to the CI service to test what would happen if the base
branch were to have the feature branch merged on top. If the tests pass, then
the feature branch is automatically merged into the base branch and the
corresponding PR is also automatically closed. This is in contrast to
"traditional CI testing", where the reviewer must wait for the tests to pass
before clicking the "merge" button on the PR.
This may seem insignificant at first, but when (like us) you have long-running
test suites, it really makes a difference. If the reviewer is forced to wait
for a long time for the "merge" button to appear, it invites them to context
switch to another task and PRs can get forgotten. With a merge queue, they can
approve the PR and walk away[3]. Further attention is only
required if the tests fail.
As mentioned earlier, our merge queue implementation of choice for a long time
was Bors-NG (a spiritual successor to
Homu, the original Rust merge bot).
Bors-NG integrates well with Github (which, at the time of our adoption, had no
merge queue implementation of its own), is free, and it gave us all the good
stuff I mentioned above[4]. With Bors-NG's deprecation we had to
start looking elsewhere for the same functionality. The official advice was to
switch to GHMQs.
As we shall see, GHMQs are a bit different, but with some fiddling, we can get
them to behave just like Bors-NG.
The Problem with GHMQs
GHMQs expects two distinct phases of testing, whereas Bors-NG had only one.
With GHMQs:
-
An initial round of tests is expected to run ASAP when a PR is raised. For
this round of testing to pass all of the "required checks" (configured in
the repo settings) must pass. This initial round of testing tests the
feature branch in isolation. If new commits are added to the feature
branch (or its history is re-written), then the initial round of tests
must be re-run.
-
There is then a second round of testing which is triggered when the first
round of tests has passed and the reviewer has clicked the "merge when
ready" button in the Github UI. This second round of testing is equivalent
to the one round of testing done by Bors-NG: it tests what would happen
were the feature branch to be merged into the base branch. Once this
second round of testing passes, GHMQs automatically merges the feature
branch into the base branch and closes the PR.
And here's the catch: currently there's no way to configure Github to run a
different set of required checks for these two rounds of testing. This is bad
for us, as we don't need the first round of testing[5]. If you
want a Bors-NG-like workflow, to run those tests would be a waste of time and
computation.
What's worse is, if (like us) you ask PR authors to squash their follow-up
commits before merging, then that will re-trigger the initial round of testing,
even though this squashing is unlikely to have a different test outcome.
Working around the Limitation
OK, so Github requires our CI to report the same set of status checks for these
two rounds of testing. That's annoying, but not insurmountable. I'm not proud
of this, but I'm about to share a gross hack to give Bors-NG-like behaviour
in GHMQs.
Github defines a set of events which can be sent to external applications to
trigger things to happen. Raising a PR, adding new commits to an existing PR,
or rewriting the history of a PR's feature branch are pull_request
events.
When GHMQs creates a temporary branch for its second round of testing, that's a
push
event. If you configure Github to notify your CI of these two kinds of
event, then you can teach your CI server to instantaneously report success for
the times when we don't really want to run tests, but where Github expects CI
to report a test outcome. It's like the first round of testing doesn't exist!
This is (roughly) how I implemented this in our Buildbot config:
def ghmq_do_step(step):
"""Determine if a step in a github-merge-queue-managed build factory should
run."""
event = step.getProperty("event")
branch = step.getProperty("branch")
if event is None:
# If there is no `event` property, then the build request didn't
# originate from github. It could be a scheduled or forced build. Let
# it run.
return True
else:
# The change notification originated from github. We may want to test
# it.
#
# If it's a push event to a merge queue branch (i.e. it's the merge
# queue branch being created), then we want to run the tests for real.
return event == "push" and branch.startswith("gh-readonly-queue/")
buildbot_sh_factory = util.BuildFactory()
buildbot_sh_factory.addStep(steps.Git(..., doStepIf=ghmq_do_step))
buildbot_sh_factory.addStep(steps.ShellCommand(command=["./test.sh"],
doStepIf=ghmq_do_step))
Here doStepIf()
nullifies all build steps if it's not a time we want to
really run tests. A Buildbot build factory with no unskipped steps is deemed
successful and in turn Github is notified of success.
Note also that you don't want to test all push
events: only pushes to the
specially prefixed (gh-readonly-queue/
) merge queue branches.
Setting up GHMQs Check-list
With my gross hack in mind, here's how we set up Bors-NG-like GHMQs with a
self-hosted Buildbot (something similar can probably be done for other CI
services):
-
There needs to be an outgoing web-hook posting to your CI service. This can
be done either at the repo or organisation level. For an organisation:
-
in the organisation settings, go to "webhooks" and add a web hook with
a URL pointing to to the CI notification endpoint (e.g.
https://my-buildbot/change_hook/github
).
-
Choose "let me select individual events", and select only "Pull requests"
and "Pushes".
-
Configure the CI to properly test only push
events to GHMQ branches. For
all other events, have the CI return success (e.g. see doStepIf
above).
-
Make sure the CI reports the test outcome back to Github (for Buildbot we
used
GitHubStatusPush
).
-
Go to the repo in question's branch settings (settings->branches
) and edit
(or add, if there isn't one) the base branch's protection rules. The base
branch is usually main
or master
. Whichever branch your PRs will merge
into. Ensure:
- "Require a pull request before merging" is checked.
- "Require status checks to pass before merging" is checked. Without this
the merge queue will blindly merge everything!
- Under there, ensure that your CI service is added to "status checks
that are required". Note that only services which have reported a
status to your repo within the last week will show up here. You may
have to force a build in your CI to get it to appear.
- "Require merge queue" is checked, and under there check "Only merge
non-failing pull requests".
- You can choose suitable values for the other settings.
-
In your repo, make a branch that adds a YAML workflow file (e.g.
.github/workflows/ci.yml
) that contains this:
on:
pull_request:
merge_group:
# This is required to silence emails about the workflow having no jobs. We
# simply define a dummy job that does nothing much. I hope github can allow
# workflows with no jobs in the future, as this is quite wasteful.
jobs:
dummy:
runs-on: ubuntu-latest
steps:
- run: /usr/bin/true
(Yes, another gross hack. I'm sorry)
If you now raise a PR for this new branch, it should use Bors-NG-like merge
queues!
Conclusion
GHMQs aren't quite as flexible as I'd hoped, but I was (just about) able to
bend them to my will.
I note that others have already complained about the inability to run different
tests at different times, and Github have acknowledged
this,
so with luck this will be fixed and my gross hack can go away.
Discussion
Acknowledgements
Thanks to Laurence Tratt and Andrei Lascu for their comments on a draft of this
post.
Footnotes
-
Often writing the associated tests takes longer
than implementing the feature or fixing the bug.
-
Some of our test suites require building LLVM
multiple times, which requires a lot of time and RAM.
-
If the tests fail, then the PR author has to fix
it and those changes then have to be reviewed again.
-
An interesting side-effect of the "walk away"
property of merge queues is that we have become way more willing to include
longer-running tasks in our test suites.
-
And some extra functionality, like "roll-up
merges", where the queue tests the compound changes of several PRs at
once in order to save time when testing. Of course this comes with some
other challenges. If the tests fail, which PR caused the problem? (I have a
vague memory of seeing a merge queue in the wild that would bisect the
failing test branch to attempt to find out who was to blame!)
-
We might consider running a reduced set of
short-running tests in an initial round of testing, but Github doesn't
currently provide a good way to do that either.