Building LLVM faster with ccache and distcc

Edd Barrett, Tue 24 October 2023.

Introduction

I'm currently working on a project that's based on LLVM and I have to build (and re-build) it from source quite a lot. Although the LLVM build system is highly parallel, my development machine isn't beefy, and doesn't have many cores. Building LLVM can take a long time.

I do however, have SSH access to machines with lots of cores! It's just that those are not machines I can realistically develop on.

This article discusses how to build LLVM quickly on a remote machine using two quite old tools that you may have heard of: ccache and distcc.

I'm not discussing anything particularly profound here, but I found setting this up (distcc in particular) quite fiddly. Since there are no guides on how to do this specific to LLVM, I thought I'd write it down here and hopefully save someone else (at least my future-self) some time.

Background information and goals

Let's start by briefly discussing what we are working with and what we want to achieve.

`ccache`

ccache is a compiler cache for C and C++. It works by creating a mapping from hashed source files to compiled object files. When you compile a source file, the C preprocessor is invoked and the resulting text is hashed. The cache is then queried with this hash, at which point one of two things can happen:

A cache miss. The C/C++ compiler is invoked and the resulting object file is stored into the cache. The compiler is (compared to the cache) slow, so we want to avoid this case as much as we can.
A cache hit. The pre-compiled object file is immediately available in the cache and there's no need to invoke a compiler.

After building a project from scratch for the first time, you could clean your build directory and rebuild everything without invoking the compiler at all (assuming your cache is large enough).

(With caveats [0], you could also share the cache between users so as to not duplicate compilation effort)

`distcc`

In the words of the project's website, distcc is a "a fast, free distributed C/C++ compiler". It lets you spread your compilation jobs across many hosts, instead of just compiling things locally.

It has two modes:

"plain" mode: where preprocessed files are distributed and compiled.
"pump" mode: where raw, unpreprocessed source files are distributed and compiled.

The two modes have benefits and drawbacks. Plain mode, for example, isn't as sensitive to discrepancies between the libraries on the different systems involved: since preprocessing is local, all compile hosts use the same library versions (note that linking is not distributed).

Pump mode, on the other hand is faster, since both compilation and preprocessing can be distributed. According to the distcc manual pump mode can speed things up by "up to an order of magnitude over plain distcc".

Goals

My goal is to build the LLVM code base faster using ccache and distcc. I want one local ccache, and to use distcc upon cache misses.

I'll use plain mode distcc because I didn't want to have to keep the libraries in sync on all the machines (anyway, the distcc manual page says "distcc's pump mode is not compatible with ccache")

I also want to do all of the network communication over SSH, since the machines are on a network with other hosts that we don't control.

How's it done?

First some terminology. Let's call the machine that will initiate the compilation the "initiator", and the remote machine(s) that will receive jobs over SSH "remotes".

Suppose we have two remotes: remote1 and remote2.

(For brevity, I'm assuming all of the hosts run Debian)

Installing stuff and starting the `distcc` daemon

The first thing to do is install distcc on all hosts involved:

# apt install distcc

Then on the remotes ensure that only the localhost is allowed to start distcc compile jobs (we can do this because jobs coming in over SSH will be considered local).

On the remotes, edit /etc/distcc/clients.allow so that they contain only one non-comment line:

127.0.0.1

Then start distcc on the remotes:

# systemctl start distcc && systemctl enable distcc

On the initiator, install ccache:

# apt install ccache

SSH communications

Now we need to make sure that the initiator can SSH to all of the remotes without a password. Chances are, you already know how to do this, but here's a summary:

To make a no password SSH key, do something like:

$ ssh-keygen -t ed25519

(-t chooses an encryption algorithm -- choose one you are comfortable with)

When prompted for a passphrase, just hit enter. This ensures that we don't have to enter a passphrase every time the initiator wants to send a compile job to a remote.

Put the public part of the key into ~/.ssh/authorized_keys on all the remotes, then on the initiator put something like this in ~/.ssh/config:

host remote*
    IdentityFile ~/path/to/id_ed25519

(substitute the path for the key you just made and adjust the hostname(s) as appropriate).

Now check you can SSH from the initiator to all of the remotes without a password.

To speed up SSH comms, we can multiplex connections using a master connection. This means that a fresh SSH connection doesn't have to be established every time the initiator wants to send a job to a remote.

To enable a master connection, you will want to expand the entry in your ~/.ssh/config to look more like:

host remote*
    IdentityFile ~/path/to/id_ed25519
    ControlMaster auto
    ControlPath ~/.ssh/master-%r@%h:%p

On your remotes, make sure that MaxSessions (in /etc/ssh/sshd_config) is set high enough for the number of compile jobs you've allocated to each remote. (If you are not planning on using a SSH master connection, also read up on MaxStartups).

(Don't forget to restart sshd)

Scheduling the simplest compile job

Before we dive into LLVM, let's do something smaller first. Let's compile nothing in a distributed fashion :)

$ cd /tmp && touch empty.c
$ DISTCC_VERBOSE=1 DISTCC_HOSTS="@remote1/128 @remote2/128" ccache distcc -c empty.c

If all goes well, you should get a empty.o file and see output like this:

distcc[2376013] exec on localhost: x86_64-linux-gnu-gcc -E /tmp/empty2.c
distcc[2376016] exec on @remote1/128: x86_64-linux-gnu-gcc -c -o empty2.o /tmp/empty2.c

You can see that preprocessing was local, but compilation was remote.

A quick primer on DISTCC_HOSTS. It lets us choose where to schedule jobs. It's a space separated list and (for our purposes) each entry starts with [user]@hostname[/jobs] for an SSH host, or localhost/[jobs] for the local machine. jobs is the maximum number of compilation jobs to send to the host at once -- for a remote, I usually set this to the core count.

If you don't want to have to set DISTCC_HOSTS in the environment every time, you can edit /etc/distcc/hosts.

Now would be a good time to inspect what ccache has cached:

$ ccache --show-stats
cache directory                     /home/vext01/.cache/ccache
primary config                      /home/vext01/.config/ccache/ccache.conf
secondary config (readonly)         /etc/ccache.conf
stats updated                       Thu May  4 11:56:23 2023
cache hit (direct)                     0
cache hit (preprocessed)               0
cache miss                             1
cache hit rate                      0.00 %
cleanups performed                     0
files in cache                         2
cache size                           8.2 kB
max cache size                       5.0 GB

You can see that we had one cache miss. If you re-run the compile command then you will see cache_hit (direct) increment while cache_miss stays at 1. You will also notice that there is no distcc logging output the second time around. This is because ccache never invoked a distcc! It used the object file from the first time we compiled the file.

If you got this all working, it's time to get this working with the LLVM build system.

`distcc` and `ccache` for LLVM.

LLVM uses cmake as its build system, so we have to trick cmake into invoking ccache and (where necessary) distcc instead of using a C/C++ compiler.

LLVM's cmake setup already provisions for ccache which makes this much easier. To use ccache, pass -DLLVM_CCACHE_BUILD=On when you configure the build. We should also probably use (roughly) the same C/C++ compilers on all remotes, so it would be prudent to pass -DCMAKE_C_COMPILER and -DCMAKE_CXX_COMPILER.

Configuring LLVM would therefore look something like this:

mkdir -p build
cd build && cmake \
    -DCMAKE_C_COMPILER=/usr/bin/clang \
    -DCMAKE_CXX_COMPILER=/usr/bin/clang++ \
    -DLLVM_CCACHE_BUILD=On \
    ...
    ../llvm

(where ... are the rest of your configure arguments)

But how do we put distcc into the mix? The easiest way I've found is to set CCACHE_PREFIX=distcc in the environment when building.

Then there's one last consideration. By default cmake (or rather the tool cmake is generating for, e.g. make or ninja) spawns a number of parallel compile jobs suitable for the core count of local host's CPU. We need to up this so that we can make the most of the larger number of cores available to us on the remotes. For two remotes, each with 128 cores each, 256 parallel jobs seems reasonable.

So when I build LLVM (after configuring the build), it looks something like this:

DISTCC_HOSTS="@remote1/128 @remote2/128" CCACHE_PREFIX=distcc cmake --build build -j256

If you got it right, LLVM should build faster using the cores of the remotes.

You should also find that rebuilds from scratch should be even faster due to the cache. After finishing a build, remove your entire build directory, reconfigure and rebuild. You should be surfing on a wave of cache hits. The build only slows during linking and when LLVM's tablegen stuff is run. These tasks cannot be cached as they are not C/C++ compilation jobs.

You may have to fiddle around with -j a bit to find what works best -- remember that although you are compiling remotely, you are still preprocessing locally. Can the local host handle 256 concurrent preprocessors?

I hope this helps. This setup is working well for me. In 2023, I'm using this setup on a daily basis for development of our experimental JIT.

If you spot any mistakes, please email me, or message me on mastodon.

Footnotes

[0]: With a few build system hacks, sharing a cache between users should be possible. At the time of writing, LLVM's build system sets CCACHE_HASHDIR=yes, which makes all cache lookups sensitive to the directory in which a user is building. This means that compiling the same file twice, but in different directories (as different users typically will), will result in a cache misses. One issue with sharing the cache over different build directories is that any paths that get encoded into the resulting object files at compile-time (e.g. paths in DWARF debug sections) could be incorrect for later consumers of the cache.

TheUnixZoo

Other articles

Preventing an optimising compiler from removing or reordering your code