Edd Barrett, Tue 24 October 2023.
tags: Compilers Build-systems
Introduction
I'm currently working on a project that's based on LLVM
and I have to build (and re-build) it from source quite a lot. Although the
LLVM build system is highly parallel, my development machine isn't beefy, and
doesn't have many cores. Building LLVM can take a long time.
I do however, have SSH access to machines with lots of cores! It's just that
those are not machines I can realistically develop on.
This article discusses how to build LLVM quickly on a remote machine using two
quite old tools that you may have heard of: ccache
and
distcc
.
I'm not discussing anything particularly profound here, but I found setting
this up (distcc
in particular) quite fiddly. Since there are no guides
on how to do this specific to LLVM, I thought I'd write it down here and
hopefully save someone else (at least my future-self) some time.
Background information and goals
Let's start by briefly discussing what we are working with and what we want to
achieve.
ccache
ccache
is a compiler cache for C and C++. It works by creating a mapping from
hashed source files to compiled object files. When you compile a
source file, the C preprocessor is invoked and the resulting text is hashed.
The cache is then queried with this hash, at which point one of two things can
happen:
-
A cache miss. The C/C++ compiler is invoked and the resulting object file is
stored into the cache. The compiler is (compared to the cache) slow, so we
want to avoid this case as much as we can.
-
A cache hit. The pre-compiled object file is immediately available in the
cache and there's no need to invoke a compiler.
After building a project from scratch for the first time, you could clean your
build directory and rebuild everything without invoking the compiler at all
(assuming your cache is large enough).
(With caveats [0], you could also share the cache between users
so as to not duplicate compilation effort)
distcc
In the words of the project's website, distcc
is a "a fast, free distributed
C/C++ compiler". It lets you spread your compilation jobs across many hosts,
instead of just compiling things locally.
It has two modes:
-
"plain" mode: where preprocessed files are distributed and compiled.
-
"pump" mode: where raw, unpreprocessed source files are distributed and
compiled.
The two modes have benefits and drawbacks. Plain mode, for example, isn't as
sensitive to discrepancies between the libraries on the different systems
involved: since preprocessing is local, all compile hosts use the same library
versions (note that linking is not distributed).
Pump mode, on the other hand is faster, since both compilation and
preprocessing can be distributed. According to the distcc
manual pump mode
can speed things up by "up to an order of magnitude over plain distcc
".
Goals
My goal is to build the LLVM code base faster using ccache
and distcc
. I
want one local ccache
, and to use distcc
upon cache misses.
I'll use plain mode distcc
because I didn't want to have to keep the
libraries in sync on all the machines (anyway, the distcc
manual page says
"distcc
's pump mode is not compatible with ccache
")
I also want to do all of the network communication over SSH, since the machines
are on a network with other hosts that we don't control.
How's it done?
First some terminology. Let's call the machine that will initiate the
compilation the "initiator", and the remote machine(s) that will receive jobs
over SSH "remotes".
Suppose we have two remotes: remote1
and remote2
.
(For brevity, I'm assuming all of the hosts run Debian)
Installing stuff and starting the distcc
daemon
The first thing to do is install distcc
on all hosts involved:
Then on the remotes ensure that only the localhost is allowed to start distcc
compile jobs (we can do this because jobs coming in over SSH will be considered
local).
On the remotes, edit /etc/distcc/clients.allow
so that they contain only one
non-comment line:
Then start distcc
on the remotes:
# systemctl start distcc && systemctl enable distcc
On the initiator, install ccache
:
SSH communications
Now we need to make sure that the initiator can SSH to all of the remotes
without a password. Chances are, you already know how to do this, but here's a
summary:
To make a no password SSH key, do something like:
(-t
chooses an encryption algorithm -- choose one you are comfortable with)
When prompted for a passphrase, just hit enter. This ensures that we don't have
to enter a passphrase every time the initiator wants to send a compile job to a
remote.
Put the public part of the key into ~/.ssh/authorized_keys
on all the
remotes, then on the initiator put something like this in ~/.ssh/config
:
host remote*
IdentityFile ~/path/to/id_ed25519
(substitute the path for the key you just made and adjust the hostname(s) as
appropriate).
Now check you can SSH from the initiator to all of the remotes without a
password.
To speed up SSH comms, we can multiplex connections using a master connection.
This means that a fresh SSH connection doesn't have to be established every
time the initiator wants to send a job to a remote.
To enable a master connection, you will want to expand the entry in your
~/.ssh/config
to look more like:
host remote*
IdentityFile ~/path/to/id_ed25519
ControlMaster auto
ControlPath ~/.ssh/master-%r@%h:%p
On your remotes, make sure that MaxSessions
(in /etc/ssh/sshd_config
) is
set high enough for the number of compile jobs you've allocated to each remote.
(If you are not planning on using a SSH master connection, also read up on
MaxStartups
).
(Don't forget to restart sshd
)
Scheduling the simplest compile job
Before we dive into LLVM, let's do something smaller first. Let's compile
nothing in a distributed fashion :)
$ cd /tmp && touch empty.c
$ DISTCC_VERBOSE=1 DISTCC_HOSTS="@remote1/128 @remote2/128" ccache distcc -c empty.c
If all goes well, you should get a empty.o
file and see output like this:
distcc[2376013] exec on localhost: x86_64-linux-gnu-gcc -E /tmp/empty2.c
distcc[2376016] exec on @remote1/128: x86_64-linux-gnu-gcc -c -o empty2.o /tmp/empty2.c
You can see that preprocessing was local, but compilation was remote.
A quick primer on DISTCC_HOSTS
. It lets us choose where to schedule jobs.
It's a space separated list and (for our purposes) each entry starts with
[user]@hostname[/jobs]
for an SSH host, or localhost/[jobs]
for the local
machine. jobs
is the maximum number of compilation jobs to
send to the host at once -- for a remote, I usually set this to the core count.
If you don't want to have to set DISTCC_HOSTS
in the environment every time,
you can edit /etc/distcc/hosts
.
Now would be a good time to inspect what ccache
has cached:
$ ccache --show-stats
cache directory /home/vext01/.cache/ccache
primary config /home/vext01/.config/ccache/ccache.conf
secondary config (readonly) /etc/ccache.conf
stats updated Thu May 4 11:56:23 2023
cache hit (direct) 0
cache hit (preprocessed) 0
cache miss 1
cache hit rate 0.00 %
cleanups performed 0
files in cache 2
cache size 8.2 kB
max cache size 5.0 GB
You can see that we had one cache miss. If you re-run the compile command then
you will see cache_hit (direct)
increment while cache_miss
stays at 1. You
will also notice that there is no distcc
logging output the second time
around. This is because ccache
never invoked a distcc
! It used the object
file from the first time we compiled the file.
If you got this all working, it's time to get this working with the LLVM build
system.
distcc
and ccache
for LLVM.
LLVM uses cmake as its build system, so we have to trick cmake into invoking
ccache
and (where necessary) distcc
instead of using a C/C++ compiler.
LLVM's cmake setup already provisions for ccache
which makes this much
easier. To use ccache
, pass -DLLVM_CCACHE_BUILD=On
when you configure the
build. We should also probably use (roughly) the same C/C++ compilers on all
remotes, so it would be prudent to pass -DCMAKE_C_COMPILER
and
-DCMAKE_CXX_COMPILER
.
Configuring LLVM would therefore look something like this:
mkdir -p build
cd build && cmake \
-DCMAKE_C_COMPILER=/usr/bin/clang \
-DCMAKE_CXX_COMPILER=/usr/bin/clang++ \
-DLLVM_CCACHE_BUILD=On \
...
../llvm
(where ...
are the rest of your configure arguments)
But how do we put distcc
into the mix? The easiest way I've found is to set
CCACHE_PREFIX=distcc
in the environment when building.
Then there's one last consideration. By default cmake
(or rather the tool
cmake
is generating for, e.g. make
or ninja
) spawns a number of parallel
compile jobs suitable for the core count of local host's CPU. We need to up
this so that we can make the most of the larger number of cores available to us
on the remotes. For two remotes, each with 128 cores each, 256 parallel jobs
seems reasonable.
So when I build LLVM (after configuring the build), it looks something like
this:
DISTCC_HOSTS="@remote1/128 @remote2/128" CCACHE_PREFIX=distcc cmake --build build -j256
If you got it right, LLVM should build faster using the cores of the remotes.
You should also find that rebuilds from scratch should be even faster due to
the cache. After finishing a build, remove your entire build
directory,
reconfigure and rebuild. You should be surfing on a wave of cache hits. The
build only slows during linking and when LLVM's
tablegen
stuff is run. These tasks cannot
be cached as they are not C/C++ compilation jobs.
You may have to fiddle around with -j
a bit to find what works best --
remember that although you are compiling remotely, you are still preprocessing
locally. Can the local host handle 256 concurrent preprocessors?
I hope this helps. This setup is working well for me. In 2023, I'm using this
setup on a daily basis for development of our experimental
JIT.
If you spot any mistakes, please email me, or message me on mastodon.
Footnotes
- With a few build system hacks, sharing a cache between users should be
possible. At the time of writing, LLVM's build system sets
CCACHE_HASHDIR=yes
, which makes all cache lookups sensitive to the
directory in which a user is building. This means that compiling the same
file twice, but in different directories (as different users typically will),
will result in a cache misses. One issue with sharing the cache over
different build directories is that any paths that get encoded into the
resulting object files at compile-time (e.g. paths in DWARF debug sections)
could be incorrect for later consumers of the cache.