kazinator 16 days ago

I worked with the internals of this some 16 years ago, maintaining a customized version at Zeugma Systems.

Some changes of mine was reworked by someone and upstreamed:

https://github.com/ccache/ccache/commit/e8354384f67bc733bea5...

If you follow the link to the mailing list posting, you will see I implemented a feature: search through multiple caches.

Say you have a build server which populates a cache. Your developers can point their ccache installations to that. However, ccache wants to write things there. The fix is to have a two-level hierarchy: ccache will look in their local cache first, and then the one from the server. If there is a cache miss, it goes to the local one; the server cache isn't polluted with development objects.

  • speed_spread 16 days ago

    One could develop a whole package manager based on this. Add some standard build profiles to something like Gentoo's portage to maximize cache hits and you could have the best of both worlds: a fast source-based distribution. Specifying a custom build profile for a package would just mean it would take more time to install but would otherwise be transparent.

    One thing missing from this is a trust framework where cache writers would sign their compiled code. There could also be a verification layer.

    • heavenlyhash 15 days ago

      Warpforge -- project website: http://warpforge.io / https://github.com/warptools/warpforge -- is a project I work on that's heading a bit in this direction. Hashes in + build instruction = hashes out. It's really a powerful primitive indeed.

      Building big sharable package systems is indeed a second half of the problem. Figuring out how to make the system have the hashy goodness AND be usefully collaborative with async swarms of people working independently and yet sharing work: tricky. We're trying to do it, though!

      We have our first few packages now published, here: https://catalog.warpsys.org/

      You can also see _how_ we build things, because we publish the full rebuild instructions with each release: for example, here's how we packaged our bash: https://catalog.warpsys.org/warpsys.org/bash/_replays/zM5K3V...

      I'm in #warpforge on matrix with some collaborators if anyone's interested in joining us for a chat and some hacking :)

    • nextaccountic 15 days ago

      NixOS uses a cache that works like this; it's source-based like Gentoo but for packages in cache it will download the cache rather than rebuilding. (it will rebuild if you configure the package in such a way to invalidate the cache, or install a package not in cache)

      Besides the global cache for the whole distro, you can also set up caches for other software. For example, if you build your projects with Nix, you can have a cache for your projects (so that new contributors won't need to recompile everything from scratch). That's the premise behind https://www.cachix.org/

      The only difference is that Nix caches aren't fine grained like ccache and sccache

    • M95D 15 days ago

      Gentoo supports building and installing binary packages from portage: You can compile once and then distribute the resulting package to other computers where they can be installed by Portage. Portage even verifies that the USE flags are the same!

    • yjftsjthsd-h 15 days ago

      Nix is also almost there - it happily uses binaries if they match the exact build inputs but compiles if no binaries match - but it's not nearly as fine grained (you'd need something like 1 package per object file).

      • ghoward 15 days ago

        And the saddest part is that Nix could be exactly there and solve another one of their problems at the same time: content-addressed outputs.

    • Cloudef 15 days ago

      It's called nix

dragoncrab 16 days ago

I've spent some time to deploy shared ccache for our Azure Devops PR builds as well as locally. The 1 hour build time went down to 5 minutes for release, about 9 for debug. It took another 2-3 minutes to download/upload the shared ccache, so it's still a good 6 time speedup, not to mention that it takes much less CPU.

The trick is to set --fdebug-prefix-map and -fprofile-dir to proper relative paths and then with some extra scripting caches will be reusable across build nodes even if the workspace directory is different for each build.

This and distcc (or IncrediBuild) are a game changer for every serious C++ workshop.

  • MuffinFlavored 15 days ago

    > The 1 hour build time went down to 5 minutes for release, about 9 for debug.

    I kind of one in the future if there will be a publicly trustable, free to use "just plug and play, you don't need to set up your own" ccache

    Think of the energy savings implications

    Imagine every unoptimized build job for every microservice (Rust in Docker pulls the entire Cargo registry + rebuilds all dependencies every time you make a source code change of any kind if you don't go out of your way to prevent it)

    Obviously trusting a public source to just compile something for you and give you a binary-like object is... probably a malware distributor's dream.

    And I don't have data to prove the bandwidth cost would offset the energy savings of CPU cycles recalculating the same stuff over and over.

    Interesting though...

    • nerdponx 15 days ago

      You mean like using IPFS for compilation caching? Or something else secure and content-addressable.

  • zulu-inuoe 15 days ago

    If I hadn't used it at work at a large C++ shop, I'd never have stumbled upon how great Incredibuild is

jonstewart 16 days ago

There are all sorts of scenarios where ccache is a major accelerant. “make clean && make” is the easiest way to force a rebuild all, with relink, and ccache will supply the object files quite fast. You may also be switching back and forth between debug and release builds; again, ccache keeps the penalty low.

Finally if you run configure scripts, all those “checking for printf…” messages are the configure script generating and compiling tiny C programs invoking those functions to make sure the compiler can find them. ccache can therefore shave a significant percentage of time off running configure scripts, which is welcome.

  • Cloudef 16 days ago

    Ah the configure scripts and large c++ codebases, the bane of fast compile times. Configure scripts "checking useless stuff" isn't parallelized so it takes ages, even on a modern beefy processor. Somebody should write ccache-like thing just for autoconfigure, so it can instantly feed answer to those checks once it has once run and cached the result, maybe this could even be part of ccache (I don't mean just the preprocessing part, but whole thing).

    • JNRowe 15 days ago

      Many years ago there was confcache¹, that for a time was integrated in to portage via a FEATURES flag for Gentoo users. It wasn't particularly useful in the general case, and never really worked that well in practice. I don't recall the idea really taking off anywhere else, and I can't remember when it finally disappeared in Gentoo either(but it was a looooong time ago).

      It was strictly a cache, it didn't run parallel checks or make any other attempts to improve the run time.

      ¹ The only source I've found right now is https://github.com/fxttr/confcache

    • jonstewart 15 days ago

      It’d be great to have a notbraindeadautoconf that just deleted the macros for the bog standard autoconf macros, checking for compilers and cstdlib and posix functions, and then invoked autoconf to work over the actual logic of configure.ac. It’s 2022, we don’t need to worry about SunOS and Ultrix and AIX and their broken compilers.

    • fatneckbeardz 15 days ago

      why i prefer modern languages number 800.. no autoconf. (rust, go)

quocanh 16 days ago

Also see mozilla/sccache for Rust.

https://github.com/mozilla/sccache

  • aidanhs 16 days ago

    Readers please note that sccache isn't just for Rust, and isn't just caching!

    It has a number of features, combining capabilities of ccache/distcc/icecream for C, C++ and Rust...along with some unique things that I've not seen in other tools. My comment at https://news.ycombinator.com/item?id=25604249 has a summary.

    • menaerus 16 days ago

      icecream and ccache has been a killer combo for my development routine, even if I only connect my workstation with the laptop which I would otherwise almost never use.

      distcc didn't for example handle the load balancing well in my experiments whereas icecream did much better on that front and thus resulting with noticeably shorter build times. icecream also comes with a nice GUI (icemon) which can come really handy, e.g. you can use it to observe the build jobs and debug when things go south.

      But I didn't know that sccache also supports distributed builds. From your comment it seems as if this was a recentish addition. I wonder how polished it is but I will definitely give it a try at some moment.

      • satvikpendem 15 days ago

        Sounds like you invented your own Turbopack (by Vercel) [0], which basically does what you're talking about, it hosts a remote cache that parallel build servers can access, cutting down compile times by quite a lot.

        [0] https://turbo.build/pack

    • JonChesterfield 16 days ago

      Documentation is encouraging. I loved icecream when it was working but it had serious reliability problems. This looks worth trying as a replacement, thanks.

  • ComputerGuru 15 days ago

    Note that sccache works slightly differently and has a different threshold for what is cacheable. I could never get it to match even 80% of ccache’s hit rate.

dmoreno 16 days ago

And don't forget distcc [1], together they can speed up compilations and recompilations. I remember using colleague computers to speed up a 30 min C++ compilation to just 5 min... to just seconds on second compilation.

[1] https://www.distcc.org/

  • ilyt 16 days ago

    When Gentoo was hip new thing we put a bunch of our office machines into distcc cluster just for that.

  • renox 16 days ago

    Distcc is great when it works but just yesterday I had to make the admin reboot my VM three times before giving up: my VM used all its memory which made it unusable..

BenFrantzDale 16 days ago

I use ccache and love it, but I keep wondering: wouldn’t it make more sense for compilers to do caching themselves? That would allow them to cache the results of much finer-grained operations than whole translation units. If two (or 500!) translation units generate the same template instantiations and then each optimize that code, that has to happen in each translation unit and ccache can’t help; couldn’t a compiler with internal caching do it once and be done with it forever? I’ve considered trying to add this to clang, but haven’t prioritized it.

  • bjackman 15 days ago

    I think most of the requirements of this are fully orthogonal to the job of a given compiler, so it's best solved at the level of the build system.

    Bazel does this, so instead of needing to reimplement this for each compiler it's automatically done across all your languages, and even any random one-off build rules based on a shell script.

    You can share the cache with your team/build infra too: https://bazel.build/remote/caching

    (Disclaimer: I've never used this with open-source Bazel, I work at Google and use the internal variant)

  • goombacloud 15 days ago

    I think the hard part is to know when caching is not allowed, and for that the compiler is a good point to manage such a cache because it could add metadata such as compiler version (maybe even patches?), compiler flags etc which a wrapper may miss. This disintegration of ccache and the compiler makes me not really trust it because these two could easily get out of sync and you would not easily notice when/why the result is wrong.

  • fatneckbeardz 15 days ago

    the time you spend pulling your hair out figuring out why your program stopped working due to some obscure problem with the caching mechanism, is alot more than the time you save by doing the caching. at least for a lot of people. for others it might make sense

  • tarranoth 15 days ago

    I think you are basically describing why people want modules to be implemented.

  • Cloudef 16 days ago

    zig && zig cc does this + they are even planning to do incremental compilation with binary patching

    • kristoff_it 16 days ago

      Correct about the caching, but we don't plan to support in-place binary patching for C/C++ with the same level of granularity as with Zig.

      • Cloudef 15 days ago

        That's understandable

iveqy 16 days ago

If you doing it locally only, I don't see the difference from make that will just keep your old .o files.

If you using it distributed between different developer, how do you make sure the cache result is secure? A shared cache where everyone can contribute to is really hard (impossible?) to make secure. Someone could add malicious code the cache that then everyone will use.

  • bastih 16 days ago

    To combat this scenario, we only had build servers populate the cache. Clients were readonly to the cache, so they would benefit from anything that build servers had already built, which covered 95%+ of what clients usually had to rebuild.

    Also release builds were excluded from caching, to prevent any form of poisoning there.

    • iveqy 16 days ago

      How did you know that the build servers only built trusted code?

      • account42 16 days ago

        The threat is not that the cache contains builds of untrusted code but that it contains builds that do not match the code that they are associated with.

        • Tobu 15 days ago

          As far as I'm aware (ICEs…) compilers aren't hardened against untrusted code, and a sufficiently capable exploit could be used to poison the cache.

        • jonstewart 15 days ago

          ccache uses cryptographic hashing of file contents, in addition to matching compiler arguments, so you can be sure that the code matches.

          • slavik81 15 days ago

            It uses a cryptographic hash of the _inputs_ to the compiler, but there is no way to verify that the cached artifact matches the _output_ of the compiler without actually compiling it yourself.

      • williamcotton 16 days ago

        Public-key cryptography?

        • williamcotton 16 days ago

          Here, let me explain how it works!

          Let’s say you have 15 engineers and they each have their own laptop computer. Each of these engineers generates a pair of cryptographic keys, one public and one private.

          Each engineer then gives their public key to the trusted authority that operates the ccache server. Only code that is submitted and signed by a respective private key is built and then distributed to the rest of the engineers.

          • gdhdjdvr 14 days ago

            So what you are talking about is gpg signed git commits and a private ci doing the building...?

            • williamcotton 14 days ago

              That’s one way to do it!

              For a public project you would only want the builds to be propagated out to other developers once the changes had been approved and then merged into a branch that triggers the CI.

  • ecaradec 16 days ago

    Make uses timestamp, so if you checkout code or switch branches even if the files are the same, make rebuild them. Also when you switch branches the ccache cache still have copy of .o files.

hoten 16 days ago

Literally just spent yesterday implementing ccache into a project's GitHub CI. https://github.com/ArmageddonGames/ZQuestClassic/commit/641d...

Using it locally too, works great on Mac, but on Windows ccache has some problems caching debug builds. IIRC the embedded debug symbols use absolute paths, so the presence of this particular flag (/Z something...) disables cache eligibility.

  • rurban 16 days ago

    just set ccache -o hash_dir=false

    we are trying to use that with conan, which changes the prefix dirs all the time. without hash_dirs the full_path is not stored in cmd line args.

cesaref 16 days ago

ccache is fantastic for CI systems. It's very common for commits to only affect part of a build, and ccache allows for clean builds (tear down the build and rebuild from scratch) to still take advantage of previous compilation runs.

Just looking at a jenkins machine with ccache I see a >90% hit rate for the cache, with 440k compilations returned from cache in the last 3 months (when stats were reset last).

gladiatr72 16 days ago

Used to use ccache to do stage1 gentoo rebuilds to support the labs @ the local university. 120 build nodes made short work of it (even for 2004)

  • gladiatr72 16 days ago

    Specifically, once a semester the lab systems would become the build nodes. The gfx/opengl class always used a textbook that was written around libvtk. Each new version had a new set of minimum versions for its long list of dependencies. This allowed for building the entire system to the same dependencies (vtk was very concerned with the version of libjpeg/pcx/etc whereas the rest of the system (eg x11, gnome) were happy with whatever vtk dictated.

    Ccache only requires r/ssh access from the controller to any remote build nodes and the ccache program. I've never heard of anyone using this program as a shared cache source. That would be, well, kinda dumb. :/

    • jonstewart 16 days ago

      Do you mean maybe distcc instead of/in addition to ccache? They’re often paired up.

  • rjzzleep 16 days ago

    I used to use it for gentoo locally even, but I don't remember why and I don't think I only used it for stage1. I do remember that even locally it had a huge impact on build times.

    • account42 16 days ago

      I used it for normal package builds in Gentoo for a while but the hit ratio was not too great. Even small USE changes modify the command line (or a shared config header) for most compiler invocations in the package. Same for going from release version to release version. I think it makes more sense for development builds where the changes from commit to commit are smaller.

    • gladiatr72 16 days ago

      Oh yes. I meant: started a complete stage1-3 build. Building the compiler chain and the kernel was but a single can of paint to watch dry (among a display case of cans)

  • nurettin 16 days ago

    Why not debian binaries? It even had source packages back in 2004.

    • gladiatr72 16 days ago

      Hrmm.. More fun* to use a system designed to adapt itself to the specific requirements and much less frustrating. Gentoo required no hacks and (remarkably) delivered few unstable distribution builds

      *when there are 6 labs of machines to use

jgaa 16 days ago

ccache is my friend. It's really useful ;)

distcc is also nice if you have access to a k8s cluster with spare capacity. https://lastviking.eu/distcc_with_k8.html

I used distcc with k8s on a medium sized C++ project, until I got a workstation suitable for the compilations (32 core AMD thread-ripper). With the new workstation in place, I changed the build-script for the project to use ccache by default for all builds, and mapped a docker volume from the build-container to the local disk to keep the cache around.

sedeki 16 days ago

How does it compare to zapcc? My team used it a few years ago.

https://github.com/yrnkrn/zapcc

  • zeotroph 16 days ago

    That is the compiler itself sortof creating "ad hoc" modules to cache parts of the source code, ideally even inside a single compilation unit, which especially helped with template heavy code. This way completely new code could be sped up, which with ccache would never get a cache hit.

    However that project is based on an old version of clang and the changes were never upstreamed (initially it was a commercial product), so sadly this project is practically dead.

pjmlp 16 days ago

ClearMake was the first time I used such kind of build caching (aka derived object sharing).

w-m 16 days ago

Is there a sane way of using ccache in Xcode projects generated from CMake nowadays?

For other generator targets, adding ccache was a single line in the CMake configuration, but for Xcode you had to bend over backwards. This was maybe 4 years ago.

  • hoten 16 days ago
    • w-m 16 days ago

      It's a nice tutorial on how to set this up, thanks. But you still have to create two launcher script files, and then set some undocumented internal Xcode variables. So to me this is more a of "no" to my question: `set_property(GLOBAL PROPERTY RULE_LAUNCH_COMPILE "${CCACHE_PROGRAM}")` still doesn't work for Xcode, you have to do some weird things.

DiabloD3 15 days ago

Man, ccache and distcc.

I haven't used either of those since 486s and P1s were popular.

mihaigalos 16 days ago

Bazel, anyone?

  • klodolph 16 days ago

    I use Bazel but it’s hard to deny that ccache is way easier to set up.

troxy1 16 days ago

I wish I could clean cache objects related to a specific c++ file. Like a bad object got into the cache and there isnt a way to remove it unless I nuke the whole cache.

  • jrosdahl 13 days ago

    Here's how you can do that:

    1. Build. 2. Remove the object file associated with the C++ file. 3. Build again with CCACHE_RECACHE=1.

  • BenjiWiebe 15 days ago

    How does a bad object get in the cache?

ergonaught 16 days ago

When I did C++ development, ccache and distcc were absolutely vital to keep build times manageable. I can’t imagine not using them, in that environment.

typ 15 days ago

The shared-state cache in the Yocto project is also interesting. It seems the sstate cache is language agnostic.

jedisct1 16 days ago

Or replace your compiler with zig cc, that already includes a cache.

oau123 16 days ago

Also known as Makefile :P

  • jonstewart 16 days ago

    Tell me you’re not a C/C++ developer without telling me you’re not a C/C++ developer.

  • bastih 16 days ago

    Gotta be one hell of a makefile.

    • gladiatr72 16 days ago

      GNU Autotools: A beutiful thing

      GNU Autotools: A hideous thing

      :D

  • Jorengarenar 16 days ago

    What? Makefile doesn't cache anything

    • FrostKiwi 16 days ago

      The commenter was referring to having compilation and linking in separate stages, as is standard in Makefiles to enable Multi-Threaded compilation. As in

        cc -c main.c -o main.o
        cc -c init.c -o init.o
        cc init.o main.o -o final_binary
      
      In that specific setup, ccache does indeed not provide a speedup, since those .o file are kept. The Makefile simply checks if the source file has a younger modification date than the object file and recompiles only if it's older. In that sense the Makefile does in fact cache the results. Once we go beyond single-user and onward to bigger projects, ccache starts making sense.
      • slavik81 16 days ago

        All rules inherently depend on the Makefile they are defined in. The second you touch the Makefile rules, you have potentially invalidated all the object files. So, ccache is great when working on the build rules.

        Using ccache is also nice when you have generated files. If you edit the generator code but the output is identical, Make will needlessly rebuild everything and ccache will make it quick.

      • Jorengarenar 16 days ago

        Unless you `make clean` as creators of ccache notice on the linked site

    • jason0597 16 days ago

      make recompiles source code files by detecting the last modified date on a file, hence it only recompiles source files as necessary. So if you have 10 source files with 5000 lines of C in them, and you only change one of them, it will not recompile everything, it will only recompile that source file which has changed.

      Which makes me agree with the parent above, I don't see how exactly Ccache is supposed to be used. Maybe for a distributed source directory with many developers working on it?

      • Jorengarenar 16 days ago

        On ccache site, there is section "Why bother?", the very first line:

        >If you ever run `make clean; make`, you can probably benefit from ccache. It is common for developers to do a clean build of a project for a whole host of reasons, and this throws away all the information from your previous compilations. By using ccache, recompilation goes much faster.

        • oau123 16 days ago

          Putting aside my tongue-in-cheek comment, honestly this argument does not convince me very much.

          What is the purpose of "make clean" other than to invalidate the whole cache so that it is cleanly recompiled? In such a situation I would want to invalidate the cache from ccache also completely.

          I'm sure there are legitimate reasons for using ccache but it is not very obvious to me what it is:

          "Only knows how to cache the compilation of a single file. Other types of compilations (multi-file compilation, linking, etc) will silently fall back to running the real compiler. "

          Well yes, traditional use of makefiles has been exactly to cache the compilation of single compilation units and trigger the compile of changed units - ccache does not help with granularity here it seems.

          Distributed development might be a good argument for this, but then what does it offer to faciliate that? It seems to suggest using NFS - which I could do with a Makefile as well. So is the advantage that it uses hashes instead of timestamps? Timestamps work quite well for me, but maybe that is a valid point.

          Another argument could be that is stores the precompiled units somewhere else and therefore doesn't clutter the file system. But is that really a good argument? Build directories exist, so even if you'd like to keep compiling several variants in parallel you could do so with a few different build directories.

          And yes, there are quite a lot of newer alternatives to Makefiles as well, so it would have to compete with those alternative build systems as well.

          • BenFrantzDale 16 days ago

            I basically never `make clean` but ccache is a boon for `git bisect`. In theory bisect takes log time; in practice, without ccache, it’s slower because handwave build time goes by something like the log of the number of commits you jump across.

            • account42 16 days ago

              It's still log(total commits) of full rebuilds. I don't think git bisect ever promised anything more.

              • BenFrantzDale 15 days ago

                If it’s my branch, I often have a build of every commit in my cache. Even if not, each jump back and forth makes a bunch of new cache entries, many of which will be reused in subsequent bisect steps.

      • hoten 16 days ago

        When switching branches, won't the modified times change and invalidate everything (with just make)?

      • inglor_cz 16 days ago

        It was my experience that building a native Android project with older NDKs benefitted hugely from introduction of ccache. Especially if you had multiple branches of the same code (release/nightly) that shared a significant code base.

        That was pre-Android Studio times. IDK what is the situation now.

        • izacus 16 days ago

          Still true now, especially if you switch branches/variants/flavors or do multiple builds.

      • actionfromafar 16 days ago

        If you have perfect description of dedepencies, and you develop alone, ccache is not useful.

        But how many times have I seen perfect description of deps?

            gcc -o hello hello.c
        • jonstewart 16 days ago

          Or if you never switch branches in git, or never rerun configure, never switch between -O0 and -O3, never run make clean, never…