Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The video game industry uses bulk builds (master files) which groups all .cc s into very large single .cc files. The speedups here are like 5-10x at least. These bulk files are sent to other developers machines with possible caching. The result is 12 min builds instead of 6 hours.


Chrome supported this for a long time and it really helped small developers outside of Google be able to build chrome without specialist machines.

But that feature was pulled by the chrome team with the stated justification being that since C++ guaranteed different things (iirc around variable scopes outside of a namespace) for one file vs multiple files, supporting the jumbo build option meant writing some language that was "not C++".

Unfortunate.


When "best" is the enemy of "good enough"


Many of them use this tool: https://www.fastbuild.org/docs/home.html

It's was built and is regularly maintained by a Riot Games principal C++ architect, and automatically compiles files in large "unity" chunks, distributes builds across all machines in an organization, and creates convenient Visual Studio .sln files, and XCode projects. It's also all command line driven and open source.

This is industrial strength C++ builds for very large rapidly changing code bases. It works.


If using cmake, you can try a unity build. https://cmake.org/cmake/help/latest/prop_tgt/UNITY_BUILD.htm...

You can also specify a `-DUNITY_BUILD_BATCH_SIZE` to control how many get grouped, so you can still get some parallelism. However, I think it'd be more natural to be able to specify number of batches (e.g. `nproc`) than their size.

Code bases may need some updating to work.


How it works with D is you can do separate compilation:

    dmd -c a.d
    dmd -c b.d
    dmd a.o b.o
or do it all in one go:

    dmd a.d b.d
Over time, the latter became the preferred method. With it, the compiler generates one large .o file for a.d and b.d, more or less creating a "pre-linked" object file. This also means lots of inlining opportunities are present without needing linker support for intermodule inlining.


sqlite uses something like this approach too, and there are additional optimisation advantages from keeping everything in a single file:

https://sqlite.org/amalgamation.html

    Over 100 separate source files are concatenated into a single large file of C-code named "sqlite3.c" and referred to as "the amalgamation". The amalgamation contains everything an application needs to embed SQLite.
    
    Combining all the code for SQLite into one big file makes SQLite easier to deploy — there is just one file to keep track of. And because all code is in a single translation unit, compilers can do better inter-procedure and inlining optimization resulting in machine code that is between 5% and 10% faster.


It makes sense. As projects grow, the average header file is included O(n) times from O(n) different .cc files - leading to O(n^2) parsed header files during compilation. And thus, O(n^2) work for the compiler.

Merging everything into one big .cc file reduces the compilation job back to an O(n) task, since each header only needs to be parsed once.

Its stupid that any of this is necessary, but I suppose its easier to hack around the problem than fix the problem in the language.


Those problems are fixable with C/C++, but nobody seems to want to do it. They are fixed with dlang's ImportC. You can do things like:

    dmd a.c b.c
and it will compile and link the C files together. ImportC also supports modules (to solve the .h problems).

It's all quite doable.


I wonder if you could create a pragma in C to do almost the same.

I don't have a good name for it but it would force the compiler to ignore previous definitions. With an 'undo' pragma as well.


Which compiler parses the same header file multiple times in the same translation unit? Compilers have been optimizing around pragma once and header guards for multiple decades.

edit: ok, you meant that each header is included once in each translation unit.


Yep. Worst case, every header is included in every translation unit. Assuming you have a similar proportion of code in your headers and source files, compilation time will land somewhere between O(n) and O(n^2) where n = the number of files. IME in large projects its usually closer to n^2 than n.

(Technically big-O notation specifically refers to worst case performance - but thats not how most people use the notation.)


I'm starting to believe that one static/shared library should be produced by compiling exactly one cpp file. Go ahead and logically break your code into as many cpp files as you want. But there should then be a single cpp file that includes all other cpp files.

The whole C++ build model is terrible and broken. Everyone knows n^2 algorithms are bad and yet here we are.


Everyone: "O(n^2) algorithms are bad."

Also everyone: "Just do the stupidest thing in the shortest amount of time possible. We'll fix it later."


Nice, that's like doing the link phase before the compilation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: