I agree. We moved to Bazel at my last job and it probably took about 6 person mo...

codethief · on Dec 10, 2024

Thanks for the insightful comment!

As someone who is somewhat experienced with build systems in general (though not with Bazel) and has had to solve a lot of the issues you mentioned in different ways (i.e. without Bazel), I have been interested in learning Bazel for a long time as its building principles seem very sound to me. However, the few times I looked into it I found it rather impenetrable. In particular, defining build steps "declaratively" in Starlark to me just seemed to be a slightly less bad way of writing magic incantations in YAML. In other words, you still had to understand what exactly every magic encantation did under the hood and how to configure it, and documentation generally didn't seem great.

Is there some resource (blog/book/…) you can recommend for learning Bazel?

dilyevsky · on Dec 10, 2024

Jay Concord’s blog is pretty good intro: https://jayconrod.com/tags/bazel

jrockway · on Dec 11, 2024

I feel like I got the basics from using Blaze for years at Google. Things like "oh yeah, buildifier will autoformat my BUILD files" and the basic flow of how a build system is supposed to work.

Figuring out how to complete a large project with Bazel involved a few skills that one should be ready to employ.

1) Programming. The stuff out there can't do things exactly the way you want. I wanted to use a bunch of golangci-lint checks with "nogo", so I opened up the golangci-lint source code and copy-pasted their code into my project to adapt the checks to how nogo works. People have tried fixing this problem generically before, but their solutions ended up not working and there are just a bunch of half-abandoned git repositories floating around that don't work. Write it yourself. (I had to write a lot of code for this project; compiling protos the way we want, producing reproducible tar files with more complex edits than I wanted to do with mtree -> awk -> bsd tar, installing built binaries, building "integration test" go coverage binaries, etc. Lots of code.)

2) Debugging. A lot happens behind the scenes and you always need to be situationally aware of what's being done for you. For example, I was pretty sure our containers would be "reproducible" i.e. have the same sha256 no matter the configuration of the build machine. That was ... not true. I tested it and it wasn't happening. So I had to dive into the depths of the outputs and see which bytes were "wrong" in which place, and then debug the code involved to fix the problem. (It was a success, and oddly I sent the PR to fix it about 5 seconds before someone else sent the exact same PR.)

3) Depth. There probably isn't a way to be functional where you pick something out of your search results, follow the quickstart, and then happily enjoy the results. Rather you should expect to read all of the documentation, then read most of the code, then check out the code and add a bunch of print statements... with each level of this involving some recursion to the same step for a sub-dependency. For example, I never really knew how "go build" worked, but needed to learn when I suspected linking time was too high. (Is it the same for 'go build'? Yes. Why? It's spending all of its time in 'gold'. What's gold, the go linker? No, it's the random thing Debian installed with gcc. Is there an alternative? Yes, lld and mold. Are those faster? Yes. How do I use one of those with Bazel? I'll add some print statements to rules_go and use that copy instead of the upstream one.)

With all that in mind, I never figured out "everything". There is a lot of stuff I took at face value, like configuration transitions for multi-arch builds. The build happens 3 times but we only build for 2 platforms (the third platform is the host machine). I don't know why or how to prevent the host build. (I did figure out how to do this for some platform-independent outputs, though, like generating static content with Hugo.) I also wrote a bunch of toolchains but never used Bazel's toolchain stuff. I had my works-with-5-lines-of-code way of running vendored tools for the host machine and never saw the need to type in 50 lines of boilerplate to do things the "right" way. I'm sure this will burn someone someday.

In the end, I guess motivation was the key. People on my team couldn't get their work done, and CI was so slow that people spent half their day in that cycle "I'm going to go read Reddit until CI is done". Hacks had been attempted in the past, and had a lot of effort put into them, and they still didn't work. So we had to rebuild the Universe from first principles, doing things the "right" way. The results were good.

I will always prefer this approach to the simpler ones. For one thing, Bazel always gives the "right answer" when it's set up correctly. It doesn't rely on developers to be experts at managing their dev machines; you include all the tools that they need and you can update them whenever you want a new feature, and they get it for free. That's the big selling point for me. I also can't deal with stuff that is obviously unnecessary, like how Dockerfile-based container builds require an ARM64 emulator to run "mkdir" in a Dockerfile. You're just generating a stack of tar files and some JSON. Let me just tell you where the tar files and the JSON is. We do not need a virtual machine here.