It was a good day when we finally removed 100% of the C and C++ code from the D compiler and all of the runtime library (including the memory manager). The assembler code uses D's inline assembler.
The test suite has C code in it, because of course D can compile C code.
Something I've always wondered about compilers written in their own language....
What is your process for compiling a new compiler? Let's say you make a code change to the compiler. You have a compiled version of the previous compiler you can run to compile the new compiler.
But, by definition, the new compiler is different from the old one. Do you re-run the compilation with the new compiler? How many times?
Stage 0: The stage0 compiler is usually the current beta rustc compiler.
States 1: The rustc source code is then compiled with the stage0 compiler to produce the stage1 compiler.
Stage 2: We then rebuild our stage1 compiler with itself to produce the stage2 compiler. In theory, the stage1 compiler is functionally identical to the stage2 compiler, but in practice there are subtle differences. The stage2 compiler is the one distributed with rustup and all other install methods.
State 3: To sanity check our new compiler, we can build the libraries with the stage2 compiler. The result ought to be identical to before, unless something has broken.
To expand on this, the "subtle differences" are things like optimizations introduced/tweaks in the new compiler that can only be taken advantage of by creating stage2. (since stage1 was compiled with the old compiler)
It's called bootstrapping. It's the hard part of any new language, esp lower level ones. Your first iteration will be very manual/low level, until your compiler gets sophisticated enough to compile itself, granted your trust it enough to do the right thing. It can take many iterations to get to the point that you trust your compiler works well enough for daily use to build itself and then you can slowly start discarding the older pieces from earlier iterations.
On some level I think Rust will become a major player for building compilers with (and obviously drivers), and since it is a portable executable and safe/predictable, there is a good chance the the compiler dev won't need to switch to his own language to compile itself, unless ofcoarse a point of pride, some specific functionality that rust cannot do or if the person just don't like rust.
Compiler development is a different beast altogether from most forms of programming, and I highly recommend you build a basic one as a hobby project. It will let you appreciate the shoulders of giants we are standing. Same goes for 3D/physics engines, audio/signal processing and so on. Building a basic filesystem or database that supports indexes and a strict schema that has some form of relational theory in it is also highly enlightening and will dispell the magic of sql engines (and make you appreciate those that came before you and their struggle to get where we are today).
I think where you might be getting confused is you are assuming the new feature is some syntax they'll just automatically start using in the new compiler immediately. Typically you wait till the compiler you added that new feature to is released, then you can consider how you would refactor your compiler code, and then it should compile since you already have your new syntactic sugar or whatever.
I hope that makes more sense to you. I think what Walter Bright answered was good too, but I think it helps to remind oneself, that just because your new compiler code implements something new, doesnt mean you have to use it the second you want to compile it, so it wont matter until the new compiler is ready, then you consider adding new syntax or features to compiler code base.
Thank you for adding this explanation! It helps catch the mistaken assumption one might make, that "adding X does not imply the compiler's source code also immediately starts using X".
This is great. The more self-contained (really self-hosted) a language is, the more implementation freedom and ability to evolve it gets.
Virgil version I&II were a Virgil->C compiler written in Java. Later, I wrote an interpreter for Virgil III in Java and then began writing a compiler in Virgil III. When that compiler could compile version III (including itself), I checked in the first "stable" compiler as a jar. Then periodically when enough new features and bugs were fixed, I checked in a new stable binary (jar). Later, I developed and eventually fully switched to native backends for 32- and 64-bit x86 on MacOS and Darwin. Today, 5 stable binaries are checked in: jar, x86-darwin, x86-64-darwin, x86-linux, and x86-64-linux. There is also a Wasm backend, which can bootstrap the compiler too, but I did not check in a stable binary for it.
Initially I was worried that a codegen bug would prevent bootstrapping from a compiler binary and that I'd need to fall back to running on an interpreter. So far, there's never been a codegen bug bad enough to break bootstrapping, so I am not worried about this. The compiler never needs to bootstrap from an interpreter.
If after reading the post you're still unsure about why we're going through this process, I made a video that focuses more on the reasons from the perspective of a Zig contributor, showing how the bootstrapping process helps contributors on their day to day tasks.
Thanks, the explanation about how the Zig compiler uses compile-time code execution to implement multiplatform support in the compile-to-c-backend helped me understand why WASM has better trade-offs for Zig than other bootstrapping options.
For others, the start of the video is discussing boostrapping in general, and the current compiler state, and then the discussion about "Why WASM" starts at around minute seven.
> The idea here is to use a minimal wasm binary as a stage1 kernel that is committed to source control and therefore can be used to build any commit from source. We provide a minimal WASI interpreter implementation that is built from C source, and then used to translate the Zig self-hosted compiler source code into C code. The C code is then compiled and linked, again by the system C compiler, into a stage2 binary. The stage2 binary can then be used repeatedly with zig build to build from source from that point on.
1/ Wouldn't that be considered "cheating" to basically commit precompiled compiler binaries to source control ?
2/ I don't understand how that solves the "features need to be implemented twice" problem. Wouldn't you need to implement new Zig language features into that WASM kernel whenever they are used in the Zig compiler source ?
1. Yes it is cheating. That is the downside of this approach. Contributors to Zig and users of Zig don't care about such cheating, but distribution maintainers such as Debian Developers do (rightly) care. This decision is a tradeoff that favors contributors and users at the expense of system package maintainers. I am counting on a third party implementation of Zig to arise someday and solve the bootstrapping problem for system package maintainers. But in the short term, it's more important to prioritize the needs of users and contributors.
2. Whenever this happens, the contributor runs `zig build update-zig1` and commits the updated wasm kernel to the repository.
I do not think that at all. But I have witnessed tortured vivisection of codebases that were never meant to undergo that procedure, all to satisfy a fantastical taxonomy. In that regard, I am triggered, yes.
Is the argument you were alluding to that a distro wouldn't consume Zig until it can be bootstrapped by two compilers to show it hasn't been the victim of the Trusting Trust attack?
Oh sure, I won't deny that package maintenance has caused plenty of issues for upstream authors [1].
Yes that's right. More specifically they have rules about generated files. They are not allowed. Generated files such as binary blobs must be produced as part of the build process of the package.
Distros like to compile from source, but generally don't mind bootstrapping off pre-built compilers not in the source package, since any other method means basically giving up on packaging software altogether.
The only exception that I am aware of is GNU Guix and the Bootstrappable Builds project, which aims to build a full distro starting with ~512 bytes binary, they have gotten quite far already.
> The Scheme interpreter is written in ~5,000 LOC of simple C, and the C compiler written in Scheme and these are mutual self-hosting. Mes can now be bootstrapped from M2-Planet and Mescc-Tools.
This is important work these folks are doing. I'd love to see some distros pick it up and periodically show that they can start with some files on disk and a ten-key. USB drive and a UEFI console?
They got a lot further than GNU Mes now. The process starts with stage0, ~512 bytes of machine code. Their live-bootstrap project is another focus of work. Another thing being worked on is using the Fiwix kernel as a step towards getting Linux bootstrapped. Also higher-level languages like Ocaml are getting bootstrapped.
It's only marginally less ugly than blessing one arch (like arm or x86) and running the bootstrapping with an emulator.
Don't get me wrong, I do like it more, but I realize it's mostly an aesthetic thing. Logically and functionally it's like if you just blessed a build jsing cosmopolitan libc or something like that.
Perhaps the distro devs could maintain their own golden WASM blobs that they compiled themselves and thus trust. Could be the same process as SecureBoot / package signing keys.
Can the stage1 wasm binary be reproduced by the stage2 or stage3 executable?
Aside from a "trusting trust" type of attack that seems fine, and every modern distro relies on some bootstrap binary for C compilers anyway (usually older versions of them), so it wouldn't be that much different of a bootstrapping problem than bootstrapping GCC itself.
(See the GNU Mes project which attempts to bootstrap from just a very small hex interpreter)
>and every modern distro relies on some bootstrap binary for C compilers anyway (usually older versions of them), so it wouldn't be that much different of a bootstrapping problem than bootstrapping GCC itself.
Yes, that's exactly my point. The zig-wasm-bootstrap package could just be a Build-Depends of the zig package.
It's what OpenJDK does. OpenJDK releases very quickly nowadays, and requires version N or N-1 to build version N. If that works for OpenJDK, presumably it should work for Zig.
Does it really prevent the Ken Thompson attack though? Well, it means the attacker has to be a committer to keep the attack from eventually breaking, or that the attack will eventually break.
You could use a different compiler written by someone else to increase the amount of work and coordination needed by the attacker to pull it off, but this is not reasonable to require for new (or new-ish) programming languages -- it'd more likely squelch programming language research and development than aid it.
There are multiple Java implementations, but does Debian build the OpenJDK with non-OpenJDK implementations? Would that eliminate the trusting trust problem?
> Does it really prevent the Ken Thompson attack though? Well, it means the attacker has to be a committer to keep the attack from eventually breaking, or that the attack will eventually break.
If the attack is sufficiently well squirreled away in code that rarely changes then that "eventually" could be a very long way away.
However, I imagine the risks of Trusting Trust are a tad overblown considering how much other lower-hanging fruit there usually is to attack through. For example just sneaking in subtly broken commits containing security vulnerabilities.
If Zig was at version 19, like OpenJDK, then Zig could just not commit stage0 and instead say "download stage0 from ... or install from your friendly distro pkg repos".
Eventually, presumably, Zig will get to that level of maturity. In the meantime, to me, it seems like not-a-big-deal to commit a very small stage0.
> 1/ Wouldn't that be considered "cheating" to basically commit precompiled compiler binaries to source control ?
No, absolutely not. This is how Virgil bootstrapping works by design. There are 5 pre-compiled compiler binaries in the repo. The repo is completely self-contained so that any revision at any point can compile itself from source, except the very earliest versions that needed an interpreter in another language. The stable binaries are updated infrequently, about once every 3-6 months.
This would definitely be considered "cheating" by the Bootstrappable Builds folks, who build everything from source, including generated binaries and generated code files.
They can consider it cheating, but they themselves use previously-compiled compilers to build, do they not? The only difference is that their previously-compiled compilers are not in the same source repositories as the compilers they are used to build. That is no guarantee that the Thompson attack is defeated.
The best way to defeat the Thompson attack is to insist on multiple distinct implementations -by different authors- of the implementations of each programming language, and even this only makes Thompson attacks a lot harder to pull off -but not impossible- for determined attackers. But one cannot insist on multiple distinct implementations for every new programming language, as that would simply make new programming language R&D to be prohibitively expensive.
Zig could, and arguably should switch to an OpenJDK-style bootstrapping system to please the distros. Essentially this means that using new language features in the Zig compiler has to wait until those new language features appear in a released version. Whether this is realistic, idk. In any case, Zig can also keep the stage0 in the repository for use by developers (but not distros).
They do not use previously-compiled compilers, no.
Instead they are working on building an entire distro starting with ~512 bytes of machine code plus a ton of source. They aren't there yet, but are getting closer.
The Thomson defeat you mention sounds like diverse double compiling, by David A. Wheeler.
I don't think what OpenJDK has is something that Guix/Bootstrappable folks would like either, they had to bootstrap off the Jikes implementation in C++ instead:
The Virgil repo contains every stable compiler binary produced in an unbroken chain back to the first interpreter. You can literally check out every single one of the 2,200 commits and build the compiler from the source and binary checked into the repo. If that's not a bootstrappable build, I don't know what is.
A bootstrappable build (according to bootstrappable.org) is a bootstrap process where you don't use any binaries of the compiler itself to build the compiler. Unless the compiler already has a maintained source-only bootstrap, the usual way to do that is to write a smaller implementation in another language with just enough features that it can build the compiler. camlboot is an example of that. Another option is to build a chain of previous versions of the compiler that were written in other languages, before the compiler became self-hosting.
1. No, why? OpenJDK version N requires an OpenJDK version N or N-1 to build, and you can download and install that if you need it. What's the difference between "you can download and install stage N-1" vs "stage N-1 is committed"? If the build artifact that is committed is small, then I would argue that there is not much real difference between those two.
2. To add a language feature, you edit the Zig-coded compiler. Then you build it, test it, and you're done. If you now want to change the Zig-coded compiler to use the new feature then you have to update the committed compiled-to-wasm Zig compiler.
I have honestly been more excited about Wasm for desktop than I am for the web. And I'm really excited about it for the Web. Really cool to see this use case pop up right as I'm trying to integrate it into my stack!
Nah, this use case is why Niklaus Wirth created P-Code for Pascal, and how UCSD created a full Pascal based OS that had P-Code based binaries, and some models even had a primitive JIT/AOT compiler for it.
WASM is just another reboot of bytecode based binaries that keeps poping up in multiple ways since at least 1961, when Burroughs Large Systems got released.
You're right, but even so one can still be excited it's popping up again. This time with a lot of support from various parties. And it's cool that zig goes with this solution too.
I will say that I'm mildly disappointed that there is no mention of Wirth in this article though. I guess Andrew didn't get around to read his work yet. I'd would expect him to love it; they'd probably agree on many things.
I've never really thought about wasm for the desktop (I've thought about it for server and of course browser), can you elaborate on your excitement? Is it just for this sort of bootstrapping application, or are there other benefits?
Wasm is an opportunity for all platforms to adopt a single container format for architecture-independent binary code, and with uniforms APIs being defined on top of that; they are already standardizing a subset of POSIX. None of the bits here are really new, but the big difference this time is that it's not owned by any one of the major players (like e.g. Java and .NET were), and the design process is truly collaborative across the entire industry. This makes it politically viable.
Neither Apple nor Microsoft are supporting WASM on the desktop. Being able to run basic command line programs on top of POSIX? Sure, that'll be possible. But that's also already possible without WASM, and yet still very rarely done.
Having actually portable APIs and abstractions such that you don't need to be concerned with differences in file paths, system behaviors, system services, audio, video, windowing, input, etc...? Yeah, WASM isn't the answer to that and it's not trying to be either.
The reason Java/.NET "failed" had nothing to do with being driven by only one player and everything to do with the abstractions are so high level at that point as to be problematic. Which you either need to fully embrace, and end up with Electron, or you end up trying to reinvent WxWidgets or AWT and the clunky compromises that result where it's really just easier to build a platform-specific binary
Also both Google and Microsoft have (or currently have) their own Java bytecode runtimes. So that's not even locked up by one player, and just compare Android vs. desktop JVM to see the inevitable future of "portable WASM"
More like compile once and distribute one blob, run anywhere. In theory.
Write once, run anywhere is true with cross compilers and native executables without any bytecode intermediate formats. Or even things like APE executables and cosmopolitan-libc.
The hard part is finding portable libraries to actually do anything interesting. Networking, graphics, GUI, peripherals. WASM is not helping here and maybe even makes things a bit worse by introducing yet another platform to the portability matrix.
I do see the allure of using WASM for sandboxing, plugins and running untrusted code. Things where the distribution part matters.
Could someone explain to me why Zig is getting hyped so much on HN? From a quick glance it looks like Zig is memory-unsafe like C/C++. I thought the macro trend was moving onto memory-safe languages:
Rust is the only mainstream language I know of that's memory-safe without the overhead of GC or reference-counting. Doing that is really hard, and compromises the dev experience, so I'm not sure I'd call it "the macro trend", outside of the uptake of Rust itself. It has tradeoffs
Zig and some other languages like Nim have taken a different approach: "If we don't try to fully solve memory safety, what are all the other ways we can improve on the status quo to make a decidedly modern systems language?" Modern tooling, colocating arrays with lengths, strings as first-class citizens, error types, nullable types that the compiler can reason about, better static types and inference in general, etc. There's a whole lot of room for improvement over C even if you're not going all the way to borrow-checking
Also, much of what Zig improved over prior art and much of what Rust improved are orthogonal and can coexist. The ideal language for me would be one that takes the syntax unification, the arbitrary-width integer types, the meta-programming, and the explicit allocators from Zig, and the traits, borrow checker, scope-based destructors, and move-by-default from Rust.
D has steadily moved towards full memory safety. The one remaining thing is dealing with manual storage allocation, and D has a prototype borrow checker to address that.
What do you mean by memory safe? This line is trotted out all the time but no one defines what they actually mean. Is Go memory safe? Rust is not a "memory safe" panacea. You can write memory unsafe code in Rust.
Zig also has a much better developer experience around "memory safety" compared to C/C++. It really is an interesting alternative to writing something in C. You can compile it in debug mode and get out of bounds checks, for example.
Rust is, in fact, a "memory safe" panacea if you're willing to put up with the syntax and complexity. Memory unsafe code in Rust is clearly marked as such, and you have to go out of your way to write it. Swift is similarly memory safe out of the box.
By contrast, it seems to be trivial to write unsafe code in Zig.
The window has closed for languages that don't take memory safety seriously. The Zig team can work on it now, or they can work on it later, but they will have to do it to get the language past a certain level of adoption in the modern world. People are starting to write real, useful Linux kernel modules in Rust.
> The Zig team can work on it now, or they can work on it later, but they will have to do it to get the language past a certain level of adoption in the modern world.
The objective of every language does not have to be world domination.
The rust community can tend towards smugness. And the evangelism on social media can be a bit much.
But I am indebted to quite a few people on the rust discord (or one of the rust discords), who have been kind enough to share their knowledge with me. Nothing but nice things to say about them.
I guess at this point if you dont like one rust community, find another. Theres enough of them now.
The zealots are always the loudest members of the communities. With Rust it's the "safety" zealots. With golang it's the "simplicity" zealots. With C++ it's the "old is gold" zealots. And so on.
Just ignore the communities. Judge the languages on what they do for you.
I am not a zealot as I do not really "love" any language. I have used many languages and my "love" changes depending on the nature of the project / task within the project or sometimes client insists on particular one. Language to me is a tool like a screwdriver and while I do have personal preferences as long as the language is not atrociously bad / not suitable for the task I do not really care.
I am kind of in the "old is gold" camp but not because I do not want to learn new language (I actually do) but because this language needs to offer significant benefits.
If I am to use Rust it would be for writing enterprise backends as this is what I use this type of language for at the moment. Currently I am using C++ for this. When I look at using Rust for the same purpose I see less expressive language with less ecosystem comparatively to C++ and need to dance under fresh moon while sacrificing virgin to accomplish some simple things. All for the benefit of memory safety without loosing performance.
Current project is already 3 years old and I've yet to discover any safety related issues in it (leak, range error, use after free etc.). Modern C++ in my opinion provides enough scaffolding to write reasonably safe programs on the language / library level and with things like address sanitizers. So switching to Rust would not really give me any ROI on investment. If client wants it in Rust and pays for it then sure. Otherwise I have more interesting things to learn.
I am being paid for designing and implementing products, not for being expert in particular language.
On almost any C/C++ or Zig related discussion happening on HN or Twitter you'll find some random Rust evangelist asking why people are still using a "memory unsafe" language to build things.
Implying, basically, that any non-Rust system programming language is obsolete. (and should maybe even be considered harmful)
I find that deeply annoying.
I don't have predictions about the future; the Zig community is not toxic for now.
> But you have to prove that your value proposition, doesn't lead to horrible failure down the line.
No I do not. My software is done, it has run for decades on the Internet, making me money; For whatever you could mean by "horrible failure", what I am doing either doesn't lead to it, or it's clearly not that bad.
> ... is not a valid argument
We're not having an argument; You want to ask a question, I can give you an answer, but there's nothing here to argue:
I'm ``using a "memory unsafe"´´ language because I can write secure programs that run quickly with them, and I don't live in a Harrison Bergeron world.
This is about why you can't yet, and really doesn't have anything to do with me.
No, but it requires covering for fixes, free of charge, in special security critical infrastructures.
Software company bares the cost of producing faulty software.
When it touches the bank, many companies will start considering alternatives.
In fact, this is what drove Microsoft et al to start finally embracing other stacks, the amount of money burned fixing exploits for free in OS updates.
> I'm ``using a "memory unsafe"´´ language because I can write secure programs that run quickly with them, and I don't live in a Harrison Bergeron world.
Did you prove it mathematically?
Did you wrap it in Rust API and use miri to inspect for UB?
Did you ran it through ASAN, MSAN, STACK, etc?
All UB/platform-specific behavior has been accounted for?
So you ran fuzzers for days-weeks? What issues did it uncover?
Do your test coverage account for all cases and most input variations?
Or is this a case of ``I think I can write secure programs``. There is a light-year-wide trench between thinking and proving you can do something. And it still needs to break is not a guarantee.
No, it's not. That's why I am asking for clarification. By testing, proving programs, running fuzzers, running sanitizers, and various other tooling you reduce the chance of problems to an acceptable level. You can't be perfectly safe, but nothing doing anything isn't acceptable behavior in face of preventable risk.
Sure, writing unsafe code is hard. And from what I see most of these issues relate to old versions of Rust having unsound issues that were fixed at later point.
> In the standard library in Rust before 1.52.0,
> In the standard library in Rust before 1.51.0,
> In the standard library in Rust before 1.19.0,
...
Basically by upgrading your Rust version, your code becomes less and less buggy over time, which is not something C can boast about (modern C++ is safer but still a far cry from what is achiveable in let's say Java).
> No, it's not. By testing, proving programs, running fuzzers, running sanitizers, and various other tooling you reduce the chance of problems to an acceptable level
Yes it is. If the outcome is acceptable to you it's because you think so.
The rust authors thought wrongly too: using this testing, proving, fuzzing, sanitizing and various other tooling "accepted" the code with bugs in it, but it wasn't good enough, so they fixed it. It clearly wasn't "acceptable" to them.
Meanwhile, I'm wondering what the heck kind of crazy crackhead do you gotta be to think that the "bugs" nobody has hit in ten-year-old-code that paid for my house are somehow worse than these bugs that passed testing, proving, fuzzing, sanitizing and various other tooling.
> but nothing doing anything isn't acceptable behavior in face of preventable risk
Going 200kph the wrong way is absolutely worse than doing nothing.
> Basically by upgrading your Rust version, your code becomes less and less buggy over time, which is not something C can boast about
No, your code doesn't become less buggy, you just stop using other peoples' clearly buggy but nonetheless proven, tested, fuzzed, and sanitized code.
Your application may or may not become less buggy: If a user can't hit a bug, it isn't a bug. But if they have, you're going to be hoping those other people at least made small diffs so you have a chance of finding it.
> Going 200kph the wrong way is absolutely worse than doing nothing.
What's the alternative, have twice as many critical vulnerabilities? Using Rust in Android, Linux and for drivers has already been proven to work, and work rather well (Linus snark aside). See recent postings about Rust code in Android.
> The rust authors thought wrongly too: using this testing, proving, fuzzing, sanitizing and various other tooling "accepted" the code with bugs in it
Just because fuzzing, testing, proving still let bugs exist doesn't mean it pointless. Let alone doing it in memory unsafe language.
> Meanwhile, I'm wondering what the heck kind of crazy crackhead do you gotta be to think that the "bugs" nobody has hit in ten-year-old-code
The kind of crackhead, that had to pick up pieces after 15 year old code that people thought it was "working", but had massive oversights. I know what passes for working, and honestly it scares me. From segfaults when comments are removed, to XML parsers that don't understand namespaces, to bugs caused in unexercised code that fucked over entire ecosystems. Would Rust solve all of them, probably most likely the first one wouldn't happen.
That said, I'm not judging your code, it's possible to make C code without UB, but it's kinda like winning a lottery. libfyaml is one such library.
> What's the alternative, have twice as many critical vulnerabilities?
Oh don't be silly: The software with the best security track record is written in C (e.g. qmail) so there are obviously many alternatives. You could sit and think for a bit, for example.
> Just because fuzzing, testing, proving still let bugs exist doesn't mean it pointless. Let alone doing it in memory unsafe language.
I never said pointless, just that you were wrong what what they do.
> That said, I'm not judging your code
Really sounds like you are. I talk about code that's finished, and you talk about code that isn't finished.
It totally makes sense to me how someone who wants to never get finished would use rust, but friend, that isn't me.
Every language once it grows beyond a certain point will it have its share of cooks. 1% of people are psychopaths. So in 10,000 people you have hundred psychos.
GP is right. In a memey, stereotype, sort of way. But, Rust community generally holds people evangelizing Rust by RIIR (rewrite it in Rust) in high disdain.
That said what C/C++ (and other memory unsafe lang) people can't seem to understand is that unlimited Undefined behavior is a trainwreck, and that Rust offers much, much more than just peace of mind regarding memory safety.
And no. Just write better code doesn't work.
It feels surreal. Like imagine you are driving a car with seatbelts and airbags. And everyone is saying. "Well I can just drive better and the belt is annoying. Doesn't allow me to get switch seats while driving.
And it won't help if you fall in river, so it's useless. Plus brakes make you go slower"
I decided that explaining and education is a lost cause. Let evolution sort them out.
Sigh. First nice strawman. Guess you are also helping up your <favorite language/niche> toxicity numbers. Har, har.
Second it's not Rust zealotry - maybe memory safety zealotry, but my reasons are well corroborated at this point (see Android safety issues with native code).
Use any memory safe language (C#, Java, Ada...) that fits your needs. Just stop pretending memory unsafe languages have no issues, and that it's all programmer's fault, i.e. just code better/ don't make mistakes/ don't hold it wrong.
Most people in Rust community do not even know about Zig, while some of them are supporting Zig. I know it is anecdote but I haven't found the opposite to be true. It is always someone from Zig community bashing on Rust. So even it is smaller/younger community then Rust it can be already pretty toxic. And there are not many sings to address this issue (apart from creator of Zig who seems to be really nice person).
Language-level guarantees of memory safety are not critical to all low-level programmers, and sometimes this is fine!
Developers of games, compilers, digital audio workstations, video editors, and live performance software (such as openFrameworks) likely don't rank memory safety as their top concern.
Zig is already an attractive choice for those domains because it offers:
- Better tooling than C/C++. The ability to cross-compile Zig and C/C++ from one machine lets you set up much more stable and reproducible build environments already. You can clone zig-gamedev and have the demos working with just three commands on Windows/macOS/Linux, for example, and two of those three are cloning the repo and changing to the directory: https://github.com/michal-z/zig-gamedev (to build the examples you will need the latest copy of Zig from the 'master' section for your platform at https://ziglang.org/download/ )
We should all be careful about insinuating that memory unsafe languages should not exist. I see “friends don't let friends use memory-unsafe languages” on social media and feel sick. It's much healthier to embrace the melting pot of Zig, Odin, D, Beef, Vale, Hare, V, Lobster, Jai, C3, Val, Roc and all the rest and see what new ideas and trade-offs they bring.
Also worth noting that new languages tend to take time to develop their own philosophies to memory safety (Vale's approach is only just now emerging, for example: https://verdagon.dev/blog/making-regions-part-1-human-factor ). Others take years to gradually improve and develop techniques for better memory safety (like D). Zig's story might not be as good as Rust's ( https://www.scattered-thoughts.net/writing/how-safe-is-zig/ ), but then it's not Zig's priority at the moment, and Zig's full story is not yet written. Even if Zig's safety features don't improve further between now and 1.0, it already has great value as a language.
I think you're misunderstanding the value proposition of languages like Rust and Swift. It's not that they help safeguard user data or statistically reduce crash logs in analytics, although those are certainly useful properties in every domain you've named; I will stipulate for the purpose of this reply that developers in those domains don't value their users at all, although I don't believe it to be true.
The value proposition is that they eliminate entire classes of low-level bugs. Certain problems that you'd otherwise spent weeks debugging during a large project just don't happen. You can spend your time on the actual logic of your task rather than debugging all of the boilerplate around it. Developers of games, compilers, DAWs, NLEs and live performance software absolutely care about productivity.
I write Rust and enjoy spending less time on memory bugs. I am not blind to the benefits.
But I’d struggle to match your claim along the lines of, “games and DAW developers would be more productive with a memory-safe language because they wouldn’t have to debug memory safety bugs”.
Memory safety in Rust might be “zero-cost” but it isn’t free.
Languages like Zig accept that developers spend time on things outside of memory bugs, seek to improve their productivity and quality of life in those areas, and trust that devs will pick tools that reduce their largest pain points, be that Zig or Rust or Odin.
The best response we as an industry can have to this is to say, “wow, I’m glad so many hard-working people feel motivated to bring some of those bars down on the Ways Software Sucks Chart, let’s give them our money and our support!”
To me that’s healthier than assuming that everyone’s Suck Chart looks the same, tapping the memory safety bar over on the right and saying, “sheesh, anyone using a language that doesn’t fix this bar just doesn’t realise how productive they could be!”.
It also detracts from celebrating the engineering achievement here. Two people deleted their creaking C++ compiler by writing a custom interpreter in two weeks so their language can be bootstrapped using only system-installed tools. It is uncharitable to insinuate that they needn’t have bothered because if you really care about productivity you wouldn’t use languages like their one anyway.
“other than comptime” is a really big chunk of the language though. Without it you can’t even use generic types for example.
Zig’s “compile-time metaprogramming as just normal programming” philosophy is what really makes me looking forward for the language. It’s a feature than no other low-level language has fully scratched the itch yet (Rust still use an ad-hoc mix of hygienic and procedural macros for many compile-time things, while having a complex generics system to complicate things even more. Nim has a better story towards metaprogramming but it still feels less elegant. Jai’s metaprogramming facilities are a bit more ambitious what Zig is doing, but it’s not released to the public yet, so there’s that.)
They do eliminate certain classes of low-level bugs, but we shouldn't always ignore the tradeoffs that can come with memory safety. GC and RC have a performance tradeoff, and borrow checking has complexity and developer velocity tradeoffs (and yes, I know there are some people who say they are immune to this effect).
For this reason it's important that we keep exploring alternative approaches and languages such as Zig, even if they don't have the level of memory safety one might personally deem appropriate for a certain domain.
Vale is even more memory safe than Rust, yet I don't go around saying Rust shouldn't exist ;)
>Certain problems that you'd otherwise spent weeks debugging during a large project just don't happen.
I write C++ professionally and I've never come across a problem that took weeks for me or a colleague to debug. With modern tools like Valgrind, address sanitizer and thread sanitizer it's generally possible to identify an issue within at most an hour or two. Far more time is spent debugging logic and performance issues.
Rust's value proposition is a type system that can encode rules in a way that most other languages can't, some of which eliminate entire classes of low level bugs - through the borrow checker rules, for example. The power of Rust, however, is that you can use that same type system to encode your own business rules and design APIs that are safe in their specific domain much like the borrow checker handles memory safety.
Zig is still a lot more memory safe than C or C++. While being a much smaller and elegant language. The stuff with alignment being part of the type system is brilliant (and pretty damn safe).
The compile-time features and the fact that you have to use allocators explicitly are interesting things. Other than that there’s nothing else there, for me. But it’s turned out to be more interesting than when I first saw the first blog post about intending to start on this language.
That isn't really an interesting statement, it's perfectly possible to make memory safe programs in C or assembly. The question is, how easy is it it to ensure a program is memory safe in Zig?
The trend toward memory safety is marked by languages and tools making it harder to inadvertently write exploitable code, and easier to verify that the program is not exploitable. My understanding of Zig is that while it does some of the same things as so-called memory safe languages, it is not a "memory safe language" in the same sense as they use the term.
Unfortunately for Zig, most of the hype (I'd say around 70 to 80%) it gets from HN comes from the “anti-Rust” crowd: C programmers and people in general who have been put off by Rust (either by the “safety” discourse around Rust, or by its functional programming background, or both most of the time).
Some people are so concerned about Rust that they need a language champion their resistance against it [1]. After Rust hit 1.0, Nim had a surge of popularity on HN for this exact reason,
That's very unfortunate for Zig as a language. Because it distracts from the main points of the languages (which is IMHO a very cool/powerful C toolchain + an interesting experiment about “what if we got all-in to constant-time evaluation“), and because it artificially inflates the “community” with people who don't genuinely care about it, and will leave whenever another anti-Rust champion eventually becomes in a better position (it could be Carbon, or Jai, or anything).
Personally, I don't think Zig has much chance becoming mainstream or overthrow Rust as “the future of system programming”, because it (IMHO) doesn't adds enough business value[2], but for a programming language perspective it is indeed very interesting.
Maybe if it could take just enough Rust concepts to make it as memory-safe, then maybe, but it would also mean being more “compromise to achieve mass adoption” and less idealistic about its design, which would likely make it less interesting on a PL perspective.
[1]: there's for instance a quite famous Java guy here who've been spending a significant time on HN explaining “why we cannot be sure that Rust doesn't ads more bugs than it removes” and other bullshit. And when Zig came out, he suddenly became a huge fan of it…
[2]: C++ took the lion share against C because it solved the organizational problem of how you deal with big code-bases worked on by big teams. And now, for the “systems” world, Rust is slowly but steadily creeping in, because the stability+security it provide were unheard of in non-managed languages until.
> because it (IMHO) doesn't adds enough business value
I dunno about this. We're seeing Zig being used as the compiler toolchain in per-existing C and C++ codebases here and there, and is used by at least one big tech companies for this very reason[0] and once you're already using the build toolchain, there's less barrier to then using the language to extend your code.
As far as I can tell, long-term, that's probably how Zig is going to work its way into the space, being the all-in-one toolchain for managing existing code, and then having a programming language that's as low-level as C, but without the complexity of C++/Rust, that just happens to come with said toolchain. From some of Andrew's comments, it also seems that the planned package manager will also be meant for C & C++ projects.
These aren't all features that don't already exist in other tools individually, but as far as I know (which admittedly isn't a lot) there aren't any that bring it all in a single convenient package with sane defaults that works out of the box.
> I dunno about this. We're seeing Zig being used as the compiler toolchain in per-existing C and C++ codebases here and there, and is used by at least one big tech companies for this very reason[0] and once you're already using the build toolchain, there's less barrier to then using the language to extend your code.
You're right that the Zig toolchain actually provides significant value (I was talking about Zig-the-language, not Zig-the-toolchain here), but I think most of the barrier to add a new language is still there (building expertise in the company on a new language is costly so you better expect a nice pay-off).
The reason why I don't think Zig will ever really become mainstream is that it's targeting C developers, yet people still using C in 2022 are also the most conservative you'll ever find in the industry. Either because they work in domains where you cannot afford to change anything and they even stick with ANSI C and a 15 years old compiler that have been qualified some time ago after a lengthy process (think embedded). Or they're simply keeping a big existing code-base alive with barely enough resource to keep them running (think about the entire open-source stack from the 90s and before that keeps the internet running), in these circles adopting a new language requires heroic effort that I doubt anyone would pay for.
Other than that, most people doing low-level/high performance stuff have been using C++ for a while, so IMHO the need for «a better C, not a better C++» pretty is low.
You're using zig-the-toolchain (which is great, I've used it too for the same reason when Andy posted about it on Twitter) not Zig-the-language-that-I-think-won't-go-mainstream.
No matter what you think about Rust, if you think Zig is a “serious replacement for C right now” despite being in development and still very unstable, I don't think you're doing C for a living.
I very much do C for living (as well as bunch of other languages). While I do not think zig is ready C replacement as for now. It's only one that's aiming to replace it and looks promising. Rust is not a C replacement, but rather C++ replacement.
> One big downside is losing the ability to build any commit from source without meta-complexity creeping in. For example, let’s say that you are trying to do git bisect. At some point, git checks out an older commit, but the script fails to build from source because the binary that is being used to build the compiler is now the wrong version. Sure, this can be addressed, but this introduces unwanted complexity that contributors would rather not deal with.
If it's the main concern of using a prior build of the compiler, an alternative solution is to develop a tool for contributors to automate and ease the process. For example, Rust has this: https://github.com/rust-lang/cargo-bisect-rustc
If you’re interested in trying Zig out and want an easy way to update/use multiple versions I’ve been working on a Zig Version Manager for the past few weeks.
It works on Windows, Mac, Linux, a smattering of BSD’s and Plan 9. Arm and x86.
> We provide a minimal WASI interpreter implementation that is built from C source, and then used to translate the Zig self-hosted compiler source code into C code. The C code is then compiled and linked, again by the system C compiler, into a stage2 binary. The stage2 binary can then be used repeatedly with zig build to build from source from that point on.
Nope, no matter how many times I read this, I’m still lost.
But then I never needed to care about VMs, compilers and bootstraps.
Okay so we start from some C source code, and we build an interpreter for WASM + WASI from that (WASM that has access to system calls)
C source -> WASM interpreter w/ system access
Now we can take the Zig self-hosted compiler (the one in .zig), which has been compiled to .wasm/.wasi files. Since we have an interpreter for those now, we can do this:
Zig compiler as .wasi instead of .exe --> WASM interpreter --> Zig's "translate-to-c" function, for the .zig file sources of the Zig compiler
E.G.
$ run-webassembly "zig-compiler-as-wasm.wasi" --translate-c <source code to zig compiler>
At this point, we have the Zig compiler as .c files. Now you can use GCC/clang or whatnot, to build a regular binary for the compiler
Output of Zig's "translate-to-c" from previous step --> GCC/clang --> Zig compiler but NOT AS WASM, as a regular binary
They wanted you to only have to use what's in the git repo itself, plus whatever C compiler is on your system. Thus, a basic WASM→C converter was written, and kept minimal to save on space, because it's only meant to work on this one binary.
I'm in the same boat. This seems like an enormous amount of work to avoid archiving compiler binaries for a baseline architecture, and supporting cross-compilation.
Weird take. This wasn't gone to avoid supporting cross-compilation—Zig can already cross-compile, even without the LLVM backend. This was to avoid having to provide a binary for every individual OS and Architecture combination that Zig supports.
Using a VM that is agnostic to the OS or architecture it's running on means that you only need to provide a single binary, and in this case WASM+WASI was chosen.
I'm surprised that compiling a partial Zig backend to WASM and then compressing that ends up meaningfully smaller than compiling to C and compressing the C, when you include also the C partial WASM implementation and zstd decoder. This sounds kind of like a general strategy for compressing C code which I would not have expected to work well, but cool that it does!
If AndyKelley ends up reading this - did you end up doing a direct comparison of "zig1.c.zstd + zstd.c" size vs the "zig1.wasm.zstd + zstd.c + wasm.c.zstd" set that you ended up with? If so, how did it turn out?
I think it boils down to how bloated is the C code generated by the C backend, which to some degree has to be, since it's generated programmatically.
My undestanding is what ends up happening is that the wasm step acts as a form of semantic compression that brings its own benefits over zstd (and which can still be combined with zstd by compressing the wasm file).
> We provide a minimal WASI interpreter implementation that is built from C source, and then used to translate the Zig self-hosted compiler source code into C code.
What do you use to compile the Zig source into C code? Wouldn't you need a Zig Compiler?
I would have expected this?
> We provide a minimal WASI interpreter implementation that is built from C source (i.e. so we don't need a Web Browser), then used to translate Zig Self-Hosted Compiler WASM code into C code. The Zig Self-Hosted Compiler WASM code is committed to the code base each time it changes, so when building a commit you already have the WASM source to a Zig compiler right there.
> Of course, in the context of bootstrapping, this Zig Self-Hosted Compiler WASM source needed to be generated the first time at some point. For that first time, we used the C++ compiler to compile the Zig Self-Hosted Compiler from Zig into WASM.
This is a method of ensuring that future builds do not depend on the existence of a Zig compiler; it's not a way to go from 0 to Zig without a Zig compiler ever having existed. Technically this already existed in the form of a Zig compiler written in C++; the point of this exercise was to stop using C++.
So:
Presume the existence of a compiler that can compile Zig. Use that compiler to compile the written-in-Zig Zig compiler to WASM. Now you have a big chunk of WASM, so you also need a WASI interpreter. Write that in 4,000 lines of highly portable C. Then use that WASI interpreter to run your big chunk of WASM code and give it your written-in-Zig Zig compiler, and tell it to output C. Then compile that C code with your system compiler, and then use that native executable to recompile the written-in-Zig Zig compiler. At this point you should be at a fixed point and further recompilations of the Zig compiler will yield the same binary.
Thank you for the summary. It's very helpful for people like me whose only experience with compilers is a class in college.
I'm slightly confused by these two statements together: This is a method of ensuring that future builds do not depend on the existence of a Zig compiler and Presume the existence of a compiler that can compile Zig. Does that mean that future builds do not depend on having a Zig compiler at hand, rather than the existence of it?
I'm having trouble not visualizing this as basically a subcategory of the second solution explored in the solution space section, "Use a prior build of the compiler". Isn't that, effectively, what this ends up being? except with an added VM abstraction encapsulating the platform-specific portions of the compiler. I don't quite get why the same thing can't abstracted in a library; if there's no existing compiler for riscv64 (like in the example), you'd write another implementation of that library, or you'd update your wasm2c / WASI interpreter to support riscv64.
is there value in additional stages where the output zig3 compiler is used to compile the written-in-Zig Zig compiler to WASM again (since previously it was obtained in binary form from the repo), and follow the steps again? Presumably you'd end up with a zig binary that is equivalent bit-for-bit with zig3, correct? But does it help strengthen the chain of trust in the commited wasm binary?
> Presume the existence of a compiler that can compile Zig.
I just mean that the original WASM blob needed a working Zig compiler to produce it. But you can just assume it fell from the heavens. Now that it exists, you no longer need a Zig compiler binary in order to build a Zig compiler.
One way to think of it is that this allows the automatic creation of "stage 1" compilers from Zig compiler source. "Stage 1" was Zig's term for the original written-in-C++ Zig compiler, but I'm just using it to mean "A Zig compiler that does not need Zig".
To give a concrete example - say you want to add a new keyword gofast that makes all your loops go twice as fast. The first step would be to implement that in the written-in-Zig Zig compiler, without using the gofast keyword in your source. Once that's done, you compile your compiler. Then you update your compiler source code to use the new gofast keyword and recompile it again. But now you have a problem - how is someone going to build the new compiler from source if their build of the Zig compiler doesn't already support gofast? They would have to check out the commit (that their existing compiler supports) which adds gofast support but doesn't use it. Build that, then update and build again.
So, the alternative to that is that you update the "stage1" compiler which does not depend on Zig. But no one wants to update a crusty old C++ codebase. So instead, someone who has a gofast-enabled compiler compiles that source code down to WASI, and then anyone can bootstrap from there following the procedure we already discussed.
> Then you update your compiler source code to use the new gofast keyword and recompile it again. But now you have a problem - how is someone going to build the new compiler from source if their build of the Zig compiler doesn't already support gofast?
Thank you, this is what made me understand the core issue at hand here. I was confused as to why they needed to keep updating the "old" compiler once they had written the self-hosted one.
what's the practical benefit? if there's already a self-hosted Zig to binary compiler, why not delete the C++ code and call it a day? if suddenly some to bootstrap Zig checkout the old commit, and done.
I assume the Zig to WASM compiler and WASI were already a thing anyway, right? So instead of safekeeping the old commit now they can opt to keep this generated WASM code?
WASM is platform agnostic, so it is one of the things you start with, along with the compiler source code. It is built on a different computer before the bootstrapping process begins.
Naive question about Zig: is there any tooling for embedding it within larger Python codebases akin to Maturin (for Rust) or Nimporter (for Nim)?
I have seen examples where the Zig code imports Python.h and uses low-level Python C API calls but I want something very lightweight for accelerating computational bottlenecks without worrying about unwrapping/wrapping data.
Not sure exactly what you need, but since Zig is C compatible, it's easy to build a zig library and import it from python using ctypes. I guess if you need something more sophisticated you could use cffi (haven't tried it).
I'm looking for something that fits into a setup.py file (or, like Maturin creates a multi-language package config) which (1) automatically compiles zig source into a Python extension module for me, so that I can (2) just import zig code into Python and call it without writing any type conversion logic.
From the perspective of a package maintainer (I don't deal with the core infrastructure of our packaging system, I just package and patch things):
While this unusual bootstrap with a WASM stage and a C WASI interpreter doesn't satisfy "everything from source" it's so much better than sitting on a non-Intel/ARM or non-Windows/Mac/Linux machine and having no other option but to maintain 5 different ancient versions of a compiler for a bootstrap sequence, or worse, being required to cross-compile from another host.
Forgive me if I'm being a bit presumptuous here, but it feels like "C and C++ have diverged so saying 'C/C++' is now wrong" is now one of those viral sentiments that people just fling around. Like "ah! he said C/C++! gotta call him out". Perhaps that's not what you're doing here, but I think it's completely fair to say that C and C++ have language footguns, even if they have diverged. Objective-C/Swift probably also have footguns ;)
If the specific statement meaningfully applies to both C and C++, as it does here, I think it's completely valid to write C/C++. And I say that as a fairly advanced C++ boutique templates author. https://github.com/Cons-Cat/libCat/blob/main/src/libraries/a...
There was a time when C++ was merely "C with Objects." Those days are long gone. Those have been the two primary languages of my long career. The modern C++ I write today bears some similarities, but it like saying Latin/English.
For me those days are only gone on my hobby coding and conference talks.
Have some fun reading the code of Windows SDK C++ libraries, Android, or plenty of C++ libraries used in enterprise shops, plugged into managed languages.
Not freeing memory is a fairly common approach, in compilers and command line tools as well. If an AST hangs around until the end of the program anyway, why not let the OS take care of blasting it away? free() is expensive after all.
Functions like Type::getInt32() or whatever it's called return the same heap allocated pointer each time so pointer equality can be used for value equality. That's a nice trick that only really works optimally if you leak the pointer, or at least don't free it until very near the end.
I've seen cleaning up memory at the end of a program take 20% of the run time and that was indeed patched to just exit & leak as a result. With a flag to clean up so we could still run valgrind on it usefully.
Yes, any compiler-as-a-library software can't do this.
And compilers-as-libraries are becoming more popular these days since being a library makes it easier to integrate the compiler with tooling, like linters and IDEs / language servers.
You say "before" as if it's a thing of the past :) gcc is still not written as a library. clang became famous partly for providing a C compiler as a library.
Though gcc's case is specifically because of ideology. Even ignoring ideology, there used to be no need to write compilers as libraries, because the only binary that would use such a library was the compiler itself. The only purpose of the compiler was to be invoked from a shell / build script to compile code. That is changing with things like IDEs that need "compiler services" becoming more mainstream, as I said.
There's whole allocation strategies built around this idea. One of the simpler, more charming ones is a 'bump' allocator. The implementation of malloc bumps an offset in contiguous blob of bytes and free does nothing at all. malloc is very cheap, the OS takes care of dealing with the contiguous blob of bytes. Bump past the end of the blob and your program crashes.
It makes a lot of sense for short-lived command line tools to not free memory, since usually any allocated items will be needer over one invocation of the tool.
It just so happens WASM is the one VM target that LLVM supports. Sure, it's a nice VM that can be implemented without too much fuss, and ditto for WASI, but that's it. It's just the most convenient VM to target for us.
One reason (among many) to do this is because compilers require complex and demanding source code across a wide-range of theory and algorithms.
And so make good tests for both the source language (is it sufficiently expressive to do this cleanly?) and for the various analysis, optimization, and code-generation passes.
Given that one of the most often repeated complainst is the lack of operator overloading, which makes any kind of vector math (so, all graphics programming, and many other things) very ugly, would you re-consider adding it to the language?
Where would I read more about Zig? Not that I have cycles to spare, but I think a fair bit about moving from C. For example, I maintain an ASN.1 compiler written in C, and I hate C, so I made it emit a JSON AST of ASN.1 modules, and now a friend of mine just wrote a backend in Swift that takes that JSON AST output and produces Swift code / templates. Leaving C behind requires a good path for porting legacy C to the new thing, or else tons of time and mindshare to get new things built. So D and Zig are very appealing.
I wouldn't be surprised if it had some more advanced optimizations or similar things which don't affect compatibility but also note there's one trailing clause in the description: “plus sharing Zig code with the new one”. I'd be curious exactly how much code could be reused across the two like that — it doesn't seem like it should be _that_ much because they were trying to do this to avoid commonly needing to implement things in two places.
> Now, there is this WebAssembly binary, which is not source code, but is in fact a build artifact. Some people, rightly, take these things very seriously [...].
Regarding this concern, well, you have to commit that build artifact because you're moving fast, but eventually you could do what the OpenJDK does: to build OpenJDK version N you need OpenJDK versions N-1 or N, and you can get OpenJDK version N-1 from your distro or from any number of places (like AdoptOpenJDK). You're essentially doing that now, but with unnamed versions -- you have to know which commits are like JDK version boundaries, and the clue is that the commit updates that one build artifact.
> It is then further optimized with wasm-opt -Oz --enable-bulk-memory bringing the total down to 2.4 MiB. Finally, it is compressed with zstd, bringing the total down to 637 KB. This is offset by the size of the zstd decoder implementation in C, however it is worth it because the zstd implementation will change rarely if ever, saving a total of 1.8 MiB every time the wasm binary is updated.
Is the goal here to save space in the Git repo, by compressing before committing?
I wouldn't assume using zstd is necessarily worth the complication. It could even make things worse.
As I understand it, Git stores objects in packfiles[1], and these are both delta-fied and compressed with zlib.
Your zstd reduces the 2.4MiB .wasm file to 637K. But Git's zlib should reduce 2.4MB to 800K (according to a quick test I just did). So at best, you only save 163K, not 1.8 MiB.
But if Git's delta-fication works, you may actually use more space.
Git should try to use its binary diff algorithm[2] to compare your various committed versions of zig1.wasm. If that algorithm is effective against Wasm files (my guess is yes), it will be able to store one version as a full copy and other versions as (somewhat? much?) smaller deltas against the full one.
If you store .wasm.zst files, since compression tends to obscure commonalities, my guess is Git won't be able to do deltas and will have to store full copies of every version.
On a side note, Git is said to be bad at handling binaries, and that's somewhat true, but there's some nuance. Binary files get in the way of easy branching and merging because Git can't merge them. So Git is bad at binary files in that way, but that's not relevant here. Also a lot of binary formats (like JPEG) are very much not amenable to binary diff, but my bet is that's not relevant here either.
Actually zstd makes that worse too, somewhat paradoxically. At least in this case, because Zig uses xz for their tarballs. (If they used gzip, it would be the other way around.)
The reason is that compression algorithms usually can't make further reductions when re-compressing already-compressed files. And xz has a higher compression ratio than zstd, so when you stick zig1.wasm.zst into a tar.xz file, xz is deprived of the opportunity to work its more powerful magic.
As a test, I got zig-0.11.0-dev.638+5c67f9ce7.tar.xz from https://ziglang.org/download/ , extracted it, and rebuilt the tar.xz myself. Then I replaced stage1/zig1.wasm.zst with stage/zig1.wasm and rebuilt the tar.xz again.
So, zig.orig.tar is the uncompressed tarball that contains zig1.wasm.zst, and it is indeed smaller than zig.new.tar. But the .tar.xz files are the other way around.
Not using zstd saves 68K.
=-=-=
Also, in the process, I accidentally discovered something else that makes a bigger difference.
Since I knew the order of files within a tar archive can affect the compression ratio (due to data locality), while doing my test, I used "tar tf" to list my tar file's contents and compare it with what I downloaded. It didn't match, so I knew I wasn't doing an apples to apples comparison.
So I added "--sort=name" to my tar commands. And both of my tar files ended up smaller than the one I downloaded:
$ du -sk zig-0.11.0-dev.638+5c67f9ce7.tar.xz
15152 zig-0.11.0-dev.638+5c67f9ce7.tar.xz
Just adding the "--sort=name" option to tar saves 584K! That's around 4% of the entire tar file. Locality matters more than I thought.
This is clever. There's one thing that I do not understand:
Why the step 3 compiler has only the C backend enabled? In theory one could enable all the backends and skip to step 6? The step 5 comment says something about 'correct final logic', but I'm unsure what it means?
That's the Zig compiler that is implemented inside the wasm blob. Since we commit that blob to the repository, we want to keep is as small as possible, which is why it only contains the C backend and nothing else.
I have decided to stick with main stream languages after playing with various new languages in the past including ziglang, it's fun but in the end, more of a waste of time.
in practice a language is really an ecosystem, from compiler, tools, editors, libraries, field testing...if you want to get things done, you just have to use the main stream ones.
This is my personal experience, ymmv, and maybe somebody needed coffee or something, but! I've found the zig community more friendly and open-minded than the Nim community.
Again, one person, one experience, I like them both, yada yada.
I suggest you have a look at both of them, and decide. The Nim book is very good.
I would NOT call using zig or nim a waste of time; yes, the ecosystem matters. But decent languages + good libraries for what you need to do = Total Win. IHMO, this is why Python wins so bigly in the (increasing) influence it has.
C wins, and will always win, I think, because it's the closest we have to a portable assembly language. We'll see, re: wasm. Maybe wasm will be the 'pdp-11' of computing for the 21st century.
from only a glance and without having ever used nim, it seems to be more abstracted from the machine, whereas zig is closer. nim code should be shorter, nicer, closer to ideal pseudocode. safer too, i imagine. zig code should be more explicit, and if you do it right, more efficient in time and memory.
zig also has a philosophy of, quoted from https://ziglang.org: "No hidden control flow. No hidden memory allocations. No preprocessor, no macros.". this also should make zig code more explicit, but probably more verbose too.
i could also be completely wrong. like i said, i know nothing of nim other than what the homepage says. don't listen to me.
In the early days when CS and languages are new things, we need evolve faster. When things kind of settling down, we need avoid NIH. Time is different.
we don't reinvent new languages to compete against English, Chinese, Spanish,etc nowadays, I'm sure that's different in the early days when human was figuring out how to communicate and how to create the language they need.
Nim is garbage collected / reference counted by default (but there's a way to turn it off, there's lots of GC options). It also ships with a much more batteries included standard library. Nim has operating overloading, dynamic dispatch (although I think this feature is being deprecated), various types of macros, generics, and a whole lot of other features. Nim has everything but the kitchen sink, which can be both good or bad, depending on your perspective. Nim compiles down to C code and that lets you interface with C and C++ libraries in a very native way. If someone likes programming in C but likes the syntax of Python, they'll love Nim (imo). Nim also lets you write Nim code but transpile Nim to Javascript, so it's an alternative to Typescript in some ways. Like I said, everything but the kitchen sink.
Nim's LSP is great and editor tooling is good. The testing framework is only so and so. The package manager in Nim leaves a lot to be desired. The Nim community is well established and big, but without hard data, I wouldn't say it is growing all that much. It's pretty much the same community members from 2-3 years ago that are all doing amazing work, with the addition of a few folk.
Zig is more barebones. It uses LLVM to generate machine code but a couple of backends are in the works as I understand it. It has compile time execution instead of macros, and generics are just compile time features. Zig is a lot like C in that it is simple in its feature set. For example there's no operator overloading. Which means when you read Zig, you kind of know exactly what the program is going to do. It also means code can be very verbose (especially math-y stuff). Try doing complex number arthimetic or 2-D vector calculations and the code is as verbose and ugly as C (imo). Some people will say that this code shows exactly what is going on but (again imo) it is unnecessarily verbose. If people could opt-in to operator overloading somehow it would make Zig really neat for math. I can see Zig being used for web servers, although if it segfaults because of the manual memory management it could be bad. But really the usecase for Zig is bare metal work, maybe software that needs to perform a bunch of work on data. Zig has a unique way of transforming array of structs to struct of arrays, so you get lots of speed improvements while still writing your code in an ergonomic fashion. Zig in a rather unique twist is a better C / C++ compiler than GCC or LLVM. So if you are interested in compiling a C program, you can use Zig to do that. I think Zig is a better alternative to CMake than anything else out there.
I can't speak to using testing in Zig, and I don't believe there's even a package manager at this point. There's very few libraries for doing stuff in Zig but it is growing.
I think a good way to get a sense of the community is to look for conference talks on YouTube or on HackerNews for a language. Nim has about 10 talks a year. Rust will have 30 talks roughly. Zig usually is like 5 talks, and one of them is almost always the creator of the language. Take that for what you like.
Both are great languages and I've had fun trying them out! They unfortunately don't fit my work requirements and are not personally interesting to me.
nim really should be more interesting for folks who using python for high level and c for low-level, I was very interested in it as I do both python and C, but it somehow was just not that popular, at least my boss will never buy any idea of using it in production.
I really like how when Andrew makes a decision about something related to Zig, he outlines how other programming languages do it and gives his thoughts.
My question is. I feel like Zig is trying to do a lot of things that GO set out to do. To reduce a lot of complexity of programs by removing hidden control flow, macros etc. But how will Zig keep itself from repeating the mistakes GO made that make people dislike it?
Zig doesn't offer garbage collection. And also no Rust's complex memory tracking. So it's doesn't really free you from memory-related bugs, just like good-old C. But it's a "better C" IMHO.
They're saying the lack of GC is an advantage over golang, not that it's a deficiency.
Or if you're suggesting a GC to solve the memory unsafety, a GC only solves leaks and doesn't do anything for use-after-free or simultaneous unexpected mutations.
I haven't used Zig at all, but doesn't the fact that Zig supports from-what-I-understand pretty powerful metaprogramming facilities already put it way ahead of Go in that regard?
I have some interesting news for you... Go is a smashing success, wildly popular, and eating Java's lunch. It is an objectively incorrect generalization to say that people dislike Go.
Whoa, slow down there :) Go is certainly a success in absolute terms, making it to the top ten in some rankings, but at age 13 -- an age by which virtually all languages have either reached or neared their all-time peak with only a single exception I can think of -- it's only ~1/10th as popular as Java [1][2], and not eating nearly as much of its lunch as PHP or Ruby did back in the day.
Agreed, go has kept me liking programming because, for me, it most the most reliable tool for making trustworthy software. All langs have tradeoffs and areas they excel in but go brings me the most joy/motivation.
Its nice to see a language with similar philsophies tackle the space where go isn't as good at: when you don't want a GC and you need to interface with C. For many that is Rust and I respect it, but I think Rust values concurrency "safety" too high and makes too many compromises in language design to achieve it. Memory safety is a BIG deal to me, without it, sufficiently complex software has never ending CVE's but concurrency bugs just doesn't cause anywhere close to the same number of security problems (orders of magnitude) and crashing programs is fine in most applications and I find concurrency bugs usually are easy to fix early in application lifecycle.
Nonsense. I love it and I've been programming since the 80's.
I watched (and used) C++ as it grew into the monstrosity it is today. I've written and maintained production code in F#, C#, Python, Ruby, Perl, Java, JavaScript, Go, PHP, Lua, VBScript, Visual Basic, C, and C++ and every variant of shell scripting imaginable.
I've spent time working with Erlang, Haskell, Rust and a variety of other exotic languages because I found it interesting. I created a port of Clojure's Transducers to C# because I could.
I am not afraid of abstractions, functional programming, or complicated CompSci concepts. And yet I keep going back to Go.
"You idiot! I hate icecream and think everyone who likes icecream is a raging moron who deserves having their teeth rot out of their head! I wish their teeth would rot out tomorrow because I am so sick of waiting for it!"
I like Go as a language in the abstract sense, but the experience of writing Go code is so tedious to me. Refusing to compile if there's an unused variable or unused import is too harsh - at least give me the option to make these things warnings.
This is true, I don't really like Go. After years of writing code in Assembly, C, C++, Perl, Basic, C#, Java, Python, Rust, JavaScript, TypeScript, Lua, Zig, and Go I find that what I really like is good tooling, and code that is easy to read and reason about. Go the language and toolset happen to do this really rather well at the moment though.
Assuming you are genuinely not aware of language implementation processes, this is called bootstrapping. You may want to implement a language A in A (a very common goal!) but this is generally impossible when you don't have a compiler for A, so you first write an implementation of A in an already implemented language B, and then use that implementation to write an A implementation in A, abandoning the initial implementation at the end. Zig just did this.
Cross-compilers are about target platforms (e.g. producing a macOS executable from Windows). We are talking about implementing a language in a different language. (The word "bootstrap" is very general and can be used for both cases.)
The main use case here is not as much building Zig for an new architecture, as it is a way to let contributors build latest Zig trivially.
Without a bootstrap process like this one, it could happen that you run git pull and then can't build the compiler because it's using a new feature that the version of the compiler you have doesn't support yet.
The wasm process ensures that you can always build the latest commit.
If you want to have a Zig compiler written in Zig, you need to bootstrap it once initially. Cross compilation makes it so you don't need to do that again when you want the compiler to work on a different architecture. Of course, there's the question of why you want to self-host your compiler (instead of keeping the C++ one):
- dogfooding identifies problems & helps prioritizing
- it demonstrates that the language is suitable for serious projects
- most importantly, Zig developers prefer writing Zig over having to use C++
gcc bootstraps using itself. Rust bootstraps using itself. Go bootstraps using itself. D bootstraps using itself. Zig bootstraps using itself. It seems pretty common :)
The need for bootstrapping in this context comes from the lack of a compiler for the language you want to implement, as in before the Zig compiler in C++ there was no Zig compiler.
I think you need to be a bit more humble on a topic that's clearly going over your head.
Imagine a scenario where you are testing out a brand new RISC-V development board. The vendor has only provided a C compiler for this board as is often the case. You want to be able to use the zig language to write programs for your new development board but no zig compiler exists for this board yet. That means you need to compile the zig compiler from source. The latest version of the zig compiler is written in zig. Again you don't have a zig compiler so how will you compile the zig compiler from source? You need a way to go from C compiler to Zig compiler. That's what this is describing. It does not make sense to maintain two completely separate versions of the compiler. The "real" one written in Zig and the "bootstrap" one for compiling from C. So the zig source is compiled into WASM and stored in the repo. On a system with only a C compiler this WASM can be ran instead of a native zig binary. The WASM version can then be used to compile an actual native zig binary.
They are compiling the current compiler to wasm and then using that compiler to build future versions of their compiler.
In other words, they are doing the described rust approach but instead using a platform agnostic target instead of doing a binary. That allows them to build on any platform that has a C compiler and to use current language features without needing manually backport them.
They could directly target C or C++ but that runs a greater risk of accidentally generating UB. Targeting a bytecode decreases that risk.
That's not "converting back", because by going through WebAssembly you have restricted the language. As long as you have a correct wasm-to-C implementation this is a valid strategy to finish the bootstrap---you no longer depend on C, just WebAssembly.
I'm not the parent commenter, but I will be honest -- I fail to understand the purpose of bootstrapping low-level languages. Like, if you just were to given a task to write a compiler, would you honestly choose Zig? No. Then why don't have the compiler be written in Haskell or whatever high-level language where writing code is actually productive and not error prone, since it is not a performance-critical application?
The Zig developers would strongly disagree that a compiler is not a performance-critical application, and they would also probably disagree that Zig doesn't bring anything to the table when it comes to writing compilers.
As a generalization, the people who are motivated enough to work on a language, want to use that language. It's only natural that they would want to write their compiler in that language too, if practical. Contributors to the Zig project would on average probably be more proficient and productive in Zig than they would be in a language they don't care about so much.
It's also just helpful to have the people who are designing the language working in that language regularly in the context of a sizable and nontrivial project.
> The Zig developers would strongly disagree that a compiler is not a performance-critical application,
Have they written am incremental compiler then? Or just an old-fashioned slow batch one? Compiler architecture matters much more than whether it's runtime has a GC or not.
I understood your original comment to imply that you disagree with their choice of self-hosting the Zig compiler because they should instead have focused on architectural improvements, in which case I disagree because the two things are completely orthogonal in my mind—the benefits of self-hosting have little to do with performance, and certainly don't come at the cost of it. I apologize if that's not what you were implying.
> This is a thinly veiled ad hominem.
I never personally attacked you, so I'm not sure why you think this is ad hominem. That's a very uncharitable interpretation of my comment.
> Like, if you just were to given a task to write a compiler, would you honestly choose Zig? No.
Actually, yes.
A compiler is not just a dumb filter that eats text and spits out machine code. It can provide infrastructure to other tools (like linters, analyzers, LSP servers...), and even allow importing and using parts of it in user programs.
Some of this can't be done in a different language (like Haskell) without some very crazy FFI, and might add an extra runtime dependency for those tools, which might not always be desirable.
Thank you for this comment, I only wish it was at a higher level up in this thread.
The number of ignorant comments in this thread is astounding. All this criticism feels like it was written by back-seat drivers who have no clue about the complexities of language design or compiler implementation.
Think of porting to a new CPU architecture. If you have your compiler written in its own language, then when you add support for the new CPU target you can compiler your compiler using your compiler and target the new CPU. Now you have a compiler that not only produces code for the new CPU, but you have one that also runs on that CPU.
The alternative would be to port your Haskell compiler to the new CPU too in order to set up a self hosting toolchain. Much more work involved, because you not only have to be proficient in Haskell, but you need to have Haskell compiler implementation skills in addition to your own compiler.
Well, this might be a valid reason, especially given that embedded is important for Zig’s target domain. (Though, is there all that many new architectures nowadays?)
As far as I can tell, the main reason we all spend so much time waiting for compilers is that compilers aren't considered as performance-critical as they should be.
My full-time job is making a compiler for a high-level language, and I only considered systems languages (e.g. Zig, Rust) as contenders for what to write it in - solely because compiler performance is so critical to the experience of using the compiler.
In our case, since the compiler is for a high-level language, we plan never to self-host, because that would slow it down.
To me, it seems clear that taking performance very seriously, including language choice, is the best path to delivering the fastest feedback loop for a compiler's users.
If I were to give a bad faith argument, I would say that Rust’s compiler, while being written in Rust is not famous for its speed (but I do know that it is due to it doing more work).
I honestly fail to see why would a lower level language be faster, especially that compilers are notorious for their non-standard allocation and life cycle patterns, so a GC might actually be faster here.
> I honestly fail to see why would a lower level language be faster, especially that compilers are notorious for their non-standard allocation and life cycle patterns, so a GC might actually be faster here.
The nonstandard allocation and lifecycle patterns are a major part of the reason I want a systems language and not a GC - it means I have strictly more control over when allocations happen, I can do cheap phase-oriented allocations and deallocations with arenas, etc.
Rust's compiler is an interesting example. It was originally implemented in OCaml (which has a reputation for being a GC'd language with good runtime performance), and then rewrote to Rust in order to self-host - and got faster. In contrast, the Go team rewrote from a systems language to Go (which also has a good reputation for runtime performance), again in order to self-host, and it got slower.
Rewrites are a different beast, I doubt they are fairly comparable. They probably have realized some better abstractions now that eases implementation, and may thus also boost performance. Also, Go’s gc has never been considered “good”. OCaml also just recently got multitask support, didn’t it?
Nonstandard lifetimes are not really helped with arena allocators though, and not everything is needed in each phase, or is there that divided phases at all. But you may be right, I honestly can’t tell with certainty.
The slow part of Rust compiles is LLVM, though some of it may be due to bloated IR input that's a frontend concern. There's an alternate cranelift-based backend that's usable already if runtime efficiency is not your priority.
LLVM's fast path (used by clang -O0) is fast. Rust's primary problem is that it can't use LLVM's fast path (because it implements only a subset of LLVM IR) and LLVM upstream is uninterested in extending it because it will slowdown clang -O0.
Yes. I guess FastISel still isn't fast enough to overcome larger compilation unit, but isn't it a substantial improvement over the default code generator?
I'm sorry, I haven't taken any measurements to find out the answer to this question. I would be curious to hear about how much it affected Rust builds if you explore this. I remember I spent an afternoon trying to enable FastISel in order to speed up LLVM, only to realize that we had been using it all along.
> Like, if you just were to given a task to write a compiler, would you honestly choose Zig? No.
Why wouldn't you? I understand why not for high level languages like Python or Ruby (since they're interpreted) but not for low level ones. Rust for example is also bootstrapped.
Compilation can be very intensive, and it's detrimental to a developer's workflow if they must wait for long recompiles.
Rust was originally written in OCaml before being self-hosted, and it wouldn't be as fast (or would be even slower ;) ) today if it was still OCaml.
And remember, low-level =/= poor abstractions. I think there are several novel abstractions available in Zig which the compiler devs probably want to make use of themselves.
They might have good language abstractions, but manual memory management is simply an orthogonal implementation detail to solving a problem — dealing with that is simply more work and more leaky abstractions.
[ as someone who does not work in language design ] - it does feel sometimes like this achievement is more a source of pride than a hard requirement. A sort of symbolic (no pun) closing of the circle.
Is there a reason why keeping a compiler in, say, C would be a bad idea long-term?
I'm of course not Andrew Kelley ;-), but I think it's strategic. C++ would be indeed less productive than Haskell when you write a language implementation, but if the target audience---at least initially---already knows C++ then writing a compiler in suboptimal languages may help. I know this is not universally applicable; for example Rust bootstrapped from OCaml, but it has achieved self-hosting very early in its life (and back when the goal of the language was not yet certain), so that might have been also strategic.
Writing the initial compiler in C++ is a rather surprising choice. My time would be too precious for that. OCaml was a good choice from the Rust devs and had a nice influence on the language. Rust has proper algebraic data types and pattern matching.
They wrote it in C++, because llvm is in C++. Currently the only critical parts of the Zig compiler that are in C++ are the bindings to llvm and I think stuff for linking clang.
Compilers are performance critical in the sense that you wouldn't want to wait 15 minutes to an hour for your code to build. Consider the number of layers of cache applied to compilation pipelines along with distributed builds to speed it up in places where they have millons of lines of code (e.g game studios) along with turning off linker features which slow things down.
You'd want faster compilation so that you can test your changes without it breaking your flow where having to wait a 1-5 minutes means you'll end up reading HN or checking chat for 10 or so minutes. That's also why there's interest in hot-code reloading and incremental linking to make it faster as it will further reduce compilation to just the changes you've made and nothing more.
There is no significant difference between managed languages and something like zig in this category of programs, it is not a video codec, so I stand by my “no performance sensitive” claim. And especially because algorithms matters much much more, there is a good chance that a faster one can be implemented when you don’t have to care about whether that memory location is still alive or not.
After on-disk caching, the biggest performance improvements I've seen to compilers have been from changing how data structures are allocated/deallocated and laid out in memory.
I haven't seen much opportunity to improve algorithm performance because which algorithms are applicable is heavily constrained by language design.
In my experience the difference between something like Zig and a language with managed memory is large.
Dogfooding is not always related to bootstrapping. In fact, this is a reasonably common pitfall for language designers, because language implementations are very specific and if you only tune your language for them your language will of course need ADTs and generics and value semantics and GCs. Not to say that those are great, but you may have missed a whole slew of possibilities by doing so! And that's why you can do dogfooding without bootstrapping---you can instead have other big enough software written in your language and coevolve with it. For example Rust did this with Servo.
Writing the compiler in a different language limits the language users' ability to contribute - they'll need to learn another language - and makes porting more complex since you'll need to port Haskell or whatever to the new platform. Dogfooding can also be an advantage.
Today’s managed languages are very fast. For example, if Java is not fast enough for your HFT algorithm, than nor is C++ or generic CPUs even! You have to go the custom chip route then. Where there is a significant difference between these categories is memory usage and predictability of performance. (In other applications, e.g. video codecs you will have to write assembly by hand in the hot loops, since here low-level languages are not low level enough). Since these concerns not apply to compilers, I don’t think that a significant performance difference would be observable between, say a java and zig implementation of a certain compiler.
WASM allows them to bootstrap Zig compiler on completely foreign systems that only have a C compiler. This solves the chicken-and-egg dilemma that occurs when one wants to use a language under development to develop the language itself.
It's not about code size or efficiency. They are talking about the ability to use the latest versions of Zig for further development of the Zig compiler. This improves and speeds up the compiler development cycle by removing the limitations imposed by the bootstrap process.
Once, good compilers were rare and expensive, and coding a compiler in your new language proved to people that it was adequate to implement a compiler.
Nowadays good compilers for everything are common and free. Coding a compiler for your language in your language demonstrates only that you could afford to waste the time, because literally every language suffices to code a compiler in.
> Coding a compiler for your language in your language demonstrates only that you could afford to waste the time
You seem to be presupposing this is true, but this is a pretty controversial claim. Do you have any justification for it?
As other people in this thread have mentioned, there are other benefits to programming your compiler using your own programming language:
- You learn your language (and its rough edges) first hand
- You can try out & experiment with new language features
- The authors of Zig clearly think zig is better to use than C++. (If not, why would they bother inventing zig?). It makes sense that if you think Zig is a better language than C++, you'd want to write your compiler in Zig and not C++.
Lots of compiler authors (you know, the people who get a vote) disagree with you on this: Clang, VC++, etc are written in C++. Rustc is written in rust. The main zig compiler is written in zig. DMD is written in D. And so on.
And anyway, whats so great about C++ that it would make a good language to write compilers in in the first place? Why shouldn't I write my compiler in plain C, or Rust, or Zig, or Go, or whatever strikes my fancy?
You seem to miss, again, that there was already a complete Zig compiler coded in C++. Coding another one, in Zig, was strictly extra work.
People can write whatever they like in whatever language they like. We need not invent spurious reasons. Surely the creator of Zig already knows Zig; coding a compiler in it teaches him nothing new. He could experiment with draft language features in any program at all: nothing about the task of compiling Zig code is better for that purpose than myriad others.
Obviously the second compiler for Zig was done because the author wanted to. But having done so demonstrates nothing meaningful about Zig. It only tells us its author took time to code what we know, with utter certainty, was a strictly unnecessary program, instead of doing something actually useful.
Doing stuff that is not useful is allowed. We need not pretend otherwise, or make up excuses.
The new compiler was not a direct port of the old one. It explored fundamentally different design decisions. The new compiler is a better piece of engineering than the old one. It wasn't strictly extra work. In order to achieve our current feature set and performance characteristics, such changes would have either needed to be done on the old codebase or the new one. It's a pretty easy argument to make that the increased developer velocity from using Zig landed us further along, by this time today, than it would have if we would have invested in the C++ codebase instead. You can argue against that particular claim but what you can't meaningfully do is call the efforts useless.
You found a way to make coding a second compiler useful.
But it remains manifestly unnecessary, because everything done in the "second system" (cf.) could have been done with less effort by altering the first. Achieving it demonstrated nothing more about Zig than that there were people willing to take time to do it.
Arguments from "velocity" have, in particular, always been unconvincing. People code faster in a language they like, and credit the language, but other people are equally quick in other languages.
"Bother"? Motivation is interesting. Getting all excited over what language a compiler is coded in, far beyond any actually, independently useful program, is objectively weird.
C got its legs from a whole OS coded in it. The compiler was very incidental. I don't have any programs in Rust on my SSD. I have one Haskell program: pandoc. I have no Java, C#, Scala, Clojure, Kotlin, or Erlang programs. I have one in Dart I don't use anymore.
A substantial, useful program in Zig that would be too much work to re-implement in a mature language would do orders of magnitude more for Zig than a second compiler. Maybe a collection of pipewire modules, or matrix gateways?
Perceptions of velocity by language advocates have very, very little to do with actual velocity in industrial settings. The latter is what matters, but is actual work to measure. Anybody measuring seems not to be publishing.
We’re actually looking to support Zig as an authoring language for user generated content for Third Room - the spatial collaboration platform for Matrix. Not sure this would be a killer app like a Matrix server or gateway or something, but could help a bunch. https://thenewstack.io/third-room-teases-user-generated-cont... has more details.
Doesn't seem like it needs much support, as such, being able to use ".h" files directly.
You would not want anything that must be depended upon coded in it, in case it fizzles like almost all languages do, but that leaves huge scope for valued additions on top.
> You seem to miss, again, that there was already a complete Zig compiler coded in C++.
No, there wasn't. They had a half-implemented compiler in C++ which implemented just enough features to compile the proper zig compiler. It didn't have feature parity.
> Obviously the second compiler for Zig was done because the author wanted to. It only tells us its author took time to code ... a strictly unnecessary program, instead of doing something actually useful.
You think dogfooding isn't useful? Wow, thats a really bold claim. I know legions of product engineers who would line up to disagree with you. Dogfooding is tremendously useful.
Failing to dogfood creates an empathy gap with your audience. (You don't understand your own work as well!). If you don't immerse yourself in your own creative output (eg, rust developers writing rust code, musicians listening to their own music, etc) then your output is usually lower quality because you aren't training your taste. How do you know what design decisions are good and bad, or what to focus on, if you aren't using your own output?
And not dogfooding is a motivation killer on passion projects. Why build something for fun when you don't even get to enjoy the fruits of your own labor?
I suppose Andrew Kelly could have a second medium-to-large project that he uses Zig for in order to understand his users. And he could go back and forth between coding Zig in C++ and coding that other project in Zig. (I think Jonathan Blow does that with Jai.) But that sounds like a huge waste of effort when he could just be spending all his time writing the zig compiler in zig, which is what he wants to be doing anyway.
I'm not making a technical argument. V8 seems to work fine despite being written entirely in C++. But V8 is just programming to a spec someone else wrote. Programming language specs don't come out fully formed. They need to be thought through, experimented with and obsessively argued over.
How would Zig become a good language if the team that invents it doesn't even programming in it themselves?
This is about the language the compiler itself is coded in. It would be absurd to try to run a compiler on your ESP microcontroller just because that is where the program you are compiling is supposed to run. (That does not stop Forth fans, of course.)
People do often run compilers on RaspPis and the like (though rarely for any good reason), but those compilers are most often coded in C++, whatever the language it is compiling, just like Zig's original compiler.
Most of the Rust compiler is in C++, and they have no plan to change that.
> Most of the Rust compiler is in C++, and they have no plan to change that.
What? I just went and searched the Rust compiler, and the only C++ code to speak of was LLVM itself, and a thin wrapper around LLVM to provide a C API. Which is pretty much the same state that Zig is in right now. (Although in Zig's case, LLVM is optional)
Yeah, it's a lot of C++ that the Rust compiler relies on, but I don't think most people would really compare things like that.
RasPis and the like are literally more powerful than your typical desktop PC 20 years ago. They can run all kinds of compilers, interpreters etc just fine, including those that are bootstrapped.
Real embedded is more challenging, but even then... DOS ran on 640 Kb of RAM for userspace, and it had C and even C++ (albeit pre-ISO) compilers, and many embedded platforms these days come with something comparable. Forth is really only necessary if you want to bootstrap on really low-memory stuff.
That's exactly what i said, they regret writing their compiler in C++, and are too lazy to do a proper and optimized C rewrite, hence they go with WASM to then generate C code
Reason why it still is running poorly, because it is the same implementation, just went with optimized code gen steps, making the source non human readable
Just so you know, the reason your comment isn't productive (especially with the culture of this site) is that your are creating a projection of the developer's mind and motivations, that most likely isn't true (I doubt they are lazy, and I also doubt there is regrets there too).
It has generated a healthy discussion, that helped me get more insight on the matter, and will help future readers get a detailed insight on the matter too
Not everyone is born with knowledge, we learn by igniting discussions
The issue is mostly maintaining two diverging compilers as every feature needs to be implemented twice. Thus, by removing the C++ compiler the project gains from only needing to implement new features once and thus lowers the maintainance burden of keeping the two in sync.
It's not that they're too lazy to rewrite this in C but rather that it's impractical when there are still breaking features to be implemented.
Also the process is `zig -> wasm blob -> c -> zig` where they're not commiting C source code to the repository at all, it's just used as a stepping stone to compile the real compiler from the bootstrap compiler.
I don't follow the zig community particularly closely, but from what I can tell 'too lazy' is not an accurate description of many people that are part of it, and certainly not the compiler author(s).
> There is exactly one VM target available to Zig that is both OS-agnostic and subject to LLVM’s state-of-the-art optimization passes, and that is WebAssembly.
Honestly, sounds like old Java would also fit their requirements.
There was a time when multiple languages ran for multi-platform, which is eased nowadays with containers and remote developer environments. So if their main concern is multi-platform, then feels like they want to look at the technologies developed at that time.
> Honestly, sounds like old Java would also fit their requirements.
Yes, it would, as would any other VM target. That said WASM is extremely convenient because it's a target that LLVM supports and because writing a VM for it (or something that compiles it to C) is easy. Java from this perspective seems way less convenient, as it would require us first to build a Zig backend for it, and then we would have to implement our own java intrepreter / aot compiler / ... for it.
From my understanding, the JVM is much more opinionated than webassembly (it has built-in GC, and I've heard it even has a notion of classes and related concepts at the bytecode level). Particularly for a low-level C-like language like Zig, it seems like a pretty bad match.
We're talking about compiler, right? Sorry don't see why classes would be a bad match to compiling Zig code to LLVM representation. As long as we don't do lot of numbers crunching (where Fortran, R, Julia, MatLab shine), then structs/classes are ok.
Zig operates at a low level where it cares about things like manual memory-management. Compiling it to target the JVM instead of webassembly (assuming that's what you're suggesting) would be a really rough abstraction, because the JVM is higher-level. Webassembly is designed to accommodate lower-level languages adjacent to C that manage their own memory, etc
And that's not even mentioning the fact that (it sounds like) Zig's compiler already has an LLVM back-end, which means they get wasm support "for free"
Why not provide either hex editor or wasm as options. Let the user choose. The former is not trendy, it's time-tested and has no ties to a commercial entity or the online advertising "business". Whereas the later has only been around since 2015 and was introduced by a company that subsists off an agreement with a deviant online advertising company. Not to mention it targets "the web", which is only one use for computer programming, and one that is overwhelmingly under the control of a handful of large corporations.
The main user of this bootstrapping process are core contributors, normal users are still supposed to download prebuilt executables from the official website.
Distro maintainers also are not the target user of this bootstrapping process, since it involves a binary blob provided by us.
The real users of this procedure are Zig contributors, so that they can trivially build latest zig always, and without the annoyance of having to keep a C++ version of the compiler in sync with the main one. That's it.
> Whereas the later has only been around since 2015 and was created by a company that subsists off an agreement with a deviant online advertising company.
Mozilla created a precursor technology, but I thought Wasm was developed via the W3C standards process from the start. From the notes of the first meeting, you can see attendees from Adobe, Apple, Arm, Autodesk, Google, Intel, Mozilla, Stanford, and more.
The test suite has C code in it, because of course D can compile C code.