Having used zig a bit as a hobby. Why is it more ergonomic? Using await vs passing a token have similar ergonomics to me. The one thing you could say is that using some kind of token makes it dead simple to have different tokens. But that's really not something I run into often at all when using async.
> The one thing you could say is that using some kind of token makes it dead simple to have different tokens. But that's really not something I run into often at all when using async.
It's valuable to library authors who can now write code that's agnostic of the users' choice of runtime, while still being able to express that asynchronicity is possible for certain code paths.
But that can already be done using async await. If you write an async function in Rust for example you are free to call it with any async runtime you want.
Let me rephrase, you can't call it like any other function.
In Zig, a function that does IO can be called the same way whether or not it performs async operations or not. And if those async operations don't need concurrency (which Zig expresses separately to asynchronicity), then they'll run equally well on a sync Io runtime.
You will need to pass that for synchronous IO as well. All IO in the standard library is moving to the Io interface. Sync and async.
If I want to call a function that does asynchronous IO, I'll use:
foo(io, ...);
If I want to call one that does synchronous IO, I'll write:
foo(io, ...);
If I want to express that either one of the above can be run asynchronously if possible, I'll write:
io.async(foo, .{ io, ... });
If I want to express that it must be run concurrently, then I'll write:
try io.concurrent(foo, .{ io, ... });
Nowhere in the above do I distinguish whether or not foo does synchronous or asynchronous IO. I only mark that it does IO, by passing in a parameter of type std.Io.
What about it? It gets called without an Io parameter. Same way that a function that doesn't allocate doesn't get an allocator.
I feel like you're trying to set me up for a gotcha "see, zig does color functions because it distinguishes functions that do io and those that don't!".
And yes, that's true. Zig, at least Zig code using std, will mark functions that do Io with an Io parameter. But surely you can see how that will lead to less of a split in the ecosystem compared to sync and async rust?
This creates the drill-down issue we see with React props where we have to pass objects around in the call chain just so that somewhere down the line we can use it.
React gets around this with the context hook and which you can access implicitly if it has been injected at a higher level.
Do you know if Zig supports something of the sort?
I think (and I’m not a Zig user at anything above a hobbyist level) based on what the developers have discussed publically:
React has a ‘roughly’ functional slant to the way it does things and so needs to provide a special case ‘hook’ for a certain type of context object. Zig however is an imperative language that allows for global state (and mutable global state for that matter), which means that there is always a way to access global variable, no hook required. On the other hand, I am relatively certain (almost 100% to be honest) there can not be a context/IO , or any data/variable, passed into a function higher up the call stack and have that propagate to the lower level via implicit inclusion.
I think the view that it’s a non-issue comes down to familiarity via language usage. I am on the ‘everything explicit all the time’ team and see no issues with Allocator, or the proposed IO mechanism. But, programmers coming from other languages, particularly those with an expectation of implicitness being a semantic and syntactic feature can not envision programming without all of the alleged time saving/ergonomic ‘benefits’.
I have had multiple ‘arguments’ about the reasoning advantages, complete lack of time loss (over any useful timeframe for comparison), and long-term maintenance benefits of explicitness in language design. I have never convinced a single ‘implicit team’ dev that I’m right. Oh well, I will keep doing what I do and be fine and will support in whatever ways I can languages and language development that prioritizes explicitness.
Well it's not a "problem" in the sense that it's a blocker. But it's also not an improvement over standard async await in other languages. Which is not bad, don't get me wrong.
> This creates the drill-down issue we see with React props where we have to pass objects around in the call chain just so that somewhere down the line we can use it.
Oh dear God. That's hell.
Refactoring and plumbing code to change where io happens is going to be a nightmare.
Cannot start a runtime from within a runtime. This happens because a function (like `block_on`) attempted to block the current thread while the thread is being used to drive asynchronous tasks.
Right, because this would deadlock. But it seems like Zig would have the same issue. If I am running something in a evented IO system and then I try and do some blocking IO inside it then I will get a deadlock. The idea that you can write libraries that are agnostic to the asynchronous runtime seems fanciful to me beyond trivial examples.
But that's the thing, idiomatic Rust sync code almost never passes around handles, even when they need to do I/O.
You might be different, and you might start doing that in your code, but almost none of either std or 3rd party libraries will cooperate with you.
The difference with Zig is not in its capabilities, but rather in how the ecosystem around its stdlib is built.
The equivalent in Rust would be if almost all I/O functions in std would be async; granted that would be far too expensive and disruptive given how async works.
But they use I/O inside, and we arrive at this issue:
I'm writing async, and I need to call std::fs::read. I can't, because it blocks the thread; I could use spawn_blocking but that defeats the purpose of async. So instead I have to go look for a similar function but of the other color, probably from tokio.
In Zig, if you're writing sync, you call the standard library function for reading files. If you're writing async, you call the same library function for reading files. Then, the creator of the `io` object decides whether the whole thing will be sync or async.
Making it dead simple to have different tokens is exactly the goal. A smattering of examples recently on my mind:
As a background, you might ask why you need different runtimes ever. Why not just make everything async and be done with it, especially if the language is able to hide that complexity?
1. In the context of a systems language that's not an option. You might be writing an OS, embedded code, a game with atypical performance demands requiring more care with the IO, some kernel-bypass shenanigan, etc. Even just selecting between a few builtin choices (like single-threaded async vs multi-threaded async vs single-threaded sync) doesn't provide enough flexibility for the range of programs you're trying to allow a user to write.
2. Similarly, even initializing a truly arbitrary IO effect once at compile-time doesn't always suffice. Maybe you normally want a multi-threaded solution but need more care with respect to concurrency in some critical section and need to swap in a different IO. Maybe you normally get to interact with the normal internet but have a mode/section/interface/etc where you need to send messages through stranger networking conditions (20s ping, 99% packet loss, 0.1kbps upload on the far side, custom hardware, etc). Maybe some part of your application needs bounded latency and is fine dropping packets but some other part needs high throughput and no dropped packets at any latency cost. Maybe your disk hardware is such that it makes sense for networking to be async and disk to be sync. And so on. You can potentially work around that in a world with a single IO implementation if you can hack around it with different compilation units or something, but it gets complicated.
Part of the answer then is that you need (or really want) something equivalent to different IO runtimes, hot-swappable for each function call. I gave some high-level ideas as to why that might be the case, but high-level observations often don't resonate, so let's look at a concrete case where `await` is less ergonomic:
1. Take something like TLS as an example (stdlib or 3rd-party, doesn't really matter). The handshake code is complicated, so a normal implementation calls into an IO abstraction layer and physically does reads and writes (as opposed to, e.g., a pure state-machine implementation which returns some metadata about which action to perform next -- I hacked together a terrible version of that at one point [0] if you want to see what I mean). What if you want to run it on an embedded device? If it were written with async it would likely have enough other baggage that it wouldn't fit or otherwise wouldn't work. What if you want to hide your transmission in other data to sneak it past prying eyes (steganography, nowadays that's relatively easy to do via LLMs interestingly enough, and you can embed arbitrary data in messages which are human-readable and purport to discuss completely other things without exposing hi/lo-bit patterns or other such things that normally break steganography)? Then the kernel socket abstraction doesn't work at all, and "just using await" doesn't fix the problem. Basically, any place you want to use that library (and, arguably, that's the sort of code where you should absolutely use a library rather than rolling it yourself), if the implementer had a "just use await" mentality then you're SOL if you need to use it in literally any other context.
I was going to write more concrete cases, but this comment is getting to be too long. The general observation is that "just use await" hinders code re-use. If you're writing code for your own consumption and also never need those other uses then it's a non-issue, but with a clever choice of abstraction it _might_ be possible (old Zig had a solution that didn't quite hit the mark IMO, and time will tell if this one is good enough, but I'm optimistic) to enable the IO code people naturally write to be appropriately generic by default and thus empower future developers via a more composable set of primitives.
They really nailed that with the allocator interface, and if this works then my only real concern is a generic "what next" -- it's pushing toward an effect system, but integrating those with a systems language is mostly an unsolved problem, and adding a 3rd, 4th, etc explicit parameter to nearly every function is going to get unwieldy in a hurry (back-of-the-envelope idea I've had stewing if I ever write a whole "major" language is to basically do what Zig currently does and pack all those "effects" into a single effect parameter that you pass into each function, still allowing you to customize each function call, still allowing you to inspect which functions require allocators or whatever, but making the experience more pleasant if you have a little syntactic sugar around sub-effects and if the parent type class is comptime-known).
The case I'm making is not that different Io context are good. The point I'm making is that mixing them is almost never what is needed. I have seen valid cases that do it, but it's not in the "used all the time" path. So I'm more then happy with the better ergonomics of traditional async await in the style of Rust , that sacrifices super easy runtime switching. Because the former is used thousands of times more.
If I'm understanding correctly (that most code and/or most code you personally write doesn't need that flexibility) then that's a valid use case.
In practice it should just be a po-tay-to/po-tah-to scenario, swapping around a few symbols and keywords vs calls to functions with names similar to those keywords. If that's all you're doing then passing around something like IO (or, depending on your app, just storing one once globally and not bothering to adhere to the convention of passing it around) is not actually more ergonomic than the alternative. It's not worse (give or take a bunch of bike-shedding on a few characters here and there), but it's not better either.
Things get more intriguing when you consider that most nontrivial projects have _something_ interesting going on. As soon as your language/framework/runtime/etc makes one-way-door assumptions about your use case, you're definitionally unable to handle those interesting things within the confines of the walls you've built.
Maybe .NET Framework has an unavoidable memory leak under certain usage patterns forcing you to completely circumvent their dependency-injection code in your app. Maybe your GraphQL library has constrained socket assumptions forcing you to re-write a thousand lines of entrypoint code into the library (or, worse, re-write the entire library). Maybe the stdlib doesn't have enough flexibility to accomodate your atypical IO use-case.
In any one app you're perhaps not incredibly likely to see that with IO in particular (an off-the-cuff guesstimate says that for apps needing _something_ interesting you'll need IO to be more flexible 30% of the time). However, when working in a language/framework/runtime/etc which makes one-way-door assumptions frequently, you _are_ very likely to find yourself having to hack around deficiencies of some form. Making IO more robust is just one of many choices enabling people to write the software they want to write. When asking why an argument-based IO is more ergonomic, it's precisely because it satisfies those sorts of use cases. If you literally never need them (even transitively) then maybe actually you don't care, but a lot of people do still want that, and even more people want a language which "just works" in any scenario they might find themselves in, including when handling those sorts of issues.
===
Rust async rant starts here
===
You also called out Rust's async/await as having good ergonomics as a contrast against TFA, and ... I think it's worth making this comment much longer to talk about that?
(1) Suppose your goal is to write a vanilla application doing IO stuff. You're forced to use Tokio and learn more than you want about the impact of static lifetimes and other Rust shenanigans, else you're forced to ignore most of the ecosystem (function coloring, yada yada). Those are workable constraints, but they're not exactly a paragon of a good developer experience. You're either forced to learn stuff you don't care about, or you're forced to write stuff you don't think you should have to write. The lack of composability of async Rust as it's usually practiced is common knowledge and one of the most popularly talked about pain points of the language.
(2) Suppose your goal is to write a vanilla _async_ application doing IO stuff. At least now something like Tokio makes sense in your vision, but it's still not exactly easy. The particular implementation of async used by Tokio forces a litany of undesirable traits and lifetime issues into your application code. That code is hard to write. Moreover, the issues aren't really Rust-specific. Rust surfaces those issues early in the development cycle, but the problem is that Tokio has a lot of assumptions about your code which must be satisfied for it to work correctly, and equivalent libraries (and ecosystem problems) in other langugages will make those same assumptions and require the same kinds of code modifications from you, the end user. Contrasted with, e.g., Python's model of single-threaded async "just working" (or C#'s or something if you prefer multi-threaded stuff and ignore the syntactic sharp edges), a Tokio-style development process is brutally difficult and arguably not worth the squeeze if you also don't have the flexbility to do the async things your application actually demands. Just write golang greenthreads and move on with your life.
(3) Suppose your goal is something more complicated. You're totally fucked. That capability isn't exposed to you (it's exposed a little, but you have to write every fucking thing yourself, removing one of the major appeals of choosing a popular language).
I get that Zig is verbose and doesn't appeal to everyone, and I really don't want to turn this into Rust vs Zig, but Rust's async is one of the worst parts of the language and one of the worst async implementations I've ever seen anywhere. I don't have a lot of comment on TFA's implementation (seems reasonable, but I might change my mind after I try using it for awhile), but I'm shocked reading that Rust has a good async model. What am I missing?
In context (embedded programming, which in retrospect is still too big of a field for this comment to make sense by itself; what I meant was embedded programming on devices with very limited RAM or other such significant restrictions), "baggage" is the fact that you don't have many options when converting async high-level code into low-level machine code. The two normal things people write into their languages/compilers/whatever (the first being much more popular, and there do exist more than just these two options) are:
1. Your async/await syntax desugars to a state machine. The set of possible states might only be runtime-known (JS, Python), or it might be comptime-known (Rust, old-Zig, arguably new-Zig if you squint a bit). The concrete value representing the current state of that state machine is only runtime-known, and you have some sort of driver (often called an "event loop", but there are other abstractions) managing state transitions.
2. You restrict the capabilities of async/await to just those which you're able to statically (compile-time) analyze, and you require the driver (the "event loop") to be compile-time known so that you're able to desugar what looks like an async program to the programmer into a completely static, synchronous program.
On sufficiently resource-constrained devices, both of those are unworkable.
In the case of (1) (by far the most common approach, and the thing I had in mind when arguing that async has potential issues for embedded programming), you waste RAM/ROM on a more complicated program involving state machines, you waste RAM/ROM on the driver code, you waste RAM on the runtime-known states in those state machines, and you waste RAM on the runtime-known boxing of events you intend to run later. The same program (especially in an embedded context where programs tend to be simpler) can easily be written by a skilled developer in a way which avoids that overhead, but reaching for async/await from the start can prevent you from reaching your goals for the project. It's that RAM/ROM/CPU overhead that I'm talking about in the word "baggage."
In the case of (2), there are a couple potential flaws. One is just that not all reasonable programs can be represented that way (it's the same flaw with pure, non-unsafe Rust and with attempts to create languages which are known to terminate), so the technique might literally not work for your project. A second is that the compiler's interpretation of the particular control flow and jumps you want to execute will often differ from the high-level plan you had in mind, potentially creating more physical bytecode or other issues. Details matter in constrained environments.
That makes sense. I don't know anything about embedded programming really but I thought that it really fundamentally requires async (in the conceptual sense). So you have to structure your program as an event loop no matter what. Wasn't the alleged goal of rust async to be zero-cost in the sense that the program transformation of a future ends up being roughly what you would write by hand if you have to hand-roll a state machine? Of course the runtime itself requires a runtime and I get why something like Tokio would be a non-started in embedded environments, but you can still hand-roll the core runtime and structure the rest of the code with async/await right? Or are you saying that the generated code even without the runtime is too heavy for an embedded environment?
> fundamentally requires async (in the conceptual sense)
Sometimes, kind of. For some counter-examples, consider a security camera or a thermostat. In the former you run in a hot loop because it's more efficient when you constantly have stuff to do, and in the latter you run in a hot loop (details apply for power-efficiency reasons, but none which are substantially improved by async) since the timing constraints are loose enough that you have no benefit from async. One might argue that those are still "conceptually" async, but I think that misses the mark. For the camera, for example, a mental model of "process all the frames, maybe pausing for a bit if you must" is going to give you much better results when modeling that domain and figuring out how to add in other features (between those two choices of code models, the async one buys you less "optionality" and is more likely to hamstring your business).
> zero-cost
IMO this is a big misnomer, especially when applied to abstractions like async. I'll defer async till a later bullet point, looking instead at simpler abstractions.
The "big" observation is that optimization is hard, especially as information gets stripped away. Doing it perfectly seemingly has an exponential cost (active research problem to reduce those bounds, or even to reduce constant factors). Doing it approximately isn't "zero"-cost.
With perfect optimization being impossible for all intents and purposes, you're left with a world where equivalent units of code don't have the same generated instructions. I.e., the initial flavor of your code biases the generated instructions one way or another. One way of writing high-performance code then is to choose initial representations which are closer to what the optimizer will want to work with (basically, you're doing some of the optimization yourself and relying on the compiler to not screw it up too much -- which it mostly won't (there be dragons here, but as an approximate rule of thumb) because it can't search too far from the initial state you present to it).
Another framing of that is that if you start with one of many possible representations of the code you want to write, it has a low probability of giving the compiler the information it needs to actually optimize it.
Let's look at iterators for a second. The thing that's being eliminated with "zero-cost" iterators is logical instructions. Suppose you're applying a set of maps to an initial sequence. A purely runtime solution (if "greedy" and not using any sort of builder pattern) like you would normally see in JS or Python would have explicit "end of data" checks for every single map you're applying, increasing the runtime with all the extra operations existing to support the iterator API for each of those maps.
Contrast that with Rust's implementation (or similar in many other languages, including Zig -- "zero-cost" iterators are a fun thing that a lot of programmers like to write even when not provided natively by the language). Rust recognizes at compile-time that applying a set of maps to a sequence can be re-written as `for x in input: f0(f1(f2(...(x))))`. The `for x in input` thing is the only part which actually handles bounds-checking/termination-checking/etc. From there all the maps are inlined and just create optimal assembly. The overhead from iteration is removed, so the abstraction of iteration is zero-cost.
Except it's not, at least not for a definition of "zero-cost" the programmer likely cares about (I have similar qualms about safe Rust being "free of data-races", but those are more esoteric and less likely to come up in your normal day-to-day). It's almost always strictly better than nested, dynamic "end of iterator" checks, but it's not actually zero-cost.
Taking as an example something that came up somewhat recently for me, math over fields like GF(2*16) can be ... interesting. It's not that complicated, but it takes a reasonable number of instructions (and/or memory accesses). I understand that's not an every-day concern for most people, but the result will illustrate a more general point which does apply. Your CPU's resources (execution units, instruction cache, branch-prediction cache (at several hierarchial layers), etc) are bounded. Details vary, but when iterating over an array of data and applying a bunch of functions, even when none of that is vectorizable, you very often don't want codegen with that shape. You instead want to pop a few elements, apply the first function to those elements, apply the second function to those results, etc, and then proceed with the next batch once you've finished the first. The problems you're avoiding include data dependencies (it's common for throughput for an instruction to be 1-2/cycle but for latency to be 2-4 cycles, meaning that if one instruction depends on another's output it'll have to wait 2-4 cycles when it could in theory otherwise process that data in 0.5-1 cycles) and bursting your pipeline depth (your CPU can automagically resolve those data dependencies if you don't have too many instructions per loop iteration, but writing out the code explicitly guarantees that the CPU will _always_ be happy).
BUT, your compiler often won't do that sort of analysis and fix your code's shortcomings. If that approximate layout of instructions doesn't exist in your code explicitly then the optimizer won't solve for it. The difference in performance is absolutely massive when those scenarios crop up (often 4-8x). The "zero-cost" iterator API won't yield that better codegen, since it has an output that the optimizer can't effectively turn into that better solution (yet -- polyhedral models solve some similar problems, and that might be something that gets incorporated in modern optimizers eventually -- but it doesn't exist yet, it's very hard, and it's illustrative of the idea that optimizers can't solve all your woes; when that one is fixed there will still exist plenty more).
> zero-cost async
Another pitfall of "zero-cost" is that all it promises is that the generated code is the same as what you would have written by hand. We saw in the iterator model that "would have written" doesn't quite align between the programmer and the compiler, but it's more obvious in their async abstraction. Internally, Rust models async with state machines. More importantly, those all have runtime-known states.
You asked about hand-rolling the runtime to avoid Tokio in an embedded environment. That's a good start, but it's not enough (it _might_ be; "embedded" nowadays includes machines faster than some desktops from the 90s; but let's assume we're working in one of the more resource-constrained subsets of "embedded" programming). The problem is that the abstraction the compiler assumes we're going to need is much more complicated than an optimal solution given the requirements we actually have. Moreover, the compiler doesn't know those requirements and almost certainly couldn't codegen its assumptions into our optimal solution even if it had them. If you use Rust async/await, with very few exceptions, you're going to end up with both a nontrivial runtime (might be very light, but still nontrivial in an embedded sense), and also a huge amount of bloat on all your async definitions (along with runtime bloat (RAM+CPU) as you navigate that unnecessary abstraction layer).
The compiler definitely can't strip away the runtime completely, at least for nontrivial programs. For sufficiently simple programs it does a pretty good job (you still might not be able to afford supporting the explicit state machines it leaves behind, but whatever, most machines aren't _that_ small), but past a certain complexity level we're back to the idea of zero-cost abstractions not being real because of optimization impossibility, when you use most of the features you might want to use with async/await you find that the compiler can't fully desugar even very simple programs, and fully dynamic async (by definition) obviously can't exist without a runtime.
So, answering your question a bit more directly, my answer is that you usually can't fix the issue by hand-rolling the core runtime since it won't be abstracted away (resulting in high RAM/ROM/CPU costs), and even in sufficiently carefully constructed and simple code that it will be abstracted away you're still left with full runtime state machines, which themselves are overkill for most simple async problems. The space and time those take up can be prohibitive.