What is the difference between code that blocks waiting for I/O and code that pe...

akoboldfrying · on Oct 20, 2024

>What is the difference between code that blocks waiting for I/O and code that performs a lengthy computation?

To the caller of that specific function, nothing. To the entire program, the difference is that other, useful CPU work can be done in the meantime.

This might not matter at all, or it might be the difference between usable and impractically slow software.

Joker_vD · on Oct 20, 2024

> To the caller of that specific function, nothing.

And that's what makes async code, not the blocking code, a leaky abstraction. Because abstraction, after all, is about distracting oneself from the irrelevant details.

akoboldfrying · on Oct 20, 2024

That isn't my understanding of a leaky abstraction. An abstraction leaks when it's supposed to always behave in some certain, predictable way, but in practice, sometimes it doesn't. When does an async function not behave the way it's supposed to?

ljm · on Oct 20, 2024

My understanding of a leaky abstraction is that the abstraction itself leaks out details of its design or underlying implementation and requires you to understand them. What you seem to describe is a bug, edge case, or maybe undefined behaviour?

For example, an ORM is a leaky abstraction over an RDBMS and SQL because you inevitably have to know details about your RDBMS and specific SQL dialect to work around shortcomings in the ORM, and also to understand how a query might perform (e.g will it be a join or an N+1?).

8note · on Oct 20, 2024

I don't really think that the async or blockingness is the leak, but that the time taken to process is not defined in either case, and you can leak failure criteria either way by not holding to that same time.

People can build to your async process finishing in 10ms, but if suddenly it takes 1s, it fails

klodolph · on Oct 20, 2024

Yes, you understand exactly the point I’m making.

rileymat2 · on Oct 20, 2024

Async is worse than leaky it often goes viral because many other parts need to be async to call it.

gwbas1c · on Oct 20, 2024

It's better to think of "async" as indicating that a code will do something that blocks, and we're allowing our process to manage its blocking (via Futures) instead of the operating system (via a context switch mid-thread.)

I would argue a few things:

First: You need to be aware, in your program, of when you need to get data outside of your process. This is, fundamentally, a blocking operation. If your minor refactor / bugfix means that you need to add "async" a long way up the stack, does this mean that you goofed on assuming that some kind of routine could work only with data in RAM?

Instead: A non-async function should be something that you are confident will only work with the data that you have in RAM, or only perform CPU-bound operations. Any time you're writing a function that could get data from out of process, make it async.

hinkley · on Oct 21, 2024

I want to restate what you’ve said from a different perspective:

If you write a lot of pure functions, in a Functional Core, Imperative Shell manner, then only the imperative part has to deal with any of the async parts. Yes it’s the topmost part of the code, but there is nothing to “infect” with it which isn’t already destined to be.

It’s when you try to write imperative code like there are no consequences for doing so that the consequences show up en masse and can be confused as symptoms of entirely different problems.

rileymat2 · on Oct 20, 2024

In many application use cases, that is an implementation detail that should not be a concern of higher levels. The ramifications may not even be known by the people at higher levels.

Take something very common like cryptographic hashing, if you use something like node.js you really don't want to block the main thread calculating an advanced bcrypt hash. It also meets all of your requirements that data not come from outside ram and is very CPU bound.

Obviously, if you are directly calling this hashing algorithm you should know, however, the introduction of a need to hash is completely unpredictable.

gwbas1c · on Oct 21, 2024

> Take something very common like cryptographic hashing, if you use something like node.js you really don't want to block the main thread calculating an advanced bcrypt hash. It also meets all of your requirements that data not come from outside ram and is very CPU bound.

I guess I implied quick operations when I said "CPU bound." (IE, calculating a square root, string manipulation, (de)serialization...)

I haven't done hashing in Node.js. I assume that the bcrypt API is async and calls into a native library?

hackit2 · on Oct 20, 2024

You could make the proposition that sequential code is inherently asynchronous in modern operating systems, because the kernel inherently abstracts the handling of blocking/unblocking your process.

smw · on Oct 20, 2024

Sure, but doesn't that remove the usefulness of the words?

gpderetta · on Oct 21, 2024

Yes and no. From a practical point of view considering sync code as async is useless as sync and async code have very different usability characteristics and it is useful to have different names for the two domains.

On the other hand, form a more abstract, theoretical level, it is important to know that there is almost always an event loop at some point deep in the stack and sync code is just sugar over async and you can always transform from one to the other.

hackit2 · on Oct 21, 2024

Yes and no, the concept that is being communicated is that your program inherently is asynchronous how-ever the context switching/block/unblocking is abstracted away from you so you can focus on what is important to you - which is solving your problem. Once you communicate that concept then it makes it a lot easier for people to create new mental models that better reflects reality. Then they can structure their programs that can take advantage of it.

gwbas1c · on Oct 21, 2024

I would say that's a leaky abstraction. (The OS hiding the details of async by blocking your threads and context switching, although this is how many programs operate.)

The problem comes when you have a critical section where you need to hold a mutex (lock): If you're working with "proper" async code, you can safely call any non-async code from within the lock. (C# enforces this, because a lock statement can not contain the await keyword.)

When any method you call can block on IO, synchronous programming (IE, using the OS to become async), can leak and make you hold your mutex longer than you should.

---

That being said, many of us build our careers programming threaded code that relies on the OS to block. So at this point we're splitting hairs.

hackit2 · on Oct 23, 2024

When most people talk about async code it is with single threaded (hence the need not to block the event executor) applications (web-browsers, nodejs ect.). How-ever on the other end of the delirium when people talk about mutex's (locks) they're talking about multiple threadeds working in parallel which you inherently need to consider if the code you're calling is going to either direct or indirect acquire the same lock or like you pointed out calls a api that blocks. In the single threaded environment the only cognitive load you need to worry about is one or two yield points, how-ever in a parallel system your main concern is mutex's, race-conditions, or dead-locks.