> A queue can handle transient peaks of load much better than direct synchronous code.
Whether a workload is being managed upon creation using a work queue within the backend, has nothing to do with the semantics of the communications protocol used to talk about the state of said workload. You can arbitrarily combine these — for example, DBMSes have the unusual combination of having a stateful connection-oriented protocol for scheduling blocking workloads, but also having the ability to introspect the state of those ongoing workloads with queries on other connections.
My point is that clients in a distributed system can literally never do "fire and forget" messaging anyway — which is the supposed advantage of an "asynchronous message-oriented communications" protocol over a REST-like one. Any client built to do "fire and forget" messaging, when used at scale, always, always ends up needing some sort of outbox-queue abstraction, where the outbox controller is internally doing synchronous blocking retries of RPC calls to get an acknowledgement that a message got safely pushed into the queue and can be locally forgotten.
And that "outbox" is a leaky abstraction, because in trying to expose "fire and forget" semantics to its caller, it has no way of imposing backpressure on its caller. So the client's outbox overflows. Every time.
This is why Google famously switched every internal protocol they use away from using message queues/busses with asynchronous "fire and forget" messaging, toward synchronous blocking RPC calls between services. With an explicitly-synchronous workload-submission protocol (which may as well just be over a request-oriented protocol like HTTP, as gRPC is), all operational errors and backpressure get bubbled back up from the workload-submission client library to its caller, where the caller can then have logic to decide the business-logic-level response that is most appropriate, for each particular fault, in each particular calling context.
Message queues are the quintessential "smart pipe", trying to make the network handle all problems itself, so that the nodes (clients and backends) connected via such a network can be naive to some operational concerns. But this will never truly solve the problems it sets out to solve, as the policy knowledge to properly drive the decision-making for the mechanism that handles operational exigencies in message-handling, isn't available "within the network"; it lives only at the edges, in the client and backend application code of each service. Those exigencies — those failures and edge-case states — must be pushed out to the client or backend, so that policy can be applied. And if you're doing that, you may as well move the mechanism to enforce the policy there, too. At which point you're back to a dumb pipe, with smart nodes.
These concepts has worked surprisingly well for us for nearly a decade. We're not Google-sized, but this architecture should work well for a few more orders of magnitude traffic.
Also, you can mix and match. If you have some parts of your system with absolutely massive traffic, then don't use this there, then.
Note that we very seldom use "fire and forget" (aka "send(..)"). We use the request-replyTo paradigm much more. Which is basically the basic premise of Mats, as an abstraction over pure "forward-only" messaging.
RPC works for just non Google and Google scale. This is one of the times where, IMHO, you can skip the middle section. Novices resort to RPC, Google resorts to RPC, and in the mid tier you have something where messaging can step in.
Why not skip it? Use RPC like a novice. If it becomes problematic, start putting in compensating measures.
Internal Mailbox/Queue on the receiving side. Caller gets back a transaction id that they can query for progress or caller provides a webhook for confirmation/failure. If mailbox is full, callee immediately responds with 429.
> Note that very we use "fire and forget" very seldom (aka "send(..)"). We use the request-replyTo paradigm much more. Which is basically the basic premise of Mats, as an abstraction over pure "forward-only" messaging.
That doesn't help one bit. You're still firing-and-forgetting the request itself. The reply (presumably with a timeout) ensures that the client doesn't sit around forever waiting for a lost message; but it does nothing prevent badly-written request logic from overloading your backend (or overloading the queue, or "bunging up" the queue such that it'll be ~forever before your backend finishes handling the request spike and gets back to processing normal workloads.)
> If you have some parts of your system with absolutely massive traffic, then don't use this there, then.
I'm not talking about massive intended traffic. These problems come from failures in the architecture of the system to inherently bound requests to the current scale of the system (where autoscaling changes the "current scale of the system" before such limits kick in.)
So, for example, there might be an endpoint in your system that allows the caller to trigger logic that does O(MN) work (the controller for that endpoint calls service X O(M) times, and then for each response from X, calls service Y O(N) times); where it's fully expected that this endpoint takes 60+ seconds to return a response. The endpoint was designed to serve the need of some existing internal team, who calls it for reporting once per day, with a batch-size N=2. But, unexpectedly, a new team, building a new component, with a new use-case for the same endpoint, writes logic that begins calling the endpoint once every 20 seconds, with a batch-size of 20. Now the queues for the services X and Y called by this endpoint are filling faster than they're emptying.
No DDoS is happening; the requests are quite small, and in networking terms, quite sparse. Everything is working as intended — and yet it'll all fall over, because you've chosen yourself into a protocol where there's no inherent, by-default mechanism for "the backend is overloaded" to apply backpressure to make new requests from the frontend stop coming (as it would in a synchronous RPC protocol, where 1. you can't submit a request on an open socket when it's in the "waiting for reply" state; and 2. you can't get a new open socket if the backend isn't calling accept(2)); and you didn't think that this endpoint would be one that gets called much, so you didn't bother to think about explicitly implementing such a mechanism.
Relying on the e.g. Servlet Container not being able to handle requests seems rather bad to me. That is a very rough error handling.
We seem to have come to the exact opposite conclusions wrt. this. Your explanations are entirely in line with mine, but I found this "messy" error handling to be exactly what I wanted to avoid.
There is one particular point where we might not be in line: I made Mats first and formost not for the synchronus situation, where there is a user waiting. This is the "bonus" part, where you can actually do that with the MatsFuturizer, or the MatsSocket.
I first and foremost made it for internal, batch-like processes like "we got a new price (NAV) for this fund, we now need to settle these 5000 waiting orders". In that case, the work is bounded, and an error situation with not-enough-threads would be extremely messy. Queues solves this 100%.
> Relying on the e.g. Servlet Container not being able to handle requests seems rather bad to me. That is a very rough error handling.
It's one of those situations where the simplest "what you get by accident with a single-threaded non-evented server" solution, and the most fancy-and-complex solution, actually look alike from a client's perspective.
What you actually want is that each of your backends monitors its own resource usage, and flags itself as unhealthy in its readiness-check endpoint when it's approaching its known per-backend maximum resource capacity along any particular dimension — threads, memory usage, DB pool checked-out connections, etc. (Which can be measured quite predictably, because you're very likely running these backends in containers or VMs that enforce bounds on these resources, and then scaling the resulting predictable-consumption workload-runners horizontally.) This readiness-check failure then causes the backend to be removed from consideration as an upstream for your load-balancer / routing target for your k8s Service / etc; but existing connected flows continue to flow, gradually draining the resource consumption on that backend, until it's low enough that the backend begins reporting itself as healthy again.
Meanwhile, if the load-balancer gets a request and finds that it currently has no ready upstreams it can route to (because they're all unhealthy, because they're all at capacity) — then it responds with a 503. Just as if all those upstreams had crashed.
> Your explanations are entirely in line with mine, but I found this "messy" error handling to be exactly what I wanted to avoid.
Well, yes, but that's my point made above: this error handling is "messy" precisely because it's an encoding of user intent. It's irreducible complexity, because it's something where you want to make the decision of what to do differently in each case — e.g. a call from A to X might consider the X response critical (and so failures should be backoff-retried, and if retries exceeded, the whole job failed and rescheduled for later); while a call from B to X might consider the X response only a nice-to-have optimization over calculating the same data itself, and so it can try once, give up, and keep going.
> I made Mats first and formost not for the synchronus situation, where there is a user waiting.
I said nothing about users-as-in-humans. We're presumably both talking about a Service-Oriented Architecture here; perhaps even a microservice-oriented architecture. The "users" of Service X, above, are Service A and Service B. There's a Service X client library, that both Service A and Service B import, and make calls to Service X through. But these are still, necessarily, synchronous requests, since the further computations of Services A and B are dependent on the response from Service X.
Sure, you can queue the requests to Services A and B as long as you like; but once they're running, they're going to sit around waiting on the response from Service X (because they have nothing better to be doing while the Service X response-promise resolves.) Whether or not the Service X request is synchronous or asynchronous doesn't matter to them; they have a synchronous (though not timely) need for the data, within their own asynchronous execution.
Is this not the common pattern you see for inter-service requests within your own architecture? If not, then what is?
If what you're really talking about here is forward-only propagation of values — i.e. never needing a response (timely or not) from most of the messages you send in the first place — then you're not really talking about a messaging protocol. You're talking about a dataflow programming model, and/or a distributed CQRS/ES event store — both of which can and often are implemented on top of message queues to great effect, and neither of which purport to be sensible to use to build RPC request-response code on top of.
Unless you are super careful, the result of taking services out of rotation as they become overloaded, very often results in cascading failures as each server in turn falls over because more increasing amount of traffic is directed at healthy ones.
You need to be super careful that a service never allows more concurrency than it can handle and fast fails requests when it is overloaded. Otherwise, tcp queues back up and the server ends up only working on requests that are too old to be useful.
To your latter part: This is exactly the point: Using messages and queues makes the flows take whatever time it takes. Settling of the mentioned orders are not time critical - well, at least not in the way a request from a user sitting on his phone logging in to see his holdings is time critical. So therefore, if it takes 1 second, or 1 hour, doesn't matter all that much.
The big point is that none of the flows will fail. They will all pass through as fast as possible, literally, and will never experience any failure mode resulting from randomly exhausted resources. You do not need to make any precautions for this - backpressure, failure handling, retries - as it is inherent in how a messaging-based system works.
Also, if a user logs into the system, and one of the login-flows need a same service as the settling flows, then that flow will "cut the line" since they are marked "interactive".
> You do not need to make any precautions for this - backpressure, failure handling, retries - as it is inherent in how a messaging-based system works.
In reality, there are time limits on fulfillment of requests. Even in your previous example of an order execution / fulfillment system, the orders must execute while the exchange is still open. I can’t think of a system that is truly unbounded in time, but maybe one exists.
This would not work for a exchange. What I work with is an UCITS mutual funds system. Once per day, we get a new price for each fund (the NAV, Net Asset Value). We now need to settle all orders, subscriptions and redemptions, waiting for that NAV. This is of course time critical, but not in the millisecond-sense: As long as it is done within an hour or two, it is all ok.
I believe this holds for very many business processes. If you get a new shipment of widgets, and you can now fulfill your orders waiting for those widgets, it does not really matter if it takes 1 second, or in a very bad outlier day, occasionally 2 hours.
Realize that the point is that this settling, or order fulfillment, will go as fast as possible. Usually within seconds or maybe minutes. However, if you suddenly get a large influx, or the database goes down for a few minutes, this will only lead to a delay - there is nothing you need code up extra to handle such problems. Also, you can scale this very simply, based on what holds you back (services, or database, or other external systems, or IO). It will not be the messaging by itself!
Whether a workload is being managed upon creation using a work queue within the backend, has nothing to do with the semantics of the communications protocol used to talk about the state of said workload. You can arbitrarily combine these — for example, DBMSes have the unusual combination of having a stateful connection-oriented protocol for scheduling blocking workloads, but also having the ability to introspect the state of those ongoing workloads with queries on other connections.
My point is that clients in a distributed system can literally never do "fire and forget" messaging anyway — which is the supposed advantage of an "asynchronous message-oriented communications" protocol over a REST-like one. Any client built to do "fire and forget" messaging, when used at scale, always, always ends up needing some sort of outbox-queue abstraction, where the outbox controller is internally doing synchronous blocking retries of RPC calls to get an acknowledgement that a message got safely pushed into the queue and can be locally forgotten.
And that "outbox" is a leaky abstraction, because in trying to expose "fire and forget" semantics to its caller, it has no way of imposing backpressure on its caller. So the client's outbox overflows. Every time.
This is why Google famously switched every internal protocol they use away from using message queues/busses with asynchronous "fire and forget" messaging, toward synchronous blocking RPC calls between services. With an explicitly-synchronous workload-submission protocol (which may as well just be over a request-oriented protocol like HTTP, as gRPC is), all operational errors and backpressure get bubbled back up from the workload-submission client library to its caller, where the caller can then have logic to decide the business-logic-level response that is most appropriate, for each particular fault, in each particular calling context.
Message queues are the quintessential "smart pipe", trying to make the network handle all problems itself, so that the nodes (clients and backends) connected via such a network can be naive to some operational concerns. But this will never truly solve the problems it sets out to solve, as the policy knowledge to properly drive the decision-making for the mechanism that handles operational exigencies in message-handling, isn't available "within the network"; it lives only at the edges, in the client and backend application code of each service. Those exigencies — those failures and edge-case states — must be pushed out to the client or backend, so that policy can be applied. And if you're doing that, you may as well move the mechanism to enforce the policy there, too. At which point you're back to a dumb pipe, with smart nodes.