I can't shake the feeling that the otters could have used a bulletin board - on ...

LotToLearn · on June 17, 2021

Where would you say it makes more sense, design wise? I feel like its hard to find perfect use cases for Kafta.

yongjik · on June 17, 2021

Sorry, I'm not qualified to answer the question. You will have to ask someone who has seen Kafka being used in a place where it made sense...

andreskytt · on June 18, 2021

I have not used Kafka, but what the book describes works with most queue systems. And those are great for localizing dynamic complexity. One part of your system behaving weirdly (spiking, dying, fluctuating etc. ) can’t affect other parts because they sit at the other end of a queue happily consuming the messages at their own pace. This means tour complex system becomes less complex in terms of behavior and is less likely to end up in some weird metstable state

nvarsj · on June 18, 2021

This is sort of funny to me, since ActiveMQ had this exact problem. A single slow client could break the entire system because the broker would slow down producers to prevent exhausting the queue space. Which was exactly some "weird" state because it wasn't obvious what happened until you spent a lot of time debugging.

rising-sky · on June 18, 2021

this made me chuckle

Joeri · on June 18, 2021

For me a perfect use case is sensor data processing. I've been involved with two independent sensor data platforms that used kafka as the backbone. Sensor data is persisted unprocessed to raw kafka topics, and then gradually deduplicated, refined, enriched and aggregated across multiple stages until finally a normalized, functionally sharded and aggregated view of the data is stored in databases for dashboarding.

It is easy to scale horizontally to massive volumes of data, and any issues in the processing pipeline can be fixed without losing any raw sensor data (restarting the consumers from the last known valid point).

lmm · on June 18, 2021

I'd recommend Kafka for any case where you need to act on data that changes over time - almost any software system. In almost every system you want to be able to reliably record events, and you want to be able to partially replay your state in a deterministic way (e.g. after you fix a bug); both those things should be table stakes for a serious datastore, but AFAIK Kafka is practically the only one that offers them.

Kafka itself kind of only solves half the problem, because it doesn't offer indexes as a builtin, so you have to build your indexed state yourself, or maybe even use a "traditional" datastore as a de facto cache. But since you've moved all the hard distributed part of the problem into kafka, that part is not so bad.

guhcampos · on June 18, 2021

Kafka is a streaming log service. People who use it just for passing messages are hugely overengineering something that would fit just fine for something like RabbitMQ, Redis or other tools.

An RSS feed must be powered by something underneath, and any of those tools can do the job. In most situations it would be extremely impractical to use a relational database for this kind of thing. If you are getting thousands of messages in per second, which is not uncommon, no transactional database will give you enough write performance and won't be able to handle too many queries per second for clients polling for updates, like an RSS feed would require. Note that caching queries is almost useless here because the latest content is updated every few milisseconds.

Kafka, as pretty much any other queueing datastore, is optimized for append-only writes with no deletes or updates. Reading from the end of the queue is extremely fast and sequential reads down the stream are quite fast too. Random reads such as the ones commonly handled by SQL databases are either not available or are less efficient than with SQL.

That said, Kafka can be used to pass any kind of message between applications: from simple text messages and small JSON data to vídeo frames, protobuf messages and other types of chunks of serialized data.

It is also a very durable data store for immutable, time-ordered data, and is widely used in the financial world to store transaction logs.

swagasaurus-rex · on June 18, 2021

When you have many servers who all need to see the same chronological data stream (including messages they might miss during network downtime), and see new events in real time.

If you set "log.retention.hours = -1" and "log.retention.bytes = -1" in kafka config, kafka stores messages forever.

In a game for example, user-inputs and other events can be produced in Kafka, then reconstruct the entire game-state by reading and processing the kafka-stream from start to finish. It has an advantage over most DBs because it's real-time.

You can use also chronological data streams to represent data structures more complicated than a simple array. For example, a tree can be represented while preserving chronology. This is far from the ideal use case however.

spockz · on June 18, 2021

> In a game for example, user-inputs and other events can be produced in Kafka, then reconstruct the entire game-state by reading and processing the kafka-stream from start to finish.

This is called event sourcing.

mdtusz · on June 17, 2021

Wherever you have a firehose of data that needs to be processed.

I've heard of it being used as a sort of message queue for application level events before, but that sounds like a nightmare of trying to reinvent the actor model with 1000x the complexity.

halfmatthalfcat · on June 17, 2021

Bah god, that's Akka's music!

oneplane · on June 18, 2021

Akka, and a lot of actor model services break microservice availability, durability and general reliability because nuking a random node messes with Akka and now whatever actor happened to be on that node is now stalled until it's transferred.

Just like SOA and ESB, the concept isn't the problem, is the technical constraints of the design at the time. Decoupling and messaging isn't bad, but having a legacy message queue on physical hardware doesn't really hold up. Any derived architecture faces the same problem.

Then again, Kafka isn't an actor model implementation, and Akka isn't a partitioned redundant stream processing system, they don't have all that much overlap ;-)

halfmatthalfcat · on June 18, 2021

If you shard your Akka actors, the messages are buffered and passed to the actor when it's initialized on the new node. You get even more stability if you persist your actors backed by a DB or some other persistent store.

Not saying Akka can replace Kafka but many of the issues around availability, durability and reliability have been attempted to be solved in Akka.

oneplane · on June 18, 2021

Yeah, that's true. I think the main issue in most cases where I see problems are the ones that only use in-memory stores.

ndrfrhlch · on June 18, 2021

your bulletin board could be an RSS feed. most often you do not need kafka (or a service bus) for distributed systems.

RSS feeds are great as they are easy to implement and debug.