I’m confused on your JVM heap concerns. Not trying to call you out, just want to understand.
You should never need to provision more than 6GB of heap for a Kafka JVM. 90% of the OS memory leftover it will use for file handlers / page cache which is all buffered and which you don’t need to manage. Zookeeper JVM you might need to worry about but I’m running 5k partitions with hundreds of consumers just fine on 6GB heap on those as well. I use on average like 400MB. It’s not like running ElasticSearch.
Resource consumption matters a lot more in a small, self-funded company that can't just throw boxes at a problem.
My data from non-JVM apps is that we can cram a whole lot of them into small 16GB VMs via Kubernetes — Go apps especially. With JVM apps, we typically have to dedicate larger 24-32GB boxes to 1-2 apps because they absolutely need the legroom. Since RAM isn't "elastic" like CPU is, you get less flexibility in managing big pieces than with small ones, especially big pieces that must stay up. We have to preallocate a certain amount and watch our limits.
I've never actually run ZK or Kafka in production, so this is pure speculation, but I didn't get any fuzzy feelings from playing around with a test cluster recently. When you combine ZooKeeper with Kafka we're probably talking about 3+ dedicated boxes because they don't fit into existing ones.
These things add up, and right now I'm trying to reduce our monthly GCP bill, not add to it!
(We do run Elasticsearch. Its memory use seems particularly egregious, indeed. I'm sure these other things are lighterweight.)