Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Using ASGs with zookeeper is kind of a pain, but also not very necessary. Production zookeeper clusters are almost always either 3, 5, or 7 nodes and it's very rare to change that once a cluster is in production. Given that this is all very static, the easiest thing is to just use terraform or similar to create the ec2 instances. If a node dies, you re-run your terraform to create a replacement.

Alternatively, you can create an ASG-per-node so that you get auto-replacement.

In my experience, ZK is one of the easiest and most reliable distributed systems to operate. I've only seen issues when it's used as a database instead of a distributed coordination service.



We ran into additional troubles based on ZK client bugs in the version that Kafka used. It would only ever resolve the ZK hostname once, at startup. Our workaround required some additional work to allocate an EIP for each host (3 AZs -> 3 EIPs) and then have the ZK host grab the EIP on startup.

Even though Netflix hasn't updated it in a while, Exhibitor was helpful as well, in that it allowed ZK to bootstrap nodes off of state stored in S3. That did come at the cost of an extra 2-3 minutes per node on initial quorum startup.


Might just need to set networkaddress.cache.ttl=60 in java.security.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: