Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> The key requirement for ultra-low-latency software is minimising/eliminating dynamic memory allocation

First, that is only true in languages -- like C++ or Rust -- where dynamic memory allocation (and deallocation) is relatively costly. In a language like Java, the cost of heap allocation is comparable to stack allocation (it's a pointer bump).

Second, in the most common case of writing high throughput servers, the performance comes from Little's law and depends on having a large number of threads/coroutines. That means that all the data required for the concurrent tasks cannot fit in the CPU cache, and so switching in a task incurs a cache-miss, and so cannot be too low-latency.

The only use-cases where avoiding memory allocation could be useful and achieving very low latency is possible are when the number of threads/coroutines is very small, e.g. generators.

The questions, then, are which use-case you pick to guide the design, servers or generators, and what the costs of memory management are in your language.



>Second, in the most common case of writing high throughput servers,

High-throughput servers are not ultra-low-latency software; they prioritise throughput over latency. Ultra-low-latency software is stuff like audio processing, microcontrollers and HFT. There's a trade-off between throughput and latency.


Not here. You don't trade off latency because you cannot reduce it below a cache-miss per context-switch, anyway if your working set is not tiny. The point is that if you have lots of tasks then your latency is has a lower bound (due to hardware limitations) regardless of the design.

In other words, if your server serves some amount of data that is larger than the CPU cache size and can be accessed at random, there is some latency that you have to pay, and so many micro-optimisations are simply ineffective even if you want to get the lowest latency possible. Incurring a cache miss and allocating memory (if your allocation is really fast) and even copying some data around isn't significantly slower than just incurring a cache miss and not doing those other things. They matter only when you don't incur a cache miss, and that happens when you have a very small number of tasks whose data fits in the cache (i.e. a generator use-case and not so much a server use-case).

Put in yet another way, some considerations only matter when the workload doesn't involve many cache misses, but a server workload virtually always incurs a cache-miss when serving a new request, even in servers that care mostly about latency. In general, in servers you're then working in the microsecond range, anyway, and so optimisations that operate at the nanosecond range are not useful.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: