That's pretty much what I'm hinting at at the end when I mention minor GC.
I don't think doing it after each request would be sensible, but counter intuitively, the time it takes to run GC isn't proportional to amount of garbage to collect, but to the number of live objects left (ignoring some minor things like finalizers).
So on paper at least we could run a minor GC for very cheap after each request, but there's likely some better heuristics given currently the median request already spent less than 1ms in GC, so after every requests might be overdoing it.
Also even if we were doing that, many requests would still have to run GC because they allocate more than there is memory available, so they need to clean their own garbage to continue, you can't delay GC indefinitely.
But at least now, endpoints that spend too much time in GC are responsible for their own demise, so engineers responsible for a given endpoint performance have a clear signal that they should allocate less, whereas before it could easily discounted as being caused by lots of garbage left over by another collocated endpoint.
Would it be possible for the allocator/GC to know what allocations are made within a request and make a generation for specifically for it? Allocations too big to fit would be made like usual
Since objects cannot be promoted to the old generation inside the request cycle, objects in the new gen are request allocated objects.
So if we were to eagerly trigger a minor GC after a request, we'd have very little objects to scan, and only need to sweep garbage, which is only a small fraction of time spent in GC.
I don't think doing it after each request would be sensible, but counter intuitively, the time it takes to run GC isn't proportional to amount of garbage to collect, but to the number of live objects left (ignoring some minor things like finalizers).
So on paper at least we could run a minor GC for very cheap after each request, but there's likely some better heuristics given currently the median request already spent less than 1ms in GC, so after every requests might be overdoing it.
Also even if we were doing that, many requests would still have to run GC because they allocate more than there is memory available, so they need to clean their own garbage to continue, you can't delay GC indefinitely.
But at least now, endpoints that spend too much time in GC are responsible for their own demise, so engineers responsible for a given endpoint performance have a clear signal that they should allocate less, whereas before it could easily discounted as being caused by lots of garbage left over by another collocated endpoint.