Our AI overlords did a pretty good job of implementing this on the minischeme interpreter I like to poke at from time to time.
Though, apparently, a SECD machine isn't as amenable to this as I originally thought. My intuition was you just pass the different SECDs as arguments between the action routines and let the compiler do it's magic but they were like "oh no, silly human, it doesn't work like that."
It turns out doing it their way allowed scheme's mandatory tail-call optimization (TCO) to naturally fall out of the implementation which I thought was kind of special. Kudos, robots...
Haven't looked at how python does its thing in quite a long time but I believe it doesn't have TCO so maybe this is a step in that direction?
It's unrelated to a python TCO. It's only about new optimizations now possible with the new clang 15 musttail attribute in the inner VM loop. while op { switch op case bla: ...} to reduce register pressure in the ops. This is possible for all other dynamic language VM's also (which didn't resort to luajit's assembly vm loop).
Compilers register allocation can never better when they cannot prove TCO. With this they can skip that
On the one hand, I find this fascinating, and I am excited to see new CPython performance improvements here, with the experimental JIT work, and more.
On the other hand, sometimes, I feel like we might have missed something important. For most of its history, an otherwise skilled and experienced programmer with no specialized knowledge could read the CPython codebase front to back, and it would make sense. Then, at some point along the way, we lost that, and I think it's sad that we lost that.
Though, apparently, a SECD machine isn't as amenable to this as I originally thought. My intuition was you just pass the different SECDs as arguments between the action routines and let the compiler do it's magic but they were like "oh no, silly human, it doesn't work like that."
It turns out doing it their way allowed scheme's mandatory tail-call optimization (TCO) to naturally fall out of the implementation which I thought was kind of special. Kudos, robots...
Haven't looked at how python does its thing in quite a long time but I believe it doesn't have TCO so maybe this is a step in that direction?