I disagree it is always trivial to work around. Typically trampolines are used, but these have a large efficiency hit, up to 1000 times slower. Not a problem in many scenarios, but clearly not good for e.g. a concurrency monad.
For your pipeline example, you gave a trivial statically determined one. Try an example where each function decides the transistion (tail calls the next one). This is of course also what my Monad and CPS examples in essence do.
Regarding your comments on replacing function composition with data structures. In essence, this is what trampolining does. But it does have a performance cost and is not appropriate in all cases. IIRC correctly the performance of "Extensible Effects" is still an outstanding issue.
Incidently, the primary author of that work, Oleg K, has published a lot of work in which he seeks to avoid intermediate data with "finally tagless".
Regarding your comments on replacing function composition with data structures. In essence, this is what trampolining does. But it does have a performance cost and is not appropriate in all cases. IIRC correctly the performance of "Extensible Effects" is still an outstanding issue. Incidently, the primary author of that work, Oleg K, has published a lot of work in which he seeks to avoid intermediate data with "finally tagless".