I am skeptical about this. Optimizer can also specialize functions and programmers can do too. Excessive specialization you get with templates always look beautiful in microbenchmarks but may not be ideal on a larger scale. There was a recent report analyzing the performance of Rust drivers vs C drivers and code bloat caused by monomorphization was an issue with the Rust things, and in my experience (also I do not have a reference) it is the same in C++.
> Optimizer can also specialize functions and programmers can do too
Yes, but not if you pass in void *. For libraries this matters. If you're both writing the producer and consumer then sure, you can do it manually.
> code bloat caused by monomorphization
This is true and a real problem, but I would argue in most scenarios extra codegen will be more performant than dynamic allocation + redirection. Because that's the alternative, like how swift or C# or Java do it.
Java does not monomorphize, it has no true generics - it's objects all the way down. It does, however, perform guarded devirtualization since all methods are virtual by default, so performance lives and dies by OpenJDK hotspot emitting guarded for fast, often multiple, dispatch as well as optimizing "megamorphic" callsites with vtable-ish dispatch (which is about the default cost of interface dispatch in .NET, somewhat slower than virtual dispatch).