I feel like super optimizing llvm code isn't the best idea.
I feel like the way other superoptimizers get good results is by taking advantage of really strange interactions of some of the less commonly understood opcodes. LLVM doesn't have those funky opportunities.
Here he talks about the end-results of Souper and Alive, which should be presented more user-friendly and generalized, with some help from a solver.
Encoding every single edge-case into the optimizer is not as good as finding better generalizations, automatically.
Another idea I want to see tried is a source code annotation which indicates to an llvm plugin to run machine specific super optimization on a specific basic block and cache the result. Might be an interesting way to enhance tight loops.
A third idea I had would be to use super optimization as a way of automatically generating isel selectors with only the lowest level descriptions of what the ir and the opcodes do. That is harvest a program for ir sequences, automatically find good mappings to assembly constructs, automatically emit a backend.
My advisor once said "Optimization means: sometimes, it is not getting worse", and that's excactly what happens if you perform any optimization on IR level. You don't have any guarantee that your result is optimal (regarding whatever). That's exactly the reason why Massalin introduced the term superoptimization, because on machine level you can provide such guarantees for the considered instruction sequence.
Of course there a cases where preventing an IR optimization can trigger a more powerful one and the resulting code gets better. However, in general I would say it's the other way around: Performing the IR superoptimization leads to better code quality.
I'd guess either approach will become stuck in local minima. Ideally, you'd want to check all permutations systematically. Of course, there is probably no polynomial-time algorithm to find the optimal, but adding some basic depth bounds should give a practical implementation that can explore at least some of the space, instead of relying on gut instinct about which approach may or may not be better.
My intuition is that stacking general super optimizers will be at best equally or less effective than just running a machine specific one for twice as long.
The advantage of performing superoptimization on IR level is that you have explicit data dependencies (due to SSA form). Thus you don't have unrelated instructions in between. Figure 1 of the cited Optgen paper http://pp.info.uni-karlsruhe.de/uploads/publikationen/buchwa... shows an example that can be optimized with IR-based superoptimization but not with a machine-specific one.
In summary, no approache subsumes the other one. Thus, it is worthwhile to have them both.
I believe the intention here is to use the super-optimiser to derive specific generally-applicable optimisations that can be contributed back to LLVM as peephole optimisations, rather than running the super-optimiser as part of the usual compilation process.
That being the case, it seems like it would be advantageous to create optimisations that work at both stages of compilation - not least because the IR-level ones will be more widely applicable.
I feel like the way other superoptimizers get good results is by taking advantage of really strange interactions of some of the less commonly understood opcodes. LLVM doesn't have those funky opportunities.