A Few Synthesizing Superoptimizer Results

andrewchambers · on July 26, 2015

I feel like super optimizing llvm code isn't the best idea.

I feel like the way other superoptimizers get good results is by taking advantage of really strange interactions of some of the less commonly understood opcodes. LLVM doesn't have those funky opportunities.

rurban · on July 27, 2015

You didn't understand how Souper works. They already found a lot of very good optimizations, which were included into LLVM proper. Check out http://blog.regehr.org/archives/1146 and http://blog.regehr.org/archives/1192

Here he talks about the end-results of Souper and Alive, which should be presented more user-friendly and generalized, with some help from a solver. Encoding every single edge-case into the optimizer is not as good as finding better generalizations, automatically.

andrewchambers · on July 28, 2015

I understand that is the approach.

Another idea I want to see tried is a source code annotation which indicates to an llvm plugin to run machine specific super optimization on a specific basic block and cache the result. Might be an interesting way to enhance tight loops.

A third idea I had would be to use super optimization as a way of automatically generating isel selectors with only the lowest level descriptions of what the ir and the opcodes do. That is harvest a program for ir sequences, automatically find good mappings to assembly constructs, automatically emit a backend.

caf · on July 27, 2015

It's not either/or, adding superoptimiser-derived optimisations at the IR level doesn't mean you can't also add different ones at the backend.

userbinator · on July 27, 2015

The problem is that superoptimised IR may obscure more useful optimisations lower down.

caf · on July 27, 2015

Is there any evidence for this?

jsprogrammer · on July 27, 2015

Is there any proof that they wouldn't?

UniQP · on July 27, 2015

My advisor once said "Optimization means: sometimes, it is not getting worse", and that's excactly what happens if you perform any optimization on IR level. You don't have any guarantee that your result is optimal (regarding whatever). That's exactly the reason why Massalin introduced the term superoptimization, because on machine level you can provide such guarantees for the considered instruction sequence.

Of course there a cases where preventing an IR optimization can trigger a more powerful one and the resulting code gets better. However, in general I would say it's the other way around: Performing the IR superoptimization leads to better code quality.

jsprogrammer · on July 27, 2015

I'd guess either approach will become stuck in local minima. Ideally, you'd want to check all permutations systematically. Of course, there is probably no polynomial-time algorithm to find the optimal, but adding some basic depth bounds should give a practical implementation that can explore at least some of the space, instead of relying on gut instinct about which approach may or may not be better.

UniQP · on July 27, 2015

The permutation problem is not related to IR superoptimization itself.

The goal of an optimization pass is to improve the performance of an average, and that's exactly what the IR superoptimization does.

andrewchambers · on July 27, 2015

My intuition is that stacking general super optimizers will be at best equally or less effective than just running a machine specific one for twice as long.

UniQP · on July 27, 2015

The advantage of performing superoptimization on IR level is that you have explicit data dependencies (due to SSA form). Thus you don't have unrelated instructions in between. Figure 1 of the cited Optgen paper http://pp.info.uni-karlsruhe.de/uploads/publikationen/buchwa... shows an example that can be optimized with IR-based superoptimization but not with a machine-specific one.

In summary, no approache subsumes the other one. Thus, it is worthwhile to have them both.

caf · on July 27, 2015

I believe the intention here is to use the super-optimiser to derive specific generally-applicable optimisations that can be contributed back to LLVM as peephole optimisations, rather than running the super-optimiser as part of the usual compilation process.

That being the case, it seems like it would be advantageous to create optimisations that work at both stages of compilation - not least because the IR-level ones will be more widely applicable.