My advisor once said "Optimization means: sometimes, it is not getting worse", and that's excactly what happens if you perform any optimization on IR level. You don't have any guarantee that your result is optimal (regarding whatever). That's exactly the reason why Massalin introduced the term superoptimization, because on machine level you can provide such guarantees for the considered instruction sequence.
Of course there a cases where preventing an IR optimization can trigger a more powerful one and the resulting code gets better. However, in general I would say it's the other way around: Performing the IR superoptimization leads to better code quality.
I'd guess either approach will become stuck in local minima. Ideally, you'd want to check all permutations systematically. Of course, there is probably no polynomial-time algorithm to find the optimal, but adding some basic depth bounds should give a practical implementation that can explore at least some of the space, instead of relying on gut instinct about which approach may or may not be better.