You may be significantly underestimating your ability to get phenomenal gains fr...

DannyBee · on June 1, 2017

" I think DJB would argue that the compiler optimizations are insignificant in comparison to those improvements for a very large portion of hot code."

Except he has no data to show this, and every piece of data i have (and my colleagues have), says that he is wrong.

Additionally, the compilers can and do replace algorithms if users let them. Most users don't want it, even if it makes code faster.

Remember that the vast majority of people want software that works and is reliable, or has fast development cycles, or a million other things, and happily choose that over software that is fast. Compilers changing algorithms on you isn't highly compatible with this, especially when, for example, most programming languages don't even give you a good way to express the invariants you are trying to maintain (except good ol' Ada).

However, besides that, certainly you can see that in the end, the people lose. It's like arguing that computer would never win at go.

If i really really really cared about some piece of hot code, i wouldn't pay an expert to optimize it, i'd throw the cycles of 100k spare cpus at optimizing it.

That is also the other strange part of his argument to me, he assumes the capabilities of compilers based on compilers he sees that are meant to operate pretty much single threaded on single machines, and complains "they will never beat people". It's like looking at the machine learning algorithm that takes months or years or roughly forever to train on a single cpu and say "it'll never do a good job". The world is not that small anymore.

It's blindingly obvious the optimizing compilers win if we want them to.

In any case, this particular debate will restart again as gpus and accelerators follow the same old compiler cycle.

flukus · on June 1, 2017

> Remember that the vast majority of people want software that works and is reliable, or has fast development cycles, or a million other things, and happily choose that over software that is fast.

People want software that is fast, slow (and bloated) software frustrates normal people, they just can't articulate it or identify the correct source of their frustrations. As I said in another post just yesterday, you'll often hear people say everything from "god I hate Microsoft" to "i think I need a new machine" to "I might have a virus". Once you dig into the problem it's often just a particular app (or bloated website) that is slow, or something else on the machine eating up way to much RAM and causing everything to swap.

I'd love to run an experiment where group A has native apps written in a low level language and group B has entirely electron based ones. I'd bet my house that group B has a hell of a lot more general complaints.

oconnor0 · on June 1, 2017

> Additionally, the compilers can and do replace algorithms if users let them. Most users don't want it, even if it makes code faster.

What compilers do this? With what algorithms?

jerven · on June 1, 2017

If you look at modern JIT compilers such as Truffle-Ruby, then yes it will change algorithms. e.g. changing sort algorithms depending on data size. if the data is small enough it will do a bubblesort on registers, when its big it will do TimSort on heap data etc...[1]

[1]: http://chrisseaton.com/rubytruffle/small-data-structures/

coldtea · on June 1, 2017

This example is about the compiler changing between (existing) internal versions of sorting algorithms for language data structures.

Not about changing what algorithms the user coded.

danielbarla · on June 1, 2017

Exactly, that's two completely different things. It's akin to a compiler using the famous "switch statement becomes if-else chain if less than 7 items, otherwise lookup table".

The parent posts are referring to macro-level optimisations, where large changes to how a system works are implemented, in order to get massive gains in performance. Picking out the essence of the larger implementation would be a difficult task for a compiler, but more importantly it wouldn't have enough information about the context of the application to make optimisation decisions. Things like: do we need to read from that file each time we need this piece information, or is it a once-off that can be cached? Unless this contextual information is expressed somehow (and it basically never is), only the programmer will be able to make the change.

jerven · on June 1, 2017

    def clamp(num, min, max)
      [min, num, max].sort[1]
    end

being turned into

    def  clamp(num, min, max)
       if (num < min)
       {
         if (num < max) 
             return num; 
         else 
             return max;
       }

       if (min < max)
         return min;
      else return max;
   end

Seems to me to be algorithm rewriting. As is superword optimization and partial evaluation. All things that Truffle-Ruby + Graal do today.

danielbarla · on June 1, 2017

Sure, and at this level, I would also argue that compilers are as good as humans at optimisation (likely better than almost all, actually). These arguments were constantly going on in the 80s and 90s, where (assembly-enthusiast) people vehemently argued that machines (C compilers) will never optimise things as well as humans. While the argument still exists in some form today, it's certainly died down a lot.

That said, in my own limited experience, everyday optimisation problems tend to exist at a much higher level, on the high-level approach or "what are we doing" level. Perhaps these are "obvious" to some, or below the level of discussion here. But essentially, a compiler can optimise away endlessly at a piece of code, but will never beat code that shouldn't exist in the first place. My comment above was that a compiler has insufficient information to make decisions about what's truly wasteful or useless.

As an example, no compiler today will come up with a Courgette update [1] by itself. And the day it does, I think we can pack up our bags and go home (hopefully in a nice, comfy retirement kind of way).

[1] https://www.chromium.org/developers/design-documents/softwar...

wolfgke · on June 1, 2017

> These arguments were constantly going on in the 80s and 90s, where (assembly-enthusiast) people vehemently argued that machines (C compilers) will never optimise things as well as humans.

I still argue this way. But these kinds of optimizations are tedious and time-intensive (thus costly). Also dependent on the architecture, i.e. when some extensions (say new SIMD instructions) are added to the instruction set, the compiler sometimes is able to use them automatically (though typically in a very sub-optimal way), while hand-optimized assembly code has to be rewritten (costly). Also if you port your program to a new CPU architecture (say Intel -> ARM or ARM-A32 -> ARM-A64) hardly anything can be reused. Finally tight hand-optimized code tends to be much harder to add features to than more high-level code (say, C code).

So I believe these vehemently arguing people are right (and have always been). But this does not contradict the fact that in many cases a very suboptimal code generated by the (say C) compiler is often fast enough and C is much more economical to use.

jerven · on June 1, 2017

I agree with this, reading it I feel once again the limitations of written communication forms.

However, the top level discussion is about is it worth using an optimizing compiler. Or in my example is it worth using TruffleRuby instead of MRI. I would argue yes because that optimizing compiler gets you 10x more perf per cpu.

openasocket · on June 1, 2017

I mean that's basically constant folding on steroids, in this case taking advantage of the fact it knows the array has three elements, and that we only care about entry 1. Which is impressive, really impressive, but it is fundamentally limited. This approach won't ever be able to take your bubble sort implementation and turn it into a quick sort or merge sort. In general I don't think this sort of optimization will lead to asymptotic improvements in runtime, excluding certain edge cases.

Which isn't to say these optimizations aren't important and can lead to very real improvements in speed, I just wouldn't consider it algorithm rewriting in the strictest sense.

AstralStorm · on June 2, 2017

The only way to automatically replace nontrivial algorithms safely to provide a formal proof of each one then do a library match.

DannyBee · on June 1, 2017

https://books.google.com/books?id=u38h_fV3UqgC&pg=PA173&lpg=...

See also the citations to it, http://dl.acm.org/citation.cfm?id=340757

See also https://www.researchgate.net/publication/221200232_Algorithm...

http://ieeexplore.ieee.org/abstract/document/5447280/

http://perso.ens-lyon.fr/christophe.alias/pub/alias_thesis.p...

https://www.ideals.illinois.edu/handle/2142/45394

http://gac.udc.es/~juan/papers/toplas.pdf

and ...

As the last paper says: "e. The development of techniques for automatic recognition of computational kernels such as inductions, reductions and array recurrences has been an intensive research area in the scope of compiler technology during the 90’s."

If you want a simpler example, all of your compilers are usually replacing your divisions by a different algorithm :) It replaces your sum reductions, recurrences, etc.

Seriously, these are really really old compiler problems. Compilers do the parts people care about, and will often replace the algorithms that are usually "slow", like math, or have well defined semantics that won't cause api breakage for you. Compilers replace memcpy, printf, etc, all the time.

If you want a compiler that replaces your much higher level algorithms, as the papers show, they have existed over the years. When the research papers are scanned copies, it's old tech :P

But the truth is that, for example, a lot of common languagers (C++, etc) are not good languages for that. It's not about compiler technology. I can teach a compiler to recognize your sorts, for example, pretty easily. But proving i can legally change it and not destroy the programming semantics is often literally not possible. You'd have to tell me "it's okay". It's only stuff that is well specified enough (like standard library functions) that i could do.

Think about trying to swap STL data structures for some code. We can recognize when it is possible. We can even determine when it is profitable, and do it. Usually, the problem is it destroys the API you've built to do so (IE you pass it around as an argument), etc. and in the end, it's usually not worth it for those languages to do it in the compiler. It's infinitely more effective to do something like http://lists.llvm.org/pipermail/llvm-dev/2016-April/098355.h... and tell people "you should switch this datastructure". At least, for these languages.

But hey, if you want the compiler to duplicate and inline all the functions and swap out the algorithms, it could!

It's just that, for example, C and C++ are not languages that lends itself to compilers doing it, it's one that lends itself to telling humans to do it (or at least, asking them if they want it done and then doing it)

This is obviously all easier in functional languages, and they actually have a fairly rich history of doing algorithm replacement :)

mpweiher · on June 1, 2017

> replacing your divisions by a different algorithm

No they're not. I don't actually specify the algorithm to be used for division, I just write "/" (usually), and so the compiler (or runtime or whoever) is certainly free to implement that "high level" specification (I want these two quantities divided) in any way it sees fit.

DannyBee · on June 1, 2017

I'm really unsure why you want to be this pedantic, but okay. You are making a lot of assumptions here. My point is that bunch will idiom recognize division algorithms and change them. So yes, yes they do!

mpweiher · on June 2, 2017

> pedantic

I am being precise where you are incredibly sloppy.

> You are making a lot of assumptions here.

Really? Which assumptions am I making? Try to be precise.

Considering the fact that you are calling everyone else idiots who don't have any data, the fact that what you write has (a) no data to back it up and (b) is usually trivially, obviously and comically wrong doesn't exactly help your argument.

> ...will idiom recognize division algorithms and change them

Really? Which ones? How is this useful? How many real-world programs actually code a division "algorithm"? I don't think I've seen that once in the last 20+ years or so, but of course YMMV.

UPDATE: Last I remember, "idiom recognition" was invented for the sole purpose of cheating on industry standard benchmarks, for example replacing the Dhrystone with a computed result. Has this changed?

Sean1708 · on June 1, 2017

As a very simple example see if you can spot the loop in the optimised version of a sum to n function[0] (hint: you can't). I don't know to what extent this scales up to more complex programs, but it is at least possible in principle.

[0]: https://godbolt.org/g/dRUYA3

wolfgke · on June 1, 2017

It does not work very well:

> https://godbolt.org/g/qmQhLK

Exercise: Write an assembly function that is faster than "-O3" and loop-free. Should be easy.

wolfgke · on June 1, 2017

UPDATE: For those who don't want to do the exercise: The Intel compiler (icc) optimizes it in about the way that I would if I were to write it in assembly:

> https://godbolt.org/g/LntTnY

AstralStorm · on June 2, 2017

Which is slower on non-Intel and older CPUs. Add -march=native next time. Nets you a vector version. If you disable sse it generates essentially what icc does with less reliance on ucode.

See, rep is a workaround for older AMD CPU branch predictor. In addition gcc does not use the highly opcoded variant of lea with offsets because it is slow on older Intel.

GCC version makes for much better speculative execution too due to the use of the adder.

wolfgke · on June 2, 2017

> Which is slower on non-Intel and older CPUs. Add -march=native next time. Nets you a vector version.

Accepted (though the central advantage of SHLX (with-march=native) and SHL (no -march=native) for this example lies in the greater flexibility of register parameters). To my defense I only have a computer whose processor supports BMI2 for a few months now - so I could not play around with BMI2 before. Otherwise I am sure I would have known such tricks.

> See, rep is a workaround for older AMD CPU branch predictor. In addition gcc does not use the highly opcoded variant of lea with offsets because it is slow on older Intel.

I know that. My personal code philosophy is to avoid such hacks for circumventing performance bugs in outdated processors.

AstralStorm · on June 2, 2017

By outdated you mean 5 year old? This is not for ancient Athlons. Those were 64 bit chips already.

Likewise preferring microcoded lea shafts everything older than Haswell on Intel side. Not to mention modern Atom.

Sean1708 · on June 2, 2017

Yes but the point was that compilers do changes algorithms.

flamedoge · on June 1, 2017

> the compilers can and do replace algorithms if users let them

I think idiom recognizers are worthy of mention. That said, it being a manual pattern recognition is probably going to eventually lose edge to things like LLVM souper or other superoptimizers (exhaustive or predictive)

big_spammer · on June 1, 2017

> most programming languages don't even give you a good way to express the invariants you are trying to maintain (except good ol' Ada).

What does Ada do that helps with this in ways C and other languages don't?

nostrademons · on June 1, 2017

Ada has a rich set of language features to express preconditions, postconditions, and invariants directly in the code, and then have those logical statements available to the compiler so that it can assume certain facts about the program's execution when optimizing:

https://en.wikibooks.org/wiki/Ada_Programming/Contract_Based...

Also, the type system lets you specify legal values with more precision than the word-based types that most languages inherited from C. For example, you can have range types that may hold integers of only a certain range, and the compiler will enforce that when assigning to variables of the type, or throw an exception if an arithmetic operation results in a value outside the range (you also have modulus types that explicitly give the wraparound behavior familiar in C). Containers allow generic constraints, and you can explicitly specify the dimensions of array types.

https://en.wikibooks.org/wiki/Ada_Programming/Type_System

mpweiher · on June 1, 2017

>> compiler optimizations are insignificant in comparison to those improvements

> he has no data to show this

Yes he does. And so do countless of other people. For example, I just gave a talk where I describe an afternoon optimization session giving me a 1000x and and 8000x improvement for a data-conversion and a query task, respectively.

https://www.youtube.com/watch?v=kHG_zw75SjE

Was the compiler helpful? Probably, I didn't measure with and without. But the vast majority of the improvement was not the compiler. And I don't see how the compiler could have come up with the changes (moving from a layered architecture of Model Objects -> ORM -> SQLite to a hexagonal architecture with an optimized data representation and completely different access procedure for the one data type that mattered).

And this is what I find almost everywhere I go. For example the BBC system I made somewhere between a hundred and a thousand times faster: https://www.slideshare.net/MarcelWeiher/in-processrest , https://link.springer.com/chapter/10.1007/978-1-4614-9299-3_...

And so on and so on. It is also interesting that in none of these cases did I revert to assembler, so I think that part of DJB's thesis is not entirely correct, but only in being too specific: hand-coding assembler is just one part of hand-optimizing, and very often it is not the or even a crucial part (though in security algorithms it might very well be).

But the idea that optimizing compilers are getting squeezed at both ends is, IMHO, 100% correct. On the one hand, so much code is so incredibly cold, see Ruby on Rails web sites or most UI programming. RoR can do maybe 10-100 requests/s. It's so slow it's comical. However, most sites would be happy to receive 100 hits per day, so "comically slow" is fast enough. Or when I set the end-parameters for an animation that is going to take half a second, it wouldn't matter if the code doing that were run on a LISP interpreter badly written in Ruby. Or why hasn't PyPy completely replaced Python? Why was TCL adequate for steering supercomputer workloads?

On the other hand, the performance critical parts have gotten more critical. That means that the "make everything a bit faster, but without any guarantees" that optimizing compilers are great at providing is simply not that useful any more for a lot of workloads.

And yes, there are workloads where it is useful, and it seems that you have some of these workloads. That doesn't change the broader picture.

regularfry · on June 1, 2017

I usually express this as "No compiler will turn your DFT into an FFT" but of all the examples I could pick, that's the one I'd expect someone to prove me wrong with.

mpweiher · on June 1, 2017

:-)

And of course it's both: it probably won't do it right where you need it, and it shouldn't even try where you don't (especially if that means messing up the code, see http://www.complang.tuwien.ac.at/kps2015/proceedings/KPS_201... )

DannyBee · on June 1, 2017

Yeah they actually do. Math makes for bad examples, idiom recognition of all types of complex math operations is something that is really widespread.

DannyBee · on June 1, 2017

"Yes he does. And so do countless of other people. For example, I just gave a talk where I describe an afternoon optimization session giving me a 1000x and and 8000x improvement for a data-conversion and a query task, respectively. "

Please be concrete about what exact data you think he does and countless others have.

"Was the compiler helpful? Probably, I didn't measure with and without. But the vast majority of the improvement was not the compiler. And I don't see how the compiler could have come up with the changes (moving from a layered architecture of Model Objects -> ORM -> SQLite to a hexagonal architecture with an optimized data representation and completely different access procedure for the one data type that mattered)."

First, you've actually said "i have no idea if the compiler would have fixed this, because i didn't run it". That means what you have is not actually data. It's an assumption. If you want to prove it out, that's data. You actually have to do the science if you want to say you've done the science.

Second, Compilers do actually come up with such optimized data representations, memory layouts, loop walking, you name it (see, for example, https://hal.inria.fr/hal-01425546)

Had you written it in a language that allowed for semantic changes of the kind you are talking about, we could make an optimizing compiler do it.

15 years ago I saw a compiler that did automatic splitting and distribution of COM objects to split up a game to run half on a server and half on a client.

We can do this stuff. You'd just have to use something other than C/C++.

Which is why more and more people are moving towards those kinds of languages ever day, and why more and more invasive "extensions" to C/C++ get made every day.

Your particular example used to be the domain of dataflow, ETL, and stream compilers, all of which, yes, could reorganize your entire hierarchy to be better if you let them.

As mentioned, this isn't about the capabilities of optimizing compilers, but about "what tradeoffs am i making in choosing my tools and languages".

I don't think it's reasonable to, for example, program in qbasic and then say "optimizing compilers suck at reorganizing my data layout". Well, yeah.

"On the other hand, the performance critical parts have gotten more critical. That means that the "make everything a bit faster, but without any guarantees" that optimizing compilers are great at providing is simply not that useful any more for a lot of workloads."

Again, you strongly are lacking in actual data.

wolf550e · on June 3, 2017

These are super interesting and I would love if you submit to HN a cool video, slide deck, blog post or paper about systems that optimize a large task across machine boundaries!

But normal people consider the output of a compiler to be a single binary (or bytecode meant to run in a single possibly-forking process) and the optimizations the compiler is allowed to perform in the "as-if" principle do not change the output of strace of the resulting binary.

I would be very surprised if a compiler ever changed the externally-visible IO operations my program performed, even if for the better. What I call a compiler cannot change the file format or the serialization format or the indexing or the way data is written to a socket or things like that. It can only change computations and memory accesses between syscalls as long as the same syscalls are performed.

For systems where the task is defined at such high level that choosing the file format is up to the "compiler", I would want to use a different name than "compiler".

mpweiher · on June 2, 2017

> First, you've actually said "i have no idea if the compiler would have fixed this, because i didn't run it"

No I did not. Please don't distort what I wrote (which appears to be the only way you have of 'making' your point) What I did write was "I didn't measure with and without", meaning I didn't do the cross check of what the total improvement would have been without the normal compiler optimizations, because both the original and the final were run with standard -Os optimizations, and because I have a fairly robust intuition in terms of the bounds of the effect.

For shits and giggles, I now did the cross check. The difference between -O0 and -Ofast was in the noise. Sometimes the -O0 was faster, sometimes the -Ofast was faster, slightly skewing towards -Ofast and with times varying by around 10%.

So let's be generous and assume that it wasn't noise, that it was 10% in the direction of the -Ofast option. Heck, let's double it to 20% just to be on the safe side. That's 20% vs. 1000x. Compared to the total, the contribution of the compiler optimizations vs. the manual optimizations is 2:10000. Or 99.98% vs 0.02%. Compare Proebsting's Law.

The rest of your post is full of "had you" and "might have" and "could have". And you berate others for not delivering data. Hmm... And then you go off and cite research that did something that seems vaguely keyword-related to the things I described, but is actually nothing at all like it.

Again, my example is

    CSV  ->  ( Object Model : ORM (CoreData) : SQL Database (SQLite) )

Transformed to

    CSV  ->  ( Optimized binary data model ) -> fwrite()

And now tell me what concrete, existing optimizing compiler (production preferred, research might be OKish) is going to do that transformation and get me my 1000:1 performance improvement. No "there was a research compiler that did some data layout transformations somewhere", but concretely, a compiler that will take that first solution and automatically generate the second solution with the 1000:1 perf. improvement.

Please also take into consideration that the CoreData ORM and SQLite DB are not available in source, but only via shared library / API.

The original is described here: https://www.objc.io/issues/4-core-data/importing-large-data-...

My modified version is described in the talk: https://www.youtube.com/watch?v=kHG_zw75SjE

Code: https://github.com/mpw/issue-4-importing-and-fetching

Sample data sets: https://vbb-gtfs.jannisr.de/2017-05-10

> Had you written it in a language ...

If my grandmother had wheels, she'd be a bus. Starting an argument (or a logical deduction) with a false premise is not exactly useful. Sorry, if your idea of the power of optimizing compilers is "you need to first rewrite the code", then I can just rewrite the code to be fast and leave out the middle-man. Your argument here is ridiculous beyond the pale.

> 15 years ago I saw a compiler...

And I saw a UFO. This is your strong "actual data"? Yes, research compilers can sometimes do funky stuff in research settings. Same for JITs. I very much remember that special issue of the IBM system's journal on Java performance. It featured a bunch of reports from the researchers touting performance numbers, and then one real world report from the San Francisco project that reported their performance "challenges". See Jitterdämmerung: http://blog.metaobject.com/2015/10/jitterdammerung.html

> ...that did automatic splitting and distribution of COM objects to split up a game to run half on a server and half on a client.

And how is this relevant to the example at hand, except in "proving" (which it doesn't, by the way) that compilers can do crazy transformations (which so isn't the point)? Was this a production compiler? Was it general purpose? Is it still around today?

The point is not what compilers can do. It is how useful that is in practice. And yes, there exists code outside of Google data centers.

> ...qbasic and then say "optimizing compilers suck at reorganizing my data layout"

Nobody said that, except you. You need to stop making up straw-man arguments that you can then demolish, but rather deal with actual facts.

tptacek · on June 2, 2017

It seems awfully weird to rebut someone who says they're doing large-scale optimization on specialized cluster compilers with better languages than C/C++ by compiling something first with -O0 and then with -Ofast.

mpweiher · on June 2, 2017

Who said I am rebutting whatever he says he is doing?!?

I am rebutting his absurd claim that compilers can and more specifically do transform the code I hand-optimized to the tune of 1000x / 8000x by doing a top-level architectural transform. They can't and don't. In this particular case, the optimizer's contribution was trivial (actually less then I would have expected). And of course DJB gave a bunch of other examples as well, and both he and I (not that I am putting myself on the same level, not by a long shot) and many others have seen many other cases.

Jeez.

tptacek · on June 2, 2017

He keeps saying they do. He doesn't say your compiler does. He says there exist production compilers that do. Which, again, makes your attempt to rebut him with gcc a little weird.

mpweiher · on June 3, 2017

> He keeps saying they do.

And he is wrong. He alludes (at most) to compilers that do things that are at best superficially-keyword related to what I've described. And that is being very, very generous.

I have given the example and made available the source code. If you have a compiler that can do this transformation, show me.

And to reiterate, the transformation is to take a system that uses an SQL database via an ORM, both not present as source code but via an OS-level ABI, and convert that to not use either the SQL database or the ORM, and instead use a packed binary representation that takes domain knowledge not available to the compiler (certain fields in the input CSV are hours and minutes, but the hours go to 30 because of midnight wraparound), and use that knowledge to create an indexing structure based on double array lookup into buckets, and a bucket sort to create those buckets. And use fwrite() for persistence (instead of SQLite) and mmap() to read the data in (instead of SQLite) because the structure was purposely created in such a way as to (a) be contiguous and (b) not contain any pointers.

Go. I'll wait.

And yes, the burden of proof is on the person why says compilers do this. And yes, the task is not to transform the program the original authors (not me) should have written into the optimized version, but the program they did write.

Oh, another example is the BBC one, where the start was a distributed system with around a dozen or two machines running around 100 processes, some talking go each other via HTTP, or shared folders, or via a shared Oracle DB. Or some combination of these. The replacement was a single process event-poster using in-process REST that was between 100-1000x faster.[1][2] Again, show me the compiler that will do this.

Go. I'll wait.

> He doesn't say your compiler does.

So "no true scotsman", ey? Anyway, I was always talking about real world examples, and real world examples are a combination of a real-world problem with a real-world compiler. Just as DJB's argument (and my similar argument for JITs, see "Jitterdämmerung"[3]) is at least in part about the real world not matching the expectations of the JIT/optimizer engineers.

In theory, all this works super swimmingly and is totally amazing, and yes you can construct benchmarks where it does. In practice it is less useful, at least for the vast majority of users, because either they don't need it or they need more. And this is the crux of the misunderstanding: the point here is not that compiler's can't do certain things (they can), but that those things are, for the most part (not universally) either: (a) not needed (RoR, ...) or (b) not enough (my examples).

Which is why the counter of "they do" and giving various examples of compilers doing amazing things is completely pointless, as it only show you're completely missing the point.

Now there obviously are specialized use-cases where this is not true, where all of this actually works and those capabilities match the requirements. While I have no insight into Google's operations, my mental model is that theirs is one of those specialized use cases, which is why DannyBee is so adamant that this all works perfectly. In his world, it does. And it's super-sophisticated, super-cool rocket science and he is a super-smart engineer.

However, not everyone is Google. In fact, I contend that Google is very much an outlier in terms of requirements and capabilities.

> He says there exist production compilers that do.

And he hasn't shown a single one that does, partly because he so far hasn't even grasped what "do" in this case is (and continuously transformed my statements into straw-men), which is why he has thrown out this barrage of unnamed compilers ("I hear things", "I see things") that do random stuff that is at best marginally related to what I've described.

Put up or shut up.

Marcel

[1] https://link.springer.com/chapter/10.1007/978-1-4614-9299-3_...

[2] https://www.slideshare.net/MarcelWeiher/in-processrest

[3] http://blog.metaobject.com/2015/10/jitterdammerung.html

tptacek · on June 3, 2017

This is a lot of words repeating the same things you've said previously. There's no Scotsman in these arguments. He runs compiler infrastructure at Google and says he's seen production compilers doing stuff you say is impossible. You say, essentially, he's lying. Do you have anything else to say?

mpweiher · on June 3, 2017

> There's no Scotsman in these arguments.

Yes there is: "ohh, gcc, that's not a true optimizing compiler". And of course so far refusing to show any other real compiler that does all these magical things.

> He runs compiler infrastructure at Google

Yes, I know. So? Argument from perceived authority much?

> says he's seen production compilers doing stuff you say is impossible.

Actually, I haven't seen him claim that. He has claimed all sorts of things that he has seen optimizing compilers do that have very, very tangential (at best) similarities with some of the keywords I mentioned, and has claimed that as a rebuttal to my claim, which it is not.

> You say, essentially, he's lying.

If he actually claims what you say he's claiming then: yes, he's lying. I haven't seen him claim that, but I have seen him use these fairly random claims to rebut some very specific claims I have made.

And of course, if there are compilers that do exactly what I've described (transform the two systems I mentioned in the way I described), then I am absolutely happy to eat crow. While anything is possible, in theory, this would be so far beyond the current state of the art, that we could probably retire >90% of programmers right now (assuming that a small fraction of that capability were also applied to other fields of programming). And seeing the consumer-visible parts of Google, those compilers would be so super-secret that they are not used on any of that stuff.

The code is out there, somebody run it through the compiler and prove what a dunce I am.

Put up or shut up.

Still waiting.

tptacek · on June 3, 2017

Now you're transparently making my point for me. He's not saying gcc does any of this. You're beating up a straw man.

It's OK not to know everything about your field, or even to be wrong sometimes. But find a less obnoxious way to handle those situations.

tom_mellior · on June 3, 2017

> Yes there is: "ohh, gcc, that's not a true optimizing compiler".

I searched this whole thread for mentions of GCC, and I didn't find anything like this. Are you referring to the part about having to use a different language than C or C++ if you want this kind of optimization?

(I agree with you that "look, there is an old paper describing a research compiler in a research setting on self-chosen benchmarks" is not the same as production compilers doing the same thing in practice.)

coldtea · on June 1, 2017

>Except he has no data to show this, and every piece of data i have (and my colleagues have), says that he is wrong.

That's because those data only represent quantitative statistics on code, not whether said code would get great benefits if the programmer actually bothered to make algorithmic adjustments.

DannyBee · on June 1, 2017

Why do you assume we haven't measured what optimal looks like?

oconnor0 · on June 1, 2017

I think one of the issues here is what compiler optimizations encompass. In a language like C or C++, you may be able to write optimal algorithms but end up with poor performance because of memory access. An unoptimized compile puts almost everything on the stack - a = b + c involves two loads, one add, and one store in an unoptimized compile. All of that extra memory access is going to kill performance; even if you have optimal data structures and algorithms.

I see "optimizations are bad/unnecessary/problematic/whatever" in my job writing a compiler. The underlying issue is almost never "I don't want optimizations" - even when someone thinks that's what they want; it's "I don't want optimizations that produce unexpected behavior".

dom0 · on June 1, 2017

> You may be significantly underestimating your ability to get phenomenal gains from algorithmic improvements.

Implementation differences in the exact same, say, simple linear algorithm, can still add up to a tenfold difference. Compiler optimisations can take it further, though some are not generally applicable. E.g. I have made good experiences with PGO (careful analysis and/or knowing your workload is needed), but it's obviously not an option for the typical open source software distribution model.

Obviously, combining a better algorithm with a better implementation that is optimized will be fastest.

dagw · on June 1, 2017

You may be significantly underestimating your ability to get phenomenal gains from algorithmic improvements.

My HPC lecturer used to say that the speed difference between bad Matlab code and good Matlab code will often be greater than the speed difference between good Matlab code and good Fortran code.

Tweaking the hell out of your code at the assembly level is a waste of time if you're using a naive O(2^N) algorithm.

sllabres · on June 1, 2017

You mean O(n^2). 2^n looks a more than 'naive' for most cases.

gpderetta · on June 1, 2017

... unless N is small.

Sean1708 · on June 1, 2017

> You may be significantly underestimating your ability to get phenomenal gains from algorithmic improvements.

Absolutely but often it's very hard to make those changes or those changes lead to a loss of accuracy, whereas compiling with -O3 is very easy and leads to no loss in accuracy (in the common case).