t-tests run afoul of the “no Gaussian assumptions”, though. Distributions arising from benchmarking frequently has various forms of skew which messes up t-tests and gives artificially narrow confidence intervals.
(I'll gladly give you credit for your outlier detection, though!)
>> Automatic isolation to the greatest extent possible (given appropriate permissions)
> This sounds interesting. Please feel free to open a ticket if you have any ideas.
Off the top of my head, some option that would:
* Bind to isolated CPUs, if booted with it (isolcpus=)
* Binding to a consistent set of cores/hyperthreads (the scheduler frequently sabotages benchmarking, especially if your cores are have very different maximum frequency)
* Warns if thermal throttling is detected during the run
* Warns if an inappropriate CPU governor is enabled
* Locks the program into RAM (probably hard to do without some sort of help from the program)
* Enables realtime priority if available (e.g., if isolcpus= is not enabled, or you're not on Linux)
Of course, sometimes you would _want_ to benchmark some of these effects, and that's fine. But most people probably won't, and won't know that they exist. I may easily have forgotten some.
On the flip side (making things more random as opposed to less), something that randomizes the initial stack pointer would be nice, as I've sometimes seen this go really, really wrong (renaming a binary from foo to foo_new made it run >1% slower!).
> On the flip side (making things more random as opposed to less), something that randomizes the initial stack pointer would be nice, as I've sometimes seen this go really, really wrong (renaming a binary from foo to foo_new made it run >1% slower!).
t-tests run afoul of the “no Gaussian assumptions”, though. Distributions arising from benchmarking frequently has various forms of skew which messes up t-tests and gives artificially narrow confidence intervals.
(I'll gladly give you credit for your outlier detection, though!)
>> Automatic isolation to the greatest extent possible (given appropriate permissions) > This sounds interesting. Please feel free to open a ticket if you have any ideas.
Off the top of my head, some option that would:
* Bind to isolated CPUs, if booted with it (isolcpus=) * Binding to a consistent set of cores/hyperthreads (the scheduler frequently sabotages benchmarking, especially if your cores are have very different maximum frequency) * Warns if thermal throttling is detected during the run * Warns if an inappropriate CPU governor is enabled * Locks the program into RAM (probably hard to do without some sort of help from the program) * Enables realtime priority if available (e.g., if isolcpus= is not enabled, or you're not on Linux)
Of course, sometimes you would _want_ to benchmark some of these effects, and that's fine. But most people probably won't, and won't know that they exist. I may easily have forgotten some.
On the flip side (making things more random as opposed to less), something that randomizes the initial stack pointer would be nice, as I've sometimes seen this go really, really wrong (renaming a binary from foo to foo_new made it run >1% slower!).