I was searching for information on hardware performance counters, and came across this PDF. Its written with a decent bit of humor, but it also covers a lot of information about the lowest-of-low level programming.
If you ever hit the absolute need to write as efficiently as possible, it becomes important to start to count the precise number of cache-hits / cache misses, and other such hardware events in your program. This PDF is a survey of what modern CPUs can offer with respect to hardware counters.
I hadn't seen this before (thanks) and haven't been through it, but it's doubtless up to INRIA's usual standards. However, I think "absolute need" is underselling the importance. Cache performance can account for factors of several in computational work if you get affinity wrong, and relevant counters are usually in the default initial set for profiling. It's just that what's available is highly hardware-dependent, and finding out what some counters -- especially SIMD ones -- really measure can be a challenge :-(.
If you ever hit the absolute need to write as efficiently as possible, it becomes important to start to count the precise number of cache-hits / cache misses, and other such hardware events in your program. This PDF is a survey of what modern CPUs can offer with respect to hardware counters.