On a 64-core machine, Python code that uses all the cores will be modestly faster than single-threaded C, even if all the inner loops are in Python. If you can move the inner loops to C, for example with Numpy, you can do much better still. (Python is still harder to get right than something like C or OCaml, of course, especially for larger programs, but often the smaller amount of code and quicker feedback loop can compensate for that.)
I strongly doubt this claim. Python is more than 64x slower than C without synchronization overhead in most numeric tasks, with synchronization overhead on those processes it should be much worse.
Python is so much slower than any native or JIT compiled language that it begets things like numpy in the first place.