Elixir, erlang, julia make using the smp of your system easy and natural. Go (you have to set up a bunch of things like channels and maybe worry about cleaning them up) and Java if you don't mind a bit of struggle.
If you want low level you can use c, c++ which are both dangerous, rust, zig (I just did some multithreaded stuff in zig and it's fantastically easy).
1) A bit about what I'm doing with zig. I am working on an FFI interface between elixir and zig, with the intent to let you write zig code inline in elixir and have it work correctly (it does). https://github.com/ityonemo/zigler/. Arguably with zigler it's currently easier to FFI a C library than it is with C (I'm planning on making it even more easy; see example in readme).
2) The specific not-in-master-branch feature I'm working on now is running your zig code in a branched-off thread. Fun fact about the erlang VM: if you run native code it can cause the scheduler to get out of whack if the code runs too long. You can run it in a "dirty FFI" but your system restricts how many of these you can run at any given time. A better choice is to spawn a new OS thread, but that requires a lot of boilerplate to do and it's probably easy to get wrong. Making it also be a comprehensive part of erlang's monitoring and resource safety system is also challenging, and so there's a lot to do to keep it in line with Zigler's philosophy of making correctness simple.
3) Zig does have its own, opinionated way of doing concurrency. I honestly find it to be a bit confusing, but it's new (as of 6 months) and is not well documented. I believe the design constraints of this are guided by "not having red/blue functions", "being able to write concurrent library code that is safe to run on nonthreaded/nonthreadable systems"
4) The native zig way of doing concurrency is incompatible with exporting to a C ABI (without a shim layer) so I prefer not to use it anyways.
5) Zig ships with std.thread. I believe it's in the stdlib and not the language because some systems will not support threading. But since I'm writing something that is intended to bind into the erlang VM (BEAM), it's probably on a system that supports threading. Also I believe that std.thread will seamlessly pick either pthreads or not-pthreads based on the build target, which makes cross-compiling easy.
6) So yes, figuring this all out is not easy (zig is young, docs are not mature), but once you figure out what you're supposed to do, the actual code itself is a breeze, this is the code that I use to pack information to connect the beam to linux thread and launch it: https://github.com/ityonemo/zigler/blob/async/lib/zigler/lon.... I really hope the docs come with guides that will make this easy in the near future.
I'm relearning c++ right now just because I am building a poker solver for a toy project.
You are right btw, if 32 core cpus become common because of a race to the bottom in prices I imagine there will be a massive increase in demand for programmers with the experience to program massively parallel systems.
I'm enjoying the Rust ecosystem. Everyone writes programs with multi-threading in mind, because the language requires everything to be thread-safe anyway.
It only requires thread-safety for stuff that's actually being shared across threads. Which is even more important since it means you're not paying for thread safety via reduced performance where it isn't needed.
Yep, and with the ease of concurrency in Go and Rust, it'll be freakin' awesome. And hopefully, we'll get some novel security research in areas dealing with attacks against concurrency and parallel execution.
...but without a sentence or three about the reasons you think Rust is more suited for parallel workloads you are just giving Rust users a bad name again (see Rust Evangelism Strikeforce).
That's completely fair, and rather than take on why it handles parallel cycles better during workload better, I tend to take a different approach with Rust and parallelism. I enjoy the compiler and parallel work loads and the elimination (well, reduction, let's face facts) of safety concerns and data races. It isn't that these same things can't be done, and done as well, with other languages but the built in tool chain is done without extension with Rust. Now, there is a learning curve and a different programming paradigm (though not a radically different as I was warned), which I happen to enjoy. It won't be for everyone. No language is. The resources are there and free (online) and I do encourage people with some spare time to give it a spin, but I don't think it's the end of the world if people don't want to =)
That being said, I like it, but I tend to use Python more. I wish I had more of a chance to use Rust in my daily life, but I don't use it at work =/
Python is just about the worst major programming language to write high-performance code in. The only slower major language is Ruby.
If you're using Python to invoke highly-optimised native-code, then your performance will be excellent (as shown by the various Python numerical libraries), but performance-sensitive code shouldn't run in the Python interpreter.
As others have said, Python also lacks true multithreading (its threads are capable of concurrency but not parallelism, on account of the GIL), but you do have the option of just running a bunch of Python processes in parallel. I imagine that's a workable solution at least some of the time, but I've never explored this, so I don't know how good the library support is.
Edit: Someone else mentioned 'mpi4py' which seems to be a Python library for multi-process work.
Or python with mpi4py. MPI is the perfect multiprocessing paradigm for parallel python code, since you avoid the GIL. You can easily use MPI on a single workstation, or scale it to run on a supercomputer.
Python with asyncio and multithreading doesn't take good advantage of multiple cores due to the global interpreter lock. With multiprocessing, one pays big costs in IPC.
Python is not at all suitable today for parallelism. Which is one reason why languages like Go and Elixir are gaining so much traction.
Subinterpreters share a common GIL. Also subinterpreters don't share Python objects, which means among other things that all modules are imported separately in each SI, which increases startup time, memory usage and reduce cache effectiveness.
It's a band-aid. If you want to run Python code in parallel, without large overhead, then CPython is simply not your environment to do so, and Python is not a good choice overall in that kind of endeavour.
Elixir uses slow interpreted vm, you can't put it into the same high performance category and golang, java, c++ and rust can't do well on massively multicore systems on their own, you would have to fight them to do it, ditch idiomatic ideas and effectively build your own runtime and use your own concurrency model (not shared memory multithreading!). So they are more like every other language that is somewhat close to the primitives that OS and hardware provide, not actually well suited for the job itself.
Elixir is compiled. And as someone who worked in HPC, I would really not call golang high performance (by the criteria which people in HPC consider "high performance")
It's not like they're intentionally broken, just have whatever bugs come with being the first revision. These are the units they send out to journalists and big companies to do benchmarks, test software for, plan large scale rollouts, etc. Having them work the same as the retail units is pretty essential for that purpose.
That said, one of my units has a clock speed that doesn't match any of the retail models (I guess they didn't end up selling that model?), and another doesn't seem to work with threading (or whatever AMD calls it) enabled. But that's a small price to pay for the money saved.
What computer language will be using to take advantage of that?