Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you have variable length SIMD, you can always treat them as fixed-size SIMD types.

New x86 processor don't executes 128-bit SIMD, the vecto ALUs are all wider now and 128 and 256-bit instructions have the same throughput and latency.

Also, do you have an example for such "opportunistic" usages?

I suppose mainly things the SLP vectorizer can usually do already (in compiled languages, I'm not sure how good the JIT is these days).

I worry that we now may end up in a world, where "hand optimized SIMD" in WASM ends up slower than autovectorization, because you can't use the wider SIMD instructions and leave 2x (zen4) to 4x (zen5) of the performance on the table.



> Also, do you have an example for such "opportunistic" usages?

The simplest example would be copying a small number of bytes (like, copying structs). Vector instructions generally have a higher setup cost, like setting so it can't really be used for this purpose. Maybe future vector instructions have no such caveats and can be used as like SIMD, but AFAIK it's not yet the case even for RISC-V's V extension.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: