Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, but a direct `exp` implementation is only like 10-20 FMAs depending on how much accuracy you want. No gathering or permuting will really compete with straight math.


With AVX-512 one can have a 128-byte table with one vector of lookups produced each cycle :)


Yes, I have an AVX512 double precision exp implementation that does this thanks to iperm2pd. This approach was also recommended by the Intel optimization manual -- a great resource.

I just went with straight math for single-precision, though.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: