A LUT could be a significant performance penalty would it not? Instead of a float8 (potentially multiple in simd case) in a register, you’re now having to head out to at least L1 cache to dereference the value in the LUT.
Plain uint8 wouldn’t allow for the same accuracy range as float8 and it’s the accuracy not the precision (which uint would win for the largest values it can represent) that counts most.
Oh oh was just gonna comment as well, but saw this! I think x86 has like pshufb for LUTs (used them like ages ago, but forgot now :() I think also some game (was it Spiderman) used loads of lookup tables.
The issue with LUTs is don't you have to update the LUT itself? You can select which memory address to load up, but the LUT itself has to be differentiable maybe? TBH I'm not an expert on LUTs.
On fixed point - similarly ye you have to fix the precision ranges as well, so again I'm unsure on how one changes the fixed point numbers over time. I'll have to read more on fixed point.
Maybe 1.58bit using (-1, 0, 1) which gets rid of multiplications and just additions might be more useful, although you'll only get a 2x FLOP boost since you still need fp8 or fp16 addition.
Plain uint8 wouldn’t allow for the same accuracy range as float8 and it’s the accuracy not the precision (which uint would win for the largest values it can represent) that counts most.