So, in this case (on a tiny microcontroller, not precision equipment), my main optimization factor was speed over everything else. I don't think this is a universal answer, and as other commenters have mentioned, there are other ways to approach this problem!
As to a couple of common questions:
* Why didn't I just use a 1/4 sine LUT? - Because the full sine lut was only 512 bytes, and I had that to spare, which saves me some math per-cycle. I would also hit diminishing returns on a more precise LUT.
* Why didn't I use (some other method)? This one was "good enough", only took 22 CPU cycles per iteration, and linear interpolation was only 5 or so of those 22. Alternatively: I am bad at complex math! But I have worked with fixed point algos before, so I wrote what I knew.
So, in this case (on a tiny microcontroller, not precision equipment), my main optimization factor was speed over everything else. I don't think this is a universal answer, and as other commenters have mentioned, there are other ways to approach this problem!
As to a couple of common questions:
Happy to answer any other questions!