A little fixed point math for embedded audio

jamesmunns · on April 29, 2022

Oh hey! Author of the blog post here.

So, in this case (on a tiny microcontroller, not precision equipment), my main optimization factor was speed over everything else. I don't think this is a universal answer, and as other commenters have mentioned, there are other ways to approach this problem!

As to a couple of common questions:

    * Why didn't I just use a 1/4 sine LUT? - Because the full sine lut was only 512 bytes, and I had that to spare, which saves me some math per-cycle. I would also hit diminishing returns on a more precise LUT.
    * Why didn't I use (some other method)? This one was "good enough", only took 22 CPU cycles per iteration, and linear interpolation was only 5 or so of those 22. Alternatively: I am bad at complex math! But I have worked with fixed point algos before, so I wrote what I knew.

Happy to answer any other questions!

willis936 · on April 29, 2022

The symmetry of the waveform can be exploited for additional precision for a similar memory and computational footprint. Make the 256-entry LUT of half of a sine lobe rather than a full period then do some bookkeeping to track where in the period you are for reflections of LUT index and y value.

Example (1-indexed):

   f  = 1000;
   tPer = 1 / f;
   ts = 1 / 48000;
   LUT = int16( 2^15 * sinpi( (1:256) / 256 / 2 ) );
   
   t = 0;
   while true
       t = mod( t + ts, tPer);
       i = uint16( floor( t * f * 1024 ) );
       
       if i <= 256
           y = LUT(i);
       elseif i <= 512
           y = LUT(257-mod(i,257));
       elseif i <= 768
           y = -LUT(mod(i,257));
       else
           y = -LUT(257-mod(i,257));
       end
   end

I left out interpolation to keep the concept highlighted.

Unsurprisingly: it's faster and more accurate to skip the LUT when using MATLAB. It's faster still to use vectors and batched function calls.

jamesmunns · on April 29, 2022

Mentioned more details about this in a top level comment - but the total LUT is small (256 * 2 = 512 bytes), so I didn't 1/4 it because that would introduce more per-cycle cost (to do the rotation), to save a few hundred bytes. This was an okay tradeoff for me, but this might be a bigger deal for larger LUTs, or more precise values than 16-bit PCM samples.

Additionally my error max was already enough, where adding 4x the LUT precision would be diminishing returns for my application.

jerrre · on April 29, 2022

> I also checked my approximation against the "real" floating point sine operation, and found a maximum error of 0.012% for any i16 value, which is more than close enough for my ears!

0.012% error ~= -78dB to put it in respect with other audio stuff, which in most cases should be more than low enough. For reference the dynamic range of 16bit audio is around 90dB

jamesmunns · on April 29, 2022

Thanks! That's cool to know.

I did my error calculations by brute forcing the whole 16-bit range, basically:

    for i in (i16::MIN..=i16::MAX) {
        let a = (i as f32).sin() as i16;
        let b = lut_sin(i);
        let diff = abs_diff(a, b);
    }

Which found the largest difference to be 4, and I got 0.012% from (4/32768).

tails4e · on April 29, 2022

I'm not near a computer to check, but a 256 entry look up table, would have 1.4 degrees resolution, so using this I believe the difference between sin of 0 and sin of 1.4 is about 2%, so the error would be +/- 1%. Using a quarter wave table reduces the error. By approx 4, but this still. Means 0.25% error. Or am I way off?

Gordonjcp · on April 29, 2022

The code isn't particularly clear (nothing written in Rust is) but it looks like the output is using linear interpolation to compute the values in between steps.

If you use a lookup table then you're approximating a sine as a bunch of piecewise linear segments with surprisingly little error, and linear interpolation works well on those. You don't have any harmonics to worry about and the curve is incredibly smooth, so there are no odd effects to take into account.

A while ago I wrote some Python code to demonstrate this effect in calculating frequency from pitch in synthesizers by interpolating a table of semitone steps. The error was negligible up to around C6, which is the top key of most 5-octave keyboards, and only a handful of Hz at C8 which is higher than most folk need. I can't find the code now but when I do I'll post it.

jamesmunns · on April 29, 2022

I sort of disagree that this isn't clear, but maybe Rust has warped my brain! I did try and comment my intent along the way, but if anything was unclear, I'd be happy to explain.

tails4e · on April 29, 2022

Thanks, didn't see that, but it makes sense.

willis936 · on April 29, 2022

This is a worst case estimate. Quantization error can be reduced with higher sample rates and better output filters. Not to zero, of course, and more bits is better, but there are tradeoffs.

chris_overseas · on April 29, 2022

In the code you'll see there is also linear interpolation being performed between the LUT entries, hence the reduced error compared to just using the 256 LUT entry values directly.

tails4e · on April 29, 2022

Thanks, I missed the linear interp part. I might look into this more for DDS, would be interesting to see how small a lut could be for 90db performance.

zokier · on April 29, 2022

Found interesting post that implements computed approximation, apparently it was faster than interpolated lut in his case.

https://hackaday.io/project/28597-the-delta-flyer/log/145736...

willis936 · on April 29, 2022

Wouldn't cubic interpolation have less error at the cost of performance?

jamesmunns · on April 29, 2022

Likely! But as mentioned in my top level comment, performance was the key metric I was going for here :)

jcelerier · on April 29, 2022

I once met someone who was consistently, 100% of the time, able to pick the correct answer in a blind test comparing puredata and max/msp's sine wave generator (iirc one uses a wavetable and the other uses math.h sin) ; would be interesting to see how low would be the difference

willis936 · on April 29, 2022

For constant frequency oscillators in code I find myself doing this: use the lowest precision acceptable then compute:

   t = mod( t + ts, tPeriod );
   x = sinpi( t * f * ts );

Maybe it's not as fast as a LUT, but I've never needed high performance and generally work with higher precision data where a LUT would be unreasonable.

MATLAB example to be more explicit:

   f = double(1000);
   tPeriod = 1 / f;
   ts = double(1 / 48000);
   t = double(0);
   while true
       t = mod( t + ts, tPeriod );
       x = sinpi( t * f * ts );
   end

djmips · on April 29, 2022

"These approaches are widely used in high performance audio equipment, but I hadn't found too many clear examples to use when writing my code." - nice to see an example in Rust!

jamesmunns · on April 29, 2022

Thanks! I appreciate the kind words.

maep · on April 29, 2022

A commonly used way to create an discrete sine wave efficiently is to use an IIR oscilator. If it needs to be long running, use a real sin to correct for rounding errors every N samples. If you're using something like Q30 fixed point that correction doesn't have to happen that often.

https://dspguru.com/dsp/tricks/sine_tone_generator/

Sesse__ · on April 29, 2022

The IIR oscillators quickly deterioritate, though. A more accurate way is to do something CORDIC-inspired: Start with a 2D point in [1,0]. Every iteration, you rotate it by (2pi f T) radians (where f is the frequency you want, and T is 1/48000 or whatever). This requires you to have precomputed sin and cos of those values for the rotation matrix, but that can be done once, up-front, as long as you don't need to change the frequency. If you do this for every sample, x will trace out your cos() and y will trace out your sin().

Accuracy errors are easy to account for in this scheme; just renormalize the vector so that the length is 1. It's cheaper and less code than doing a sin().

By the way, for linear interpolation (if you wish to keep the table), usually x + (y-x)*t is faster than x*(1-t) + y*t.

Bermion · on April 29, 2022

Depending on the frequency you want, perhaps an option is to solve an oscillating ode? Each solution step gives you an audio sample. So instead of increasing the accumulator you calculate the next sine value directly from the previous (and its derivative)

martincmartin · on April 29, 2022

Does anyone know of a good fixed point library, preferably for C++? Fixed point seems straight forward enough that people tend to just roll their own, but it would be good to have a proper API to make the resulting code easier to read.

jamesmunns · on April 29, 2022

At least in rust, the fixed crate[0] is probably the standard for this. Personally I chose not to use it because integer generics in rust are not quite powerful enough to express this without falling to type system hacks, at least in the stable version of the compiler.

However in C++, you might be able to mimic the API more clearly, as templating is able to do integer things that Rust can't yet easily today.

[0]: https://docs.rs/fixed/latest/fixed/

sobriquet9 · on April 29, 2022

The look-up table can also be generated on the device itself using CORDIC.

jamesmunns · on April 29, 2022

That's totally fair! In fact, this device DOES have a floating point unit, but for some reason, even lossy approximations of trig functions were much slower than I wanted.

Since the LUT was constant, doing it quickly on my PC made sense, but for a more dynamic approach, that would make sense too.