KYLXBN

Optimizing compression of MDCT coefficients via quantization is kinda hard to get, but it's one of the things I always try to optimize because my lossy audio codec needs it.

One of the approaches that my codec takes is band scaling. It splits bins 0-63 into 4 bands, and normalizes them to a range of $[-1,\,1]$ .

For today's experiment, here's what I learned.

How does the typical MDCT frame look like anyway?

To answer that, we first have to know what to optimize in the first place.

I prepared a bunch of 30-second audio tracks from lossless sources, converted them to lossless 16-bit mono audio at 32kHz sampling rate. I ran them through the codec's typical 256-sample sine window, to the 128-sample MDCT frame. And I averaged every frame across every file. Here's the average of their magnitudes:

So how do we determine the range of each band?

That's easy. Since the codec stores bins 64-128 differently, the actual bands will only cover bins 0 to 63. Just split them into 16 bins each.

Unfortunately, that's not right. The DC bin (bin 0) is extremely large. If we placed bins 0 to 15 into band 1 and normalized it, then bins 1 to 15 would be extremely small, and quantization will probably quantize it to zero, which we do not want. So finding the best position to split the band is not that easy.

First, I tried to just "minimize the dynamic range", but that resulted in band boundaries of 11, 29, and 48. Having the first band occupy 0-10 is not ideal, because since bin 0 is huge, it makes the bins ~3-9 too small which results in them being quantized to zero.

So my next approach was to add A-weighting to make it more perceptual. This resulted in noticeably better sound quality, since less bins are getting quantized to zero:

Now, after normalizing it, we get bins that are farther from zero, meaning they are slightly more robust.

Okay, does the codec always run at 32kHz SR?

Nope. It can run on any arbitrary sample rate. And as the sample rate increases, the frequency of each band also shifts, completely ruining the psychoacoustic adjustments.

I will switch to ERB later on and use anchored cutoff instead.

Optimizing compression of MDCT coefficients via quantization is kinda hard to get, but it's one of the things I always try to optimize because my lossy audio codec needs it.

One of the approaches that my codec takes is band scaling. It splits bins 0-63 into 4 bands, and normalizes them to a range of $[-1,\,1]$ .

For today's experiment, here's what I learned.

How does the typical MDCT frame look like anyway?

To answer that, we first have to know what to optimize in the first place.

So how do we determine the range of each band?

That's easy. Since the codec stores bins 64-128 differently, the actual bands will only cover bins 0 to 63. Just split them into 16 bins each.

So my next approach was to add A-weighting to make it more perceptual. This resulted in noticeably better sound quality, since less bins are getting quantized to zero:

Now, after normalizing it, we get bins that are farther from zero, meaning they are slightly more robust.

Okay, does the codec always run at 32kHz SR?

Nope. It can run on any arbitrary sample rate. And as the sample rate increases, the frequency of each band also shifts, completely ruining the psychoacoustic adjustments.

I will switch to ERB later on and use anchored cutoff instead.

Menu

Articles

Optimizing MDCT Magnitude Flatness

How does the typical MDCT frame look like anyway?

So how do we determine the range of each band?

Okay, does the codec always run at 32kHz SR?

Optimizing MDCT Magnitude Flatness

How does the typical MDCT frame look like anyway?

So how do we determine the range of each band?

Okay, does the codec always run at 32kHz SR?

Menu