Very nice! I wasn't aware of this algorithm at all, but it seems to be an early application of the lifting scheme to the DCT filter, 8 years before this method was introduced in the more general context of perfect reconstruction filters.
Interestingly enough, Sweldens does not seem to cite Loeffler's paper, so it's likely that they came up with the same method independently of one another.
Doing DCT the naive way, the formula is as easy as the DFT. But computing the DCT quickly (like in O(n log n) time) is much harder to understand than the FFT.
I read a couple of papers and implemented the fast DCT algorithms. The Arai, Agui, Nakajima 8-point DCT uses 13 multiplications. The Lee DCT algorithm is recursive and works on any length that is a power of 2. https://www.nayuki.io/page/fast-discrete-cosine-transform-al...
I'm always sad that scipy and numpy do not use good FFT implementations due to ridiculous political concerns. However, in that case, there was a very happy outcome with this beautiful article!
I Googled around a bit and didn't find anything on political concerns other than the GPL thing (which would cover linking, not reimplementation of math). Is that the political concern in question or is there also something else?
Yes, I was referring to that. They refuse to use the standard and state-of-the-art FFTW3 library because it's distributed under the GPL. (Actually, it's dual-licensed GPL/commercial).
There's a great expository paper about these kinds of filter implementations by Daubechies and Sweldens in the general case: https://9p.io/who/wim/papers/factor/factor.pdf
Interestingly enough, Sweldens does not seem to cite Loeffler's paper, so it's likely that they came up with the same method independently of one another.