Bad, vile and meaningless: Pitch Shifter from Alan's clob

This is, like, heavy, man

I found Debian package tap-plugins that contains Tom Szilagyi's audio plugins for various purposes and one of them happened to be a pitch shifter effect. A pitch shifter's purpose is to change the frequency of the sound without changing the duration of the sound.

Mr. Szilagyi's version is based on concept of short audio granules which are taken from the original sound data and mixed on top of each other at the desired rate. There are 3 such granules at all times, each reading sample data at regular intervals from recent audio history.

For instance, to resample audio data at 50 % of the regular frequency, that is, one octave down, you'd insert 3 readers spaced apart, and then each of them would read half a sample in each cycle of output (linear interpolation is used to come up with samples at fractional time delays).

Because the edges of the granules are almost never matched, each granule needs to employ a windowing function. Tom uses squared 1-cosine window, which has the following properties:

f(0) = 0
f(1/2) = 1
f(1/4) = 1/2
f(x) = f(1-x),         0 <= x <= 1/2
f(x) = 1 - f(1/2 + x), 0 <= x <= 1/2

All in all, the whole pitch shifter core is the following piece of code:

while (samples_to_go--) {
        /* store a sample into history buffer */
        params->history->add(params->history, *insamples++);

        tmp = 0;
        phase_tmp = params->phase;
        for (i = 0; i < 3; i += 1) {
                tmp += sin(2 * phase_tmp * M_PI) *
                            depth * (dir ? phase_tmp : 1 - phase_tmp));
                phase_tmp += 1.0 / 3;
                if (phase_tmp >= 1.0)
                        phase_tmp -= 1.0;
        *outsamples++ = tmp * some_scaling_factors;

        params->phase += phase_inc;
        if (params->phase >= 1.0)
                params->phase -= 1.0;

The "amount" of shifting is fully contained in the "depth" and "dir" variables. The dir variable firstly determines whether shifting is happening up or down. The idea is that if we need to shift down, we need to reuse earlier pieces of data in history buffer, and to progress down one octave we ask for sequence in history at points of time such as 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, each sample dwelling further into the history offset. However, it's worth remembering that the history progresses forward 1 sample at every cycle due to the add() call, so in reality we are progressing forward in time with half the speed. Similarly, when resampling octave up, the sequence becomes 1, 2, 3, 4, 5, 6, but due to the additional rate caused by add() the sequence is in reality double that. Finally, depth = 0 gives regular playing speed and in fact the output becomes identical to the dry signal.

Finally, the output is slowly modulated by static phase variable that cycles from 0 .. 1 at 6 Hz.

Problems of this approach and possible improvements

I determined that 3 granules per sample is too much. The coloration of sound is immense: there are phase cancellation effects which---at select sampling frequencies and shifting rates---fully eliminate frequency components of the input, much like a static phaser or comb filter. Ouch. Perhaps even worse, at extreme upsampling there's a nasty "echo" effect because 3 granules widely spaced apart all read the same sample data and mix it into output. However, 2 granules per sample is not enough -- now the problem is that the phase effects cause oscillating tremolo, and it's even worse.

Finally, Tom claims that his effect could be used to apply slight finetuning corrections but this is definitely not the case. The phase cancellation effects are simply overpowering at small finetuning settings! It might pass inaudibly for human vocals that are mostly silent, but for steady guitar tone it's just way too apparent. (Again, it's a bit like phaser or flanger.)

I believe I will be able to improve this effect in the weeks to come, although I'll be randomly twiddling it unless I get some new idea of a better approach.