Formant-Shifting: How It Works with Pitch-Shifting
I have heard about formants and formant-shifting from time to time over the past few years, but I never really thought it was a big deal until last week.
I pitched-up some vocals by a few semitones in Ableton Live and they sounded quite squeaky and unnatural.
I read that warping the audio in Ableton using the Complex Pro algorithm can help to make a pitched-up vocal sound more natural. I tried that, and it really did the trick.
The Complex Pro warping algorithm has a control to adjust the formants, and it was this that made the vocal sound (a lot more) like a real person singing at the new pitch.
So after several years of idle curiosity, I decided to learn more more formants, formant-shifting, and how this works with pitch-shifting.
What Are Formants?
Formants are the resonant frequencies of a person’s vocal tract that give their voice its unique timbre.
Your spoken and sung voice is generated by vibrations in your vocal cords (sometimes called vocal folds). Your vocal cords are located inside your larynx, or voice-box.
Within the sounds produced by your vocal cords there are particular frequencies that are amplified, and emphasised, by the shape of your vocal tract.
The length and width of your throat and mouth (your vocal tract) cause these specific frequencies to resonate, which contributes to the timbre of your voice, providing its unique characteristics.
The formants are also responsible for producing our vowel sounds.
Here’s a really interesting thing about formants. Since it’s your vocal tract that causes these frequencies to resonate, the formants are more or less the same regardless of the note-pitch you are singing.
Formants Are Overtones
The frequencies we are talking about here are harmonics, or overtones, which will be higher in frequency than the fundamental pitch begin sung or spoken.
Since your voice is made up of a number of different frequencies (fundamental, overtones, noise, etc), it also contains a large number of formants.
However, only a small number of the formants are relevant since the frequencies of all the others are too high for us to hear.
The most important formants for the sound of your voice are the two with the lowest frequencies.
These first two formants (F1 and F2) are responsible for the vowel sounds, and the higher formants are responsible for the overall tone, or timbre, of your voice.
The image below from Wikimedia shows the fundamental frequency of a person’s voice, with the important first and second formant frequencies above it.
The image represents the way you would see frequencies in the display of an EQ plugin.
The best resource I found online is the excellent video below, which I encourage you to watch if you want to learn more about formants.
In audio production, the situation where formants, and formant shifting, becomes important is where vocals are being pitch-shifted.
Vocal recordings shifted up or down in pitch can sound strange and unnatural unless the formants are also modified.
Sometimes this unnatural effect is what you want to achieve, but other times you are looking for a more natural vocal sound.
How Does Pitch-Shifting Work?
First, a little bit of background on pitch-shifting.
One way to pitch-shift audio is to change the playback speed of the recording.
When you play a recording at a slower speed, the pitch of the audio will become lower, and when you play it at a faster speed the pitch will become higher.
However, changing the playback speed will also change the tempo of the recording, which isn’t what we are thinking about here.
We are focusing on pitch-shifting as the process of changing the pitch of an audio signal without changing its tempo or duration.
This can be done in a variety of ways, but the most common method is to use digital signal processing algorithms to manipulate the audio waveform.
When you change the pitch of an audio recording, the formants will also shift, which can make the audio sound unnatural.
Formant-preserving pitch-shifting attempts to preserve the original formants of the audio signal while changing the pitch.
As I mentioned earlier, normally the formants (resonant frequencies) are the same regardless of the note pitch the person is singing (or speaking). So when the pitch is changed you have to adjust the formants to regain the natural sound.
How Does Formant-Shifting Work With Pitch-Shifting?
Formant-shifting alters the timbre or color of the sound by adjusting the frequencies of the formants. In this case the sound being adjusted has also been pitch-shifted, either up or down.
Formant-shifting adjusts the formant frequencies up or down so that they fit the pitch of the pitch-shifted audio.
For example, if you shift the pitch of a vocal recording upwards then all of the frequencies within the sound are raised.
The sound of this upward pitch-shifted vocal is sometimes describes as sounding like a chipmunk (after a cartoon character with a pitch-shifted voice).
To get the vocal sounding natural again you need to adjust (or shift) the formant frequencies so that they fit with a voice singing (or speaking) normally at this new pitch.
The formant-shifting software is able to identify the formant frequencies so that they can be adjusted to taste.
As I understand it, the ability to focus on the formants was originally intended for formant-preservation while shifting pitch.
Formant preserving pitch-shifting attempts to preserve the original formants of the audio signal while changing the pitch.
For example, pitch-correction software like Auto-Tune or Melodyne from Celemony aims to correct to the pitch of a note while leaving the overall sound quality unchanged for natural tuning.
You can read more about Auto-Tune pitch-correction and how it works in another article on the website.
How Does Formant-Shifting (or Preservation) Work?
Finding an explanation of the formant-shifting (or preservation) process that I can understand has been difficult.
I found one explanation of formant-shifting on Reddit, and I have picked out some key points here.
Formants are not really sounds in themselves, just resonances – sections along the frequency spectrum that get boosted more than others.
When pitch-shift is applied, the fundamental pitch together with the overtones (including formants) get shifted too.
There are different methods for implementing formant-shifting, but they all seem to involve analyzing the audio signal to identify the formants,
To correct formants after pitch-shifting, you have to change the shape of the pitch-shifted audio overtones, including the formants, to match the shape of the original signal. This would then help to make it sound natural again.
Although we were thinking about making pitch-shifted audio sound natural, the ability to manipulate formants opens up some exciting sound manipulation possibilities.
When you alter the formants of a sound, you change its timbre, making it sound brighter, darker, more nasal, or more throaty.
By adjusting the formants, you can change the perceived gender, age, or size of a voice, or the type of instrument being played. This can be useful in creating harmonies, or adding depth to a mix, by duplicating the lead vocal several times and altering the pitches.
You can also make a male voice sound like a female voice, or one musical instrument sound like another.
In the video above there is a demonstration of making a basic synth patch sound as if it’s making human vowel sounds. This is done by manipulating the formants in the sound.
One of the most popular pitch-shifting audio effects plugins is the Little Alter Boy from Soundtoys.
You can see how the plug allows you to link pitch adjustments with matching formant adjustments to help keep the pitch-shifted audio sounding natural.
You can disable the link feature if you want to produce less natural, but possibly more interesting sounds if that’s what you are looking for.
A Lot to Learn About Pitch and Formant-Shifting
There seems to be a lot to learn about pitch (and time) shifting of audio, together with the effect this can have on formants within the sound.
Learning more about formant manipulation has made me really interested in using this to change the quality of vocal performances, rather than as part of pitch-shifting.
While reading up on this, I noticed that Melodyne lets you adjust formants on a note-by-note basis, and this looks really interesting. So that’s where I’m heading first.