Exploring the new audio features in Godot 4.0

Godot 4.0 was recently released with a massive number of new features and improvements. Even though there were not many significant changes or groundbreaking additions in the audio aspect, there are still some new things worth talking about.
Let's dive into them one by one:

Cleaner sound

With this release a significant chunk of audio processing logic has been moved to the AudioServer.
According to the developers this will pave the way for future improvements to make Godot's audio system more flexible and feature-rich.

This release also includes improvements to the audio resampling behaviour that now makes for fewer popping issues, artifacts, and race conditions.

Polyphony

Godot now features polyphony support which allows the stacking and repeating of the same sound multiple times on top of itself using a single AudioStreamPlayer node. This means that the sound instead of being retriggered from its beginning while playing, it will spawn other copies of itself and mix them together. This can make some sound effects more satisfying, such as gunfire or footsteps for example.

This new option named “Max Polyphony” can be found on the inspector when selecting an AudioStreamPlayer node:

An image showcasing the new max poliphony setting in Godot 4.0

By default Max Polyphony is set to 1, which basically means that no more than one instance of that sound can be playing at the same time. This essentially emulates the retrigger behaviour from the previous Godot versions. Setting a higher value means that the sound will be able to be instantiated multiple times, until the max number of polyphony voices defined is reached. Notice though that this stacking behaviour (Polyphony) only happens if the triggered sound hasn't reached its end while playing.

Note: Do not set this value higher than needed as it can potentially impact performance. This will probably only happen when using multiple AudioStreamPlayer nodes set with high polyphony playing at the same time, but it can be better to exercise some caution.

The AudioStreamPlayer node now displays the polyphony setting working by representing triggers as vertical bars on the inspector. (I used an animation player node for triggering the audio sample)

New audio stream importer window

Godot 4.0 has a new window titled "Audio Stream Importer". This window showcases the selected audio file with visual waveform representation.

The playhead can be scrubbed back and forth to audition any part of the file and the waveform can be zoomed in and out.
This new window also presents us with a loop enable checkbox, offset, and music playback settings: BPM (beats per minute) and time signature.

The music playback settings don't seem to do much for now. Apparently they were only added in this release for future interactive music support.

The new AudioStreamImporter Window in Godot 4.0

For now, the only relevant settings on this window are the loop option and the offset. The latter allows us to set a delay from the start of the file. This option only appears to work when loop mode is enabled and appears to achieve the same result as the "Loop Offset" present on the Import dock. This option can be used to eventually fix some bad loop point on a music track or to limit the loop to a smaller portion of the audio file.

I'll add my opinion here: It would be better if an “end” offset was provided as well in order to make this option more useful. It should also work for non looping files for the same reason, but it appears that something related to the audio formats is preventing the addition of such options.

As a side note: You won't need to use any of those offset options when using a Blips.fm music pack :)

This window is only available for the compressed file formats (Ogg Vorbis & MP3) and can be accessed by double-clicking one of these files or by choosing “Advanced…” in the Import dock after selecting a file.

Text-To-Speech

The new text-to-speech functionality allows you to… You guessed it! To turn text content into speech by using speech synthesis. This is a great addition that can make your project more accessible. It can also be an easy way to narrate events in your game.

The text-to-speech functionality is provided through code. Below is a very simple script that I created to test it using GDScript:

GDScript

extends Node

var voices: Array[Dictionary] = DisplayServer.tts_get_voices()
var voice_id = voices[0].id
var volume: int = 100
var pitch: float = 1.0
var speech_rate: float = 1.0
		
func speak() -> void:
	var message = "Hello, World!"
	DisplayServer.tts_speak(message, voice_id, volume, pitch, speech_rate)

func stop() -> void:
	DisplayServer.tts_stop()

func _process(_delta):
	if Input.is_action_just_pressed("ui_accept"):
		speak()
	if Input.is_action_just_pressed("ui_cancel"):
		stop()

You can add this code snippet to a new script and attach it to any node on your scene to test this feature for yourself.

In this script I used the “ui_accept” action from the Input map to initiate the speech (enter and spacebar keyboard keys by default) and “ui_cancel” to stop it (escape keyboard key by default).
The volume variable sets the speech volume from 0 to 100.
The pitch variable sets the pitch of the voice between 0.0 and 2.0.
The speech_rate variable sets the rate of the speech with values between 0.1 and 10.0 (1.0 being a normal speaking rate).

The documentation mentions that Godot depends on system libraries for the text-to-speech functionality to work. For Windows and MacOS operating systems the documentation states that these libraries are already installed by default. However, I was only able to make this script work out of the box on a Windows machine.

On MacOS no voices appeared to be available when inspecting the result of the tts_get_voices() method. I also tried to install additional and optimized voices through the MacOS system preferences to no avail. This could be due to an incompatibility with my current MacOS version, since I'm not on one of the latest. I wasn't able to confirm this though as unfortunately, documentation for these new features is still a bit scarce at the moment.

I advise you to test this functionality on your system before counting on it to develop your project. Linux users on the other end will probably need to manually install text-to-speech libraries.

As a final note, when using the text-to-speech functionality for accessibility purposes Godot recommends the following as best practices:

Develop the game with text-to-speech enabled and ensure that everything sounds correct.
Allow players to control which voice to use and save/persist that selection across game sessions.
Allow players to control the speech rate and save/persist that selection across game sessions.

This provides blind players with the most flexibility and comfort available when not using a screen reader, and minimizes the chance of frustrating and alienating them.

For more information about these new audio features refer to the godot engine documentation.

Jack Type

Blips founder, video game music composer & technical sound designer

As a child I never missed an opportunity to tear a toy with some electronics apart and use the parts to create something new. I ended up taking an electronics/programming course in my teens while also developing a deep passion for music. Owing much of this passion to a family owned night club and venturing briefly as a DJ, I embarked on a music production journey, and being an avid gamer triggered the desire to be involved in the creation of a game's music and sound. While continuing to grow both my technical and creative skillsets, I found that video game development fits me like a glove. It allows me to fully apply those skills for an endless number of possibilities.