Setting up Wwise for a spatial live electronic performance.

A short jam with the system running. The audio in the videos has been decoded from ambisonics to binaural. The use of headphones is highly encouraged.

In this blog entry I will explain how I tried to apply the Ambisonic audio workflow with Unreal Engine for a school project, how I failed, and how I used that failure as inspiration for trying out something entirely different.

My original -and later discarded- concept: Using Unreal Engine

I originally envisioned a generative sound piece that would control the movement of visual objects in Unreal Engine, all in real-time. These visual objects would move in a 3D scene and emit audio. The audio would then be encoded to ambisonics and then decoded using IEM plugins in Reaper, to then finally reach the output through the multi-speaker setup at the Audiovisual Media laboratory (AVM) at my University. My main intention was to get familiar with both Unreal Engine 5 in general, its new DSP audio rendering graph called Metasounds, and the possibilities of spatial audio for live performances of experimental music.

The setbacks: Unreal Engine’s limited documentation

Very soon I encountered roadblocks. Playback of ambisonic recordings in Unreal is pretty straightforward, but it seems that the playback of audio objects encoded into ambisonics outside of the engine is not. I read that UE possesses a type of Blueprint called “Soundfield Submix” that is meant for encoding audio objects into different types of soundfield representations, including Oculus and Resonance Binaural, and ambisonics. Also, one of the lead audio programmers at Epic exposes in this talk the concept of “Soundfield Submix Endpoints”, which in conjunction with Soundfield Submixes are able to send ambisonic audio streams to custom hardware devices. 

While this evidence showcases that the backbone of the technology seems to be in place, the documentation is scarce. Not in the official documentation nor the talk a concrete example of use is shown. Additionally to this, every question in regards to the topic at UE forums (like this one) has so far ended up unanswered. Speculation online (from user forums, not from any official account at Epic) seems to suggest that this and other parts of UE5’s audio engine are still buggy and in development, and therefore the team has put the soundfield features aside momentarily while they fix some other parts of the engine that they consider more urgent.

After spending several days researching and asking in numerous forums without an answer, I decided to implement Wwise into the project and try something slightly different. I’ve used Wwise and ambisonics successfully in the past, but I didn’t want to rely on it because then I would need to use samples for my composition, while what I really wanted was to try out Metasound’s synthesis capabilities. Nevertheless, Wwise offers a wide range of possibilities for open-ended composition and is capable of managing ambisonics up to 5th order. My first tests were successful. Thanks to Reaper’s ReaRoute and its ability to transmit up to 16 channels of audio between applications, I was able to transmit a third-order ambisonic stream for decoding: 

3rd order Ambisonic Bus in Wwise

Using ReaRoute as Output

Receiving the Ambisonic stream inside Reaper and distributing it for decoding.

I used IEM’s AIIRA Decoder for outputting ambisonics into the AVM’s custom speaker setup..

Unfortunately, a new roadblock appeared, and once again it related to UE. While Wwise works fine when transmitting spatial audio to external applications by itself, the situation changes when the engine works alongside UE5. Every time I tried to test the audio with Unreal, my Master bus in Wwise stopped recognizing ReaRoute and reverted to stereo.

Audio reverting to stereo when using Wwise alongside Unreal, and ReaRoute disappearing as an endpoint option.

After searching once again through forums and QA posts, I found this thread that appeared to have similar issues to mine. The thread was never officially answered, but the author followed up with a solution that involved engine code modifications in Wwise. I am not a programmer, but I tried to follow anyway, unsurprisingly without success. I also sent a message to Michael G. Wagner, Professor and Department Head of Digital Media at the Antoinette Westphal College of Media Arts & Design, who I follow because of his very useful Spatial Audio tutorials on YouTube. He replied that is possible to write your own Audio Device and make it work with Wwise by creating a script. I realized that the difficulty and scope of the project were way over my time constraints, so I abandoned it.

Exploring Wwise as a tool for live electronic performance

After these setbacks, I reconsidered my approach. I was not gonna be able to complete the project if I kept trying to make unfamiliar tools work the way I imagined to. At the heart of my goal was the exploration of immersive audio and interactivity. I decided to maintain that core while reevaluating the execution and think more about the possibilities of the tools I already had working (Wwise, Reaper, and IEM plugins). 

Wwise works by deciding when and how sound events should be playback by grouping sound files into containers that have certain specific characteristics (e.g. a “Random Container” will choose a sound or container from all the sounds that are contained in it at random, according to certain rules that the user decides; a “blend container” will crossfade between the sounds or containers as specified by an external control parameter, etc). These containers can then be grouped into “Events” that define the final logic and behavior of when and how these containers are chosen to be played by the system. Usually, the game’s code decides when to fire these events, and an analogy to live performance can be easily drew: if the game’s code is the performer, Wwise is the launchpad.

Fortunately for us, Wwise offers the possibility of mapping MIDI messages to events inside of the authoring tool. Unfortunately, the mapping method is very cumbersome when compared to what you can find in other tools that have been designed with live performance in mind (e.g. Ableton Live). Wwise’s MIDI mapping is meant for testing, so the sound designer can easily and freely fire many events at a time without having to execute game code first. This works well when launching simple one-shot events, but doesn’t let us map (or at least I haven’t found a way) other important features that Wwise offer like switch and state changes, or real-time 3D positioning and attenuation simulation. Another disadvantage in comparison with traditional music performance software is the lack of a global clock. BPM can be taken into consideration while designing sound behaviors inside of what is called “Interactive Music Hierarchy”. While I didn’t actually try to map MIDI to BPM in the Music Hierarchy, I always considered the Music Hierarchy limiting, since the tools there focus on the treatment of long pre-rendered audio, and I like the more granular approach of the Actor-Mixer Hierarchy. Clock, positioning and switches were all fundamental operations, so I had to find workarounds.

Sending MIDI

A custom configuration of my MIDI device

After selecting, editing, and implementing my samples in Wwise, I started creating events without logic, and I populated my containers. The behavior of these was in constant iteration and required fine-tuning of settings until the very end of the process. Creating the events first allowed me to start mapping them to MIDI messages from the beginning and do fast testing and iteration while trying out my container behaviors.

Some of the structures of the sample instruments I created. They ended up using several levels of inheritance.

The next step was to hook up my MIDI controller and configure it so it could work as intended with my behavior structure. I chose to work with my Akai MPK Mini for two reasons: first, it offers the ability to customize the entire layout to choose what messages each knob and pad sends individually; and second, it has its own clock that allows for a variety of different arpeggiator modes and speeds. The fact that I was able to tap tempo and change subdivisions on the fly was very important to give my sequencer musicality and variability. 

MIDI mapping in Wwise

After configuring my controller, I started mapping the events I created to different MIDI notes as I found it more comfortable to play.

Selecting musical scales

The next step was to work on the behavior of my pitch content. Originally I planned to make everything chromatic but then I considered the possibility of having two modes: one chromatic and one diatonic, for more variation. This would require the programming of two different states, which as mentioned before were not accessible for MIDI mapping. The solution was to use the Blend Container and crossfade between two Random Containers of notes (one containing all 12 tones of the scale, one containing only those of the C major key), and a Real-Time Parameter Control to go from one container to the other. Since RTPCs do offer the option to be mapped to a specific CC value range, I was able to control this feature easily with a knob.

In this example, the Bass voice of the choir can switch between scale modes thanks to a Blend Container.

The challenge of spatialization

Despite its powerful spatialization engine, Wwise is dependent on a game engine to receive coordinates about the location of sound emitters in space. When no game engine is connected to Wwise, the user can still simulate positioning by using 3D panners or attenuation previews. Although useful for testing and debugging, these tools can only be manipulated via mouse, and as far as I know, no MIDI mapping is offered. This limitation makes real-time spatialization for my specific purposes really challenging, so I needed to find workarounds.

Unfortunately, I didn’t find a way of mapping panning previews to MIDI CC inside of Wwise authoring.

My initial approach was to set my sources to specific points in space as starting points and then program certain default movements. In Wwise, this can be achieved by making use of the “3D Position” section in the “Positioning” tab of a sound or container. From there, when “Emitter with Automation” is selected, a new window can be opened. This window offers the possibility of predefining positions in a simulated 3D space and by defining points in space and time (in a similar fashion as how keyframes work in animation) it is possible to draw a path for the sound to navigate through in a defined timeframe.

The “Positioning” tab in Wwise

The initial position in space of my sources. I organized them with the layout of the AVM in mind.

A spatial automation path for a sound in Wwise

This method offered me a way of spatializing sounds in a more creative way, but I still didn’t like the lack of control. I expected that having my sources moving all the time in predefined paths would end up cheapening the whole experience very fast. To try to counteract this problem I decided to use the same blend container method as before and created another RTPC that I called “Spatial Mode”. This parameter acted as a sort of “on/off” switch for the moving paths, so I could at least decide when to start and stop the movement of sources through space. I achieved this by populating the container with two versions of each sub-container: one that only had initial positioning data (indicated by the suffix “Sectional”) and another that contained movement automation paths (suffix “spread”).

My second RTPC blend control for turning on or off

Exploring spatial granularity

Building upon the work done until then,  I adventured a bit further and tried to figure out less obvious solutions for spatialization. The path automation window lets you store up to 64 paths in a list. Every time an event calls the object that contains the automation, a new single path in the list is chosen, and the user can decide if the event cycles through the list sequentially or at random. This gave me the idea to explore the possibilities of spatial “granularity”. I wondered if an illusion of fluent movement in space could be made out of many static positions being fired rapidly. I decided to test out this with percussion sounds because of their brevity. I filled up the list of some of my percussion samples with static positions that together made for individual steps of simple shapes like circles and 8-figures, or that moved in parables through the X and Y axis. The clock of my MIDI controller is capable of firing up to 1/32T notes at 280 BPM. The results were positive and I was able to control the speed of the movement by changing the clock of my arpeggiator or by manually firing MIDI notes. This capability gave my set more unpredictability and dynamism, and although admittedly limited, it allowed me to have some agency over the spatial component of the performance. I would certainly like to make more experiments with this and other concepts in the future since I believe I only touched the surface of what Wwise could offer for performative spatialization.

These three pictures show three different static path points on my kick sample, which moves only circa 6º at a time in a circle each time the “playOneShot_drum01” event is fired by MIDI.

Effect controls 

After deciding on the spatial behavior, I started working on effect groups. Since I was running out of time I didn’t go too deep, but I manage to map two effects of my drum group to knobs in my MIDI controller: Low-pass filtering and Pitch transposition. I was especially surprised about how well Wwise manages pitch shifting in real-time,  in a range of +/- 24 semitones. I believe both parameters gave my sequencer an extra layer of expression, but I would like to explore more in the future.

The four parameters I mapped to my MIDI controller.

Conclusions

This project taught me many lessons about spatial audio, but also live electronic performance. It was a bit disappointing to see how badly documented Unreal Engine is in the soundfield department and I hope the team at Epic can work on use examples in the near future. 

Ambisonic audio in Wwise worked very well for this specific setup but I think it would be nice to see the possibility of using VST plugins in audio buses in the future. I would prefer to decode the Ambisonic stream into a custom speaker setup directly inside of Wwise, instead of sending that stream to Reaper to be then decoded with IEM or Sparta plugins. As much as I love Reaper, adding extra software only for playback adds a level of complexity that could easily be avoided if Wwise worked with third-party plugins as a regular DAW. This is especially annoying in Windows, where specialized (and often temperamental) third-party software is required to stream audio from application to application. 

On the other hand, I was pleasantly surprised with Wwise and how it works with MIDI input in its authoring environment and I see great potential for creative uses that go beyond game audio implementation. That being said, I believe this potential is cut short by design. I had experience with Wwise before, mainly working on small game projects, and this surely helped me to find alternative solutions to many cumbersome problems. As the popularity of the tool (and the relevancy of interactive audio in general) increases, I hope the team at Audiokinetic realizes the potential and starts to offer more functionalities and better workflows for other types of interactive audio experiences. This exercise showed me that there is a multitude of things that Wwise does much better than established tools like Ableton Live or Logic Pro, but until better workflows are not implemented, that potential will never be realized.

Note: The reader can watch and listen to more examples of the sequencer in action by following this link. The audio in the videos has been decoded from ambisonics to binaural. The use of headphones is highly encouraged.

Previous
Previous

Para Ernesto (original poem in Spanish)

Next
Next

Synthpaint - Devlog 02 - Origins and First prototype