REAL TIME PROCEDURAL WIND SOUNDSCAPE
The effect of procedural wind soundscape on navigation in virtual 3D space
Bachelor Degree Project in MEB: Media Arts, Aesthetics and Narration
30 ECTS
Spring term 2014
Jóhannes Gunnar Þorsteinsson
Examiner: Anders Sjölin / Lars Bröndum
Abstract
Sound design with the help of procedurally generated sound in video games has seen a rise in the last few years given how that method gives us greated freedom in how sound reacts in realtime to the games, and the players. This research looks into if there is any difference in how procedural sound, in this case procedurally generated wind, affects the navigation of players in a three dimensional world, as opposed to static sample based sound design.
Keywords: procedural sound navigation
Table of Contents
1 Introduction...1
2 Background...2
2.1 Procedural Audio...2
2.2 Sample Based Audio...2
2.3 Pure Data...3
2.4 Open Sound Control...4
2.5 Game Examples...4
2.6 Wind... 4
3 Problem...6
3.1 Method...6
3.1.1 Template Level... 7
3.1.2 Level A, Sample based wind...7
3.1.3 Level B, Procedural Wind Soundscape...8
3.1.4 Gathering Data... 8
3.1.5 Processing data... 9
4 Implementation...10
4.1 Progression...10
4.1.1 The Sound Engine... 10
4.1.2 First Level Design, The Natural...11
4.1.3 Second Level Design, The Urban...12
4.1.4 Third Level Design, Natural but orderly...13
4.2 Pilot Study...14
4.2.1 Results from pilot study...15
5 Evaluation...18
5.1 The Study...18
5.2 Analysis...19
5.3 Conclusions...23
6 Concluding Remarks...24
6.1 Summary...24
6.2 Discussion...24
6.3 Future Work...24
References...26
1 Introduction
This research explores the possible practical applications of procedurally generated sound for wind soundscapes, along with its effect on in-game navigation for the end user as opposed to sample based sound design. What we normally hear when we play video games or watch movies is sample based sounds which means the playback of never changing pre- recorded (and pre-designed) sound effects. This is something game design inherited from cinema as it works quite nicely there as sounds do not need to be played in real time nor do they need to vary at all as there is no interactivity. Because of the nature of cinema, movies remain virtually the same each time you watch them. Interactive art like games on the other hand do not. Sample based sounds in games have been working out quite well in the past, but we are starting to hit a wall concerning the processing and size cost of those samples when the games start to grow. Sample based sound banks do not scale very gracefully as they have a fixed cost as each sample is always going to take the same amount of memory and processing power. This works well with products using small sound banks but when the number of sound effects start growing into the hundreds, thousands, or even more, things start getting hard to manage. Procedural sound on the other hand has a variable cost for each sample and therefor scales a lot more gracefully upwards.
To be able to implement procedural wind soundscape in a video game this research uses a visual programming language called Pure Data (Puckette et al, 2013). It allows you to program sounds, music, video or even games to name a few examples. Although its strongest point lies in the audio and digital signal processing. But it is just one part of the puzzle. To be able to connect a game engine and a Pure Data sound engine together Open Sound Control (Freed & Wright, 2009), more commonly known as OSC, is used to take care of communications between the two engines.
In essence, wind is at its core simply noise, but the main source of the sound behind wind comes from vortices that are created when wind hits obstacles in its path and can that create either cavity tones or aeolian tones. Cavity tones being generated in hollows and cavities while the opposite accounts for aeolian tones, like for example when a stick is swung around (Dobashi et al, 2003).
Does procedural wind soundscape affect the way people navigate a 3d game world, and to what extent? This research subjects players to a video game controlled from a first person perspective, where the objective is to collect a number of objects in a near open world. In the background the game collects location data about the player in the 3d world for the research.
Half of the participants played version of the level where a sample based wind sound is used
for the main soundscape while the other half played a level with procedurally generated wind
sounds. As an example, in the latter level the soundscape changes dramatically as a low pass
filter is applied to the master output when a player finds shelter from the wind. Wind does
not affect game play or game mechanics at all directly. The underlying wind sound engine
was developed in Pure Data (Puckette et al, 2013) and implemented in Unity (Unity
Technologies, 2005)
1via Open Sound Control messages (Freed & Wright, 2009). The wind
2 Background
2.1 Procedural Audio
Procedural audio is the act of creating and manipulating sound in real time with the help of algorithms and such tools. Sample based audio differs from that where sound is simply played back like a track on a CD. Andy Farnell goes over synthetic audio and sample based audio quite thoroughly in his book Designing Sound (2010). For those that were raised up in the 80's and 90's (and perhaps earlier even) do remember the sounds that the old video game consoles and computers created. Each console had a different feel to the audio as instead of sample based audio, these consoles actually had their own synthesiser chips built into them that produced sound effects and music in real time. When the progress of synthetic audio hit a wall in the 90's, sample based technology took a leap and won the race in interactive media because of its perceived realism, after that synthesised sound was simply discarded (Farnell, 2010). Even modern games that are made in the same style as the games from the late 80's most often emulate the synthetic sounds with sample recordings rather than making use of real time sound synthesis.
2.2 Sample Based Audio
“A recording captures the digital signal of a single instance of a sound, but not its behaviour”.
(Farnell, 2010)
This is why real time generated synthetic audio can be such an interesting fit for game audio.
Games are built on interactivity. Game characters, objects, and even visuals in general are affected by player behaviour, sample based sound on the other hand, is simply triggered.
Farnell (2010) said that sample based sound can be compared to the series of static 3D renders that were used to construct the game Myst (Cyan, 1993). In Myst, a player navigates a 3D world in first person view via pre-rendered static frames. By clicking on locations in view, the game loads a new frame rendered from the new location, giving the impression that the player has travelled a said distance. Various filters and effects can be added to the sample based sounds but that only fixes the problem with lack of interactivity but the problem with space needed for samples still remains.
Andy Farnell (2007) mentions in his paper Synthetic game audio with Pure Data (Puckette et al, 2013), that one of the main reason why a switch to procedural/synthetic audio might be a good idea is because of the sheer amount of samples needed to create realistic audio in modern games. Just looking at footsteps and how you need a good amount of footstep sounds for each character, every single surface the character walks on, and also sometimes footstep sounds depending on the weight of gear or the style of running/walking (Farnell, 2007).
When it comes to sampled wind we are left with several sound sources, maybe 10-30 seconds
long each, scattered over the virtual game area with big volume attenuation zones telling the
game where they can and can not be heard. They are then blended with other sound sources
of various lengths to make sure that the user won't hear as easily that they are looped. For
extra detail couple of sound sources for smaller sounds like rustling leaves and trash being
dragged along the scene by the wind is added but it's usually not connected to any proper objects in the game. When you hear the wind pick up speed you don't see the difference in the game world. Those two elements are completely disconnected. The virtual visual world and the audible world are two separate worlds, not really communicating with each other.
You could in theory implement wind sound with samples that reacts to changes in the game world by using clever combinations of samples, along with various filters and reverb effects that many game engines have built in. But just a single waveform loop of wind sound in our example can take between 500 to 1000 kilobytes, multiply that with the amount of samples needed to see how expensive it would be. When it comes to procedural audio on the other hand, the wind synth created for this research will most likely remain around 500 kilobytes.
This number does not multiply according to amount of sounds played as they are being generated in real time with the 500 kilobytes of code. Procedural audio therefor has a relatively high start cost, but barely increases as it scales unlike sample based audio.
2.3 Pure Data
Pure Data (Puckette et al, 2013) often known as is an open source visual programming language used by musicians, visual artists, performers, researchers and developers to create software graphically in real time without the need of writing code. Pd was started by Miller Puckette who also developed a similar (but commercial) software Max (Cycling '74, 2014) prior to starting work on PD. Johannes Kreidler describes Pure Data in precise terms as
“real-time graphical programming environment for audio processing” (Kreidler, 2009).
Figure 1 Example of a part of a wind synth patch in Pure Data
Although Pure Data (Puckette et al, 2013) and Max (Cycling '74, 2014) are relatively similar products the main difference lies in the fact that Max is commercial and proprietary while Pure Data is open source, and therefore a lot easier to customize for each usecase. Like for example embedding it into a game engine which is something that is theoretically possible with Max but practically too much hassle both technically and legally. Because of this difference in licensing this research will be using Pure Data rather than Max.
2.4 Open Sound Control
The simplest method of implementing audio in the Unity engine (Unity Technologies, 2005) would be using Open Sound Control (Freed & Wright, 2009), or OSC for a short. Using a library called libpd
2(Pure Data community & libpd contributors, 2010) is also a possibility but it actually embeds into the game and is therefor favourable for publishing but at the same time trickier to implement. OSC allows us to simplify the process considerably by sacrificing the option of embedding the Pure Data (Puckette et al, 2013) patches into the game. OSC simply acts as a method to send messages between the Pure Data patch and the game engine. The Pure Data patch will need to be launched individually which makes it not optimal for published games but perfect for audio prototyping and also a decent method for this specific research.
2.5 Game Examples
A good example of using Pure Data (Puckette et al, 2013) in games would be Maxis' work on embedding Pure Data in their game, Spore (Maxis, 2008) as an audio engine to drive audio patches (Brinkmann & Kirn, 2011). This allowed them to generate procedural audio for the monsters for example, and also allowed Brian Eno and others to write procedurally generated music for the game easily (Kosak 2008). According to Andy Farnell (2010) Sony might even embed Pd in their future game console designs. Another example being the game Fract OSC by Phosfiend Systems (2014). that is built with the Unity engine (Unity Technologies, 2005). Fract OSC is a first person game set in a synthesizer world where the player needs to solve puzzles and affect the game world by operating large machine synths and sequences. That kind of sound design would be extremely difficult and expensive to create with the sample based method.
2.6 Wind
The main reason for choosing wind sound for this experiment as opposed to any other sound was the simplicity of the core sound behind wind. At its core wind sound is simply a noise that is then shaped and adjusted until favourable results are achieved.
In van den Doel's (2001) experiment in real time rendering of aerodynamic sound it was mentioned that their audio algorithms are not derived using first principles from physical laws. The laws governing the sound of wind are so complex that such a simulation would simply be impossible (Doel et al, 2001). Therefor we will be basing our wind sound generator on the work showcased in Farnell's book Designing Sound (2010), which breaks up the sound into easy to understand components. At the core there is the wind speed and the
2
A library allowing for easy implementation of Pure Data in games and other applications.
generic wind sound created with simple white noise generator plus various filters. Other sounds that we need to focus on are the gusts, and the squall along with the high frequency whistling. Then there are the howling of pipes and other cavity tones along with the rustling of leaves in the trees.
Another thing to keep in mind is that wind is actually traveling from one place to another over a certain period of time, so if the wind is traveling from left to right, then your left ear will pick up the rise in wind intensity before your right ear. To achieve this the location and direction data from the game can be passed to the sound engine to decide when and how much to filter the left or right channel to achieve this effect (Farnell, 2007).
To connect this all together with the game world various filters along with volume control
will be connected to each node of the synth, allowing the game engine to feed the position of
the player into it to affect it. Therefor when the player is in cover, the sound gets dampened
for example.
3 Problem
The main hypothesis is the following; with the help of procedural wind soundscape, players will traverse the 3d world differently, seeking cover in areas with less wind even if said wind has no direct effect or consequences on gameplay or game mechanics. From this hypothesis we can construct the main research question, which is the following:
Does procedural wind soundscape affect the way people navigate a 3d world, and to what extent?
Procedurally generated and computed audio is slowly gaining traction in the sound design field and a lot of research has been coming up looking into and analysing the strengths and weaknesses of this method of sound creation, see Synthetic Game Audio with Pure Data (Farnell, 2007), Foley Automatic (Doel et al, 2001) to name a few. There has been a lot of research based on the generation of ambience sounds like Sounding Liquids: Automatic Sound Synthesis from Fluid Simulation (Pamplona et al, 2009) which makes a good foundation for looking into the effect it has on players. Limited amount of research looking into the effect of generated sound on player has been made and therefor it seems like this is a good starting point for that kind of research.
3.1 Method
Sample collection used the purposive sampling methodology to make sure that one group is not too over represented. For example avoiding excessive numbers of males in the age range of 20-30. With this, the research is hoping to fill a certain quota of certain age groups, genders and such, but inside each group participants is selected randomly with probability sampling (Dawson, 2007).
Taking note that technology and equipment is an important factor of this research a large sample size of players is not really an option. One method to solve that would be to spread the research game over the internet to a lot of people and then gather the results through that. But the discrepancies in gear could contaminate the results. Some people only have speakers that can't emit the sounds properly, while other people might experience lag, low amount of frames per second or other technical difficulties. Therefore it is important to do the test on the same equipment for each participant. This limited the sample size considerably along with the fact that population is extremely scarce where the research was conducted
Participants in the experiment were subjected to playing a short and basic open world video game where they needed to collect all objects to finish the level. Two versions of this level were made and participants were split into two groups where one group played level A while the other played level B. Level A features sample based wind sounds while level B uses procedural wind sounds. Navigational data is collected from the playthroughs and the objective of this experiment is to compare the differences in data between those two groups.
Along with data collection of coordinates of player movement, the play session was recorded
using standard screen recording software. The focus on this research is on the quantitative
data but it is kept in mind that not everything can be planned out and expected. Therefore,
grounded theory methodology (Dawson, 2007) is used in mixture with the quantitative
method. Qualitative screen recording data is more thought of as an extra data just in case if something unexpected would be discovered during the course of the research. Therefore, given the fact that this is in the unknown there is no set rule on how or if the data will be processed or analysed, but it was simply be collected from all participants in the experiment.
Participants were able to quit at any time given the fact that this research is not looking into how well people navigate but rather how they navigate. So quitting early will not invalidate the data points collected.
3.1.1 Template Level
The template level that was constructed as the foundation for level A and B (covered in detail below) in this experiment. The level is constructed using Unity Technologies Unity engine (2005) as a square landscape featuring open areas and valleys for the player to navigate through. Three objects will be placed irregularly throughout the level and the objective for the player is to collect all objects and after that the game ends. To encourage exploration and to avoid players learning the simple scene by heart and taking therefor the shortest paths a fog is used to obstruct the view partially
The number of objects chosen is to simply allow for enough diverse navigational data to be collected while keeping the play session's length to the minimum. One objects would introduce a rather linear gameplay or in a worst case scenario a long walk around the game world without any perceivable progression for the player. Two objects would only allow for 2 possible gameplay variants to be taken but with three you have six possible routes to be taken. The size of the level will be decided by the rule that each playthrough should take approximately 5 minutes on average to avoid boring the players while still managing to collect sufficient amount of data.
Given the fact that this research is looking into how people navigate rather than how well they navigate the objects were added to the game simply as a goal to drive the player to progress in the game session. The amount of objects collected or the time it takes for each participant are of no interest to this research as the wind sounds are not designed to lead the player towards his goals. A non-linear square open map was also chosen to allow for more freedom in choice concerning navigation.
3.1.2 Level A, Sample based wind
The wind soundscape used in this scene is constructed out of several samples of real and pre-
designed wind recordings be placed accordingly. The sounds do not act along with the
changes in the scene though but will focus on fidelity rather than interactivity.
3.1.3 Level B, Procedural Wind Soundscape
In level B, The Pure Data (Puckette et al, 2013) wind sound engine comes into play, using OSC (Freed & Wright, 2009) to allow it to talk to the game engine, and the other way around.
Figure 2 An illustration showing the flow of data between the game engine in light gray and the audio engine in black.
In certain areas where the player has a cover from the wind according to its wind direction, a low pass filter
3is applied at the end of the sound engine pipeline. Resulting in all of the higher frequencies being silenced and the wind sound being softened and damped considerably. Putting it more into the background.
3.1.4 Gathering Data
Quantitative data was gathered at a timed interval, every second, with the X and Y coordinates of the player in the game world. Each dataset also features an identifier telling if the data is from the sample based experiment level or the reactive wind one.
Biometrics were not measured simply as that is not a realistic scope for this research.
Although with that said, comparing biometric data between those two scenarios could give some interesting results and is a worthy idea for future research in the field.
A metrics tool of this sort will only give us very raw numbers from the game but will not give us anything regarding the person, feelings, social factors and such. That is not the scope of the project although to be on the safe side, subjects will be asked a short close-ended questionnaire with the usual question of age, gender and perceived time spent on games each week to roughly estimate their game experience (Tychsen. 2008).
This is the quantitative data, but the research also gathered some qualitative data to mix in grounded theory methodology with our current quantitative methodology of choice. This data was not planned to be of any use but was collected just in case if a discovery would be made half way through the research requiring this data to be collected. This allowed for the
3