Artificial Reverb vs. Real Recorded Reverb in the Back Channels in a 5.1 Surround Setup

(1)

Artificial Reverb vs. Real Recorded Reverb in the Back Channels in a 5.1

Surround Setup

Adrian Emilsson

Audio Technology, bachelor's level 2018

Luleå University of Technology

Department of Arts, Communication and Education

(2)

Abstract

When recording music for surround audio engineers sometimes face limitations in time, ideal microphone positions or a noisy audience. If this cannot be dealt with at the

location, artificial reverbs are often used in the mixing to “fill in the blanks”. In this study, three instruments were recorded separately with two 5.1 surround microphone setups. Two audio engineer students created artificial reverbs that replaced the back channels of each microphone setup. A listening test was conducted where test subjects compared the 5.1 real recording to the two other stimuli with artificial back channels in terms of realism, envelopment and preference. The result showed that the real

recording and the artificial back channels were interchangeable, but that the artificial

back channels pointed towards more envelopment, and that the real recording pointed

towards more realism.

(3)

1. Introduction 1

1.1 History of surround sound 1

1.2 Recording for 5.1 2

1.3 Reverberation 5

1.4 Realism in artificial Reverberation 6

1.5 Envelopment in the surround format 6

1.6 Applications of artificial reverb in the surround sound format 7

1.7 Research question 7

1.8 Purpose 7

2. Method 8

2.1 Method overview 8

2.2 Attributes and preference rating 8

2.3 Stimuli 9

2.3.1 Choosing instruments and type of music 9

2.3.2 Recording stimuli 9

2.3.3 Creating the artificial back channels 10

2.3.4 Normalizing the back channels 12

2.3.5 Back-channel bias 12

2.3.6 Finalizing the stimuli 12

2.4 Listening test 12

2.5 Test subjects 14

3. Results 15

3.1 Average test scores, T-tests and p-values 15

3.2 Comments by the test subjects about the test 16

3.2.1 Comments regarding the instruments 16

3.2.2 Comments regarding the attributes 16

4. Analysis 18

4.1 T-test instrument analysis 18

4.2 T-test attribute analysis 18

4.3 Test subject comment analysis 19

4.4 Artificial vs real recorded reverberation analysis 20

5. Discussions and Conclusions 22

6. Future applications and further studies 23

7. Bibliography 24

(4)

1. Introduction

In evaluation of surround sound the primary interest is usually the sound source and how it fills the space. Berg and Rumsey (2003) found attributes that are usable when evaluating surround sound. For example, naturalness, presence, ensemble width, localization, source distance and envelopment, all could be defined and understood in the surround sound realm. The majority of these attributes evaluate the audio in the front channels, neglecting what’s in the back. This could perhaps mean that the relevance of the back-channel audio is not as great as the front-channel audio. This pattern can be found in other audio fields too. A recording engineer might, when faced with time, venue or equipment constraints, not record back-channel audio, presuming that it can be artificially created later, with the same result. Would there be a notable difference between the recorded and the artificial audio? In a study by King, Leonard, Howie and Kelly (2017) realism and immersion in surround was investigated, using both real recorded reverberation and artificial reverberation. King et al. used a 9.1 surround setup, 5.1 surround setup with four additional height channels, and only switched between the real and artificial audio in the height channels. They showed that there was no perceived difference between real and artificial height channels in terms of realism or immersion. The study by King et al. can be simplified by using a 5.1 surround system, with the benefit of a potential perceivable greater difference between the real and artificial audio. Two of the validated attributes in the study by Berg and Rumsey (2003) that include the back channels to a greater extent, envelopment and naturalness, almost matched the attributes in the study by King et al. (2017). In this study, however, the attributes in focus are realism and envelopment.

Background

1.1 History of surround sound

“Fantasound” was the forerunner of all modern surround sound systems today and was used to accompany the movie “Fantasia”, created in 1938 by Disney. The system used a surround sound setup with two front speakers and only one surround speaker in the back. During the 1950s different types of surround setups emerged in the market using different amounts of speakers. Problems of perceived location made some setups, like the quadraphonic, an unsatisfying choice for surround. This led to the modern 5.1 speaker setup, which arose during the early 1990s. It consists of a Left and Right speaker in an equilateral triangle with the listener, a Center speaker in between the former at the same distance from the listener, two Surround speakers placed at ±110°

(±10°) from the center speaker and a subwoofer somewhere in the listening

environment (figure 1). (International Telecommunications Union, 2013)

(5)

Figure 1. The modern surround sound setup, without the subwoofer.

In smaller surround systems the surround speakers’ positions becomes a compromise between envelopment and rearward image. This can in practice only be solved by using more speakers, reducing the tradeoff. Height speakers can be added to complement the speakers in the horizontal plane, which will better simulate a real-world venue.

However, the ability to hear errors in elevation is generally three times less accurate than hearing errors in the horizontal plane, when listening to sounds in the front. This ability is further reduced when listening from the side or back (Holman, 2008). That might indicate that more speakers are not necessary, but a study by Hamasaki, Nishiguchi, Hiyama & Okumura (2006) showed that added speakers with different heights can produce a significant increase in direction as well as envelopment.

1.2 Recording for 5.1

There are many different types of recording techniques one can use when recording for surround. Every technique has a different characteristic and will be better suited for certain types of recordings. The way we prefer music can even vary between pieces alone depending on the way it was recorded (Atsushi, Francisco, Kim, Martens, Walker, 2006). As the venue itself affects the possibilities of how to record, and also plays a major part the sound of the music, established surround microphone setups could (or should) be considered as suggestions. These established surround setups are often used when different attributes of the music are considered. If wanted the recordist could capture the music with for example more directional properties or with a very

reverberant sound. The only techniques discussed in this chapter is the Fukada Tree, the OCT Surround and the Hamasaki Square.

In stereo recordings there is generally either a difference in intensity or time that

creates the stereo effect. Two well-known different recording techniques are XY-stereo,

which uses two cardioid microphones in the same spot in a 90° angle, and AB-stereo,

which uses two omnidirectional microphones with a distance between. XY-stereo is

(6)

known to have good directional properties but sound narrow. AB-stereo is known to have a full sound but lack in directional properties. These two can be tweaked or combined in for example ORTF-stereo where there is a small distance between two cardioid microphones with a 110° angle. When recording in surround these basic techniques can be configured in many different combinations in order to capture a sound in an environment in the best way (Holman, 2008).

The Fukada tree (figure 2) consists of five cardioid microphones, four of the

microphones in a 2 m sided square facing outwards from the center of the square, and one center microphone 1 m closer to the sound source. This setup has good directional properties, since it consists of only directional microphones. The Fukada tree is often used with two additional omnidirectional microphones (in grey in figure 2) on either side of the left and right microphones. Their positions are most often one meter left respectively right outside the left and right microphone. This can yield unwanted phase issues and depending on the recording situation their positions could be changed.

Blending the cardioid and omnidirectional signal for the left and the right channels respectively can create a fuller sound, but still have the beneficial directional properties (DPA Microphones, 2017).

Figure 2. The Fukada Tree.

The OCT Surround (figure 3) array consists of three cardioid microphones and 2

supercardioid microphones. Compared to Fukada tree this rig is half the size in width

and one sixth of the size in length. The left and right microphones are supercardioid. The

spacing between the left and the right side can be adjusted to suit different sizes of

ensembles. The main idea with this setup is to produce better phantom images between

the front speakers. In the Fukada tree, for example, there is a small overlap between the

front microphones, which can make the half-images “mushy” (Holman, 2008).

(7)

Figure 3. The OCT Surround.

The Hamasaki square (figure 4) is not a complete surround microphone setup in itself but can be a complement to spot microphones or a main microphone array. This is mainly used to capture ambience and not the sound source(s) directly. The Hamasaki square consist of four bidirectional microphones positioned in a square with a side of 2 m, with the null side of all microphones pointing towards the source. The square itself might not even be positioned close to the main array and many squares can be used at the same time in different parts of the venue (DPA Microphones, 2017).

Figure 4. The Hamasaki square, with the arrow pointing towards the sound source.

Depending on the situation sometimes combinations of these methods and traditional stereo methods must be used to create a satisfying result. When recording a symphony orchestra there’s often a main array usually positioned above the conductor. Spot- microphones for the different instrumental groups are often used to highlight solos during the performance. The spot-microphones are panned according to the main array.

Hamasaki squares can be used to capture the space of the venue. These combinations of

techniques can apply to smaller ensembles as well (Holman, 2008).

(8)

When recording for even bigger formats than 5.1 surround, additional microphones can be used to capture height channel information, or side channel information. In the study by King et al. (2017) four microphones were used to capture height information, in addition to the main 5.1 array. The height channels in that recording were mainly used for ambience enhancement. King et al. used four omnidirectional microphones with diffraction attachments and they were positioned one meter above the left, right and surround microphones pointing upwards, based on a 5.1-Fukada tree setup. These types of bigger microphone setups often mirror the speaker setups later used when listening to the reproduced recording. In that way the sound reaching a certain spot in the recording environment will originate in the same spot in the speaker environment, making it sound more natural.

As many setups, including the Fukada Tree and the OCT surround, tend to resemble or mirror the speaker positions of the reproducing system, one can draw the conclusion that this would create the most natural sound. This, however, is not always the most desired sound when recording, since all venues have different aural qualities. The microphones are there to capture the sound and the venue, rather than positioned in a certain way to satisfy a specific microphone array. This, in the long run, means that the venue and the sound produced in that venue will set the terms for how to record. This also means that sound engineers are able to capture certain attributes regarding to how they do their recording. Sometimes the sought-after attributes may not have been captured at the venue. This could lead the mixing engineers to use artificially created sounds to accomplish the wanted result (Shriram, 2011).

1.3 Reverberation

Shriram (2011) describes reverberation in a room as “…a natural phenomenon that occurs in an enclosed space due to the sound reflecting off the different boundaries of that space”. Shriram then decomposes sound in enclosed spaces into three parts: direct sound, early reflections and reverberant sound. Direct sound is the sound coming straight from the source to the listener, unaffected by any boundaries in the room. Early reflections are described as reflections via surfaces in the room, separated from the direct sound in both time and direction. Reflections heard at 5 ms to 50 ms after the direct sound, are considered early reflections. The reverberant sound is a denser set of reflections heard after 50 ms, they come from all directions and the sound has been reflected many times. Every enclosed listening environment will have a different type of reverberation, but all can be described using these three parts.

When creating reverbs artificially there are two main types of reverbs – algorithmic and

convolution. The algorithmic reverb relies on calculated echoes and using feedback-

loops. More advanced algorithmic reverbs take the time and frequency domain in

perspective simulating specific rooms, with their specific early reflections, how the

sound is absorbed over time, general reverberation time etc. (Everest & Pohlmann,

2017). Convolution reverbs are also calculated but rely on an impulse response, which is

the recorded reverberation from a specific venue. The impulse response is multiplied

with the dry signal so that it sounds as if it was recorded in that space. The advantages of

the algorithmic reverbs are the possibility to change the parameter settings, but it also

needs knowledge regarding what the different parameters do. The advantages of

convolution reverbs are the more instant natural sounding reverberation, since it

(9)

always builds on existing venues. That, however, also implies that you need a lot of different impulse responses in order to create different types of reverberating venues (Shriram, 2011).

1.4 Realism in artificial reverberation

Studies in surround for bigger formats than 5.1 came to the conclusion that additional height channels are important when creating envelopment and a sense of a realistic space (Hamasaki et al, 2006). The same study proposed a surround system using 22.2 channels. This number of channels will in turn create a demand for more discrete channels during the recording phase. There is a possibility for this to be worked around by using reverb to fill in the blanks (King et al., 2017). King et al. used a 9.1 setup

consisting of a 5.1 setup and four additional speakers positioned above the right, left, right surround and left surround speakers. The height channels either had location- recorded information or artificially produced information, derived from the main 5.1 recording. They showed that there was no preference for either the recorded or the artificially produced material when used together with the 5.1 recording. They also showed that both the recorded and the artificially produced material, in the 5.1-context, had the same realism rating. Their test subjects could however easily differentiate between the two types of height source material when they were listening to the height channels alone.

When comparing real reverberation with two convolution reverbs, modeled after the same venue, preferences was shown differently for different instruments with different types of timbre and spectral composition (Shriram, 2011). The test also included a rating of paired attributes for the stimuli. They had to rate naturalness (artificial – natural), spaciousness (small – big), ambience (dry – wet), distance (near – far), roughness (smooth – rough) and density (scattered – compact) on a scale from 1 to 7.

The natural recording of the venue showed to be the most natural sounding of the three, but not perceived as the biggest, or the most ambient – still scoring at the middle of the scale. Here the convolution reverbs both sounded bigger and more ambient. This shows how different types of reverbs can be good for different causes.

1.5 Envelopment in the surround sound format

Envelopment in audio is the sense of being surrounded by sound, and envelopment is most common in big reverberant halls or a surround sound system. There are of course other enveloping audio related experiences when, for example, reverberant listening rooms are used, and all speaker produced sounds are reflected off the back and side walls. Envelopment can roughly be seen as one of two parts of spatial impression and the other one being apparent source width (Soulodre, Lavoie, Norcross, 2003). Soulodre et al. found that the apparent source width is determined by the energy within the 105 ms after the arrival of the direct sound. Envelopment, or listener envelopment, is determined by the energy 105 ms after the arrival of the direct sound. What makes a sound more or less enveloping is the level and spatial distribution of that late energy.

Soulodre et al. (2003) also found that an overall higher level and a longer reverberation time was perceived as more enveloping. The attribute immersion was used by King et al.

(2017) to evaluate artificial reverbs, and immersion is similar to envelopment, but

immersion does not need a surround system to be apparent.

(10)

1.6 Applications of artificial reverbs in the surround sound format

With the price of hard drive space still declining every year we are more and more capable of recording more music. There are few times today when there’s a limit to how many channels one can record simultaneously. This rise in technology will make it easier to record more channels, as long as there are enough microphones. Big

microphone setups can take time to rig, and when recording live concerts there’s always a possibility that the recording venue is noisy or filled with a loud audience. King et al.

(2017) showed that it is possible to replace recorded audio with reverb in height channels and still have a preferable, realistic and immersive result. This could lead to fewer microphones, but still maintaining the audio qualities of the real recording.

The possibility to create more natural sounding environments increases with the increase of channels, which in turn puts pressure on the mixing engineer to know how to utilize the surround sound format to the fullest, and how effects like reverbs behave when used in surround. One might argue that audio engineers are capable of creating suiting reverberation that can compete with or exceed the recorded material in terms of realism or envelopment.

1.7 Research question

Can real recorded reverb be replaced with artificial reverb in the back channels of a 5.1 surround sound setup with the same perceived realism and/or envelopment?

1.8 Purpose

This could potentially lead to a better understanding of artificial reverb regarding the

attributes envelopment and realism. It could be seen as guidelines when creating

artificial reverb, so that the audio engineer knows what to use when needing a more

enveloping or realistic sound.

(11)

2. Method

2.1 Method overview

In order to find out if realism or envelopment could be preserved when replacing real audio with artificial a listening test was conducted. The listening test had to have stimuli that were recorded in a way that suited this type of experiment. The study by King et al.

(2017) had done a similar comparison and they used classical music recorded in a large hall. The study made by Shriram (2011) showed that different types of reverbs are better suited for different types of instruments, and Atsushi et al. (2006) showed that music could be more or less preferable only depending on the microphone setup used in the recording. This suggested that at least two different artificial reverbs should be compared to the real recording and that at least two microphone setups should be used.

In accordance with prior research (King et al., 2017) the ambition was to make artificial reverbs that would be easy to recreate, for future purposes. The stimuli were then compared against each other in a listening test in terms of realism, envelopment and preference.

2.2 Attributes and preference rating

The attributes chosen for this test were realism and envelopment. The only difference between stimuli were in the back-channel audio. Therefore, all attributes associated with perceived direction, instrument width, or front channel information in general, were unnecessary. The attributes were chosen out of the attributes that Berg and Rumsey (2003) proposed when evaluating attributes for surround sound audio. The attribute “Naturalness” which was described by Berg and Rumsey as “How similar to a natural (i.e. not reproduced through e g loudspeakers) listening experience the sound as a whole sounds.” combined with the attribute “Presence”, described as “The experience of being in the same acoustical environment as the sound source, e g to be in the same room.”, could roughly be compared to the attribute “Realism” King et al. (2017) used in a study similar to this one. In the same study by King et al. they also used the attribute

“Immersiveness”. What makes a sound immersive might not depend on how it was recorded or how many speakers you are listening through. It could be as simple as the sound itself drawing you into the virtual world of that sound. This also means that a disconnecting sound could have the opposite effect, even if the recording and speaker conditions were right. Instead the attribute “Room Envelopment” proposed by Berg and Rumsey (2003) was chosen. Berg and Rumsey described it as “The extent to which the sound coming from the sound source’s reflections in the room (the reverberation) envelops/surrounds/exists around you – i e not the sound source itself. The feeling of being surrounded by the reflected sound.”, which comes closer to what this study focuses on. The direct sound of the sound source is, in this study, not as interesting as what has happened to the indirect sound.

In addition to the attribute ratings a preference rating was made, also based on the study by King et al (2017) which used both attribute and preference ratings. They found that the attribute ratings didn’t necessary mirror the preference ratings. Berg and

Rumsey (2003) describes preference as “If the sound as a whole pleases you. If you think

the sound as a whole sounds good. Try to disregard the content of the programme, i e do

not assess genre of music or content of speech.”. If a realistic and enveloping sound is

(12)

not preferable, the other unknown factors that determine the outcome makes the preference rating just as important.

2.3 Stimuli

The structure of each stimuli was either a multichannel recording in all five full-range channels, or a front channel recording of three channels with an artificial reverb in the two back channels.

All music was recorded with a 5.1 surround sound speaker setup in mind, but the microphone techniques clearly shows no LFE microphone. As mentioned above there was no mixing involved, other than normalizing the artificial back channels. This meant that the LFE channel was not used, both during recording and during the test.

2.3.1 Choosing instruments and type of music

Three recordings were the foundation of the stimuli used in this experiment. The

recordings were short classical music excerpts played on clarinet, piano and snare drum.

These instruments are different in frequency and timbre, and they differ a lot in their tonal width. Shriram (2011) showed that different types of reverbs were preferred on different types of instruments (piano, oboe and cello). These instruments were chosen with that in mind. All three musicians that was recorded in this study were asked to play a piece of music that included big dynamic differences and, if possible, also difference in frequency content. It was important that they were familiar with the piece and really could express the differences within the pieces, still with a musically satisfying recording. Larger dynamic differences tend to trigger and excite both natural and artificial reverberation in a way that could make any differences more obvious. The pieces that the musicians chose to play were “Opus 10: Etude No. 3 in E major”, by Frédéric Chopin, on piano, “Monolog” – first mov., by Erland von Koch, on clarinet and

“For what four?”, by Lalo Davilaon, on snare drum. All musicians were students at the School of Music at Luleå University of Technology.

2.3.2 Recording stimuli

For every performed musical piece a number of different simultaneous recordings were made. The surround microphone setups Fukada Tree and OCT Surround was used in this experiment. Two additional microphone setups were also used, including a Hamasaki Square and a stereo pair of close-up microphones, each pointing about 10°

away from center. All stimuli were recorded in the large concert room in Studio

Acusticum since the long reverberation of the hall is suitable for this type of experiment.

The adjustable ceiling in the hall was at the highest level, which gave the hall a

reverberation time of 2,5 s. The microphone setups used for the recordings was partly based on the study by Atsushi et al. (2006), where the Fukada Tree showed the best preference ratings in comparison with three other recording setups. Theile (2001) proposed a then new type of surround microphone setup, the OCT Surround, and compared it to the Fukada Tree. Both the Fukada Tree and the OCT Surround uses cardioid or super-cardioid microphones, which makes them easy to compare. This led to the conclusion that the same type of microphone could be used at all recording

positions. This posed a potential problem since all microphones in these two setups

(13)

didn’t all have the exact same characteristics. As seen in Figure 3 the left and right microphones are super-cardioids, rather than cardioids. Since this experiment focuses on the difference between different back channel information the microphone similarity in the front channels was more important than getting all microphone characteristics right. In the recording the Neumann KM184 (Georg Neumann GmbH, 2018) was used at all recording positions. This microphone was the only condenser microphone available in sufficient numbers, meaning that the only directional pattern of this microphone, cardioid, had to do. To not confuse the reader the modified OCT Surround that was used is onwards called OCTmod in the text. All microphones were recorded using RME

Micstasy preamps at the same gain and a Sequoia interface. No mixing was applied to the recordings, which means that there is only one microphone per channel.

The Hamasaki square that was used during recording was never used in the experiment due to the way it is supposed to be mixed with the rest of the channels. The idea is to take the two front microphones and add them to the front left and right channels, while the two back microphones are added to the left and right surround channels. If the reason one might replace backchannel audio is due to noise, it would seem strange if only the back microphones of the Hamasaki square were affected. In order to keep the number parameters low this recording technique was scrapped, even if it is often used in this type of recording. Neither were the close-up microphones used directly in this test, but indirectly fed through some of the artificial reverbs.

Figure 5. Microphone positions viewed from above with the instrument at the X to the right. The blue dots represent the Fukada Tree and the red dots represent the OCTmod. The same spot was used for their individual center microphone. All surround setup microphones were placed 2,5 m above the stage floor.

2.3.3 Creating the artificial back channels

The artificial reverb in the back channels was created to suit the front channels of each

microphone setup. The reverb was created by two audio engineer students at School of

Music at Luleå University of Technology. They were both provided with the three front

channels for both microphone setups and was then asked to create back channel audio

that suited the three channels in the front. They were also given information about

where this was recorded and were also instructed to, if possible, make the reverb sound

enveloping and realistic. They were provided with two types of audio when creating

their back channels. One student made his back-channel audio to the Fukada Tree front

channels from the close-up microphones and the back-channel audio to the OCTmod

(14)

front channels from the three OCTmod front channels. The other student was given the three Fukada Tree front channels when creating his back-channel audio for the Fukada Tree, and the close-up microphones for the OCTmod front channels. This is better shown in figure 6 where the audio that was used to create the back channels for each student for respectively technique is circled.

→ Reverb 1a, Fukada

Student 1

→ Reverb 1b, OCTmod

_____________________________________________________________________________________________________

→ Reverb 2a, Fukada

Student 2

→ Reverb 2b, OCTmod

Figure 6. Visualization of the audio used for each reverb.

The three front channels of each microphone setup had a more reverberant sound than the very dry close-up microphones. The two students used a Lexicon 960 to create the back-channels. They were both familiar with how the Lexicon 960 works and the sound of the big concert room in Studio Acusticum. They did not get to listen to the real back channel recording and were never able to compare their artificial audio to the real recording. The two reverb settings each student made were supposed to suit all three instruments.

One often used technique for artificial reverbs is to use really “dry” input audio in an

attempt to blend the spot microphone audio with ambience audio. It is, however, also

common to use the main recording pair/array as the audio that is fed through the

reverb, which this makes the added artificial reverb audio correlate more with the main

pair, often creating a more coherent whole. Those reverbs are blended into an already

decent mix, which differs from this study where the reverbs are used as individual

channels. The two students that created the reverbs had to make one of each reverb

version because of a potential outcome where one of the audio types would be better

suited for creating a reverb with certain characteristics.

(15)

2.3.4 Normalizing the back channels

When creating their backchannels both students increased the input level of the Lexicon by 6 dB when using the close-up microphones, and one of the students did not use his center channel when running the three front channels through the Lexicon. Since all artificial reverbs was created unknowing of the real back channel audio this led to different audio levels among the back-channel pairs. A higher level of the back channels would be perceived as more enveloping (Soulodre et al. 2003). It was solved by

normalizing all back channels to the same loudness level. The artificially created back channel audio was a compromise between the three different instruments which meant that levels could differentiate a lot between transient rich sounds and sounds without transients. Thus, the configuration of the test determined how the audio was

normalized. The test subjects were to grade three different surround stimuli against each other, one real recording and two with artificial back channels, which meant that the two stimuli with artificial back channels had to be normalized to the same loudness level as that real recorded back channels. The loudness measurement was done in accordance with EBU R128 and normalized within a tenth of decibels accuracy (European Broadcast Union, 2014).

2.3.5 Back-channel bias

The artificial back channels could have been made by the recordist, but the awareness of how the recorded audio actually sounded could have influenced the way the artificial reverb was made. That could have directed the different stimuli towards being almost inseparable, which was not the intention of this study. By using unbiased students when creating the artificial reverbs, the recorded and the artificial audio would not turn out alike. This projected their own subjective view onto the realism or envelopment of the audio they created. That is also why more than one student was asked to make an artificial reverb, so that there would be a larger representation of what engineers think of as “real” or “enveloping” when creating reverbs. Two unbiased versions are a small, and not in any scientific way representable number, but still more representative than one biased version.

2.3.6 Finalizing the stimuli

The recorded pieces were, except for the snare drum piece, too long for the intended purpose of the music and were therefore shortened. As the intention when choosing pieces of music was to have a great dynamic range within the piece, it was of great importance that the shortened versions of the music also contained that dynamic range.

The shortening was also done in a way to keep whole phrases of the piece intact. This gave test versions of the pieces a length of about one minute. Shorter parts than one minute would not have displayed all different qualities of the recordings.

2.4 Listening test

The listening test took place at the School of Music at Luleå University of Technology in

room L151, an acoustically treated mixing and control room for 5.1 surround. The

volume of the test was set by two audio engineers at a level they individually felt

(16)

comfortable. The average volume between those was used during the test and the test subjects were not allowed to change the volume during the test. All visible meters in the listening room that could indicate something about the sound were hidden so that the test subjects only could rely on their hearing. The subjects were asked to sit in the sweet spot, with the same distance from all speakers. This ensured that all subjects had the same listening conditions through the test and that no sound levels were harmful.

The test interface was done using a customized version of MUSHRA, without reference stimuli, that included three stimuli at each test trial. The trials were in random order and the order of the stimuli in each trial were also randomized. This ensured that the order of stimuli couldn’t determine the end result. The top left corner on each test page indicated what attribute/preference the test subject was to rate. On each test page were also three buttons connected to each of the three simultaneous stimuli and above the buttons were faders, which were used for rating each stimulus ranging from 0 to 100. A higher number was associated with a higher perception of the attribute, i.e. more

enveloping, realistic or preferable. When pushing one of the buttons the stimuli

associated with that button was played. For every page there was a section dedicated for the playback, including position in stimuli, start and stop-positions and a play, a pause and a loop button.

Figure 7. The MUSHRA test page that was used during the test. In the actual test only three stimuli were compared at the same time, with no reference track.

Each test subject underwent a brief training test before the real test on how to use the

interface which mainly consisted of learning the playback section. They were also given

a short explanation of each attribute and what type of stimuli they were supposed to

rate. They were also trained in how they were going to rate the stimuli. No information

about the actual difference was given to the test subjects.

(17)

During the test the test subjects compared three stimuli against each other and rated them according to what they thought sounded most realistic/enveloping/preferable.

Since there were three instruments recorded with two microphone setups and three different types of ratings the test subjects had to do 18 (= 3 ∙ 2 ∙ 3) different trials. This means that there were only six different sets of stimuli, containing three versions with the same front channels. The two different microphone setups were never compared against each other, as this study focuses on differences between the real recording and the artificial.

After the test each test subject was asked to write down their general thoughts about the test, for example if they found differences in how they perceived different attributes, instruments etc.

2.5 Test subjects

In total 17 test subjects, ranging from 21 to 29 in age, did the test. All test subjects were studying at the School of Music at Luleå University of Technology and all had experience of listening to live classical music and/or surround sound music. All subjects reported to have normal hearing.

It was important that the test subjects didn’t know what the differences between the

stimuli were beforehand. If they knew that the differences were in the back channels

their listening focus could have been directed towards the back channels, not listening

to the surround stimuli as a whole. This is also why some students at the school were

disqualified from partaking in the test, despite them being good listeners.

(18)

3. Results

The raw data from the test is presented in appendix, including the distribution

represented in histograms. The written comments each test subject made after the test are also in appendix. The comments are not translated from their original language, in order to not lose vital information in translation. Histograms over the t-test with significant p-values can be found in the appendix.

3.1 Average test score, T-tests and p-values

The T-tests in this study were paired and two-tailed, since the compared numbers were set by the same test subjects and with a possible outcome on both sides. The confidence interval of these t-tests was set to 95 %.

Table 1. Average test scores and p-values for the attribute Realism. The highlighted numbers represent p-values lower than 0,05.

REALISM Average test score P-value

Stimuli Real Rec. Reverb 1 Reverb 2 Real vs Rev1 Real vs Rev2

Piano, Fukada 82.1 81 82.3 0.8396 0.9626

Piano, OCTmod 89.2 80.5 80.8 0.0401 0.0403

Clarinet, Fukada 84.1 81.1 82.7 0.6509 0.7637

Clarinet, OCTmod 87.4 62.5 81 0.0028 0.2077

Snare, Fukada 84.5 74.8 70.7 0.1010 0.0165

Snare, OCTmod 77.2 71.4 73.2 0.3923 0.5365

Table 2. Average test scores and p-values for the attribute Envelopment. The highlighted numbers represent p-values lower than 0,05.

ENVELOPMENT Average test score P-value

Stimuli Real Rec. Reverb 1 Reverb 2 Real vs Rev1 Real vs Rev2

Piano, Fukada 68.6 84.2 76.6 0.0068 0.2560

Piano, OCTmod 73 71.2 70.1 0.7810 0.6563

Clarinet, Fukada 61.3 81.2 70.2 0.0130 0.1264

Clarinet, OCTmod 71.6 83.9 66.1 0.1119 0.3192

Snare, Fukada 64.8 77.9 81.9 0.0157 0.0055

Snare, OCTmod 83.1 75.1 65.5 0.1824 0.0349

(19)

Table 3. Average test scores and p-values for the attribute Preference. The highlighted numbers represent p-values lower than 0,05.

PREFERENCE Average test score P-value

Stimuli Real Rec. Reverb 1 Reverb 2 Real vs Rev1 Real vs Rev2

Piano, Fukada 84.1 78.9 80.2 0.5063 0.5742

Piano, OCTmod 80.9 89.2 79 0.2072 0.7706

Clarinet, Fukada 84 84.9 79.1 0.8776 0.3989

Clarinet, OCTmod 85.2 79.6 81.8 0.3611 0.5834

Snare, Fukada 88.8 78.3 70.9 0.1391 0.0103

Snare, OCTmod 85.1 76.2 78.8 0.2708 0.2897

3.2 Comments by the test subjects about the test

All test subjects found the test to be hard, and no one was really sure about all their answers, saying that some stimuli sounded too alike. The predicted test time was around 20 minutes, while the actual test time ranged from 25 minutes to one hour with an average test time around 35 minutes.

3.2.1 Comments regarding the instruments

The most common comments made by the test subjects regarded the snare drum.

Several of them thought that the snare drum sound was the easiest to evaluate because the stimuli differentiated most. Some of the test subjects, however, had the same reason for the opposite – they found it harder to evaluate the snare drum stimuli because of the bigger differences.

The comments regarding the piano circled around it being hard to evaluate. One test subjects said that it was due to the much stronger front channels than back channels.

Others thought that the piano was the easiest one to evaluate.

The comments regarding the clarinet were, similar to the snare drum comments, about some finding it to be the easiest one to evaluate while others found it to be the other way around. Some pointed out that the clarinet stimuli contained more

instrument/instrumentalist noise than the other stimuli.

3.2.2 Comments regarding the attributes

The perceived realism in the test was hard to evaluate for the majority of the test subjects, due to them not knowing which reference frame for realism to use. Some test subjects differed in their opinion about realism with certain instruments, but the snare drum was almost always involved.

The comments about perceived envelopment varied from test subject to test subject,

with some finding it very easy to evaluate, while others were struggling with certain

instruments. The majority seemed to think that it was easier to evaluate envelopment

than realism.

(20)

Very few test subjects made any comments about the preference rating, but when they commented preference they all spoke of it in comparison to the other attributes.

Perceived large stimuli differences tended to give clearer preference scores.

Some of the test subjects perceived a difference in the stereo width when switching

between the three stimuli, despite there being no difference in the front channels.

(21)

4. Analysis

4.1 T-test instrument analysis

Two of the significant p-values regarding the piano was connected to realism, where the real recording was perceived as significantly more realistic than the both reverbs. The third significance was related to envelopment where Reverb 1 was significantly more enveloping than the real recording.

The clarinet had two significant p-values, one regarding realism where the real

recording was rated significantly higher, and one regarding envelopment where Reverb 1 was rated significantly higher.

The snare drum had five significant p-values with three of them regarding envelopment, one realism and one preference – the only one regarding preference. Three of these were a comparison between the same audio, which probably means that there was a big difference making it easy to evaluate. One of the significant p-values for envelopment were, unlike the other significant p-values for envelopment, showed that the real recording was rated higher than Reverb 2.

When comparing the different average values of the instruments there are bigger differences between the average snare drum values and smaller between the average piano values. This is probably why five of the ten significant p-values are located among the snare drum stimuli. The spread of the data, however, was more or less the same for all tests, regardless of instrument.

4.2 T-test attribute analysis

Table 1, 2 and 3 shows that it was easier to evaluate attributes compared to preference.

Four out of ten significant p-values were connected to realism, five to envelopment and only one to preference. Two of the significant p-values connected to realism were comparisons between the same reverb (Reverb 1) and the same microphone technique, and the average test results is leaning towards the same p-value in the third comparison as well. Three of the significant p-values connected to envelopment were comparisons between the same reverb (Reverb 1) and the same microphone technique. Three out of the four snare drum p-values connected to envelopment were significant, proving that the difference between the different back channels was easy to evaluate. The only p- value of significance regarding preference was a snare drum comparison, proving that the real recording sounded better.

Generalized the average test scores for the attributes and preference of each back-

channel version can be displayed as:

(22)

Table 4. Average test scores for the attributes and preference of each back-channel version when using the Fukada data (blue), OCTmod data (yellow) and the average of all the data (green). The highlighted p-values are those below 0.05 and the p-values that are bold are below 0.0125.

Attribute Version P-value

Real rec Rev 1 Rev 2 Real vs Rev1 Real vs Rev2

Realism 83.5 79 78.6 0.1689 0.0905

Envelopment 64.9 81.1 76.3 9.34E-06 0.0016

Preference 85.6 80.7 76.7 0.216 0.0176

Realism 84.6 71.5 78.3 0.0006 0.0342

Envelopment 75.9 76.7 67.2 0.8317 0.0273

Preference 83.7 81.7 79.8 0.6086 0.2675

Realism 84.1 75.2 78.5 0.0005 0.0066

Envelopment 70.4 78.9 71.7 0.0016 0.6296

Preference 84.7 81.2 78.3 0.2135 0.0123

This shows that the real recording significantly sounded more realistic than both of the artificial reverbs. Reverb 1 on the other hand generally sounded more enveloping than the other two, but only significantly more enveloping when compared with the Fukada back channels. This should be compared with Reverb 2 which shows significant p-values for both microphone techniques, but differs in that it is evaluated more enveloping than the Fukada recording and worse than the OCTmod recording. The real recording is generally considered more preferable, but only significant when combining all preference data.

The lowered significance level is due to the data being compared more than once.

According to the Holm-Bonferroni method the level of significance should be the normal significance level (in this case 5 %) divided by the number of times the data has been compared. In this case, where the data has been compared four times, the significance level should be

^0.05₄

= 0.0125 (Statistics how to, 2018). Instead of using the same

significance level for all p-values a small correction to the significance level is made. The smallest p-value has to be lower than 0.0125, but the second smallest p-value only has to be lower than

^0.05₃

= 0.0167, and so on. If one of the p-values are higher than the

significance level, that one and all following p-values are deemed too high. Each row is its own comparison, shown in table 4 where only the bold letters represent lower than significant levels.

4.3 Test subject comments analysis

Most test subjects found it easy to hear differences in the snare drum stimuli and their

answers in the test also partly shows this. Even if the more interesting results are

located among the snare drum stimuli, there is no real overall consistency between the

(23)

results. The test subjects who thought they could better hear the differences between the other instruments might have done so, but the data shows that it wasn’t the overall case.

The test subjects’ individual interpretation of the different attributes might have changed the result, as one test subject didn’t know what reference to use when comparing the acoustics of the recording. A better description of the attributes might have given clearer results.

4.4 Artificial vs real recorded reverberation analysis

The biggest differences when comparing the different stimuli versions is the amount of direct sound. The away-facing microphones are still capturing some of the direct sound from the sound source, and this was not accounted for by the engineers who made the artificial reverbs. As they were never able to hear the real recording there were some other properties that differentiated, for example reverberation time. The big hall in Studio Acusticum has a reverberation time (RT60) of 2.5 s long (Studio Acusticum, 2018). The recorded files show that the room has a RT60 of 2.33 s, while the artificial reverb ranges from 2.5 s to 3 s. Soulodre et al. (2003) pointed to that being one of the reasons that some artificial reverbs sounded more enveloping. This is, however, hardly the case since the artificial reverb with the shortest RT60 (2.5 s) was the one that had p- values below or close to 0,01 when compared to the real recording for every

envelopment rating. The structure within the artificial reverb is the probable cause of the increased envelopment. In the real recording there is a smoother spacing of the early reflections and a smoother reverberant sound, while the artificial reverbs are rougher in the early reflections and not as smooth. The real recording is more or less fading from a strong initial peak, while the artificial reverbs swell and then fade. This is more in line with the study by Soulodre et al. (2003) who argued that the late energy (105 ms after the arrival of the direct sound) is the energy responsible for envelopment. The initial peak is probably direct sound captured by the away-facing microphones. The real recording has more treble than the artificial reverbs.

Figure 8. The OCTmod backchannel (left, grey), Reverb 1b (left, blue), Reverb 2b (left, red), Fukada Tree backchannel (right, grey), Reverb 1a (right, blue) and Reverb 2a (right red). A hit on the rim of the snare drum was used to evaluate the reverberation.

(24)

Even if different types of input signal were used when creating the artificial reverbs

there seems to be no big difference in how the two types of reverbs turned out. There is

roughly the same amount of low p-values for each type of input signal. What really

stands out is the Fukada Tree recording technique that seems to have impacted the

reverberation the most. This could depend on the larger distances between the

microphones in the Fukada Tree, compared to the OCT Surround. It means that there

could be less correlating audio between all 5 microphones in the Fukada Tree, making

the task of creating a suitable pair of back channels harder compared to the OCT

Surround where the microphones are much closer together.

(25)

5. Discussions and Conclusions

Out of the main 36 t-tests ten had p-values below 0.05. This proves that real recorded reverb can be replaced with artificial reverb in the back channels of a 5.1 surround setup and maintaining realism and envelopment. For some types of sounds the artificial

reverb is triggered in a way that does not simulate the real world enough for it to be convincing. The great dynamic differences of the snare drum with the short percussive sound turned out to be the easiest to detect and evaluate. Looking at the results,

however, it shows that it was rather a combination of instrument, recording technique and type of reverb that caused the more detectable differences. When creating artificial reverbs for this type of venue, the smoothness of the early reflections and the beginning of the reverberant sound could make it more like the real world, perhaps increasing the perceived realism. For the reverb to sound more enveloping one could try to increase the volume of the earlier part of the reverberation, like a delayed smeared echo.

Many of the test subjects thought the test was hard and that some of the stimuli started to blend together after a while, not really knowing what was realistic or enveloping anymore. The long test time could have been cut in half using only one of the recording techniques, and thus letting the test subjects evaluate with fresher ears. In hindsight knowing that one of the recording techniques had all but one of the more significant t- test scores, both techniques might have been used anyway.

The study by King et al. (2017) showed that there was no preference for artificial or real recorded reverb and the results of this study, with fewer discrete channels, generally shows the same. King et al. found that the real recorded reverb was generally rated higher in realism, and the artificial reverb higher in envelopment. This is also shown in this study but can only be viewed as significant for a few of the comparisons. Shriram (2011) showed that different types of reverbs were preferred for different types of instruments. This study cannot confirm the preference ratings, but there is clear evidence that some types of instruments were easier to evaluate with some type of reverberation.

The validity of this study can be found in the answers from the test subjects and their test results. As a majority of the test subjects clearly stated that some

attributes/instruments were easier to evaluate, the majority of the significant test results also showed this. As all test subjects were instructed in the same way, with the attribute definitions by Berg and Rumsey (2003). These attribute definitions are used many times in these types of studies.

The reliability of this study could be compared to the study by King et al. (2017) who did a similar test with similar findings. If both tests got the same results, with a slight

difference in method, the results of the studies should be considered reliable.

(26)

6. Future applications and further studies

As mentioned in the end of the background section, one of the main uses of the artificial reverbs are to create spaces that couldn’t be captured during the recording, due to microphone limitations, noise or time. This study, as the study by King et al. (2017) and Shriram (2011), confirms that it is possible to maintain or increase realism or

envelopment with an artificially created reverb. In the study by King et al. (2017) they chose a standard setting on the Lexicon 960 that sounded close the real recording and didn’t change any parameters. In this study the artificial reverbs were created to suit the front channels which led the artificial reverb creators to change parameters until they got the desired sound. Further studies could refine the actual parameters responsible for difference in envelopment or realism.

One possible use of the findings of this study, but out of scope, is the up-mixing

possibility it entails. This could also be derived out of the study by King et al. (2017). If

the artificial reverb creator knows how to create reverbs suitable for a stereo to 5.1

surround up-mix, they should also be able to up-mix from 5.1 to 7.1, or 9.1 surround.

(27)

7. Bibliography

Atsushi, M., Francisco, M., Kim, S., Martens, W. L., Walker, K., (2006, June). An Examination of the Influence of Musical Selection on Listener Preferences for Multichannel Microphone Technique. Presented at the 28

^th

AES International Conference, Piteå, Sweden.

Berg, J., Rumsey, F. (2003, June). Systematic Evaluation of Perceived Spatial Quality.

Presented at the 24

^th

AES International Conference on Multichannel Audio, Banff, Canada.

DPA Microphones. (2017). Fukada tree. Retrieved 2017-12-05 from http://www.dpamicrophones.com/mic-university/fukada-tree

DPA Microphones. (2017). Hamasaki Square. Retrieved 2017-12-06 from http://www.dpamicrophones.com/mic-university/hamasaki-square

European Broadcast Union. (2014). R 128 Loudness normalization and permitted maximum level of audio signals. Retrieved 2018-04-18 from

https://tech.ebu.ch/docs/r/r128.pdf

Everest, F. A., & Pohlmann, K. C. (2009). Master Handbook of Acoustics (fifth edition).

New York: McGraw-Hill Companies.

Georg Neumann GmbH. (2018) Miniature Microphone series 180. Retrieved 2018-03-19 from

http://www.neumann.com/?lang=en&id=current_microphones&cid=km180_descriptio n

Hamasaki, K., Hiyama, K., Nishiguchi, T., & Okumura, R., (2006, May). Effectiveness of Height Information for Reproducing the Presence and Reality in Multichannel Audio System. Presented at the 120

^th

AES Convention, Paris, France.

Holman, T. (2008). Surround Sound Up and Running (second edition). Burlington, Massachusetts: Elsevier.

International Telecommunications Union. (2013). Multichannel sound technology in home and broadcast applications. Retrieved 2018-04-18 from

https://www.itu.int/dms_pub/itu-r/opb/rep/R-REP-BS.2159-6-2013-PDF-E.pdf King, R., Leonard, B., Howie, W., & Kelly, J., (2017, May). Real or Illusion? A comparative study of captured ambiance vs. artificial reverberation in immersive audio applications.

Presented at the 142

^nd

AES convention, Berlin, Germany.

Schoeps Mikrofone. (2017). Surround system OCT surround set. Retrieved 2017-12-06

from http://www.schoeps.de/en/products/oct-surround-set

(28)

Shriram, A. (2011). Perceived Differences between Natural and Convolution Reverberation in 5.0 Surround Sound. (Master’s thesis, University of Jyväskylä, Jyväskylä). Retrieved from https://jyx.jyu.fi/dspace/bitstream/handle/123456789/27082/URN:NBN:fi:jyu- 2011052610924.pdf;sequence=1  

Soulodre, G. A., Lavoie, M. C., Norcross, S. G. (2003). Objective Measures of Listener Envelopment in Multichannel Surround Systems. Presented at the 24

^th

AES International Conference on Multichannel Audio, Banff, Canada.

Statistics how to. (2018). What is the Holm-Bonferroni method?. Retrieved 2018-05-23 from http://www.statisticshowto.com/holm-bonferroni-method/

Studio Acusticum. (2018). Tekniska Specifikationer. Retrieved 2018-04-18 from http://www.studioacusticum.se/en/lokaler/stora-salen/tekniska-specifikationer/

Theile, G. (2001, June). Natural 5.1 Music Recording Based on Psychoacoustic Principles.

Presented at the 19

^th

AES International Conference, Schloss Elmau, Germany.

(29)

Appendix 1. Test scores for the real recording (piano-blue, clarinet-green, snare-yellow):

Test sub. Stim 1 Stim 2 Stim 3 Stim 4 Stim 5 Stim 6 Stim 7 Stim 8 Stim 9

1 86 100 92 100 100 97 97 79 91

2 90 70 90 90 51 80 81 21 81

3 84 100 63 100 85 73 91 71 100

4 76 52 64 70 62 84 50 81 82

5 46 77 78 61 57 100 81 59 72

6 84 72 86 86 61 73 64 45 76

7 100 41 29 100 77 66 100 47 71

8 72 75 86 84 77 100 89 71 74

9 100 84 100 100 35 58 86 100 100

10 94 100 96 100 100 100 100 98 84

11 87 61 95 76 39 100 59 32 100

12 74 68 79 90 92 100 100 87 77

13 100 33 100 94 80 41 100 54 100

14 86 90 100 95 60 90 95 40 100

15 92 75 100 100 100 100 86 62 90

16 70 31 71 100 93 67 76 42 50

17 54 38 100 71 72 46 74 53 80

Test sub. Stim 10 Stim 11 Stim12 Stim 13 Stim 14 Stim 15 Stim 16 Stim 17 Stim 18

1 100 100 82 80 94 74 96 90 100

2 90 50 89 75 74 81 40 100 70

3 100 67 100 83 93 100 100 85 84

4 100 92 100 97 56 100 64 68 100

5 76 71 79 39 46 100 44 75 100

6 67 50 87 84 84 71 82 93 82

7 100 68 44 100 29 100 54 100 100

8 100 94 100 100 65 73 31 100 58

9 100 33 87 100 71 100 100 100 100

10 100 92 100 100 89 84 100 77 100

11 86 90 79 85 41 100 72 48 50

12 77 84 74 82 66 100 91 72 54

13 64 72 100 61 36 81 81 89 100

(30)

14 95 70 100 90 80 98 90 90 100

15 67 84 81 87 77 82 100 92 82

16 93 55 47 100 46 65 67 34 66

17 70 46 100 74 54 100 100 100 100

Test scores for the artificial reverb 1 (piano-blue, clarinet-green, snare-yellow):

Test sub. Stim 1 Stim 2 Stim 3 Stim 4 Stim 5 Stim 6 Stim 7 Stim 8 Stim 9

1 86 100 94 94 100 100 97 100 100

2 91 90 88 90 50 85 100 60 91

3 91 76 100 90 76 100 82 100 71

4 92 77 89 50 100 100 100 54 53

5 61 100 100 76 49 68 61 72 100

6 91 85 70 81 82 90 87 92 76

7 22 78 100 54 100 100 39 100 100

8 82 86 77 70 59 77 100 87 100

9 91 100 52 86 69 100 53 46 58

10 100 91 86 100 74 84 89 79 97

11 85 73 40 100 30 89 27 60 87

12 82 73 61 74 76 75 100 87 100

13 100 100 86 73 100 70 88 92 56

14 100 100 95 80 80 100 100 80 90

15 86 91 72 74 76 79 100 100 100

16 70 31 71 100 29 100 100 72 80

17 47 81 60 76 61 100 56 100 85

Test sub. Stim 10 Stim 11 Stim 12 Stim 13 Stim 14 Stim 15 Stim 16 Stim 17 Stim 18

1 100 100 82 100 100 100 100 100 86

2 35 81 84 100 100 63 80 49 90

3 64 100 92 100 80 82 76 100 89

4 25 76 52 67 71 53 37 91 55

5 59 61 84 47 65 64 34 52 81

6 76 76 79 75 69 70 87 89 71

7 55 100 76 69 58 30 100 61 37

8 57 100 84 86 100 100 46 92 99

9 42 100 100 40 100 63 51 85 50

10 78 100 92 84 79 91 86 91 83

(31)

11 32 20 45 85 64 95 84 71 40

12 59 94 100 72 62 73 72 53 100

13 28 92 49 39 87 100 53 57 84

14 98 80 95 95 60 95 100 65 95

15 100 73 100 92 86 100 75 75 100

16 100 84 69 54 61 78 79 65 89

17 55 90 71 66 82 74 54 80 46

Test scores for the artificial reverb 2 (piano-blue, clarinet-green, snare-yellow):

Test sub. Stim 1 Stim 2 Stim 3 Stim 4 Stim 5 Stim 6 Stim 7 Stim 8 Stim 9

1 100 87 100 89 100 97 100 58 90

2 90 49 87 90 51 81 89 31 100

3 100 87 82 81 100 87 100 83 87

4 62 98 100 87 82 54 74 98 100

5 77 83 82 61 54 83 55 66 72

6 79 54 100 83 46 93 87 87 89

7 58 100 65 78 57 39 76 61 41

8 90 51 100 76 87 61 77 76 47

9 58 56 45 63 100 87 100 61 81

10 92 75 100 78 94 77 94 71 100

11 66 82 65 100 64 86 91 85 46

12 91 94 86 67 67 79 100 87 73

13 100 80 78 95 30 100 79 73 69

14 95 95 90 95 70 95 90 60 95

15 100 100 71 77 86 79 72 80 69

16 70 50 72 100 63 67 76 42 85

17 71 62 41 53 40 78 46 74 100

Test sub. Stim 10 Stim 11 Stim 12 Stim 13 Stim 14 Stim 15 Stim 16 Stim 17 Stim 18

1 100 83 100 89 92 82 96 85 86

2 100 42 81 90 76 41 100 30 84

3 82 86 84 69 100 68 88 69 100

4 48 55 79 51 95 80 90 33 84

5 67 66 100 51 57 65 54 59 77

6 52 79 86 64 86 64 54 61 71

7 81 37 100 31 100 16 68 38 67

(32)

8 76 76 88 77 88 49 38 77 78

9 84 53 50 80 89 100 60 51 70

10 87 79 75 96 100 100 84 100 93

11 66 40 100 80 87 95 47 88 85

12 69 77 82 69 83 76 79 87 94

13 100 57 51 50 82 74 70 70 71

14 100 75 90 100 70 100 95 80 81

15 85 86 72 67 64 66 91 66 71

16 100 55 69 91 60 78 79 66 66

17 80 77 83 47 64 52 52 54 61

Histograms over the spread of each stimuli

The histograms show the distance between the individual compared data, and only the data that gave significant results are shown in these histograms. The average distances are also shown, and a positive average points towards the real recording, and a negative points towards the artificial reverb.

Histograms showing specific stimuli results.

Average: -15.6 Average: 8.8

Average: -19.9 Average: 24.8

(33)

Average: -13.1 Average: 8.5

Average: 13.8 Average: -17.2

Average: 17.8 Average: 17.6 Histograms showing overall results.

Average: -16.2 Average: -11.4

(34)

Average: 8.9 Average: 13.1

Average: 6.3 Average: 8.7

Average: 8.9

(35)

Average: 5.6

Average: -8,5

Average: 6.4

(36)

Test subject comments:

S1

Upplevde främst skillnad i ljudbild med virveltrumman.

S2

Virveltrumman var den absolut enklaste att urskilja, då det endast var transienter.

Klarinetten var hyfsat svår, men gick ändå helt ok tack vare andningarna och blåsljud.

Pianot var däremot väldigt svårt, väldigt starkt rakt framifrån. Det blev svårt att uppleva de ljud som fanns där bak.

S3

Easier to evaluate snare drum sounds. Hardest to evaluate the piano.

S4

Det var lättast att höra skillnad på pianot. Det var ibland svårt att avgöra hur omsluten jag känner mig av ljudet.

S5

Ljuden kändes verklighetstrogna men var svåra att urskilja. Virveln kändes kanske minst verklighetstrogen men var lättare att urskilja på de olika exemplen.

S6

Det var svårt att hitta skillnader hos de olika instrumenten. Ibland upplevde jag att vissa var mer omslutande än andra, men oftast hade jag det svårt att höra nån skillnad. Det var i vissa enstaka fall som jag kunde höra den skillnaden.

S7

Det var lätt att känna sig omsluten av ljuden. Svårt att höra vilka som var mer

verklighetstrogna än de andra. Klarinett var lättare än slagverk. Piano var däremellan.

S8

Hög kvalité på samtliga ljud. Det kändes nog enklast att höra skillnad på hur omsluten jag kände mig. Jag tyckte det var små nyanser som skilde ljuden åt så jag hade svårt att höra skillnad ibland. När det var fråga om verklighetstrogenhet så visste jag inte vilken akustik jag skulle utgå ifrån.

S9

Generellt sett upplevde jag att de ljud som lät ”sämst” i de olika kategorierna oftast verkade ha

ljudet bakifrån mer fördröjt än de ”bra” ljuden.

S10

Jättefina inspelningar. Omslutenheten och min bedömning av ljuden var lättare att höra skillnad på än att avgöra om det var verklighetstroget. Personligen tyckte jag det var lättats att höra klangskillnader på klarinett. Men det var lättare att höra skillnad i reverb och rum med virveltrumman. Pianot var alltid svårast att höra skillnad på.

S11

Artificial Reverb vs. Real Recorded Reverb in the Back Channels in a 5.1 Surround Setup