• No results found

The impact of onset transient duration onperceived transient loudness: Could transient level reduction be compensated by increasing transient duration?

N/A
N/A
Protected

Academic year: 2022

Share "The impact of onset transient duration onperceived transient loudness: Could transient level reduction be compensated by increasing transient duration?"

Copied!
28
0
0

Loading.... (view fulltext now)

Full text

(1)

perceived transient loudness

Could transient level reduction be compensated by increasing transient duration?

Jakob Erlandsson

Audio Technology, bachelor's level 2020

Luleå University of Technology

Department of Arts, Communication and Education

(2)

Abstract

When mixing and mastering modern music, many engineers strive for making the end product to be perceived as loud as possible without sacrificing audio quality. Achieving this will often involve reducing the dynamic range of a track through peak limiting. By reducing the level of the loudest transients, the additional headroom can be used to raise the overall level of the track. This method of maximizing loudness through transient suppression has, arguably, made it more relevant to understand the human perception of transients. By further understanding the psychoacoustical factors that plays a role in how transient loudness is perceived, engineers could hopefully achieve greater audio quality while maximizing loudness, if desirable. This paper will focus on how the signal duration of a transient will affect its perceived loudness and potentially compensate for level changes within transients. The fact that sound signals of longer duration are perceived as louder than signals of shorter duration has been proven several times in prior research. This effect is tested again, in this research, by letting participants match the loudness of several short pink noise bursts of varying durations. The noise bursts are designed to mimic the envelope of a typical snare drum transient, which makes the stimuli differ from stimuli tested in prior experiments testing the same effect.

Based on the result from this experiment, each transient is normalized to be perceived as equally loud. Then, a stationary component is added to every transient to make each stimuli mimic a full typical snare drum. In a second experiment, each stimuli is then compared against each other in an ABX test to see if listeners can perceive the differences.

The results from the first experiment showed that transients were perceived to be

approximately 0.3 dB louder per 5 ms increase in duration. In the second experiment, listeners failed to hear the differences between stimuli when transient duration differed less than 5 ms.

For differences in duration longer than this, listeners correctly identified the differences.

(3)

Table of Contents

1 Introduction ... 3

2 Fundamental concepts ... 4

2.1 Defining a transient ... 4

2.2 Loudness ... 5

2.2.1 Sound Duration ... 5

2.2.2 Just Noticeable Differences in Level ... 6

2.2.3 Masking ... 6

2.2.4 Primacy effect ... 7

2.3 Effects of peak limiting ... 7

2.4 Critique against traditional psychoacoustics ... 8

3 Aim and purpose ... 9

4 Method ... 10

4.1 Summary of method ... 10

4.2 Stimuli ... 10

4.2.1 Preparation ... 10

4.2.2 Deciding on relevant transient durations ... 11

4.2.3 Creating the transients ... 12

4.2.4 Creating the stationary component ... 13

4.2.5 Separating the transient ... 14

4.2.6 Target population ... 15

4.3 Experiment 1 ... 15

4.3.1 The experiment ... 15

4.3.2 Analysis and results ... 17

4.4 Experiment 2 ... 19

4.4.1 Preparation ... 19

4.4.2 The experiment ... 19

4.4.3 Method for analysis ... 20

5 Results and analysis ... 22

6 Discussion ... 24

6.1 Ecological validity ... 25

6.2 Conclusion ... 26

7 References ... 27

(4)

1 Introduction

In modern music production it’s common practice to use dynamic range compression and peak limiting in order to raise the perceived loudness of a track. By reducing the loudest peaks, the overall level of a track can be raised accordingly. For this reason, a lot of the onset transients present in a recording will often be subjected to a level reduction after the mixing and mastering process. This has led to some debates about how the reduction of transient level effects the quality of a music track. Prior research has tried to measure audibility and

preference regarding peak limiting on music tracks. However, differences in songwriting, arrangement, recording techniques and mixing approaches from one music production to another makes it hard to generalize the findings of these experiments. Therefore, it might be difficult to pinpoint specific sound attributes that are affecting the results when using music tracks as stimuli. If raising the loudness of a track by reducing transient level is desirable, sound engineers would have a lot to gain in extending their knowledge towards how different sound attributes are affecting the way transients are perceived by listeners.

To achieve a greater knowledge in this research area, a more controlled experiment approach might be suitable but might compromise the ecological validity of the experiment. Traditional psychoacoustic research uses far more controlled stimuli and offers insight on how auditory factors affect perceived loudness and level differences of sound signals. These findings can be used to some degree when trying to understand how transient loudness is perceived. This kind of research, however, is often using simple stimuli like sine waves and noises. While offering a greater experiment control, these types of stimuli arguably reduces the ecological validity of the experiment since they bear little resemblance to the harmonically and dynamically

complex instruments, effects and techniques used in music production. Though traditional psychoacoustics offers an important understanding on human hearing in laboratory

environments, it fails to fully explain the human auditory system in the sonic environment of everyday life (Neuhoff, 2004). Future research should test factors known to affect perceived transient loudness in traditional psychoacoustics using stimuli that bears greater resemblance to a musical instrument. However, the stimuli need to offer greater control than a full music track to allow for one specific sound attribute to be isolated for investigation.

So, there is a trade-off regarding the stimuli. Moreover, it is left to investigate what sound attributes that contribute most to how a transient is perceived. While many different sound attributes would be relevant to test to find an answer, this research will focus solely on how perceived transient loudness is affected by sound duration, more specifically the duration of the transient.

(5)

2 Fundamental concepts

2.1 Defining a transient

Traditionally, a transient has been defined as the initial peak of an impulsive sound, like a drum hit or a plucked guitar string. When experimenting on the human perception of transients, the transient component often needs to be separated from the rest of the sound to allow for manipulation of the transient independent of the rest of the sound. In traditional psychoacoustics, the separation has mostly been done by inserting a cut edit after the

transient, separating it in time from the rest of the sound. As modern technology has allowed for more detailed analysis and sound manipulation techniques, some concern has been raised against this method, claiming that it gives a false representation of the transient component.

Problematizing this trend, Siedenburg (2019) claims that the transition between the transient and the rest of the sound (referred to by Seidenberg as the stationary component) is gradual, meaning that the stationary component will, most often, be building up simultaneously to the fade out of the transient. By only separating the components through a cut edit, the transient would either have to be partially cut off or include parts of the stationary component.

Siedenburg defines a transient as an impulsive, short lived, chaotic burst of acoustical energy.

The stationary component, on the other hand, is by Siedenburg defined as when the sound has reached a more “steady state”. Based on this definition, the components will differ in

frequency as well as in time.

Siedenburg and Doclo (2017) proposed a DSP algorithm that would separate the transient from the stationary component, taking both the time and frequency information from said components into consideration. Based on the premise that the stationary component will be persistent over time and sparse in frequency and that the transient will be sparsely distributed in time and persistent across frequency, the algorithm could successfully separate the

components into two different signals despite them overlapping in time. Using the same algorithm to separate the components on several tonal instrument samples, Siedenburg (2019) found that the transient, for every instrument, was lower in level than the stationary

component. The transient from a vibraphone hit, for instance, was 18 dB lower in level than the stationary component. For a majority of instruments, the point in time showing the highest level was positioned later than the point showing the highest level for only the transient component.

It should be stated, however, that Siedenburg (2019) only used the algorithm on tonal instruments. In a music production context, the transients that are most heavily impacted by peak limiting will most often come from percussive instruments, like the kick or snare drum.

Using tonal instruments, the transient and the stationary component would, arguably, be easier to separate than in the case of snare drum. In that case, the rattling of the snares and the

(6)

disharmonic resonances of the snare drum gives the stationary component a noise like characteristic. Therefore, it might prove challenging to identify were the impulsive behavior of the transient ends and the steady state of the stationary component begins. Addressing this problem, Siedenburg and Doclo states that:

…transients are notoriously hard to define in semantic terms, because defining features such as short-livedness and stochastic nature can be easily contested in the context of complex audio mixtures. Consequently, ground truth for the stationary/ transient separation task is only available for synthesized test signals.

(Siedenburg, Doclo, 2017, s. 1)

To simplify the language in this report, a signal containing both a transient and stationary component is from now on referred to as a compound signal.

2.2 Loudness

Since this research aims to identify variables in a transient that affects the loudness of which the transient is perceived, a brief description of loudness is necessary. As described by Fastl and Zwicker (2007), loudness belongs to the category of intensity sensation. More

specifically, loudness refers to how loud a sound is perceived by the human ear. Research has shown this phenomenon to be incredibly complex and dependent on several different factors.

Therefore, the perceived loudness and the measured sound pressure level of a sound may very well differ quite radically. While not directly related to transient loudness, prior research has found many factors suggested to impact the human loudness perception. Some of these factors might, however, play a role in how transients are perceived and can be used as a basis for designing the experiment for this research. Some of the factors are presented and discussed below.

2.2.1 Sound Duration

As been established, transients are very short in time compared to the stationary component.

According to Fastl and Zwicker (2007) the duration of a sound does in several ways affect the loudness in which that sound is perceived. Research measuring the threshold of audibility has shown that longer sounds are audible at lower levels than sounds of shorter duration. The threshold of audibility is decreasing as the duration of the sound increases until the sound reaches a length of 200 ms. For sounds longer than 200 ms the threshold remains unchanged.

When measuring how perceived loudness is affected by sound duration at higher sound pressure levels, a similar effect can be observed. In this case, perceived loudness increases together with sound duration until reaching a length of 100 ms after which, the loudness remains unchanged.

While loudness, in this case, was measured using test tones presented at different durations at a sound pressure level of 57 dB (unfortunately, it’s not specified if the SPL scale was

weighted in any way), threshold of audibility was measured at levels significantly lower than

(7)

this. A comparison between these findings indicate that the effect provided by sound duration on perceived loudness is dependent on playback level. Unfortunately, the authors don’t offer a precise definition on how these factors correlate. This should, however, be taken into

consideration in the experiment design for this research since it suggests that the choice of playback level might affect the results.

In an experiment, also measuring loudness as a function of signal duration, Small, Brandt and Cox (1961) had participants listen to several noise bursts of varying durations and match their loudness to a 500 ms noise by adjusting a fader. The amount of level change provided by participants were documented and analyzed to find how much the perceived level varied depending on signal duration. The authors then compared their results to results from similar experiments conducted by Miller (1948) and Graner (1947). Although a slight variation could be observed in the steepness of the loudness function-curve derived from the different

experiments, the perceived loudness increases in a linear fashion as sound duration increases in all three cases. The same is true in the data presented by Fastl and Zwicker (2007).

2.2.2 Just Noticeable Differences in Level

Moreover, Fastl and Zwicker (2007) present data suggesting that the duration of specific sounds will also affect how sensitive listeners are to level changes between the sounds. This phenomenon is generally referred to as just noticeable differences in level, or JNDL. This concept refers to how great the difference in level has to be between two sounds in order for the difference to audible. The data suggests an increase in sound duration will reduce the JNDL, meaning that smaller level changes are audible at longer sound durations. This has been measured using a 1 kHz tone presented at durations between 2 ms and 0.5 s. At a reduction from 200 ms to 2 ms the JNDL increases by a factor of 5. This corresponds to a slope of about -6 dB per decade within this range. For durations above 200 ms, the JNDL does not decrease much.

Similarly, the authors present data indicating that JNDL is dependent on the sound pressure level at which sounds are presented. The data shows JNDL for a 1 kHz tone presented at several sound pressure levels between 10 and 100 dB. Suggested by the data, the lower the sound pressure level, the greater the level difference has to be for the differences to be audible. The JNDL increases rapidly at sound pressure levels below 20 dB towards zero.

More than 1 dB is added to the JNDL at 10 dB compared to 20 dB sound pressure level. As the sound pressure level increases above 20 dB, a slower decrease can be observed for the JNDL. From 40 dB to 100 dB sound pressure level, the JNDL is decreased about 0.2 dB.

2.2.3 Masking

An additional factor that one needs to be aware off when researching loudness is the phenomenon of masking. Masking is an effect present in most sounds experienced by humans. Explained by Fastl and Zwicker (2007), masking happens when one sound is preventing another sound from being heard or is reducing its perceived loudness. The effect

(8)

is, according to the authors, both time and frequency dependent. A closer look at the temporal aspects of the masking effect can prove useful towards understanding how transient loudness is perceived.

Fastl and Zwicker (2007) claims the masking effect can be divided into three categories, simultaneous masking, postmasking and premasking. Simultaneous masking can be explained as when a sound is being masked by another sound playing at the same time. Postmasking is when a sound is masked, from a masker that stops before the start of the masked sound.

Contrary, premasking is when the masker starts after the end masked sound. Premasking might occur in a period of 20 ms, or less, meaning that the part of the masked sound that is presented less than 20 ms before the start of the masker risks being masked. In the case of postmasking, the masking effect can be observed for a time period up to 200 ms after the end of the masker. The sound level that has to be reached by the masked sound in order to be audible, while being subjected to masking, is referred to as the masked threshold. This threshold will, in the case of postmasking, increase as the duration of the masker increases.

This effect can be observed until the masker reaches a duration of 200 ms, where the masked threshold reaches a fixed position. The authors state it has, so far, been impossible to decide whether premasking is dependent on masker duration.

Further presented by Fastl and Zwicker (2007), a reduction in perceived loudness will be present for the masked sound, even when exceeding the masked threshold. Keeping this in mind, it’s reasonable to assume some sort of masking to appear between the transient and the stationary component in a sound. With a drum hit, for example, the transient component could be assumed to affect how the loudness of the stationary component is perceived, through masking, and vice versa.

2.2.4 Primacy effect

Though not directly related to masking, some evidence suggests that the transient would in fact, affect the loudness of which the stationary component is perceived. Described by Oberfeld, Jung, Verhey and Hots (2018), the onset of a sound receives a higher weight than later portions when estimating its over-all loudness. This effect is called the primacy effect and has been observed for sound durations up to 10 s. This would suggest that transients play a large role in determining the over-all loudness of a musical sound. Since transients of longer duration are suggested to be perceived as louder, this might mean that a longer transient will also make the stationary component be perceived as louder.

2.3 Effects of peak limiting

Previous research has tried to find a boundary for when dynamic range compression is audible. Stärnman (2014) conducted a research in which participants were asked to differentiate music tracks, processed with a varying amount of peak limiting. In this

experiment the peak limiting could only be perceived when providing more than 8 dB of gain reduction.

(9)

In a similar experiment conducted by Hjortkaer and Walther-Hansen (2014), listener preference towards dynamic range compression was measured in a comparison between original and remastered versions of popular music tracks. The researchers specifically chose music tracks where the remaster have been debated for its impact on sound quality of the music track and have been used to demonstrate the negative effects of the loudness war.

Results from the test showed that a preference to the original master could be found, only when the remasters showed a reduction of peak to average ratio (PAR) by more than 8 dB relative to the original master. With a PAR reduced by less, no significant correlation to listener preference could be found.

The experiments made by Hjortkaer and Walther-Hansen (2014) and Stärnman (2014) gives useful insight in how dynamic range compression is perceived by the listener. However, they provide little insight in what psychoacoustical factors that are affecting the listener

perception. In both cases, the authors had little control over how decisions made in in the production stage of the music tracks might have impacted listener perception. To gain further insight in when peak limiting is or isn’t audible, stimuli needs to be less complex and offer further control over attributes that might affect the audibility of peak limiting. It should be noted, though, that in both cases a dynamic range reduction of 8 dB was needed in order to find any trends amongst listeners.

2.4 Critique against traditional psychoacoustics

While research using very complex stimuli, like the case of Stärnman (2014) and Hjortkaer and Walther-Hansen (2014), arguably lacks stimuli control, findings from traditional psychoacoustic literature has been critiqued to lack ecological validity (Neuhoff, 2004).

Traditional psychoacoustic research is often tested in laboratory environments using simple stimuli, like sinewaves or white noise. In many cases, these stimuli are presented with unmodulated pitch, amplitude and frequency content. Keeping this in mind, the previously discussed data from Fastl and Zwicker (2007) might prove insufficient in understanding loudness perception in a complex, musical context. Still, this data provides a ground to base future research on while striving for a more ecological understanding of psychoacoustics.

(10)

3 Aim and purpose

Previous research tells us that longer sound signals are perceived as louder within certain limits. While not being the main research question, this research aims to test that very effect again, using transient signals that slightly differs from signals tested in this way before. This is expressed in the first research question following below.

With the relationship of sound duration and perceived loudness in mind, it is hypothesized that transients can be lowered in level without it being audible to the human ear, if the transient level reduction is properly compensated by an increase in transient duration. If the hypothesis were to be correct, there must be an upper limit to when the decrease in level is too severe to be compensated by increasing duration. That limit will likely depend on how small duration differences the auditory system is able to identify in a sound signal. A threshold for how large an increase in signal duration has to be in order to be audible would be a relatively easy find for a signal played back in isolation. However, this threshold is hypothesized to be affected if the transient is followed by a second sound component (a stationary component), which will usually be the case in a real-life context. Several psychoacoustic effects discussed in this report, like the primacy effect, temporal masking and the very effect of sound duration, suggests the two sound components will influence how the other component is perceived.

With that in mind, this research also aims to investigate if, and to what extent, changes in transient amplitude can be compensated by changes in transient duration when followed by a stationary component. This aim is expressed in the second research question.

So, following up on the discussion above, this research proposes two questions:

- Will the duration of a transient affect the perceived loudness of that transient?

- Could transient amplitude changes be compensated by changing transient duration without introducing any audible differences in a compound signal context?

While trying to maintain control over the psychoacoustical factors that might impact the result, this research will use stimuli with a greater complexity than, what is referred to as, simple stimuli. Greater knowledge in this area could possibly help engineers to produce less audible artifacts while striving for a perceived loud output in music production. It could also help to understand the limitations or compromises that has to be taken into consideration in a mixing and mastering context.

(11)

4 Method

4.1 Summary of method

In order to answer both if sound duration truly changes perceived loudness for a transient, and if a longer sound duration can, in fact, be used to compensate for a reduced transient

amplitude, the research consisted of two experiments. In experiment one, participants were asked to loudness match six unique transients with varying duration, created from pink noise.

In the second experiment, four of the six transients were merged with a stationary component, also made from pink noise, to create four unique compound signal stimuli. The four transients were then loudness normalized according to the data from experiment 1. While the transients differed in duration and level for every stimulus in experiment 2, the stationary component was mixed to the same level in every case. Since the transient duration differed, the duration of the stationary component was altered for each stimulus so that the overall stimuli length would always be the same. In that way, the independent variables tested would mainly be the duration and level of the transient, but also the duration of the stationary component as a necessary biproduct. The four different stimuli were then tested against each other in an ABX-test to see if listeners could tell the difference from the between any of the four stimuli.

While aiming for a highly controlled experiment that, at the same time would be slightly more ecological than in the case of traditional psychoacoustic experiments, some trade-offs had to be made regarding the stimuli. A detailed description of the stimuli creating process follows below.

4.2 Stimuli 4.2.1 Preparation

It was reasoned that the instruments most likely to be subjected to a transient level reduction from the usage of peak limiting in a music production would be the kick and snare drum. The snare drum seemed to be the easiest one to replicate of the two, only using pink noise and without having to add additional signal processing that would potentially reduce experiment control. Therefore, a decision was made to mimic the envelope of a typical snare drum as much as possible when creating the transients, stationary components and eventually the compound signal.

So, the end goal was to have four monaural, pink noise transients of different durations that, together with a pink noise stationary component, were supposed to somewhat mimic the envelope of a snare drum to make the stimuli bear a resemblance to a sound that could be heard in a typical music production.

For creating the stimuli, five recorded snare drum hits were used as a reference. Snare drum 1 and 2 were recordings of the same drum and drummer, hitting the drum with two different

(12)

velocities, a hard hit in the first case and a medium hit in the second case. In both cases, the sample consisted of a mixture of close mikes, overheads and room mikes with some basic EQ and compression applied. The original samples were stereophonic, but for analysis only the left channel was used. Snare drum 3 and 4 were recordings of two different drummers using two different snare drums. In both of these cases, only a close mike was used. Snare drum 5 consisted of a snare drum sample from the stock sample library in logic pro x (Apple, 2020).

In this sample, it’s reasonable to assume that some processing in the form of EQ and

compression has been applied. While the sample were stereophonic, only the left channel was used for analysis.

In hope of finding a range of typical transient durations, all of the reference snare drums were analyzed using Izotope RX7 advanced (Izotope, 2019). Using this application, a spectrogram was portrayed, showing the energy across the full frequency spectrum at any given point in time. Figure 1 displays one of the snare drums bring analyzed using this interface. This

proved useful when trying to distinguish the transient based on Siedenburgs (2019) definition.

Unfortunately, no clear indicators could be found as to when the transient ended, and the stationary component began in any reference snare drum. The chart showed a distinct fundamental frequency followed by overtones in the bass and lower part of the mid-region, both at the onset and in the following part of the sound. In the higher parts of the spectrum, the reference snare drums showed a more noise-like characteristic, presumably coming from the rattling of the snares. Differences from the onset and later portions of the sound were, however, too small to clearly separate reliably a transient from it.

Figure 1 One of the reference snare drums being analyzed being analyzed using Izotope RX7

4.2.2 Deciding on relevant transient durations

Trying to solve the problem of finding valuable transient durations to test, a group of four sound engineering students, including the author, were gathered to listen to the reference recordings and pink noise transients of different durations, and in that way decide on what transient durations that would be relevant to use for the stimuli. All of the participants had

(13)

prior experience in recording and mixing music and were therefore familiar with the

terminology. In preparation for the meeting, nine pink noise transients were prepared ranging from a duration of 2.5 ms to 40 ms. Except for the first and second transient which ranged from 2.5 to 5 ms, the increase in duration between each transient were of 5 ms. The transient durations used for the listening session are shown in Table 1.

Table 1 Transient used in meeting with sound engineers

Transient (n) 1 2 3 4 5 6 7 8 9

Signal duration (ms) 2.5 5 10 15 20 25 30 35 40 The transients were created by digitally recording a pink noise in Avid Pro Tools and using the same part of that recording for every transient (Avid, 2019). In this way, every transient would have an identical starting point. The duration was changed for each transient by including more or less of the pink noise after the starting point. A linear, four sample fade in and fade out were added to each transient to avoid unwanted klicks.

To make it possible for the group to get a sense of what the end result would sound like, in a compound signal context, an alternative version of each transient was created were a

temporary pink noise stationary component were inserted after each transient ending 294 ms after the transient start. Similar to the transients, the same starting point were used for each stationary component. The part of the noise recording used for the stationary components were, however, different than the part used for the transients. The duration of the stationary component was altered between stimuli to make the over-all stimuli duration the same. The stationary component was lowered 11 dB in level relative to the transient and a linear crossfade, corresponding to half the duration of the transient, were added between each transient and stationary component. Lastly, a logarithmic fade out, starting immediately after the crossfade, were added to each stationary component. The stationary components were strictly temporary, so that the group could hear the transients in a context, and the properties of the stationary component would be changed before creating the final stimuli. After listening, experimenting and discussing, the group decided on a span of four transient

durations that seemed to be doing the best job of mimicking a snare drum transient: 15, 20, 25 and 30 ms. The group reasoned that transients of shorter durations than 15 ms sounded like transients but were too fast to mimic a snare drum transient. On the other hand, when exceeding 30 ms, the group thought that the transient lost their transient properties and sounded more like a full percussive element in itself. Some of the participants drew similarities between electronic claps and the transients with durations longer than 30 ms.

4.2.3 Creating the transients

In order to avoid the risk of having a too small span of transient durations to introduce any perceived loudness differences, two extra transients were added so that a larger span of durations would be tested. In this way, two extreme values were introduced, 10 and 35 ms. If experiment 1 would show that listeners perceived loudness differences between the original

(14)

four transients (15, 20, 25 and 30 ms), however, the plan was to use them for the compound signal in experiment 2.

Even though the same portion of the pink noise recording were used for every transient, longer transients would include parts of the noise recording that the shorter transients would not. Therefore, the noise recording that was used for the transients were adjusted so that the noise would never exceed the peak level that was measured from the shortest transient. Peaks that exceeded this level, were simply lowered in level using the clip gain line.

4.2.4 Creating the stationary component

While the transient durations had been decided, there was still the problem of deciding the level, duration and envelope of the stationary component that were to be added before the second experiment. Trying to solve this problem, a group of 4 sound engineers, including the author, were once again gathered to listen and discuss the properties of the stationary

component.

After trying several different approaches and comparing them to the reference snare drums, the group agreed on the following properties for the stationary component:

- A duration of 460 ms subtracted by the duration of the transient (so that the overall stimuli duration would be 460 ms, independent of the transient duration)

- A level of -11.5 dB relative to the transient

- An equal power crossfade between the transient and the stationary component corresponding to half the duration of the transient

- A logarithmic fadeout starting immediately after the crossfade covering the full length of the stationary component

The 20 ms transient together with it’s stationary component can be seen from the creation process in Avid Pro Tools in Figure 2. The clip at the top is the transient, the next clip downwards is the stationary component and the clips at the bottom is both of them, put together with fades applied.

(15)

Figure 2 The two sound components, used for the 20 ms transient stimuli, in separation and put together as a compound signal

4.2.5 Separating the transient

To make sure that participants would be listening to the same transients in experiment 1 as they would in experiment 2, the transients were separated from the stationary components after crafting the compound signal, so that the crossfade could be accounted for after

separation. This was done by copying only the transient to a new track and applying a fadeout of the same duration with similar start and stop points as the crossfade would have in the full stimuli version for that same transient. Now, the separated transients would fade out the same way as they would when listening to the full stimuli. However, this also meant that the separated transients would now actually have a longer duration. The 20 ms transient, for example, would have a 20 ms fadeout, starting 10 ms after the onset and ending 30 ms after the onset.

In Figure 3, the original 20 ms transient can be seen, followed by the compound signal for that same transient with the added stationary component and lastly the final transient component with a fade out.

(16)

Figure 3 The 20 ms transient component before and after adding a fade out

4.2.6 Target population

Since the effects researched in this study aims to be used primarily by mixing and mastering engineers, it was logical to use listeners that had been subjected by some sort of training in the experiments. In that way, the listeners would hopefully apply the same sort of critical

listening as an engineer would in a real-world context. For that reason, only sound

engineering students on a university level and sound engineering teachers were used as test subjects.

To collect more information about the subject pool, a questionnaire for the participants to fill were created. The questionnaire primarily asked participants about their age, gender and previous sound engineering experience.

4.3 Experiment 1 4.3.1 The experiment

A test was conducted were participants were asked to loudness match the six transients of different durations. The test was done using a custom-built application (see Figure 4), created using the audio programming software Reaktor 6 by Native Instruments (2019). The interface of the application contained six blocks that would each control one of the six transients. Each block contained one fader, one loop button and one play button. The fader was used to adjust the play back level of the transient connected to that specific block. Each fader ranged from +10 to -10 dB and had a resolution of 0.1 dB. The amount of level adjustment applied on each fader was stored and could be accessed by the author after the tests. The loop and play button would either start a loop or play a one-shot of the transient connected to that block. While allowing the participants to loop the stimuli was necessary since short sound can be very hard to evaluate without repetitions, the loop duration had to be longer than 200 ms to account for the loudness effect provided by repetition rate. To stay well above this threshold, a loop

(17)

duration of 250 ms was used. It was reasoned that listeners, in a real-world context, won’t all listen to audio at the same play back level, therefore, the most ecological choice seemed to be to let listeners choose their own play back level. For that reason, a master fader was added to the interface, that could be used to adjust the playback level of the system. Audio were played back using Sennheiser HD650-headphones connected to a RME Babyface-interface.

Figure 4 Interface of loudness matching application

12 sound engineering students conducted the test twice per person. More information on the test subjects regarding demographics and prior sound engineering experience is presented in Table 2 and 3. In Table 2, the age and gender distribution within the subject pool is displayed.

Table 3 shows the number of listeners that are studying their first or second year of their current bachelor program. It also displays how many listeners that have previously been studying sound engineering or has working experience in the sound engineering field.

Table 2 Demographics within subject pool for experiment 1

Age Gender

19-24 25-29 30-35 35+ Male Female

Number of listeners 10 1 1 10 2

Table 3 Sound engineering experience within subject pool for experiment 2

First year Second year Prior study Working experience

Number of listeners 7 5 8 4

Each test subject was asked to read the instructions provided on a separate paper, telling them how the interface worked and what their assignment was. The instructions read that the subjects were supposed to match the signals to the exact same loudness by adjusting the faders. It also read that they were allowed to adjust the playback level of the system but were

(18)

asked to do this through adjusting the master fader in the application. In this way, the selected output level could be monitored. However, it was noticed through observation during the experiment how several test subjects would change the playback level several times during the test. Unfortunately, the application would only save the master level that was latest used by the subjects when finishing the test and therefore it couldn’t reliably represent the level in which the test was played back.

After finishing the first test, each subject filled in a form and returned to immediately do the test a second time. For each separate test, the order in which the transients were distributed to the six blocks of the interface was randomized

4.3.2 Analysis and results

The level changes applied on each fader was arranged to one separate data set for each test person. The reference point that the stimuli were loudness matched to differed between test subjects. This was partially because the presentation order of the stimuli was randomized, and subjects tended to use the perceived loudness of transient placed on the block the furthest to the left as a reference for the rest. To enable an accurate comparison between tests, the results had to somehow be normalized to the same reference level. This was achieved by first

averaging the result of each test. The amount that each average diverged from zero were then subtracted from each separate value for that individual test. In this way, all of the test results were normalized to an average of zero.

To account for the errors present in each person’s loudness estimation, fader values from each subjects first and second test were averaged into one fader value per test subject. The values were then imported to a graph (see Figure 5) portraying the average fader levels applied by each test person to each transient duration after normalization.

The average fader level for each duration was calculated. In similar experiments discussed earlier in this report, loudness as a function of signal duration shows a linear slope. Therefore, a linear approximation was calculated from the averaged data points. The averaged data points are portrayed in Figure 6 together with the linear function.

(19)

Figure 5 Each test subjects average loudness correction after normalization

Figure 6 Average loudness correction with linear function -3

-2 -1 0 1 2 3

10 15 20 25 30 35

Fader level (dB)

Duration (ms)

Loudness matching test

y = -0,0586x + 1,3189 -1

-0,8 -0,6 -0,4 -0,2 0 0,2 0,4 0,6 0,8 1

10 15 20 25 30 35

Fader level (dB)

Signal duration (ms)

Average loudness correction

(20)

4.4 Experiment 2 4.4.1 Preparation

Four of the transients from experiment one, ranging from 15 to 30 ms, were used in experiment two. The level of the transients was normalized using the linear function from Figure 6. Due to limitations of the tools available during the stimuli creating process, the level of the transient could only be altered by 0.1 dB at the least. Therefore, values from the

loudness function were rounded to even tenths of a dB. This resulted in the transients having the following levels relative to the 15 ms transient (see table 4).

Table 4 level correction applied to each transient before experiment 2

Signal duration (ms) 15 20 25 30

Level correction (dB) 0 – 0.3 – 0.6 – 0.9 The stationary component was added to each transient following the same principle as described in section 4.2.4, in terms of level, duration and fade and crossfade types. The same questionnaire that was used in experiment 1 was used in experiment 2, with the addition of one question; if the listeners had conducted the first experiment in this study.

4.4.2 The experiment

To find out if an increased transient duration could truly compensate for a reduced transient level without introducing any audible effects, the transients were compared to each other in an ABX-text. In this kind of test, the listeners are subjected to three signal options; A, B and X.

A and B are two unique signals and X is an exact copy of either A or B. The listener is asked to identify which one of A and B that is identical with X. In this way, the listeners choices will only be based on them being able to tell a difference between stimuli and any influences regarding preference will be excluded.

The test was conducted using the STEP software for ABX tests. The interface allowed participants to play and loop the audio. A 900 ms loop length was chosen to make the repetition rate seem somewhat closer to typical snare drum repetition rate from a musical point of view. Even though the STEP interface allowed for changing playback position and loop length, participants were instructed not to change these parameters.

For the test, a group of 19 trained listeners were used. 10 of them had participated in the first experiment and 9 were new to the study. Every subject conducted the test two times and filled in a form between the tests. The demographics and work experience within the subject pool is displayed in Table 5 and 6. In this way, listeners had an obligatory pause between the tests

(21)

which would help them avoid listening fatigue. Audio were played back through Sennheiser HD650 headphones connected to a M-box mini audio interface.

Table 5 Demographics within subject pool for experiment2

Age Gender

19-24 25-29 30-35 35+ Male Female

Number of listeners 13 3 2 1 15 4

Table 6 Sound engineering experience within subject pool for experiment 2

First year Second year Teacher Prior study Working experience Number of

listeners

12 6 13 6

4.4.3 Method for analysis

The data collected from experiment two were structured into six data sets, one for each comparison from the ABX-test. In each individual test there were two outcomes, either the reference was correctly identified (1), or it was not (0). Since each person did the test twice, each person’s two results could be combined for each comparison to yield three different outcomes. The following three alternatives were considered a unique outcome:

1. The test subject correctly identified the references in both their trials (1 + 1), this was considered a right answer in the analysis.

2. The test subject incorrectly identified the reference in both their trials (0 + 0), this was considered a wrong answer in the analysis.

3. The test subject correctly identified the reference in one of their trials ( 1 + 0 or 0 + 1), this was considered an unsure answer in the analysis.

The number of right, wrong and unsure answers were counted for each comparison and the distribution of them were tested using a Fishers exact test (Lowry, 2020) to see if the

distribution were statistically significant. The test requires the distribution to be compared to an expected distribution. To calculate an expected distribution, a null hypothesis was stated. A further explanation of the null hypothesis reads: “there’s no perceived difference between compound signals with varying transient duration after transients are loudness normalized”.

If the null hypothesis were to be true, the distribution of right, wrong and unsure answers would be close to random. The probability for a test person to get a right answer (1 + 1) by chance were calculated to be 25%. Similarly, the probability of getting a wrong answer (0 + 0) were calculated to a 25 % chance. Since there were two ways of getting an unsure answer (1 + 0 or 0 + 1) this answer was calculated to having a 50 % chance of occurring by chance.

(22)

The random distribution probabilities were applied to the number of tests conducted in experiment two (n = 19) to find the amount of right, wrong and unsure answers that would be expected in a random distribution for this test. The probabilities were calculated to yield 4.75 right, 4.75 wrong and 9.5 unsure answers. For the null hypothesis to be rejected, a confidence level 𝛼 = 0.05 was chosen.

(23)

5 Results and analysis

The results from experiment two are presented in a bar chart below. The bar chart displays the number of right, wrong and unsure answers for each comparison.

Figure 7 Results from experiment two

The distribution of right, wrong and unsure answers for each comparison along with the expected random distribution are shown in Table 7. The same values were, for each

comparison, inserted into an online calculator from Vassarstats.net (Lowry, 2020), were they were compared with the random distribution using Fishers exact test. The calculated P-value for each comparison is displayed (rounded to four decimals) in the last row of the table.

Comparisons that showed a P-value lower than the critical P-value (𝛼 = 0.05) are highlighted as green in the table.

The P-values indicate that listeners could successfully identify the references in comparison B, C and E, and the null hypothesis were rejected in these cases. In the comparisons were the null hypothesis couldn’t be rejected (A, D and F), the differences in transient duration

between stimuli didn’t exceed 5 ms. In comparison B, C and F the differences in transient duration between stimuli were always greater than 5 ms.

0 2 4 6 8 10 12 14 16 18 20

15 | 20 ms 15 | 25 ms 15 | 30 ms 20 | 25 ms 20 | 30 ms 25 | 30 ms

ABX-test results

Right Answers Wrong Answers Unsure Answers

A B C D E F

(24)

Table 7

Comparison A 15 | 20

B 15 | 25

C 15 | 30

D 20 | 25

E 20 | 30

F 25 | 30

Random distribution

Right Ans. 9 18 16 8 12 8 4.75

Wrong Ans. 1 0 1 3 5 5 4.75

Unsure Ans. 9 1 2 8 2 6 9.5

P - value 0.3383 0.0000 0.0008 1 0.0394 0.9153

(25)

6 Discussion

Figure 6 displays the average level change applied to transients depending on their duration in experiment one. The fact that most people perceived transients of longer duration as louder goes very well in line with findings from previous research were similar trends have been observed, despite experimenting on somewhat different signal types (Fastl & Zwicker, 2007) (Small, Brandt, & Cox, 1961). From this a conclusion, answering the first research question, was drawn that the duration of a transient will affect the loudness of which that transient is perceived. However, transients were played back in isolation from other sound components in this experiment, and the results therefore cannot answer how transient duration affects

transient loudness in a more complex context.

Results from the second experiment can be seen in figure 7. In comparisons were the transient duration differed more than 5 ms between stimuli, the null hypothesis was rejected, meaning that the differences were audible, in these cases, with a significance level of less than 𝛼 = 0.05. From this fact, a conclusion was drawn that transient duration changes were only audible, after loudness normalization, when exceeding differences of 5 ms in transient duration. This effect was consistent with every comparison that only differed 5 ms between transient durations (comparison A, D and F). In the case of comparison F, for instance, the overall duration of the transients are 10 ms greater than in the case of comparison A.

Therefore, the effect can be assumed to be independent of the overall transient duration and only focus on the differences between comparisons.

It should be noted, though, that the P-values displayed in Table 7 shows a notoriously higher P-value for comparison E than for the other two comparisons for which the null hypothesis were rejected. This means that the results for comparison E has a higher probability of having occurred by chance than in the case of comparison B and C. Comparison E compared

transients of the durations 20 and 30 ms which would be the longest durations compared with a difference of 10 ms. This might be evidence that longer durations allow for greater

differences in durations before the differences become audible. However, to clarify how this effect is dependent of the overall duration of the compared stimuli, a greater span of durations should be tested.

While the results from experiment 2, to some extent, answers the second research question (if transient amplitude changes could be compensated by changing transient duration without introducing any audible differences) even more information is needed to fully answer the question. For instance, the loudness normalization applied to the transients before experiment 2 were based on loudness estimation measured on transients played back in isolation. In the second experiment, however, a stationary component was added to create a compound signal.

This research doesn’t fully answer how transient loudness as a function of transient duration is affected by the addition of a stationary component. Future research could provide a more nuanced answer to the question by doing a loudness matching test, similar to experiment one

(26)

in this research, but with a stationary component following each transient. In that case, the level changes applied by participants should only affect the transient level and the stationary component should be static.

Moreover, this research tells us that audible differences introduced by changing transient duration could no longer be compensated by changing transient loudness when exceeding 5 ms of differences in transient duration between stimuli. The next step in duration differences compared were 10 ms, in this case the differences were heard by participants. To get a fuller understanding of the duration span in which this effect is useful, more comparisons should be made in a similar test with stimuli of smaller differences in duration.

It should also be mentioned that results from this experiment is based completely on whether listeners were able to detect the differences of the stimuli, or not. The next step would be to have listeners evaluate how the audio quality is affected by the differences which are audible.

The fact that the difference between a 15 ms and a 25 ms transient at a lower level is audible, for example, doesn’t necessarily mean that it is affecting the signal in a negative way. Future research should try to find the limit for when listeners are perceiving the act of compensating level reduction with signal duration as negative. This kind of research should be conducted using stimuli that bears even greater resemblance to instruments found in a music production since estimating audio quality of noise bursts might prove to be an unreasonable, or pointless task.

In an experiment conducted by Stärnman (2014), peak limiting was not audible to listeners until applying a gain reduction of more than 8 dB. Quite similarly, Hjortkaer and Walther- Hansen (2014) found that listeners didn’t prefer any specific master of a music track until the dynamic range of a track was reduced by more than 8 dB, then listeners preferred the version with less dynamic range reduction. In this experiment, transient level differences were audible when being reduced by as little as 0.6 dB and the transient duration was increased by 10 ms.

This could be evidence that the increase in duration played a larger roll than the decrease in level for listeners to successfully identify the differences of the stimuli. It could also mean that when transients are played back in a more complex context, it becomes harder for listeners to identify differences in transient level. However, the differences between said experiments and this research are too great to draw that kind of conclusion. As briefly mentioned in section 2.3, too little is known about how the stimuli is processed in Stärnman (2014) and Hjortkaer and Walther-Hansens experiments to make a reliable comparison to this research.

6.1 Ecological validity

The stimuli used in this research was created using pink noise and modified to mimic the envelope of the transient and stationary component of a typical snare drum. A pure, full range pink noise will seldom be used in a music production and the ecological validity of this experiment might arguably be reduced for that reason. However, the envelope of the stimuli

(27)

does bear some resemblance to the reference snare drums used when creating the stimuli.

Therefore, it’s reasonable to assume that results from this experiment might be somewhat similar if the experiment were to be replicated with an electronic snare drum, for instance, where the transient duration can be controlled.

This experiment is also insufficient in explaining how transient loudness would be perceived in a music production context were several other instruments and sound effects will be heard simultaneous to the transients. A context like that might completely change how loudness is perceived which makes it hard to apply results from this experiment directly into a music production. On the other hand, using more complex stimuli would have made it impossible to identify exactly the experimental factors that would impact the results.

While more research is needed to bridge findings from this research to a real-world context, this research can be used as a foundation to base further, more complex experimentation on.

6.2 Conclusion

This research provides evidence that a reduction in transient level could truly be compensated by increasing transient duration. While longer transients are perceived as louder, with the span of durations tested in this experiment at least, the compensation becomes audible to listeners when increasing the transient duration more than 5 ms. According to the data from this research, a 5 ms increase in duration corresponds to approximately a 0.3 dB increase in perceived loudness. Although it might not be possible for mixing and mastering engineers to control the exact durations of the highest-level transients in a musical track, this research suggests that striving for longer transient durations might prove efficient when trying to maximize the perceived loudness of a track since this would allow for reducing the transient level while maintaining a higher perceived transient loudness. However, this would be easier to apply to electronic music, were synthetic sounds are common and can offer greater control over details like transient duration.

(28)

7 References

Apple. (2020). Logic Pro X (10.4.7) [Digital Audio Workstation]. Retrieved from https://www.apple.com/logic-pro/

Avid. (2019). Pro Tools 12 (2019.10.0) [Digital Audio Workstation]. Retrieved from https://www.avid.com/pro-tools

Fastl, H., & Zwicker, E. (2007). Psychoacoustics facts and models (3. ed.). Berlin: Springer- Verlag.

Garner, W. R. (1947). The effect of frequency spectrum on temporal integration of energy in the ear. The Journal of the Acoustical Society of America, 19(5), 808-815.

Hjortkaer, J., & Walther-Hansen, M. (2014). Perceptual effects of dynamic range

compression in popular music recordings. Journal of the Audio Engineering Society, 62, 37-41.

Izotope. (2019). RX7 Advanced (7.01) [Audio Application]. Retrieved from https://www.izotope.com(en/products/rx.html

Lowry, R. (2020). Fishers Exact Probability Test: 2x3. Retrieved from Vassarstats:

http://vassarstats.net/fisher2x3.html

Miller, G. A. (1948). The perception of short burts of noise. The Journal of the Acoustical Society of America, 20(2), 160-170.

Native Instruments. (2019). Reaktor 6 (6.3.1) [Audio Programming Application]. Retrieved from https://www.native-instruments.com/en/products/komplete/synths/reaktor-6/

Neuhoff, J. G. (2004). Ecological Psychoacoustics. San Diego: Elsevier Academic press.

Oberfeld, D., Jung, L., Verhey, J. L., & Hots, J. (2018). Evaluation of a model of temporal weights in loudness judgements. The Journal of the Acoustical Society of America, 144(2), 119-124.

Siedenburg, K. (2019). Specifying the perceptual relevance of onset transients for musical instrument identification. The Journal of the Acoustical Society of America, 145(2), 1078-1087.

Siedenburg, K., & Doclo, S. (2017). Iterative structured shrinkage algorithms for stationary/

transient audio seperation. 20th International Conference on Digital Audio Effects.

Edinburgh.

Small, A. M., Brandt, J. F., & Cox, P. G. (1961). Loudness as a function of signal duration.

The Journal of the Acoustical Society of America, 34(4), 513-514.

Stärnman, A. (2014). Perception in the use of limiting in popular music when loudness normalized according to EBU R-128. Bachelor thesis. Piteå: Luleå Tekniska Universitet.

References

Related documents

De geofysiska undersökningar som används är till exempel elektrisk resistivitet, inducerad polarisation, markradar, seismisk refraktion och elektromagnetism (Dahlin

Acta Universitatis Upsaliensis Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine 43 Editor: The Dean of the Faculty of Medicine A

Even when switching to a large real network, the length of the faulted line has a linear influence on the critical time of removal of both three-phase and one-phase

Then it is valid to assume that an aggregate of DFIGs, such as a wind park, can be modelled as a single DFIG, and a similar assumption can be made for synchronous generators,

Regarding the boundary conditions for heat transfer, at the substrate-zeolite interface the temperature will remain constant, and equal to the reference

The modelling of a panel radiator with several heat capacitances linked in series achieves a temperature gradient of the supply heat flow, an accurate heat emission during

This is the published version of a paper presented at Proceedings of BS2015:14th Conference of International Building Performance Simulation Association, Hyderabad, India, Dec..

The aim of the work presented here is to investigate different hydropeaking-frequency scenarios in a bypass reach in the Ume River in northern Sweden as well as studying the