• No results found

Experienced Listeners Preference between 44.1 kHz and 96 kHz

N/A
N/A
Protected

Academic year: 2021

Share "Experienced Listeners Preference between 44.1 kHz and 96 kHz"

Copied!
18
0
0

Loading.... (view fulltext now)

Full text

(1)

Experienced Listeners Preference

between 44.1 kHz and 96 kHz

Jens Benner

2016

Bachelor of Arts Audio Engineering

Luleå University of Technology

(2)

Experienced listeners preference

between 44.1 kHz and 96 kHz

Jens Benner

benner.jens@gmail.com Luleå University of Technology

Department of Arts, Communication and Education Supervisor: Jonas Ekeroot

Abstract

Music and other audio has been released with higher sample rates than the standard CD-quality 44.1 kHz. Prior research has given differing results as to if there is any perceivable differences between different sample rates. This study investigates if experienced listeners prefer 96 kHz or 44.1 kHz. This was done with double-blind forced-choice paired-comparison listening tests. The results shows no significant preference for any of the sample rates. This can be explained by various reasons: either that there is no perceivable subjective difference between the audio qualities or that the preference is not shared among the listeners, or something in the test may have distorted the results. 


(3)

Table of contents

1. Introduction 3

1.1.1 Sampling 3

1.1.2 Analog to digital conversion 3 1.1.3 Conversion from digital to analog 4

1.1.5 DVD-A and SACD 4

1.1.6 Theoretical benefits and drawbacks of higher resolutions than CD 5

1.2 Previous listening tests 5

1.2.1 Meyer and Moran 5

1.2.2 Kanetada et al. 6 1.2.3 Mizumachi et al. 7 1.2.4 Summary 7 1.3 Research question 8 2. Method 8 2.1 Preparation of stimuli 9 2.1.1 Recording 9 2.1.2 Editing 9

2.1.3 Randomization and blinding 10

2.2 Listening tests 10 2.3 Measurements 12 2.3 Statistical analysis 12 3. Results 13 3.1 Measurements 13 3.2 Listening tests 13 4. Discussion 15 5. References 16

(4)

Since the introduction of the CD the digital resolution of uncompressed audio for the end product has settled at a bit depth of 16 bits and a sample rate of 44.1 kHz for music and 48 kHz for movies. End products with a higher resolution has been introduced, mainly targeted at the hifi and audiophile market. Music has been released on physical formats such as the Super Audio Compact Disc (SACD) (Aldrich, 2004), based on a DSD stream and music released on Blu-Ray or DVD-audio based on PCM (Sveriges Radio, 2004). More recently, several web based services – such as HDtracks (https://www.hdtracks.com) and iTrax

(http://www.itrax.com) – has been introduced which provides music with a resolution higher than the CD-standard. The scientific studies that exists seems to come to different conclusions about if there is a benefit to increase the resolution above the CD standard.

Sampling

To describe a waveform digitally a technique called sampling is utilized. For CD-quality audio, the sampling frequency is 44 100 samples per second, generally written as 44.1 kHz. For every sample, the amplitude is measured and captured. The amplitude is described with bits. The CD has a bit depth of 16 bits. For PCM with binary coding, that means

216=65 536 (1.1)

65 536 quantization levels (Aldrich, 2004). Every bit brings 6 dB of dynamic range. For 16 bits, that means

16•6=96 (1.2)

96 dB of dynamic range. The only thing that needs to be measured and captured from every sample is the amplitude. With that information, the waveform can be correctly reproduced in terms of frequency, phase and amplitude (Aldrich, 2004).

Nyquist

Nyquist’s theorem states that the sample rate must be at least twice the highest frequency present. The Nyquist frequency is thus half of the sample rate, and no frequency content above the Nyquist frequency can be tolerated. Therefore so called anti-aliasing filters are used for the AD-conversion and reconstruction filters for the DA-conversion, which are intended to remove anything above the Nyquist frequency (Aldrich, 2004).

Humans can generally hear frequencies from 20 Hz-20 kHz. In order to reproduce frequencies in that range, a sample rate of at least 40 kHz must be used. For the CD, the sample rate is set to 44.1 kHz. That is so that filters with sufficient steepness can be implemented approximately between 20 kHz and the Nyquist frequency 22.05 kHz (Aldrich, 2004).

Analog to digital conversion

An analog-to-digital converter, or ADC, is used to sample the analog signal and produce a digital output. Before the AD-conversion an analog anti-aliasing filter must be used so that no information above the Nyquist frequency enters the converter. If the signal goes over half of the sample rate in the AD-process, the result will be a distortion called aliasing. What happens is that the frequency that is over the Nyquist frequency folds down to the specified range (Aldrich, 2004). If the Nyquist frequency for example is 10 kHz and a 12 kHz frequency is sampled, the result will be an 8 kHz frequency. If a 22 kHz frequency is sampled, the end frequency will be 2 kHz.

However, it is very difficult to construct an analog anti-alias filter with sufficient steepness. It will also distort the phase of the signal the more steep it is.

According to Aldrich, the most common way of making an AD-conversion today is with a multi-bit modulator, a technique introduced in

(5)

1999. One of its benefits is how it solves the issues with anti-alias filters. It does this by upsampling the signal and thus working by a much higher sample rate. Therefore, a much less steep analog filter can be implemented before the AD-converter, with much less phase distortion or other artifacts. Then a digital anti-alias filter can be implemented with much better characteristics than the analog counterpart before the signal is downsampled. Conventionally, this converter only uses between 2-5 bits. Earlier converter designs used more bits, but this resulted in distortions due to the nonlinearities of analog components (Aldrich, 2004).

One bit is actually enough to completely be able to describe a waveform. One bit gives 6 dB of headroom. The problem that must be overcome is the quantization distortion, which occurs when the signal is sampled in between two quantization steps. It is actually a white noise, but since it is related to the signal, it is perceived as distortion. Since it is white noise, it is louder in the higher octaves than the lower. For every doubling of the sample rate, the noise in the audible range will fall 3 dB. Theoretically, with a high enough sample rate, audio could be sampled using only one bit and still have low noise in the audible range (Aldrich, 2004).

Humans is generally regarded to have a maximum dynamic range of 120 dB. Therefore, that range is what converter designers typically aims for. To get 120 dB of dynamic range with only one bit would require a sample rate not practically possible to achieve at the moment. Instead a filtering technique called noise shaping is utilized, which redistributes the noise. Most of the noise is now in the high frequencies above the audible range, which enables the sample rate to get lowered (Aldrich, 2004).

The multi-bit modulator however uses more than one bit. This is because another problem

with one-bit converters is that the quantization error is correlated to the signal which is perceived as distortion. The multi-bit modulator uses 2-5 bits that are randomly varied between different dynamic ranges, which solves the issue of correlated quantization error and makes the converter more linear (Aldrich, 2004).

This type of converter conventionally up-samples the signal 64 or 128 times. For 44.1 kHz, that means that the converter is working on a sample rate of approximately 2.82 MHz or 5.64 MHz. Notice that, as mentioned before, an analog anti-alias filter still must be implemented before the AD-conversion. After the conversion, the digital output can be coded to for example 44.1 kHz/24 bits. This type of converter should be able to provide 120 dB of dynamic range (Aldrich, 2004).

Conversion from digital to analog

The DA-process in a multi-bit modulator works in a similar way to the ADC. For a 4-bit converter the process will look something like this: To convert the 44.1 kHz/24 bit signal to four bits, the signal is upsampled and filtered. Unlike the AD-conversion, this whole process is done digitally. After that, Aldrich explains, the digital multi-bit modulator releases a 16 bit unary code which controls 31 voltage elements. The sum of the voltage yields the proper analog value for each sample (Aldrich, 2004).

DVD-A and SACD

DVD-audio was introduced around 2000 as a replacement for the CD. It can hold audio with bit depths from 16-24 bits and sample rates from 44.1 - 192 kHz according to Sveriges Radio (2004).

The Super Audio Compact Disc was introduced by Philips and Sony in 1999 which had the same ambition. The disc holds DSD coded audio sampled with 1 bit 2,8224 MHz (Aldrich, 2004).

(6)

Theoretical benefits and drawbacks of

higher resolutions than CD

As stated above, the CD is specified at 44.1 kHz 16 bits. There is theoretical evidence that a bit depth higher than this may lead to higher audio quality. Humans have a dynamic range of 120 dB, while the CD:s 16 bits only deliver 96 dB. While this in most cases are more than adequate, in special circumstances with very loud playback and dynamic audio material a higher bit depth may be beneficial. A bit depth of 24 bits has been proposed, delivering 144 dB of dynamic range.

One benefit of higher sample rates would be improved digital anti-alias and reconstruction filters. Craven (2004) discusses that the digital filter design may benefit from higher sample rates. Many filter designs is a compromise between different drawbacks from different solutions in terms of aliasing, frequency-, phase- and transient response. A minimum-phase filter suffers from a differing group delay between high and low frequencies, as well as ringing after transients, something called post-ringing. A linear phase filter has constant group delay, but has pre-ringing as a consequence of the FIR-filter design. This could be described as a short echo before a transient. He also shows how his own designed filters benefits from higher sample rates in these regards. However, he states that there is a lack of studies on how listeners are affected by pre- and post ringing.

One benefit of higher sample rates is proposed by Oohashi et al. (2000). They showed that listeners reacted to recordings with audio material above 20 kHz, something they termed the hypersonic effect.

Aldrich (2004) however argues that there is no inherent benefit to higher sample rate, if the equipment only is designed properly.

Drawbacks that exists with higher resolution audio is the increase in bit rate and consequently an increase in file sizes. For a

doubling of the sample rate the file size would double, while the dynamic range would increase 3 dB. If the bit depth was increased from 16 bits to 24 bits, the dynamic range would increase with 48 dB, and the file size would increase with 50%. So if the goal is to increase the dynamic range, this is done much more efficiently with an increase in bit depth. (Aldrich, 2004).

Previous listening tests

Meyer and Moran

Meyer and Moran (2007) conducted a test to see if trained listeners could hear the difference between CD-quality audio and high resolution audio from DVD-A and SACD. The test was setup in the following way: The stereo signal from a high quality DVD/SACD-player was sent both directly to an ABX-relay module as well as to an converter. The A/D/A-converter converted the audio to CD-quality 16 bits 44.1 kHz and then back to analog to the ABX-relay module.

It was a double-blind test performed on about 60 members of the Boston Audio Society and other volunteers and lasted approximately a year. Different sets of speakers and control rooms were used. The authors doesn’t specify any of the equipment’s brands or models, but assures that everything used was very high quality and expensive. The material used was varied across different musical material such as classical, pop, rock and jazz as well as speech. To summarize the results, the report didn’t find any credible evidence that high resolution audio can be distinguished from CD-quality audio. To make a direct quote: ”The test results for the detectability of the 16/44.1 loop on SACD/DVD-A playback were the same as chance: 49.82%. There were 554 trials and 276 correct answers.” (Meyer & Moran, 2004, p. 777)

(7)

On one occasion the speaker level was raised to 14 dB. This test was performed on two participants, and now they both distinguished the CD-loop by a certainty of 100%. However, this was done without any input signal. What the two persons heard was only the difference in the noise-floor. Tested with input signal, the authors thought the playback level was too painful.

The test and its results are interesting. It seems that this proves Aldrichs thesis that there is no inherent benefit to audio at higher sample rates. There is a lack of information in the report, as none of the equipment is specified – which makes the test impossible to reproduce. Also, the resolution of the DVD-A is not presented. DVD-A can theoretically have the same resolution as a CD (Sveriges Radio).

Kanetada et al.

Kanetada, Mizumachi and Yamamoto (2013) studied if a test group could discriminate between different audio qualities. This included high resolution audio (HRA) at 192 kHz/24 bits, CD quality audio at 48 kHz/16 bits and also compressed audio at different bit rates. For the CD-quality audio file, the HRA-file was converted to 48 kHz/16 bits using MATLAB. 48 kHz/16 bits is regarded as CD-quality by the authors as it’s cut off frequency begins at 20 kHz. The reason for choosing this sample rate over the actual CD-standard of 44.1 kHz is because of 48 kHz closer mathematical relationship with the original sample rate, thus reducing the error in downsampling. An anti-alias filter was designed in MATLAB using a 20th order butterworth filter.

To choose what sound stimulus to use, an informal listening test was setup. The researchers found that percussive jazz music was the material that made it easiest to discriminate between the different formats. Therefore, that was chosen.

27 subjects participated in the listening test. Among them, 11 were female and 16 men. The age of the participants is said to be spread from teens to seventies and their interest in music and audio was very varied.

For the listening test the method of paired comparison was used. The test subject was exposed to two successive stimuli and selected the one the subject thought was of the higher quality.

The length of one stimulus was 120 s. In the motivation for this they reference to other studies suggesting a longer stimulus makes it easier to discriminate higher sample rates. Between each successive stimulus there were a 30 s paus of silence, and between each pair there was a paus of one minute. Each subject evaluated the sound quality of twelve pairs, thus reviewing each pair twice. One whole session lasted 2-2.5 hours for each subject, including breaks. The breaks were inserted in consideration to the subject’s fatigue, and followed a planned schedule.

In a separate study in the same paper, 28 subjects made a similar paired comparison test, but this time focusing only on bit depths. The same stimuli was used and converted to 48 kHz. Then, it was converted to 16 bits. The subjects then had to discriminate between 16 bits and 24 bits.

For the first test, the results showed that 57.4% correctly identified HRA over the CD-quality audio. In the second test, 60,3% correctly identified 24 bits over 16 bits.

From the results, it points towards a small advantage in audio quality. However, an increase in 7-10 percentage above chance is not enough to draw the conclusion that the higher resolution audio actually sounds better. However, supposing the results are correct, there are a few issues with the study. There is no information about wether dither was used for the bit reduction. This is something

(8)

essential for the resulting audio quality. Such information must be presented, and also, if used, what type of dither. The authors also mentions that they constructed an anti-alias filter, a 20th order butterworth filter. However, they don’t motivate why this type of filter was used or present what qualities it has in terms of phase response and pre- and post ringing. So, supposing the results are correct, the study is possibly not answering the question of wether high resolution audio sounds better or can be discriminated from CD-quality audio. More exactly, it answers questions about wether listeners can hear the difference when a certain type of filter is applied and if they can discriminate that an unknown method of performing bit reduction is performed.

Mizumachi et al.

In a study by Mizumachi, Niyada and Yamamoto (2015), much stronger evidence is found for the possible discrimination of HRA. The study focuses on the discrimination of audio at different resolution in a car listening environment. This was done with a paired comparison test in which the subject had to choose in between two recordings which one was of the higher quality. For the assessment of the results, the subjects were divided into d i ff e r e n t g r o u p s d e p e n d i n g o n h o w experienced listeners they were. The authors found that the group with the experienced listeners could discriminate between a recording of 192 kHz/24 bits and 48 kHz/16 bits with a correct rate of around 75%. The less experienced listeners scored around 50%, thus the same as chance, which means they couldn’t discriminate between the resolutions.

This study gets very different results from the other mentioned studies. Here, the experienced listeners has a correct discrimination rate of 75%, significantly more than the two other studies. However there is problems with this study as well. The authors are quite vague with how the audio files were prepared. The lower resolution file was converted from the higher

resolution file. However, it is not very clear how this was done. They only state that ”the data conversion is carried out using the resample function with a carefully-designed lowpass filter in MATLAB.” It would be interesting to see what qualities the filter has in terms of phase response and pre- and post ringing. It may actually be that it is a poorly designed filter that the experienced listeners react to. If dither was used once again remains unclear.

Summary

The results of the three studies presents quite differing results on the discrimination rate of HRA: 49.82%, 57.4%, 60.3%, 75% and 50% respectively. All three studies have problems, because overall there is some amount of technical information missing from the reports. It is hard to draw any conclusions from the results on the benefits of high resolution audio. What is of special interest is what the listener actually reacted to. Since all studies focused on HRA in terms of both higher bit depth and sample rate, it is unclear if it was the bit depth or the sample rate or the two combined that the listener reacted to. The second part of the Kanetada study points to a discrimination rate of 60.3% of the recording with higher bit depth. That is actually a higher percentage than in the first part, where the HRA recording consisted of both higher bit depth and sample rate. So in the first part, it is fully possible that the small preference for the HRA recording could be because of the higher bit depth.

Further research should therefore be more methodological and study the difference in audio quality between a higher bit depth and a higher sample rate separately. This is an important point since higher resolution results in higher bit rates and larger files. An increase in sample rate yields a much larger increase in bit rate than an increase in bit depth. So if the resolution should be increased above CD-quality, it is worth to investigate if that would mean an increase in sample rate or bit depth or both.

(9)

Research question

For the above stated reasons, this thesis will focus on the perceived benefits of higher sample rates than the CD-standard 44.1 kHz. Do experienced listeners prefer 96 kHz or 44.1 kHz?

Method

Listening tests was conducted on 16 audio technology students at Luleå University of Technology at School of Music in Piteå. The test was a double-blind forced-choice paired-comparison. The participant listened to 15 excerpts, and were able to switch between two versions labelled A and B. One of them was 44.1 kHz and the other 96 kHz, and for each excerpt the participant was asked to choose which one he or she preferred.

Throughout the whole method, efforts have been made to ensure that the frequency response is extended as high as possible. This is partly due to the ”hypersonic effect” discussed by Oohashi et al, although this study has not managed to obtain a similarly high frequency extension. The Oohashi study used stimuli and speakers with a frequency response which got up to about 100 kHz. The equipment in this study however only delivers up to slightly above 20 kHz. Another reason to include high frequency contents is to be able to trigger the potential filter anomalies of the lower sample rate, as discussed by Craven. This has impacted the choice of microphones, speakers and instruments.

For similar studies, the stimulus is often a high sample rate recording which is then converted to a lower sample rate. It can be argued that the difference between the stimuli are not the sample rate – but a sample rate conversion. This is not the same thing. This study therefore takes a different approach. Instead of sample rate conversion two independent recorders are

recording the same signal at different sample rates.

It is common with ABX listening tests in other fields within audio engineering. These typically utilize a computer interface which lets the listener switch back and forth between A, B and X and also repeat or loop the stimulus a number of times. However, no such application seems to exist that can switch seamlessly between different sample rates without using up- or downsampling. As discussed before, this study attempts to avoid that solution. The playback of the stimuli has been solved in different ways in previous studies. The Meyer and Moran used an analog ABX-unit which let the listener switch back and forth between the versions. Kanetada on the other hand used a paired comparison listening test where the listener was forced to listen through the whole piece, without being able to switch back and forth. The researchers argument for this is that the benefits of HRA are mostly perceived as a long term effect. This claim needs to be studied further to be verified, therefore this study focuses only on the short term audible differences.

The playback in this study was solved by using two independent playback devices set at different sample rates. The playback devices was connected to a monitor controller which enabled the listener to switch back and forth between the two versions. It was initially intended to setup an ABX-listening test. However, the problem occurred that no two identical devices were available which supported both MMC and MTC/SMPTE. This was required to be able to have identical start/ stop-times for the two machines. The Sound Devices 702T was chosen as both recorder and playback device due to its good characteristics, its portability and the fact that two identical machines were available. The device also facilitates randomization since it can change sample rate quite quickly between different recordings. One problem with the devices was that it consistently took a bit longer to start the

(10)

recording of the higher sample rate. This made an ABX-listening test unsuitable: supposing recorder A was connected to button A and X, and recorder B connected to button B, switching from X to A would result in a smooth transition, while switching from X to B would result in that the recording jumps in time. This would probably make it obvious to the listener that X and A was the same. For ABX-tests, it is consequently very important to have identical start- and stop times. Instead the method of paired comparison was chosen, where the listener was asked which of two versions he or she preferred. With this method, the time differences would not affect the results as much. The playback was controlled by the test operator in a nearby room, so that the listener would not be distracted.

Preparation of stimuli

Recording

Stimuli was recorded prior to the listening tests. Stereo recordings were made of the instruments drums, grand piano, harmonica and acoustic guitar, each recorded separately. These stimuli were chosen since they are rich in high frequency contents and that they cover a broad range of instruments and genres. Since it is debated if listeners can discriminate high sample rates, there are no reliable research on

what stimuli could be termed as ”critical”. This is the termed used in ITU-R BS 1116 to define material which stresses the system under test.

The signal from the stereo microphone pair was sent to the active split, which in turn sent the signal to two identical recorders. One recorder was set at 44.1 kHz 24 bits and the other was set at 96 kHz 24 bits. In case there would be any small differences between the recorders, it was randomized so that each recorder was set at one sample rate as often as the other. See figure 2.1 for a block diagram.

Editing

The standard ITU-R BS 1116 recommends excerpts in the range of 10-25 s. This is a very short time to be able to make a decision, therefore it is common that softwares – such as ARL STEP – enables the user to loop or repeat an excerpt manually. This would not be possible since the playback was controlled by the test operator. Instead it was chosen to use excerpts in the specified time frame and create identical loops for the two sample rates.

Table 2.1 Recording equipment

Device Frequency Response 2 Sound Devices 702T 10 Hz-40 kHz (+0.1, -0.5 dB) Schoeps CMC 5-U with MK 2 capsule 20 Hz-20 kHz

DPA 4015A pair 40 Hz-20 kHz (±2 dB) DPA 4006A pair 10 Hz-20 kHz (±2 dB) Klark Teknik

Square One

20 Hz-20 kHz (±0.5 dB)

(11)

This was done by importing the 44.1 kHz recordings to a Pro Tools session set at 44.1 kHz. A loop was created with suitable crossfades. The gain was adjusted so that they showed -23 LUFS according to the EBU R128 integrated loudness meter meter and then the loop was bounced. The track was then exported using the feature ”Export track as new session”. In a separate Pro Tools session set at 96 Khz, the same recording in 96 kHz was imported. The feature ”Import session data” was used so that the 44.1 kHz version could be used as a template. The same excerpt was located on the 96 kHz version and was aligned as accurately as possible to the template. An identical loop with identical crossfades was created, see figure 3.2. After the gain had been adjusted so that the loop showed -23 LUFS the 96 kHz version was bounced. Note that no

signal processing took place except for gain adjustments and editing. Even the loudness meter was removed from the track in case it would alter the audio in any way.

With this process 15 loops was created, exclusive of one loop reserved for a training round.

Randomization and blinding

It was decided that the recorders used for playback should not be set constantly at the same sample rate. If that had been implemented it would be uncertain if a preferred sample rate would be a result of the sample rate – or that the recorders has some difference between each other. Therefore, the excerpts was shared so that the recorders would play each sample rate as often as the other. This division was carried out using true randomness . 1

In listening tests it is regarded important that the order of the stimuli is randomized between

Table 2.2. Software

Pro Tools 12.4

Hofa 4U meter, Fade & M/S-pan

Provided by random.org

1

(12)

each subject. This is due to that one stimulus can interfere with the following, and randomization is one way of overcoming that. Unique lists was created for each subject with the order of the stimuli. The order was carried out using true randomness.1

Note that these lists didn’t provide any information of which sample rate belonged to which recorder – that information was in a separate document so that the answers could be identified later. The playback devices shows the current sample rate in a small section of the display. This section was covered with tape so

that the test operator would not be aware of which recorder delivered the higher sample rate. In that aspect this test is regarded as a double blind study.

Listening tests

The listening tests took place at a control room named K2 at School of Music in Piteå. The recorders, which were now used as playback devices, was controlled by the test operator in a separate room in front of the listener, as is seen in figure 2.4. The recorders was connected to a monitor controller with an analog balanced c o n n e c t i o n . T h e m o n i t o r controller enabled the listener to switch between the two recorders with two buttons labelled ”A” and ”B”.

16 audio technology students in the age of 20-35 participated. The listeners are viewed as experienced with the education program in regard. Of them were 14 male and 2 female. The participants are unlikely to have

Figure 2.3. The control room of the listening tests.

(13)

any hearing impairments: the program requires a certificate of normal hearing prior to admission. Each listener was instructed individually to switch between the two versions A and B, and then fill in a form on which version he or she preferred. See Appendix. As each stimulus was a loop, and the recorders was controlled by the test operator in the close by room, the listener was instructed to give the sign thumbs up to the test operator – who was visible through the window – when he or she had answered and was ready to go on to the next excerpt. The listeners controlled the volume by themselves. Each listening test took around 15-25 minutes. Note that the participants was fully aware of what was being studied, including the sample rates that were being used. The reason for this is the previous studies which indicates a very small audible difference between the audio qualities. If the subjects are aware of what is being studied they should be able to focus more easily on small details and it also gives an opportunity for the subject to include previous experience on the matter.

!

Fig. 2.4. Sound Devices 702T, used for both recording and playback.

Measurements

Measurements was made to ensure that the signal path didn’t degrade the signal in any significant way.

The frequency response, THD and self noise was measured for the analog signal chain, i.e. the balanced analog connection from the patchbay and through the monitor controller. The volume from the recorders through the signal chain was also measured, to ensure that no volume differences between the recorders existed. This was done by sending a sine wave to the inputs of each recorder and measure the voltage after the monitor controller. The process was repeated for every input for both sample rates.

The self noise of the room was measured to show the listening conditions of the control room.

The recorders seemingly produces identical output voltages through the signal path for all outputs for both 96 kHz and 44.1 kHz, as is seen in table 2.5.

Table 2.4. Measurement equipment

Neutrik Audio Transmission Test Set TT402A

Norsonic Nor 131

Table 2.3. Playback equipment

Device Frequency Response 2 Sound Devices

702T

10 Hz-40 kHz (+0.1, -0.5 dB)

Crane Song Avocet 20 Hz - 30 kHz (± 0.05 dB) Klein + Hummel O 870 n/a Klein + Hummel O 410 30 Hz - 24 kHz (±3 dB)

(14)

The remaining measurements are presented in table 2.6. No measurements were performed on the microphones or speakers, since no appropriate equipment was available. Instead the frequency responses specified by the manufacturers are presented in table 2.1 and table 2.3.

bajskorv

Statistical analysis

The results was analyzed with the significance level 𝛼 set to 0.05. The null hypothesis is that the two sample rates were equally preferred. The alternative hypothesis is that either 44.1 kHz or 96 kHz was preferred. The p-value was found with a two-tailed binomial test

calculated with R. A p-value less than 0.05 is viewed as significant.

One problem with paired comparison listening tests is that listeners may have different preferences. It also possible that one sample rate would be more preferred for a certain instrument, but not another. To see if there was any such pattern, the two-tailed binomial tests was performed both for each listener and for each excerpt.

Table 2.6. Miscellaneous measurements

Room self noise Leq: 21,2 dBA Signal chain self noise

CCIR468-3

-85 dBQ Frequency response signal

chain 20 Hz - 30 kHz

+/- 0.05 dB Frequency response signal

chain 20 Hz - 300 kHz

+/- 0.65 dB THD+N signal chain 0,045 % (1

kHz)

Table 2.5. Output voltage

Device and channel Voltage (dBU)

702 A ch 1 44.1 −9,2 702 A ch 2 44.1 −9,2 702 A ch 1 96 kHz −9,2 702 A ch 2 96 −9,2 702 B ch 1 44.1 −9,2 702 B ch 2 44.1 −9,2 702 B ch 1 96 kHz −9,2 702 B ch 2 96 −9,2

(15)

Results

Listening tests

From a total of 240 trials, 121 times the 96 kHz version was preferred. This gives a p-value of around 0.9486. Consequently, this study fails to reject the null hypothesis.

In table 3.2 the preferences of the listeners are presented and analysed individually. Overall, no significant preference exists. The exception is for listener 10 – highlighted in blue – who with a significant difference preferred 96 kHz over 44.1 kHz.

As is seen on table 3.3, there is no significant preference for any sample rate when the excerpts are analysed individually.

Table 3.3. Binomial analysis of the preference of individual excerpts

Excerpt code No. of successes P-value

T01 8 1 T02 8 1 T03 7 0.8036 T04 9 0.8036 T05 8 1 T06 5 0.2101 T07 9 0.8036 T08 9 0.8036 T09 7 0.8036 T10 8 1 T11 6 0.4545 T12 8 1 T13 8 1 T14 9 0.8036 T15 11 0.2101

Table 3.2 Binomial analysis of listeners individual preferences

Listener no. No. of successes P-value

1 6 0.6072 2 5 0.3018 3 8 1 4 8 1 5 8 1 6 9 0.6072 7 6 0.6072 8 8 1 9 10 0.3018 10 12 0.0352 11 4 0.1185 12 9 0.6072 13 8 1 14 5 0.3018 15 8 1 16 7 1

Table 3.1. Results of summed trials

Trials Successes P-value

(16)

Discussion

The results of this study points to that the listeners did not prefer 96 kHz over 44.1 kHz. No correlation could be found for the individual excerpts either. For the individual listeners only one significantly preferred 96 kHz over 44.1. From the results, no evidence can be found which supports the claimed audible benefits of a higher sample rate than 44.1 kHz.

However, the accuracy of the results can be discussed. One thing that may have distorted the results to some degree is the delay that occurred between the two sample rates. The excerpt with 96 kHz was constantly starting after the 44.1 kHz version, which results in a systematic error. This could for example lead to that listeners consistently preferred the earlier version of each excerpt, although the results does not point to this. For further studies – if the approach with two playback devices is used again – the problem with delay between the devices should be considered. One solution could be to use devices which support MMC and SMPTE or MTC, in this way they should be able to follow a masters timecode and start/stop commands. Another solution if such devices are not available, would be to randomize the delay. This could be done by randomizing the time before each excerpt starts within the editing process. This would take away the systematic errors which occurred in this study and turn them into random errors. In this way, the effect on the results would be minimized.

These results could be compared with the previous studies from Mizumachi et al. (2015) and Kanetada et al. (2013). Those studies found some evidence for that listeners preferred a stimulus with the higher sample rate. However, some major differences compared to this study exists that should be pointed out.

First: the sample rate. This study uses 96 kHz as the higher resolution version, while they are using 192 kHz. This study is focused on the short term audible differences between two sample rates, and uses stimuli in the range from 10 s to 25 seconds. Kanetada et al. and Mizumachi et al. have researched the long term listening effects of audio at different resolutions by using stimuli around 120 seconds long. The speakers used in those studies may also have a higher frequency extension than the ones used in this thesis. These aspects could explain the differences in results. Yet another – perhaps even more important difference – exists: In those two studies the sample rate was not the only difference between the stimulus. There was a difference in bit depth as well, as they were comparing 192 kHz 24 bits to 48 kHz to 16 bits.

The results falls more in line with the study by Meyer and Moran (2004). In contrast to this study and the above discussed studies, Meyer and Moran conducted an ABX-test as opposed to a paired comparison test. Thus, they did not study any preference, only the discrimination of audio at different resolutions. As discussed previously, the listeners couldn’t discriminate between the audio qualities with any significance. One similarity here is that the listener was able to switch back and forth manually between A, B and X.

Future research on the topic should be done. Besides researching the audible effects of different sample rates for stronger evidence in any direction, the audible effects of higher bit depths should be investigated. Another related topic that could be investigated by the means of listening tests is the audible effects of different anti-alias and reconstruction filter designs. This could perhaps more accurately investigate the benefits of higher sample rates that Craven (2004) discusses. This study has no control or knowledge of what types of filters is implemented in the converters used, which is a problem.


(17)

References

Aldrich, N. (2005). Digital audio explained. Fort Wayne: Sweetwater Sound.

Oohashi et al. (2000). Inaudible High-Frequency Sounds Affect Brain Activity: Hypersonic Effect. Journal of Neurophysiology, 83, 3548-3558.

Craven, P. (2004). Antialias Filters and System Transient Response at High Sample Rates. Journal of the AES, 52, 216-242.

Kanetada, N., Mizumachi, M., & Yamamoto, R. Evaluation of Sound Quality of High Resolution Audio. Proceedings of the 1st IEEE/IIAE International Conference on Intelligent Systems and Image Processing, 51-56.

Meyer, E., & Moran, D. (2007). Audibility of a CD-Standard A/D/A Loop Inserted into High-Resolution Audio Playback. Journal of the AES, 55, 775-779.

Mizumachi, M., Niyada, K., & Yamamoto, R. (2015). Subjective Evaluation of High Resolution Audio Under In-car Listening Environments. In AES 138th Convention (pp. 1-5). Warsaw, Poland. Recommendation ITU-R BS.1116-3. Methods for subjective assessment of small impairment in audio systems. February 2015.

Sveriges Radio (2004). Digital ljudlagring på CD och DVD. Stockholm: Sveriges Radio.

Additional resources

Avid Technology Inc. (2015). Pro Tools (Version 12.4.0) [Software]. Available from http://www.avid.com/en/pro-tools

Crane Song Ltd. (2009). Avocet Operator’s manual.

DPA Microphones. d:dicate™ 4015A Wide Cardioid Microphone Quick Guide. DPA Microphones. d:dicate™ 4006A Omnidirectional Microphone Quick Guide. Klein + Hummel (2008). Operating manual O 410.

https://www.hdtracks.com

Hofa-plugins. Hofa 4U meter, Fade & M/S-pan (Version 2.0.2) [Software]. Available from https://hofa-plugins.de/en/plugins/4u/

http://www.itrax.com

Klark Teknik. (2008). Square One Splitter Operator Manual. https://www.random.org/lists/

R (Version 3.2.4) [software]. Available from http://ftp.acc.umu.se/mirror/CRAN/index.html Schoeps Mikrofone. Colette Modular System User Guide.

Sound Devices 702T. High Resolution Digital Audio Recorder with Time Code User Guide and Technical Information.

(18)

Appendix

Lyssningstest

Lyssnare nr:

Kön:

Ålder:

Lyssningstestet undersöker lyssnares preferenser av olika samplingsfrekvenser.

Besvara vilken av A eller B som du upplever dig föredra. Ringa in ditt svar.

Träning:

A

B

1.

A

B

2.

A

B

3.

A

B

4.

A

B

5.

A

B

6.

A

B

7.

A

B

8.

A

B

9.

A

B

10. A

B

11.

A

B

12. A

B

13. A

B

14. A

B

15. A

B

References

Related documents

In the latter case, these are firms that exhibit relatively low productivity before the acquisition, but where restructuring and organizational changes are assumed to lead

Both Brazil and Sweden have made bilateral cooperation in areas of technology and innovation a top priority. It has been formalized in a series of agreements and made explicit

Av 2012 års danska handlingsplan för Indien framgår att det finns en ambition att även ingå ett samförståndsavtal avseende högre utbildning vilket skulle främja utbildnings-,

 What is the interest to increase the cooperation of higher education, research and innovation, and which focus areas is identified by the institutions in the Nordic countries

Our study is hence unique since we investigate the relationship between the congestion tax and emission levels by taking the negative trend in emissions and meteorological

approval for research on somatic cell therapy and that it was sufficient to protect individuals from any risks posed by such research. While the rules that govern

This either suggests that the design and execution of the listening test was sub-optimally adapted to suit untrained listeners and was too complicated, that the attributes used

The purpose is to investigate if there is a perceived audio quality difference between an uncompressed signal and a lossy compressed signal coded in common music