2008:139 CIV
M A S T E R ' S T H E S I S
Commercial Detection Based on Audio Repetition
Joakim Wedin
Luleå University of Technology MSc Programmes in Engineering Arena, Media, Music and Technology
Department of Computer Science and Electrical Engineering
Division of Signal Processing
Abstract
Television today is a natural part of many people’s homes. There are technical products that automatically record your favorite shows and programs. The end users ability of choice is a highly valued sales argument in television consumption products. One thing that the user still can not control though is the presence of commercial breaks.
Popcatcher is a company that has developed a product that automatically records music from radio stations. What is so special about it, is its ability to distinguish songs from commercial or radio talkers. This is done with help of the fact that popular music is repeated in radio broadcasts. Popcatcher now wants to investigate if the same model can be used to detect commercial in television broadcasts.
The goal of this thesis is to build a model and show how well commercial detection could get based on the fact that commercials also are repeated. Dif- ferent modifications were made to the original algorithm so it eventually formed a commercial scan. One of these modifications was the extended search logistics that determines how the algorithm should react when a potential commercial is found. The model also has a function that somewhat can make up for a failed detection and fill in empty gaps in the detection to some extent.
Based on results from a commercial scan made on Swedish TV6, a commercial hit percentage of above 80 percent was achieved when using a seven hours long audio source material. Strong indications show that increasing the source length most likely would improve this percentage further.
The commercial scan model presented in this thesis works more like a com-
mercial block detector than an individual commercial detector. One of the draw-
backs of the model is the learning time of the system. It has to accumulate and
store several hours of input data before giving somewhat fair results. An advan-
tage is the fact that it only analyses audio input in contrary to the often more
demanding video analysis other methods use. This thesis shows that commer-
cial detection based on repetition in fact can be done. There is however many
improvements left to be made and the work done in this project should be seen
as a guidance for further development.
Preface
I first and foremost want to thank the Berg brothers at Popcatcher for giving me the opportunity of making this project. It has been an incredibly experiencing period and working alongside encouraging colleges has been a pleasure.
I also want to give credit to all friends and family from dear Örnsköldsvik that has supported me in the various troublesome times. Friends from MMT , you know what I am talking about. Special thanks to Tomas for lending me his apartment during the project time.
To all of you who are about to read this Thesis, I honor you. If the text for
some inexplicable reason gets boring, there are still a lot of beautiful illustrations
to watch.
CONTENTS CONTENTS
Contents
1 Introduction 3
1.1 Background . . . . 3
1.2 Problem Formulation . . . . 3
1.3 Scope . . . . 3
2 Theory 4 2.1 Discrete Time Fourier Transform . . . . 4
2.2 Windowing Functions . . . . 4
2.2.1 Rectangular Window . . . . 4
2.2.2 Hamming Window . . . . 5
2.3 Window Size Properties . . . . 5
2.4 Frequency Spectrum Analysis . . . . 6
2.5 Cancellation . . . . 7
2.6 Downsampling . . . . 7
2.6.1 Filtering . . . . 8
2.6.2 Decimation . . . . 8
2.7 The Repetition Search Algorithm . . . . 8
2.7.1 Creation of Search Track . . . . 9
2.7.2 Repetition Search Process . . . . 9
3 Algorithm Modifications 11 3.1 Silence Detection . . . . 11
3.2 Close Hits Compensation . . . . 12
3.3 Quality Hit Length Sorting . . . . 12
3.4 History Track . . . . 12
3.5 Extended Search Logistics . . . . 13
3.6 Hit Merging . . . . 13
3.7 Search Modes . . . . 13
3.7.1 Thorough Mode . . . . 13
3.7.2 One Hit Abort Mode . . . . 14
4 Commercial Scan Results 15 4.1 TV3 . . . . 15
4.2 TV6 . . . . 17
4.3 TV4 . . . . 18
4.4 SVT1 . . . . 19
5 Discussion 20 5.1 Search Track Length . . . . 20
5.2 History Track Length . . . . 20
5.3 Merge Variable . . . . 20
5.4 False Hits . . . . 21
5.5 Potential Commercial Lengths . . . . 21
5.6 Commercial Scan Step Size . . . . 22
5.7 Real Time Model . . . . 22
5.7.1 Worst Case Scenario Example . . . . 22
5.7.2 Possible Hit Limits . . . . 23
6 Conclusion 24
CONTENTS CONTENTS
7 Further Studies 25
7.1 To Investigate . . . . 25
7.2 Model Improvements . . . . 26
1 INTRODUCTION
1 Introduction
1.1 Background
PopCatcher is a company that has developed a radio that can distinguish music from radio talkers, news broadcasts and commercials. The music is then stored as separate songs on an internal memory. The Popcatcher algorithm is based on the fact that popular music is played repeatedly. By comparing current audio segments with past ones, repetitions can be found.
Television consumption can nowadays be done in many different ways. There are TV sets with built in hard drives, automatic movie recording and ability to time shift live TV. All these are features that extend the concept of watching TV and give the end user more ability to control his or her own viewing.
1.2 Problem Formulation
One of the things the user can not control is the presence of commercial breaks.
Commercials hold the same properties of repetition as popular music. A com- mercial spot is rarely aired once only [12]. Therefore the PopCatcher technology already is able to detect commercials in broadcasts, however, in their present al- gorithm commercials are considered as garbage due to its relatively short length compared to a song. Thus, the algorithm does not handle the detection in any useful manner since no further action is taken upon the finding of a short repe- tition.
This thesis investigates how the PopCatcher algorithm can be modified to fo- cus on locating commercials in television broadcasts. The goal is also to evaluate the results of the modifications and suggest further development.
1.3 Scope
There are other existing methods of performing commercial detection which often are based on image processing. Black frames between breaks [10], presence of channel logos [1] and high rates of color changes [3] are some of the parameters authors use in their detection schemes. However, due to time restraints no thorough research in these other methods of commercial detection has been made in this project.
The project starts with a study in the Popcatcher algorithm and the basic
theory behind it. Next is an introduction to what modifications were made in
the algorithm along with descriptions of problems along the way and solutions
to them. After that the final detection scheme is presented and explained. Fur-
thermore, results of how different parameters affect the outcome are presented
and evaluated. All testing and modifications of the algorithm has been done in
a Matlab programming environment. The Matlab source code for the original
algorithm was provided by Popcatcher.
2 THEORY
2 Theory
The theory section will review the basics of the different components used in the Popcatcher algorithm. It will also handle an explanation of how these com- ponents are put together to achieve an efficient search for repetition in media channels. More information about the basic theory can be read in [11] and [8].
One way to compare signals is to look at the frequency spectrum at different time instances.
2.1 Discrete Time Fourier Transform
In order to understand how to analyze the frequency content of a signal, one will have to acquaint with the concept of Fourier transforming. By performing this transform, a signal can be viewed in a frequency domain representation instead of the ordinary time domain representation. When considering a discrete time signal x[n], the Discrete Time Fourier Transform (DTFT) of x[n] would then be described as
F {x[n]} =
!
∞ n=−∞x[n]e
−iωn(1)
The F here denotes the Fourier operator. The input x[n] is transformed into what can be written as X(ω). Hence, F {x[n]} = X(ω). The variable ω denotes frequency in radians and the function X(ω) is periodic with period 2π. The periodicity can be explained as X(ω + 2ωk) = X(ω) where k is an integer value.
If ω
sis the frequency that x[n] was sampled at, due to the sampling theorem [11] would the highest represented frequency component of X(ω) be ω
s/2. The output of X(ω) will be complex exponentials that can give amplitude and phase information about different frequencies.
2.2 Windowing Functions
When using the DTFT in a practical way to view spectrum changes in a signal over time, a limited amount of data has to be considered. A window function is a function that is zero valued outside a certain time interval. By multiplying an input signal with a window function, a small part of the input can be extracted.
These functions can be constructed in different ways and two commonly used functions are called rectangular and hamming.
2.2.1 Rectangular Window
The simplest window function is the rectangular window since it is of value one within a specified range . If the rectangular window is set to be called w
r[n], the function can be described as
w
r[n] =
"
1, 0 < n < L − 1
0, else (2)
where L is the window length. By multiplying w
r[n] with the input x[n], an approximation of the DTFT can be formed as
X(ω) = ˆ
!
∞ n=−∞w
r[n]x[n]e
−iωn. (3)
2 THEORY 2.3 Window Size Properties
This means that that the DTFT generates uniformly sampled values from ˆ X(ω) according to equation 4.
DT F T
n= ˆ X(ω)
ω=2πk/N(4) 2.2.2 Hamming Window
Using a rectangular window is convenient but it has a disadvantage because it induces spectral leakage that can be read about in [11]. In short, the leakage can be explained as a smearing effect on the estimated frequency spectrum. This effect can be reduced by weighting samples within the window with a non static function. The hamming window w
h[n] is one example of such a function. The mathematical expression of w
h[n] is shown below in equation 5.
w
h[n] =
"
0.54 − 0.46 cos(
L2πn−1), 0 < n < L − 1
0, else (5)
As can be seen in figure 1, the samples in middle of the time window are consid- ered most important and samples at the edges are suppressed. This introduces a certain loss of information and as a compensation to that can something called overlapping be used. When analyzing the frequency spectrum of a signal over time, concurrent windows of data are extracted from the signal. These windows are then set up to overlap each other in time. This way is the loss of data compensated.
0 L-1
0 0.2 0.4 0.6 0.8 1
Sampels
Amplitude
Figure 1: Illustration of hamming window function.
2.3 Window Size Properties
When extracting a part of a signal as explained in section 2.2.1 and 2.2.2 the window size L turns out to have a large impact on the approximation of the DTFT. When analyzing a signal over time, a crucial decision about the window size L has to be made. A large L value will give a good frequency resolution but a poor time resolution. This means in practice that distinguishing single tonal components is done better with a large L, but then rapid changes in the frequency spectrum are hard to discover. Choosing a small L will lead to the opposite effect, that is, a poor frequency resolution but good time resolution.
The size of L is equal to how many frequency bins the approximated DTFT
will consist of. If L = 256 and the input sample rate is ω
s, the spacing between
the bins would thus be (1/256)ω
s.
2 THEORY 2.4 Frequency Spectrum Analysis
2.4 Frequency Spectrum Analysis
Consider the DTFT estimation of a signal x[n] at two different time instances, X ˆ
n(ω) and ˆ X
n+k(ω) where k denotes a time lag. The frequency spectrum of those time instances is then achieved by calculating the magnitude of the trans- forms, hence | ˆ X
n(ω)| and | ˆ X
n+k(ω)|. A thorough mathematical description of how the magnitude calculation is done can be found in [11] and different methods to compare the spectrums is found in [6].
An example of | ˆ X
n(ω)| using different window functions is shown in figure 2 and 3, this when the input x[n] consists of a single sinusoid of frequency ω
s/5.
The true spectrum of x[n] would then be equal to δ(ω
s/5) + δ( −ω
s/5).
Comparing figure 2 with figure 3 reveals the spectral leakage when using a rectangular windowing function as a bit more side clutter surrounding the peak in figure 2.
Frequency
|X(ω)|
ωs/4
0 ωs/2
Figure 2: Example of magnitude plot of ˆ X(ω) when using a rectangular window function with L = 128.
Frequency
|X(ω)|
ωs/4
0 ωs/2
Figure 3: Example of magnitude plot of ˆ X(ω) when using a hamming window function with L = 128.
A common way to approximate the DTFT for signals of finite length is by doing
a Fast Fourier Transform (FFT). The FFT is a modified version of the DTFT
that is more computationally efficient. Through a series of numerical operations
2 THEORY 2.5 Cancellation
the number of proportional multiplications to evaluate equation 3 is lowered from L
2to Llog
2L. This is often desirable in the field of digital signal processing.
2.5 Cancellation
Cancellation is a fairly straightforward mathematical way to calculate similarities in signals. This method can be used in different ways and below is a description of how it was performed in this project.
Let us say we have a discrete time signal x that contains none zero infor- mation. Let n and k be integers that denote time instances and m to be an integer that determines a summation size. By subtracting the value in x at time instance n with the value in x at time instance n + k, a sample by sample error that compares x at different times is calculated. The absolute value of this error is then summed up with the m − 1 following values. This way a vector e
c[k] as seen in equation 6 can be constructed that gives us information regarding how different sections of x relates to itself at time lag k. If two sections of x are identical, this will give a cancellation error e
cequal to zero. In practice, similar sections will yield a very low error.
e
c[k] =
m
!
−1 n=0(|x[n] − x[n + k]|) (6)
Time ec
0 1 -1 1 -1 1
AmplitudeAmplitude
x[n]
x[n+k]
ec[k]
Figure 4: Example of two similar sections in a signal x at a fixed k value.
2.6 Downsampling
Computational complexity is a limiting factor in many real-time signal process- ing applications. There are however several different ways to reduce the amount of data in a signal. This reduction can be a useful tool to minimize computa- tional load when analyzing large sets of data. One quite straightforward way to recuce data is to only keep every Kth value in the signal. It is done in two steps, first a filtering and then a decimation of the input.
input Filtering Decimation output
Figure 5: Signal flow diagram of a downsampling process.
2 THEORY 2.7 The Repetition Search Algorithm
2.6.1 Filtering
Consider a signal x[n] with sampling frequency ω
sxthat has a non zero frequency spectrum up to ω
max= π. In order to decimate that signal without the appear- ance of aliasing effects and still maintain the sampling theorem criterion, the bandwidth of x must first be reduced so ω
max= π/K. More information about the aliasing effect and the sampling theorem can be read in [11]. The bandwidth reduction can be made by passing x[n] through a low pass filter characterized by the impulse response h[n]. Ideally this low pass filter would have the frequency response
H[ω] =
"
1, ω <
Kπ0, else . (7)
00 0.2 0.4 0.6 0.8 1
Frequency
Frequency Response
π/K 2π/K
Figure 6: Frequency response of h[n].
The filtered output of x[n] would then be described as
v[n] =
!
∞ k=0h[k]x[n − k]. (8)
In practice, filters of the character described in equation 7 are not realizable.
This due to the fact that the impulse response h[n] then would be a sinc function of infinite length. Most commonly used filters have a roll off after their cut off frequency as can be seen in figure 7.
2.6.2 Decimation
The more straightforward part of the downsampling process is the decimation.
Here every Kth sample of the filtered input is picked out to form the final output according to the equation
z[n] = v[nK]. (9)
When this process is done, the signal z[n] has got the sample rate ω
sz= ω
sx/K .
2.7 The Repetition Search Algorithm
Popcatcher has developed an algorithm that takes advantage of repetitions in
media streams to capture popular music. By using analysis methods such as
DTFT and cancellation, as seen in sections 2.1 and 2.5, the position and length
2 THEORY 2.7 The Repetition Search Algorithm
00 0.2 0.4 0.6 0.8 1
Frequency
Frequency Response
π/K 2π/K
Figure 7: Frequency response of a realistic low pass filter.
of a repetition can be found. In short, there are two processes running simulta- neously in the Popcatcher real time application:
• Creation of Search Track
• Repetition Search Process
These processes are further explained below.
2.7.1 Creation of Search Track
In order to make the repetition search process efficient, the audio data fed to the algorithm has to be processed and divided in a certain manner. Figure 8 shows a flow chart of the whole procedure. In this process there are two different audio tracks involved:
• Full Quality Track: An audio file containing a small part of the input media stream as an uncompressed pcm coded wave file.
• Search Track: A downsampled version of the full quality track that is continuously filled with up to F days of data. It is named s[n].
Audio
Input Stream Full Quality Track Downsampling Search Track
Figure 8: Signal flow chart of search track creation.
2.7.2 Repetition Search Process
The search process is divided into a number of steps that in this section are explained in the order they are performed.
Fingerprint Extraction In order to speed up the search process, the full quality audio track is not used when performing any of the needed calculations.
First off, a number of consecutive samples are extracted from a desired start
point in the search track until it forms a P long vector. This forms what hereby
is referred to as a fingerprint.
2 THEORY 2.7 The Repetition Search Algorithm
Coarse Comparison Test The fingerprint is then compared to the complete search track at different time lags. A number of comparison values are calculated using methods mentioned in [6] and are compared to determined threhold values.
The bigger size of fingerprint used, the better hit accuracy in finding rep- etitions. However, the size of the fingerprint becomes a compromise between computational load and a good hit accuracy. The coarse comparison test will hence generate a number of possible hit locations as seen in figure 9, where the number of false hits is connected to the fingerprint size. A small P value will in general yield more false hits than a large P value.
Search Track
Fingerprint Possible Hit
Possible Hit
Figure 9: Illustration of comparison test.
Fine Comparison Test The possible hits from the coarse comparison test are then run trough a more thorough comparison test where a larger sized fingerprint is used. In combination with a short cancellation test, the possible hits are evaluated. All hits that do not match up to the predetermined demands of this test are disregarded. A possible hit that passes the fine comparison testing is transferred to the next tryout. These hits we refer to as quality hits.
Endpoint Search Starting from where the quality hit was located in the search track, an iterative comparison process is started running first forwards then backwards to calculate an error vector e[k]. This process gets a new finger- print out of the search track k samples next to the original fingerprint location.
The new fingerprints are matched with data picked out k samples next to the quality hit location as illustrated in figure 10. This data is extracted from the search track in the same way as the fingerprint and therefore also forms a P long vector.
Search Track Quality Hit
Search Track
Fingerprint
Figure 10: Illustration of endpoint search.
This endpoint process continues until e has exceeded a threshold value in both
directions. When this has occurred, the endpoints of a repetition have been lo-
cated. If the repetition length matches some predetermined demands, the stated
start and stop locations are stored. In this project these start and stop locations
together form what is chosen to be called a potential commercial.
3 ALGORITHM MODIFICATIONS
3 Algorithm Modifications
The presented modifications made to the Popcatcher algorithm has been done on a Matlab v.6.5.0.180913a demonstration version of the algorithm. Unedited it was set up to perform a single repetition scan through a prerecorded audio file of commercial radio. Since this not is a real time model, the search tracks had to be precalculated to be able to run the demo. Initially the demo worked in this manner:
• Through a command prompt the user is asked to select a position in the full quality audio file.
• The repetition search process is run as explained in section 2.7.2.
To get the algorithm to search for commercials instead of music songs, the whole search was modified to be executed repeatedly. In practice this means that the user states a section in the search track that is desired to analyze for repetitions.
Between the start and the end of the section, a number of fingerprints were extracted and compared to the full search track. The distance between the extractions was set to a constant called commercial scan step size c
step. The resolution of the scan was thus determined by c
step as visualized in figure 11.
All individual fingerprint hit data was set up to be stored in a matrix. With this modification done, a graph showing repetition information in a media stream could be drawn. From now on we call this procedure as doing a commercial scan.
A library of useful search tracks was established trough recording audio from different Swedish commercial TV stations with an analog TV tuner.
0 1 2 3 4 5 6 7
Found repetitions
Time cstep
Figure 11: Example plot of commercial scan data.
3.1 Silence Detection
The first commercial scan performed on a search track created from Swedish
TV6 gave many false repetition hits. When investigating a number of these hits
closer, it could be seen that there was a clear connection between low audio
level in the fingerprints and found false repetitions. Therefore a function was
3 ALGORITHM MODIFICATIONS 3.2 Close Hits Compensation
created that weighs how many percent of the samples in the area around the fingerprint extraction location that was below a set threshold value s
th. If a too high percentage of the samples are below the threshold, a new fingerprint is extracted one second in front of the old one. This function removed almost all false hits related to low audio level.
3.2 Close Hits Compensation
After the fine comparison test in each individual repetition search performed in the commercial scan, as explained in section 2.7.2, a number of so called quality hits are left. It turns out that these quality hits often appear in blocks. For example, one repeated commercial can result in several hits within the repeated section. This phenomenon could often be connected to short repeated audio loops running in the background within a commercial. Two different methods have been used to attack the problem with blocks of close hits:
1. The quality hit with lowest cancellation value is the one kept.
2. All hits are analyzed with the endpoint search method, see section 2.7.2, and the hit that resulted in the longest repetition is the one kept.
Method number 2 turned out to give the best results, so that one was used. The size of the allowed distance between the first and the last quality hit is set by a constant. This constant was determined through empirical testing to 32 seconds.
3.3 Quality Hit Length Sorting
The endpoint search will calculate a length of each quality hit. A way to sort out unwanted material is to set up limits regarding how long a hit is allowed to be.
These limits would preferably be related to minimum and maximum standard lengths of commercials. In the demo used in this project these lengths were set to 62 seconds as maximum and 3 seconds as minimum according to data found in [12]. Anything outside of that range was considered irrelevant. The true maximum and minimum lengths are 60 and 5 seconds, but using a safety marginal of a few seconds turned out to be necessary. This because the endpoint scan sometimes states potential commercials as a second too long or short.
Having limits like this would in an ideal case remove the issue re-aired media episodes creates. This because they by definition are repetitions with lengths longer than 62 seconds in most cases. See section 7 for further coverage in this matter.
3.4 History Track
During the progress of the commercial scan, an increasing number of repetitions
will be stated. By storing all these in a separate file a new track called history
track is established. It is stored in the same format as the search track. This
track would ideally be continuously filled with commercials. As a way to increase
the possibility of finding a quality hit faster, a new modified version of the repe-
tition search process was added. It does the same thing as the search mentioned
in section 2.7.2 with the only modification of replacing the search track with the
history track. If no quality hit is found in the history track, the search continues
in the regular search track. A further discussion about the different properties
this modification holds is found in section 7.
3 ALGORITHM MODIFICATIONS 3.5 Extended Search Logistics
3.5 Extended Search Logistics
When a potential commercial is found, one can assume that there is interesting data in its neighborhood since commercials often are broadcasted in blocks. One way to take advantage of this fact is to perform an extended search logistic when being in an interesting area of that type.
As can be seen in figure 12, if only c
stepwas used many commercials would be missed if the search moved forward after finding one potential commercial.
Therefore was a new smaller step variable c
step2formed that replaces c
stepwhen beeing in an intresting area. This variable makes sure that the area surrounding the first potential commercial is also scanned. By applying this extended search logistics was the number of found potential commercials substantially increased.
Commercial
Startpoint
Potential commercial
Not commercial cstep
Figure 12: Illustration of problem with using only c
stepvariable.
3.6 Hit Merging
The repetition search case shown in figure 12 is a theoretical model. In prac- tice, the stated length of each potential commercial might not correspond to the actual length of the commercial as seen in figure 13. This can lead to spacing between each individual potential commercial. To counteract this spacing, po- tential commercials that are closer than m
lenseconds to each other are merged.
The default value of m
lenwas set to 5 seconds.
Commercial
Potential commercial Not commercial
Figure 13: Example of search result in practice.
3.7 Search Modes
A way to reduce the amount of necessary calculations for each repetition search in a commercial scan, is to abort the search when a potential commercial is found. Thus there are two modes in which the commercial scan can be made;
thorough mode and one hit abort mode.
3.7.1 Thorough Mode
In thorough mode all quality hits are collected from the fine comparison part of
the repetition search and then the endpoint search function is run on all of them.
3 ALGORITHM MODIFICATIONS 3.7 Search Modes
This is done to see which quality hit yields the longest potential commercial. In this mode information can be extracted about how often a repetition occurs in the whole search track. Although this mode gives a lot of information it has the disadvantage, in terms of computational load, of first finding all quality hits and then evaluating them all before the search is done.
3.7.2 One Hit Abort Mode
This mode is the one more suitable for real time applications due to its higher computational efficiency. When one quality hit is found that through endpoint testing yields an approved potential commercial, the search is ended. This means that many fine comparison tests run on possible hits are skipped. Moreover is the endpoint search in optimum case run only once, if the first quality hit found yields a potential commercial.
The drawback of this mode is its harsh way to end the search process. The
first quality hit found might not be the one that generates a potential commercial
length that matches the full length of the real commercial length. More on this
is discussed in section 7.
4 COMMERCIAL SCAN RESULTS
4 Commercial Scan Results
In this section results are presented that show how well the commercial detection performs on real life signals. The outcome of the algorithm, when in one hit abort mode, can be displayed as a curve showing the value one if a potential commercial is present and zero if not. If the scan is set to thorough mode, the height of the curve represents how many times every fingerprint area has been repeated through the entire search track. This also shows that when doing a commercial scan, quality hits can be found in future data. In a real time model this is not possible so it would probably give a slightly different result.
A database containing information about actual commercial locations was set up through manual listening analysis of the search tracks. In addition to pure commercials there are a few more cases that are considered to be a part of a commercial block in this project:
• TV show advertisement
• Short channel specific jingles
When using a search track size up to 21 hours, recordings from three consecutive days were used where each day contributed with seven hours. These seven hour recordings were made starting from 19.00 hours and forward. In the scans made with extended search track length, the only allowed area to pick fingerprints from was from the last day. The commercial scan was thus performed only on the last day but with the possibility of finding quality hits anywhere in the extended search track.
If nothing else is mentioned, all scans are made in one hit abort mode. A potential commercial is considered correctly stated if the endpoints are not more than one second wrong in time in relation to the real commercial. Any larger length error generates a false hit. Some further statistic parameters also has to be explained to understand the commercial scan results:
• Hit Percentage: How many percent of the real commercials that the commercial scan has stated as potential commercials. False hits are not included in this calculation.
• False Hit percentage: Total length of the false potential commercials in relation to the total length of all potential commercials.
• Number of False Hits: How many potential commercials that not was real commercials.
• Missed Blocks: If there is no potential commercial found at all in a real commercial block, this number grows by one. A block in this project is considered to be two or more connected commercials.
• Total Blocks: Total number of real commercial blocks.
Below follows a selection of results from commercial scans made on Swedish TV channels.
4.1 TV3
The commercial scan using 21 hour search track has one false hit. This false
hit is an introduction jingle broadcasted in the beginning of a TV show. The
same jingle is then also played in the end of the show and the jingle is therefore
caught by the scan. The scan made in thorough mode was mostly made to get
an understanding of how often media is repeated in a broadcast.
4 COMMERCIAL SCAN RESULTS 4.1 TV3
Setup Hit False Hit Number of Missed Block /
Percentage Percentage False Hits Total Blocks
7h search track 39.96 0 0 2/22
c
step=50s
7h search track 47.58 0 0 1/22
c
step=30s
14h search track 63.48 0 0 1/22
c
step=30s
21h search track 75.02 1.02 1 0/22
c
step=30s
Table 1: Commercial scan statistics on TV3.
0 0.5 1 1.5 2 2.5
x 10
42
4
Potential commercials, 7 Hour Search track, c
s tep=50s, Thorough Mode
seconds
R ep etitions
0 0.5 1 1.5 2 2.5
x 10
40
1
Real commercial blocks
seconds
C ommer cial
0 0.5 1 1.5 2 2.5
x 10
40
1
Potential commercials, 21 Hour Search track, c
step=30s
seconds
C ommer cial
0 0.5 1 1.5 2 2.5
x 10
40
1
Potential commercials, 14 Hour Search track, c
s tep=30s
seconds
C ommer cial
0 0.5 1 1.5 2 2.5
x 10
40
1
Potential commercials, 7 Hour Search track, c
s tep=50s
seconds
C ommer cial
0 0.5 1 1.5 2 2.5
x 10
40
1
Potential commercials, 7 Hour Search track, c
s tep=30s
seconds
C ommer cial
0 0.5 1 1.5 2 2.5
x 10
4-0.2 0.2 0
Input TV3
seconds
A mplitude
Figure 14: Results from commercial scans made on TV3.
4 COMMERCIAL SCAN RESULTS 4.2 TV6
4.2 TV6
Setup Hit False Hit Number of Missed Block /
Percentage Percentage False Hits Total Blocks
7h search track 56.94 4.70 5 0/22
c
step=30s
7h search track 83.25 3.78 5 0/22
c
step=30s, m
len=60s
Table 2: Commercial scan statistics on TV6.
0 0.5 1 1.5 2
x 104 0
0.5 1
Real blocks
seconds
Commercial
0 0.5 1 1.5 2
x 104 0
0.5 1
Potential commercials, 7 Hour Search Track, cs tep=30s, mlen=60s
seconds
Commercial
0 0.5 1 1.5 2
x 104 0
0.5 1
Potential commercials, 7 Hour Search Track, cs tep=30s
seconds
Commercial
0 0.5 1 1.5 2
x 104 -0.2
0 0.2
Input TV6
seconds
Amplitude
Figure 15: Results from commercial scans made on TV6.
It can be seen that increasing the merge parameter m
lento 60 seconds from the standard five seconds shows a clear improvement in the hit percentage.
Worth noticing is that the m
lenincrease did not affect the false hit percentage badly. Four of the false hits are jingles for the TV show Simpsons. There are two episodes of Simpsons appearing in the recording and the jingle is broadcasted once in the beginning and once in the end of each show.
The fifth false hit is more non logic. It is a four seconds long clip with noise like background sound and occasional speech. When analyzing audio found in the quality hit location related to the this false hit as seen in figure 16, one can see that exactly where the fingerprint and the quality hit is found there are some visible similarities. Both areas have similar background noise which probably has an effect. Most likely this false hit is a result of low input audio level and a not perfectly adjusted silence detection. Even though the signals are quite different when looking at the full four second areas, the cancellation values in the endpoint scan simply gets to small and does not reach above the set cancellation threshold.
As seen in table 2, the false hit percentage has dropped when using a larger
m
lenparameter even though the number of false hits is the same. This is ex-
plained by how the false hit percentage is calculated. Since the total length of
4 COMMERCIAL SCAN RESULTS 4.3 TV4
Time
e
cs
th0
1 -1 1 -1 1
Amplitude Amplitude
s[n]
s[n+k]
e
c[k]
Figure 16: Cancellation error on false hit from TV6.
all potential commercials has increased but the total lengths of all false hits are the same, the false hit percentage decreases.
4.3 TV4
A commercial scan on Swedish TV4 was also made. The results of this scan were however quite poor due to a phenomenon called reruns which briefly are described as identical TV shows being broadcasted twice or more. In this par- ticular case it was an episode of the series Heroes which was broadcasted once early in the evening and once in late night. This should normally not cause any trouble since repetitions longer than 62 seconds are not considered potential commercials. For some unknown reason the rerun did not cause one long rep- etition but instead many short ones. The cancellation value that theoretically should be low through the whole episode did occasionally vary a lot as can be seen in figure 17. Perhaps the problem could be due to a varying broadcasting playback speed.
Time e
cs
th0
1 -1 1
Amplitude
-1 1
Amplitude
s[n]
s[n+k]
e
c[k]
Figure 17: Cancellation error around false hit location from TV4.
4 COMMERCIAL SCAN RESULTS 4.4 SVT1
4.4 SVT1
As an interesting test a commercial scan on the non commercial TV station SVT1 was also made. It turned out that what was considered as potential commercials by the algorithm were of two different types:
• Trailers for TV shows
• Short news segments
This reveals another weakness in the commercial scan in form of news segments.
Their length often fit inside the range of 3 to 62 seconds and they are often aired more than ones.
0 0.5 1 1.5 2 2.5
x 104 0
0.5 1
Potential commercials, 7h Search Track, cstep=30s.
seconds
Commercial
0 0.5 1 1.5 2 2.5
x 104 -0.5
0 0.5
Input SVT1
seconds
Amplitude
Figure 18: Results from commercial scan made on SVT1.
5 DISCUSSION
5 Discussion
5.1 Search Track Length
Making a fair prediction of an optimum search track length is hard without hav- ing a large amount of commercial scan data from several different TV stations.
How long an optimal search track could be would probably vary among the sta- tions, but it would give an arbitrary feel for a suiting length. Figure 19 shows the hit percentage plotted against search track length for the TV3 analysis. One
0 5 10 15 20 25 30 35
0 10 20 30 40 50 60 70 80 90 100
Search track length in hours
Hit percentage
Figure 19: Relation between hit percentage and search track length.
could perform a regression analysis on this data, but in order to get a reliable model more different search track lengths have to be tested. Nevertheless, the figure gives a hint of a connection. How the TV3 recordings were made does also have to be taken in consideration. Recording a 21 hour search track that only comes from one day would maybe produce a different looking graph than the one shown in figure 19.
In the end, the search track length will become a compromise between hit percentage and computational load. Having a search track length of one month would probably yield a very high hit percentage, but it is hardly a realizable model due to much too large computations.
5.2 History Track Length
To see what an optimal length of the history track would be, one would again have to look at commercial broadcasting statistics from several different TV stations. It could give an idea of how often new commercials appear and how long the history tracks should be to contain the currently most popular commercials.
Another idea would be to build up a large history track and then almost only use that instead of the search track, but then new commercials would be hard to detect. There has to be a balance between these two tracks. More testing has to be made with focus on the history track alone to be able to draw a conclusion of what maximum length that would be sufficient.
5.3 Merge Variable
As can be seen in table 2, the merge variable m
lenhas a great impact on the
hit percentage. Further testing has to be made to find out what this variable
5 DISCUSSION 5.4 False Hits
can be increased to without causing negative effects. One of these side effects reveals itself when a person giving short information about a program just after a commercial block is stated to be a potential commercial. This because there sometimes is a short commercial after the talking as shown in figure 20.
talk
Commercial
m
lenNot commercial
Figure 20: Illustration of problem situation with large m
lenSeveral more side effects will appear in connection with the problems that reruns and news bring. All these false hits can in worst case scenario be merged into large blocks of potential commercials.
5.4 False Hits
As mentioned in section 4.3 and 4.4, reruns and news are two of the most oc- curring problems with the algorithm. One way to get around the news segment problem would be to set a time limit D in hours to how close a quality hit can be to the original fingerprint location. Since the same news segments rarely are broadcasted two days in a row this could be an effective action. This would however lead to that no searching could be done in the latest D hours in the search track because the search tracks has to build up some length.
The rerun problem seems to be more rooted in the fundamental properties of the endpoint scan. Only depending on a perfect result from the endpoint process is risky. An ongoing disturbance in the signal can appear anytime and create what can be interpreted as many short repetitions instead of one long.
5.5 Potential Commercial Lengths
In general there are two cases where potential commercials are too short com- pared to the real commercial. Either some disturbing noise is present that pro- hibits the endpoint scan or the commercial has a special design. This design is based on that in the first part of the commercial there is a so called main prod- uct theme which is followed by a special offer. In these cases the commercial scan often finds the main product theme but fails to find the special offer since they are of a bigger variety. In general this is not a major problem since the merge variable takes care of small glitches between commercials. However, if the commercial is located in the end of a block, the potential commercial will be to short.
To get statistics about how often the potential commercial lengths are too
long or too short is not possible in the model made in this project. Since close
hits are merged, information about how long they were from the first place gets
lost. Making sure this information does not get lost during the scan is however
not a difficult task to fix for further analysis purposes.
5 DISCUSSION 5.6 Commercial Scan Step Size
5.6 Commercial Scan Step Size
Not enough testing was made on changing this parameter to get a feel for how much of a loss it leads to. As seen in table 1 it obviously has a decreasing effect on the hit percentage, but at some point a compromise has to be made to bring down the computational load. If enlarging this step size to much, commercial blocks could be entirely missed. In theory that is if the step size is longer than the shortest commercial block. In practice, the step size is not the only thing affecting the hit percentage.
With a small step size one could get several chances to find a potential com- mercial in one block. This reveals another thing about the fundamental function of the commercial scan. If you only get one attempt to find a potential com- mercial in a block and that fails to find anything, the scan will simply continue.
This despite the fact that if a repetition scan was made just a second next to that attempt, it would have found a quality hit. The extended search logistics mentioned in section 3.5 that can give valuable extra hits will not start unless a potential commercial is found in the first place.
5.7 Real Time Model
When building a real time version of the commercial scan, trade-offs are bound to be made. The commercial scan step size sets a limit to how much time each repetition scan can take, this including the time a possible extended search logis- tics run would take. If the search track is scanned for potential commercials at a slower rate than new information is gained to the search track, the commercial scan process will fall behind.
Another thing to consider is that many small embedded microprocessors that could be used in a real time application does not handle floating point calculations. If being forced to use fixed point number representation further precision is lost compared to the Matlab model which presumably would lead to worse hit percentage. Information about number representation can be found in [9]. More discussions about ways to improve the algorithm and adapt it to a real time model are found in section 7.
All in all, the three variables that make up the most important computational parameters for the commercial scan are:
• Length of history and search track
• Commercial scan step size
• Number of allowed possible hits 5.7.1 Worst Case Scenario Example
The commercial scan step size c
stepdetermines how the other parameters are to be controlled and determined. The worst case scenario with respect to com- putational complexity is when the scan ends up in a long commercial block.
This since the extended search logistics goes on until no potential hits are found anymore. Suppose every commercial is five seconds long in that area, then that would lead to a maximum of six repetition scans per c
stepjump. As seen in figure 21 the scan actually continues forward also, where the smaller grey arrows represents repetition scans made within the extended search logistics. The scans made forward can however be taken into account to the next c
steppoint scan.
In the real time 400 MHz DSP implementation of the repetition scan that
Popcatcher has made, one hour of the search track takes 0.95 seconds to scan
5 DISCUSSION 5.7 Real Time Model
Commercial
c
stepNot commercial
Quality hit No hit
Figure 21: Illustration of worst case scenario computational wise.
for quality hits. This does not include the further testing made with the quality hits, but the computational load of getting the quality hits is by far the most significant. With this in mind, if c
step= 30s every individual repetition scan then gets 30/6 = 5 seconds of processing time. This would yield the possibility of going through 5/0.95 ≈ 5.3 hours of history and search track in any desired combination. If using a faster processor would imply linearly faster calculations, it is fairly simple to predict how many more search track hours every individual repetition scan can analyze with a faster processor. This is illustrated in figure 22.
5 10 15 20 25 30
400 600 800 1000 1200 1400 1600 1800 2000
Search track hours
Processor speed in MHZ