Commercial Detection Based on Audio Repetition

(1)

2008:139 CIV

M A S T E R ' S T H E S I S

Commercial Detection Based on Audio Repetition

Joakim Wedin

Luleå University of Technology MSc Programmes in Engineering Arena, Media, Music and Technology

Department of Computer Science and Electrical Engineering

Division of Signal Processing

(2)

Abstract

Television today is a natural part of many people’s homes. There are technical products that automatically record your favorite shows and programs. The end users ability of choice is a highly valued sales argument in television consumption products. One thing that the user still can not control though is the presence of commercial breaks.

Popcatcher is a company that has developed a product that automatically records music from radio stations. What is so special about it, is its ability to distinguish songs from commercial or radio talkers. This is done with help of the fact that popular music is repeated in radio broadcasts. Popcatcher now wants to investigate if the same model can be used to detect commercial in television broadcasts.

The goal of this thesis is to build a model and show how well commercial detection could get based on the fact that commercials also are repeated. Dif- ferent modifications were made to the original algorithm so it eventually formed a commercial scan. One of these modifications was the extended search logistics that determines how the algorithm should react when a potential commercial is found. The model also has a function that somewhat can make up for a failed detection and fill in empty gaps in the detection to some extent.

Based on results from a commercial scan made on Swedish TV6, a commercial hit percentage of above 80 percent was achieved when using a seven hours long audio source material. Strong indications show that increasing the source length most likely would improve this percentage further.

The commercial scan model presented in this thesis works more like a com-

mercial block detector than an individual commercial detector. One of the draw-

backs of the model is the learning time of the system. It has to accumulate and

store several hours of input data before giving somewhat fair results. An advan-

tage is the fact that it only analyses audio input in contrary to the often more

demanding video analysis other methods use. This thesis shows that commer-

cial detection based on repetition in fact can be done. There is however many

improvements left to be made and the work done in this project should be seen

as a guidance for further development.

(3)

Preface

I first and foremost want to thank the Berg brothers at Popcatcher for giving me the opportunity of making this project. It has been an incredibly experiencing period and working alongside encouraging colleges has been a pleasure.

I also want to give credit to all friends and family from dear Örnsköldsvik that has supported me in the various troublesome times. Friends from MMT , you know what I am talking about. Special thanks to Tomas for lending me his apartment during the project time.

To all of you who are about to read this Thesis, I honor you. If the text for

some inexplicable reason gets boring, there are still a lot of beautiful illustrations

to watch.

(4)

CONTENTS CONTENTS

1 Introduction 3

1.1 Background . . . . 3

1.2 Problem Formulation . . . . 3

1.3 Scope . . . . 3

2 Theory 4 2.1 Discrete Time Fourier Transform . . . . 4

2.2 Windowing Functions . . . . 4

2.2.1 Rectangular Window . . . . 4

2.2.2 Hamming Window . . . . 5

2.3 Window Size Properties . . . . 5

2.4 Frequency Spectrum Analysis . . . . 6

2.5 Cancellation . . . . 7

2.6 Downsampling . . . . 7

2.6.1 Filtering . . . . 8

2.6.2 Decimation . . . . 8

2.7 The Repetition Search Algorithm . . . . 8

2.7.1 Creation of Search Track . . . . 9

2.7.2 Repetition Search Process . . . . 9

3 Algorithm Modifications 11 3.1 Silence Detection . . . . 11

3.2 Close Hits Compensation . . . . 12

3.3 Quality Hit Length Sorting . . . . 12

3.4 History Track . . . . 12

3.5 Extended Search Logistics . . . . 13

3.6 Hit Merging . . . . 13

3.7 Search Modes . . . . 13

3.7.1 Thorough Mode . . . . 13

3.7.2 One Hit Abort Mode . . . . 14

4 Commercial Scan Results 15 4.1 TV3 . . . . 15

4.2 TV6 . . . . 17

4.3 TV4 . . . . 18

4.4 SVT1 . . . . 19

5 Discussion 20 5.1 Search Track Length . . . . 20

5.2 History Track Length . . . . 20

5.3 Merge Variable . . . . 20

5.4 False Hits . . . . 21

5.5 Potential Commercial Lengths . . . . 21

5.6 Commercial Scan Step Size . . . . 22

5.7 Real Time Model . . . . 22

5.7.1 Worst Case Scenario Example . . . . 22

5.7.2 Possible Hit Limits . . . . 23

6 Conclusion 24

(5)

CONTENTS CONTENTS

7 Further Studies 25

7.1 To Investigate . . . . 25

7.2 Model Improvements . . . . 26

(6)

1 INTRODUCTION

1 Introduction

1.1 Background

PopCatcher is a company that has developed a radio that can distinguish music from radio talkers, news broadcasts and commercials. The music is then stored as separate songs on an internal memory. The Popcatcher algorithm is based on the fact that popular music is played repeatedly. By comparing current audio segments with past ones, repetitions can be found.

Television consumption can nowadays be done in many different ways. There are TV sets with built in hard drives, automatic movie recording and ability to time shift live TV. All these are features that extend the concept of watching TV and give the end user more ability to control his or her own viewing.

1.2 Problem Formulation

One of the things the user can not control is the presence of commercial breaks.

Commercials hold the same properties of repetition as popular music. A com- mercial spot is rarely aired once only [12]. Therefore the PopCatcher technology already is able to detect commercials in broadcasts, however, in their present al- gorithm commercials are considered as garbage due to its relatively short length compared to a song. Thus, the algorithm does not handle the detection in any useful manner since no further action is taken upon the finding of a short repe- tition.

This thesis investigates how the PopCatcher algorithm can be modified to fo- cus on locating commercials in television broadcasts. The goal is also to evaluate the results of the modifications and suggest further development.

1.3 Scope

There are other existing methods of performing commercial detection which often are based on image processing. Black frames between breaks [10], presence of channel logos [1] and high rates of color changes [3] are some of the parameters authors use in their detection schemes. However, due to time restraints no thorough research in these other methods of commercial detection has been made in this project.

The project starts with a study in the Popcatcher algorithm and the basic

theory behind it. Next is an introduction to what modifications were made in

the algorithm along with descriptions of problems along the way and solutions

to them. After that the final detection scheme is presented and explained. Fur-

thermore, results of how different parameters affect the outcome are presented

and evaluated. All testing and modifications of the algorithm has been done in

a Matlab programming environment. The Matlab source code for the original

algorithm was provided by Popcatcher.

(7)

2 THEORY

2 Theory

The theory section will review the basics of the different components used in the Popcatcher algorithm. It will also handle an explanation of how these com- ponents are put together to achieve an efficient search for repetition in media channels. More information about the basic theory can be read in [11] and [8].

One way to compare signals is to look at the frequency spectrum at different time instances.

2.1 Discrete Time Fourier Transform

In order to understand how to analyze the frequency content of a signal, one will have to acquaint with the concept of Fourier transforming. By performing this transform, a signal can be viewed in a frequency domain representation instead of the ordinary time domain representation. When considering a discrete time signal x[n], the Discrete Time Fourier Transform (DTFT) of x[n] would then be described as

F {x[n]} =

!

∞ n=−∞

x[n]e

^−iωn

(1)

The F here denotes the Fourier operator. The input x[n] is transformed into what can be written as X(ω). Hence, F {x[n]} = X(ω). The variable ω denotes frequency in radians and the function X(ω) is periodic with period 2π. The periodicity can be explained as X(ω + 2ωk) = X(ω) where k is an integer value.

If ω

s

is the frequency that x[n] was sampled at, due to the sampling theorem [11] would the highest represented frequency component of X(ω) be ω

s

/2. The output of X(ω) will be complex exponentials that can give amplitude and phase information about different frequencies.

2.2 Windowing Functions

When using the DTFT in a practical way to view spectrum changes in a signal over time, a limited amount of data has to be considered. A window function is a function that is zero valued outside a certain time interval. By multiplying an input signal with a window function, a small part of the input can be extracted.

These functions can be constructed in different ways and two commonly used functions are called rectangular and hamming.

2.2.1 Rectangular Window

The simplest window function is the rectangular window since it is of value one within a specified range . If the rectangular window is set to be called w

r

[n], the function can be described as

w

r

[n] =

"

1, 0 < n < L − 1

0, else (2)

where L is the window length. By multiplying w

r

[n] with the input x[n], an approximation of the DTFT can be formed as

X(ω) = ˆ

!

∞ n=−∞

w

r

[n]x[n]e

^−iωn

. (3)

(8)

2 THEORY 2.3 Window Size Properties

This means that that the DTFT generates uniformly sampled values from ˆ X(ω) according to equation 4.

DT F T

n

= ˆ X(ω)

ω=2πk/N

(4) 2.2.2 Hamming Window

Using a rectangular window is convenient but it has a disadvantage because it induces spectral leakage that can be read about in [11]. In short, the leakage can be explained as a smearing effect on the estimated frequency spectrum. This effect can be reduced by weighting samples within the window with a non static function. The hamming window w

h

[n] is one example of such a function. The mathematical expression of w

h

[n] is shown below in equation 5.

w

h

[n] =

"

0.54 − 0.46 cos(

_L^2πn₋₁

), 0 < n < L − 1

0, else (5)

As can be seen in figure 1, the samples in middle of the time window are consid- ered most important and samples at the edges are suppressed. This introduces a certain loss of information and as a compensation to that can something called overlapping be used. When analyzing the frequency spectrum of a signal over time, concurrent windows of data are extracted from the signal. These windows are then set up to overlap each other in time. This way is the loss of data compensated.

0 L-1

0 0.2 0.4 0.6 0.8 1

Sampels

Amplitude

Figure 1: Illustration of hamming window function.

2.3 Window Size Properties

When extracting a part of a signal as explained in section 2.2.1 and 2.2.2 the window size L turns out to have a large impact on the approximation of the DTFT. When analyzing a signal over time, a crucial decision about the window size L has to be made. A large L value will give a good frequency resolution but a poor time resolution. This means in practice that distinguishing single tonal components is done better with a large L, but then rapid changes in the frequency spectrum are hard to discover. Choosing a small L will lead to the opposite effect, that is, a poor frequency resolution but good time resolution.

The size of L is equal to how many frequency bins the approximated DTFT

will consist of. If L = 256 and the input sample rate is ω

s

, the spacing between

the bins would thus be (1/256)ω

s

.

(9)

2 THEORY 2.4 Frequency Spectrum Analysis

2.4 Frequency Spectrum Analysis

Consider the DTFT estimation of a signal x[n] at two different time instances, X ˆ

n

(ω) and ˆ X

n+k

(ω) where k denotes a time lag. The frequency spectrum of those time instances is then achieved by calculating the magnitude of the trans- forms, hence | ˆ X

n

(ω)| and | ˆ X

n+k

(ω)|. A thorough mathematical description of how the magnitude calculation is done can be found in [11] and different methods to compare the spectrums is found in [6].

An example of | ˆ X

n

(ω)| using different window functions is shown in figure 2 and 3, this when the input x[n] consists of a single sinusoid of frequency ω

s

/5.

The true spectrum of x[n] would then be equal to δ(ω

s

/5) + δ( −ω

^s

/5).

Comparing figure 2 with figure 3 reveals the spectral leakage when using a rectangular windowing function as a bit more side clutter surrounding the peak in figure 2.

Frequency

|X(ω)|

ω_s/4

0 ω_s/2

Figure 2: Example of magnitude plot of ˆ X(ω) when using a rectangular window function with L = 128.

Frequency

|X(ω)|

ω_s/4

0 ω_s/2

Figure 3: Example of magnitude plot of ˆ X(ω) when using a hamming window function with L = 128.

A common way to approximate the DTFT for signals of finite length is by doing

a Fast Fourier Transform (FFT). The FFT is a modified version of the DTFT

that is more computationally efficient. Through a series of numerical operations

(10)

2 THEORY 2.5 Cancellation

the number of proportional multiplications to evaluate equation 3 is lowered from L

²

to Llog

2

L. This is often desirable in the field of digital signal processing.

2.5 Cancellation

Cancellation is a fairly straightforward mathematical way to calculate similarities in signals. This method can be used in different ways and below is a description of how it was performed in this project.

Let us say we have a discrete time signal x that contains none zero infor- mation. Let n and k be integers that denote time instances and m to be an integer that determines a summation size. By subtracting the value in x at time instance n with the value in x at time instance n + k, a sample by sample error that compares x at different times is calculated. The absolute value of this error is then summed up with the m − 1 following values. This way a vector e

^c

[k] as seen in equation 6 can be constructed that gives us information regarding how different sections of x relates to itself at time lag k. If two sections of x are identical, this will give a cancellation error e

c

equal to zero. In practice, similar sections will yield a very low error.

e

c

[k] =

m

!

−1 n=0

(|x[n] − x[n + k]|) (6)

Time ec

0 1 -1 1 -1 1

AmplitudeAmplitude

x[n]

x[n+k]

e_c[k]

Figure 4: Example of two similar sections in a signal x at a fixed k value.

2.6 Downsampling

Computational complexity is a limiting factor in many real-time signal process- ing applications. There are however several different ways to reduce the amount of data in a signal. This reduction can be a useful tool to minimize computa- tional load when analyzing large sets of data. One quite straightforward way to recuce data is to only keep every Kth value in the signal. It is done in two steps, first a filtering and then a decimation of the input.

input Filtering Decimation output

Figure 5: Signal flow diagram of a downsampling process.

(11)

2 THEORY 2.7 The Repetition Search Algorithm

2.6.1 Filtering

Consider a signal x[n] with sampling frequency ω

sx

that has a non zero frequency spectrum up to ω

max

= π. In order to decimate that signal without the appear- ance of aliasing effects and still maintain the sampling theorem criterion, the bandwidth of x must first be reduced so ω

max

= π/K. More information about the aliasing effect and the sampling theorem can be read in [11]. The bandwidth reduction can be made by passing x[n] through a low pass filter characterized by the impulse response h[n]. Ideally this low pass filter would have the frequency response

H[ω] =

"

1, ω <

_K^π

0, else . (7)

00 0.2 0.4 0.6 0.8 1

Frequency

Frequency Response

π/K 2π/K

Figure 6: Frequency response of h[n].

The filtered output of x[n] would then be described as

v[n] =

!

∞ k=0

h[k]x[n − k]. (8)

In practice, filters of the character described in equation 7 are not realizable.

This due to the fact that the impulse response h[n] then would be a sinc function of infinite length. Most commonly used filters have a roll off after their cut off frequency as can be seen in figure 7.

2.6.2 Decimation

The more straightforward part of the downsampling process is the decimation.

Here every Kth sample of the filtered input is picked out to form the final output according to the equation

z[n] = v[nK]. (9)

When this process is done, the signal z[n] has got the sample rate ω

sz

= ω

sx

/K .

2.7 The Repetition Search Algorithm

Popcatcher has developed an algorithm that takes advantage of repetitions in

media streams to capture popular music. By using analysis methods such as

DTFT and cancellation, as seen in sections 2.1 and 2.5, the position and length

(12)

2 THEORY 2.7 The Repetition Search Algorithm

00 0.2 0.4 0.6 0.8 1

Frequency

Frequency Response

π/K 2π/K

Figure 7: Frequency response of a realistic low pass filter.

of a repetition can be found. In short, there are two processes running simulta- neously in the Popcatcher real time application:

• Creation of Search Track

• Repetition Search Process

These processes are further explained below.

2.7.1 Creation of Search Track

In order to make the repetition search process efficient, the audio data fed to the algorithm has to be processed and divided in a certain manner. Figure 8 shows a flow chart of the whole procedure. In this process there are two different audio tracks involved:

• Full Quality Track: An audio file containing a small part of the input media stream as an uncompressed pcm coded wave file.

• Search Track: A downsampled version of the full quality track that is continuously filled with up to F days of data. It is named s[n].

Audio

Input Stream Full Quality Track Downsampling Search Track

Figure 8: Signal flow chart of search track creation.

2.7.2 Repetition Search Process

The search process is divided into a number of steps that in this section are explained in the order they are performed.

Fingerprint Extraction In order to speed up the search process, the full quality audio track is not used when performing any of the needed calculations.

First off, a number of consecutive samples are extracted from a desired start

point in the search track until it forms a P long vector. This forms what hereby

is referred to as a fingerprint.

(13)

2 THEORY 2.7 The Repetition Search Algorithm

Coarse Comparison Test The fingerprint is then compared to the complete search track at different time lags. A number of comparison values are calculated using methods mentioned in [6] and are compared to determined threhold values.

The bigger size of fingerprint used, the better hit accuracy in finding rep- etitions. However, the size of the fingerprint becomes a compromise between computational load and a good hit accuracy. The coarse comparison test will hence generate a number of possible hit locations as seen in figure 9, where the number of false hits is connected to the fingerprint size. A small P value will in general yield more false hits than a large P value.

Search Track

Fingerprint Possible Hit

Possible Hit

Figure 9: Illustration of comparison test.

Fine Comparison Test The possible hits from the coarse comparison test are then run trough a more thorough comparison test where a larger sized fingerprint is used. In combination with a short cancellation test, the possible hits are evaluated. All hits that do not match up to the predetermined demands of this test are disregarded. A possible hit that passes the fine comparison testing is transferred to the next tryout. These hits we refer to as quality hits.

Endpoint Search Starting from where the quality hit was located in the search track, an iterative comparison process is started running first forwards then backwards to calculate an error vector e[k]. This process gets a new finger- print out of the search track k samples next to the original fingerprint location.

The new fingerprints are matched with data picked out k samples next to the quality hit location as illustrated in figure 10. This data is extracted from the search track in the same way as the fingerprint and therefore also forms a P long vector.

Search Track Quality Hit

Search Track

Fingerprint

Figure 10: Illustration of endpoint search.

This endpoint process continues until e has exceeded a threshold value in both

directions. When this has occurred, the endpoints of a repetition have been lo-

cated. If the repetition length matches some predetermined demands, the stated

start and stop locations are stored. In this project these start and stop locations

together form what is chosen to be called a potential commercial.

(14)

3 ALGORITHM MODIFICATIONS

3 Algorithm Modifications

The presented modifications made to the Popcatcher algorithm has been done on a Matlab v.6.5.0.180913a demonstration version of the algorithm. Unedited it was set up to perform a single repetition scan through a prerecorded audio file of commercial radio. Since this not is a real time model, the search tracks had to be precalculated to be able to run the demo. Initially the demo worked in this manner:

• Through a command prompt the user is asked to select a position in the full quality audio file.

• The repetition search process is run as explained in section 2.7.2.

To get the algorithm to search for commercials instead of music songs, the whole search was modified to be executed repeatedly. In practice this means that the user states a section in the search track that is desired to analyze for repetitions.

Between the start and the end of the section, a number of fingerprints were extracted and compared to the full search track. The distance between the extractions was set to a constant called commercial scan step size c

s

tep. The resolution of the scan was thus determined by c

s

tep as visualized in figure 11.

All individual fingerprint hit data was set up to be stored in a matrix. With this modification done, a graph showing repetition information in a media stream could be drawn. From now on we call this procedure as doing a commercial scan.

A library of useful search tracks was established trough recording audio from different Swedish commercial TV stations with an analog TV tuner.

0 1 2 3 4 5 6 7

Found repetitions

Time c_step

Figure 11: Example plot of commercial scan data.

3.1 Silence Detection

The first commercial scan performed on a search track created from Swedish

TV6 gave many false repetition hits. When investigating a number of these hits

closer, it could be seen that there was a clear connection between low audio

level in the fingerprints and found false repetitions. Therefore a function was

(15)

3 ALGORITHM MODIFICATIONS 3.2 Close Hits Compensation

created that weighs how many percent of the samples in the area around the fingerprint extraction location that was below a set threshold value s

th

. If a too high percentage of the samples are below the threshold, a new fingerprint is extracted one second in front of the old one. This function removed almost all false hits related to low audio level.

3.2 Close Hits Compensation

After the fine comparison test in each individual repetition search performed in the commercial scan, as explained in section 2.7.2, a number of so called quality hits are left. It turns out that these quality hits often appear in blocks. For example, one repeated commercial can result in several hits within the repeated section. This phenomenon could often be connected to short repeated audio loops running in the background within a commercial. Two different methods have been used to attack the problem with blocks of close hits:

1. The quality hit with lowest cancellation value is the one kept.

2. All hits are analyzed with the endpoint search method, see section 2.7.2, and the hit that resulted in the longest repetition is the one kept.

Method number 2 turned out to give the best results, so that one was used. The size of the allowed distance between the first and the last quality hit is set by a constant. This constant was determined through empirical testing to 32 seconds.

3.3 Quality Hit Length Sorting

The endpoint search will calculate a length of each quality hit. A way to sort out unwanted material is to set up limits regarding how long a hit is allowed to be.

These limits would preferably be related to minimum and maximum standard lengths of commercials. In the demo used in this project these lengths were set to 62 seconds as maximum and 3 seconds as minimum according to data found in [12]. Anything outside of that range was considered irrelevant. The true maximum and minimum lengths are 60 and 5 seconds, but using a safety marginal of a few seconds turned out to be necessary. This because the endpoint scan sometimes states potential commercials as a second too long or short.

Having limits like this would in an ideal case remove the issue re-aired media episodes creates. This because they by definition are repetitions with lengths longer than 62 seconds in most cases. See section 7 for further coverage in this matter.

3.4 History Track

During the progress of the commercial scan, an increasing number of repetitions

will be stated. By storing all these in a separate file a new track called history

track is established. It is stored in the same format as the search track. This

track would ideally be continuously filled with commercials. As a way to increase

the possibility of finding a quality hit faster, a new modified version of the repe-

tition search process was added. It does the same thing as the search mentioned

in section 2.7.2 with the only modification of replacing the search track with the

history track. If no quality hit is found in the history track, the search continues

in the regular search track. A further discussion about the different properties

this modification holds is found in section 7.

(16)

3 ALGORITHM MODIFICATIONS 3.5 Extended Search Logistics

3.5 Extended Search Logistics

When a potential commercial is found, one can assume that there is interesting data in its neighborhood since commercials often are broadcasted in blocks. One way to take advantage of this fact is to perform an extended search logistic when being in an interesting area of that type.

As can be seen in figure 12, if only c

step

was used many commercials would be missed if the search moved forward after finding one potential commercial.

Therefore was a new smaller step variable c

step2

formed that replaces c

step

when beeing in an intresting area. This variable makes sure that the area surrounding the first potential commercial is also scanned. By applying this extended search logistics was the number of found potential commercials substantially increased.

Commercial

Startpoint

Potential commercial

Not commercial c_step

Figure 12: Illustration of problem with using only c

step

variable.

3.6 Hit Merging

The repetition search case shown in figure 12 is a theoretical model. In prac- tice, the stated length of each potential commercial might not correspond to the actual length of the commercial as seen in figure 13. This can lead to spacing between each individual potential commercial. To counteract this spacing, po- tential commercials that are closer than m

len

seconds to each other are merged.

The default value of m

len

was set to 5 seconds.

Commercial

Potential commercial Not commercial

Figure 13: Example of search result in practice.

3.7 Search Modes

A way to reduce the amount of necessary calculations for each repetition search in a commercial scan, is to abort the search when a potential commercial is found. Thus there are two modes in which the commercial scan can be made;

thorough mode and one hit abort mode.

3.7.1 Thorough Mode

In thorough mode all quality hits are collected from the fine comparison part of

the repetition search and then the endpoint search function is run on all of them.

(17)

3 ALGORITHM MODIFICATIONS 3.7 Search Modes

This is done to see which quality hit yields the longest potential commercial. In this mode information can be extracted about how often a repetition occurs in the whole search track. Although this mode gives a lot of information it has the disadvantage, in terms of computational load, of first finding all quality hits and then evaluating them all before the search is done.

3.7.2 One Hit Abort Mode

This mode is the one more suitable for real time applications due to its higher computational efficiency. When one quality hit is found that through endpoint testing yields an approved potential commercial, the search is ended. This means that many fine comparison tests run on possible hits are skipped. Moreover is the endpoint search in optimum case run only once, if the first quality hit found yields a potential commercial.

The drawback of this mode is its harsh way to end the search process. The

first quality hit found might not be the one that generates a potential commercial

length that matches the full length of the real commercial length. More on this

is discussed in section 7.

(18)

4 COMMERCIAL SCAN RESULTS

4 Commercial Scan Results

In this section results are presented that show how well the commercial detection performs on real life signals. The outcome of the algorithm, when in one hit abort mode, can be displayed as a curve showing the value one if a potential commercial is present and zero if not. If the scan is set to thorough mode, the height of the curve represents how many times every fingerprint area has been repeated through the entire search track. This also shows that when doing a commercial scan, quality hits can be found in future data. In a real time model this is not possible so it would probably give a slightly different result.

A database containing information about actual commercial locations was set up through manual listening analysis of the search tracks. In addition to pure commercials there are a few more cases that are considered to be a part of a commercial block in this project:

• TV show advertisement

• Short channel specific jingles

When using a search track size up to 21 hours, recordings from three consecutive days were used where each day contributed with seven hours. These seven hour recordings were made starting from 19.00 hours and forward. In the scans made with extended search track length, the only allowed area to pick fingerprints from was from the last day. The commercial scan was thus performed only on the last day but with the possibility of finding quality hits anywhere in the extended search track.

If nothing else is mentioned, all scans are made in one hit abort mode. A potential commercial is considered correctly stated if the endpoints are not more than one second wrong in time in relation to the real commercial. Any larger length error generates a false hit. Some further statistic parameters also has to be explained to understand the commercial scan results:

• Hit Percentage: How many percent of the real commercials that the commercial scan has stated as potential commercials. False hits are not included in this calculation.

• False Hit percentage: Total length of the false potential commercials in relation to the total length of all potential commercials.

• Number of False Hits: How many potential commercials that not was real commercials.

• Missed Blocks: If there is no potential commercial found at all in a real commercial block, this number grows by one. A block in this project is considered to be two or more connected commercials.

• Total Blocks: Total number of real commercial blocks.

Below follows a selection of results from commercial scans made on Swedish TV channels.

4.1 TV3

The commercial scan using 21 hour search track has one false hit. This false

hit is an introduction jingle broadcasted in the beginning of a TV show. The

same jingle is then also played in the end of the show and the jingle is therefore

caught by the scan. The scan made in thorough mode was mostly made to get

an understanding of how often media is repeated in a broadcast.

(19)

4 COMMERCIAL SCAN RESULTS 4.1 TV3

Setup Hit False Hit Number of Missed Block /

Percentage Percentage False Hits Total Blocks

7h search track 39.96 0 0 2/22

c

step

=50s

7h search track 47.58 0 0 1/22

c

step

=30s

14h search track 63.48 0 0 1/22

c

step

=30s

21h search track 75.02 1.02 1 0/22

c

step

=30s

Table 1: Commercial scan statistics on TV3.

0 0.5 1 1.5 2 2.5

x 10

⁴

2

4 Potential commercials, 7 Hour Search track, c

_{s tep}

=50s, Thorough Mode

seconds

R ep etitions

0 0.5 1 1.5 2 2.5

x 10

⁴

0

1 Real commercial blocks

seconds

C ommer cial

0 0.5 1 1.5 2 2.5

x 10

⁴

0

1 Potential commercials, 21 Hour Search track, c

step

=30s

seconds

C ommer cial

0 0.5 1 1.5 2 2.5

x 10

⁴

0

1 Potential commercials, 14 Hour Search track, c

_{s tep}

=30s

seconds

C ommer cial

0 0.5 1 1.5 2 2.5

x 10

⁴

0

1 Potential commercials, 7 Hour Search track, c

_{s tep}

=50s

seconds

C ommer cial

0 0.5 1 1.5 2 2.5

x 10

⁴

0

1 Potential commercials, 7 Hour Search track, c

_{s tep}

=30s

seconds

C ommer cial

0 0.5 1 1.5 2 2.5

x 10

⁴

-0.2 0.2 0

Input TV3

seconds

A mplitude

Figure 14: Results from commercial scans made on TV3.

(20)

4 COMMERCIAL SCAN RESULTS 4.2 TV6

4.2 TV6

Setup Hit False Hit Number of Missed Block /

Percentage Percentage False Hits Total Blocks

7h search track 56.94 4.70 5 0/22

c

step

=30s

7h search track 83.25 3.78 5 0/22

c

step

=30s, m

len

=60s

Table 2: Commercial scan statistics on TV6.

0 0.5 1 1.5 2

x 10⁴ 0

0.5 1

Real blocks

seconds

Commercial

0 0.5 1 1.5 2

x 10⁴ 0

0.5 1

Potential commercials, 7 Hour Search Track, c_{s tep}=30s, m_len=60s

seconds

Commercial

0 0.5 1 1.5 2

x 10⁴ 0

0.5 1

Potential commercials, 7 Hour Search Track, c_{s tep}=30s

seconds

Commercial

0 0.5 1 1.5 2

x 10⁴ -0.2

0 0.2

Input TV6

seconds

Amplitude

Figure 15: Results from commercial scans made on TV6.

It can be seen that increasing the merge parameter m

len

to 60 seconds from the standard five seconds shows a clear improvement in the hit percentage.

Worth noticing is that the m

len

increase did not affect the false hit percentage badly. Four of the false hits are jingles for the TV show Simpsons. There are two episodes of Simpsons appearing in the recording and the jingle is broadcasted once in the beginning and once in the end of each show.

The fifth false hit is more non logic. It is a four seconds long clip with noise like background sound and occasional speech. When analyzing audio found in the quality hit location related to the this false hit as seen in figure 16, one can see that exactly where the fingerprint and the quality hit is found there are some visible similarities. Both areas have similar background noise which probably has an effect. Most likely this false hit is a result of low input audio level and a not perfectly adjusted silence detection. Even though the signals are quite different when looking at the full four second areas, the cancellation values in the endpoint scan simply gets to small and does not reach above the set cancellation threshold.

As seen in table 2, the false hit percentage has dropped when using a larger

m

len

parameter even though the number of false hits is the same. This is ex-

plained by how the false hit percentage is calculated. Since the total length of

(21)

4 COMMERCIAL SCAN RESULTS 4.3 TV4

Time

e

c

s

_th

0 1 -1 1 -1 1

Amplitude Amplitude

s[n]

s[n+k]

e

_c

[k]

Figure 16: Cancellation error on false hit from TV6.

all potential commercials has increased but the total lengths of all false hits are the same, the false hit percentage decreases.

4.3 TV4

A commercial scan on Swedish TV4 was also made. The results of this scan were however quite poor due to a phenomenon called reruns which briefly are described as identical TV shows being broadcasted twice or more. In this par- ticular case it was an episode of the series Heroes which was broadcasted once early in the evening and once in late night. This should normally not cause any trouble since repetitions longer than 62 seconds are not considered potential commercials. For some unknown reason the rerun did not cause one long rep- etition but instead many short ones. The cancellation value that theoretically should be low through the whole episode did occasionally vary a lot as can be seen in figure 17. Perhaps the problem could be due to a varying broadcasting playback speed.

Time e

c

s

_th

0 1 -1 1

Amplitude

-1 1

Amplitude

s[n]

s[n+k]

e

_c

[k]

Figure 17: Cancellation error around false hit location from TV4.

(22)

4 COMMERCIAL SCAN RESULTS 4.4 SVT1

4.4 SVT1

As an interesting test a commercial scan on the non commercial TV station SVT1 was also made. It turned out that what was considered as potential commercials by the algorithm were of two different types:

• Trailers for TV shows

• Short news segments

This reveals another weakness in the commercial scan in form of news segments.

Their length often fit inside the range of 3 to 62 seconds and they are often aired more than ones.

0 0.5 1 1.5 2 2.5

x 10⁴ 0

0.5 1

Potential commercials, 7h Search Track, c_step=30s.

seconds

Commercial

0 0.5 1 1.5 2 2.5

x 10⁴ -0.5

0 0.5

Input SVT1

seconds

Amplitude

Figure 18: Results from commercial scan made on SVT1.

(23)

5 DISCUSSION

5 Discussion

5.1 Search Track Length

Making a fair prediction of an optimum search track length is hard without hav- ing a large amount of commercial scan data from several different TV stations.

How long an optimal search track could be would probably vary among the sta- tions, but it would give an arbitrary feel for a suiting length. Figure 19 shows the hit percentage plotted against search track length for the TV3 analysis. One

0 5 10 15 20 25 30 35

0 10 20 30 40 50 60 70 80 90 100

Search track length in hours

Hit percentage

Figure 19: Relation between hit percentage and search track length.

could perform a regression analysis on this data, but in order to get a reliable model more different search track lengths have to be tested. Nevertheless, the figure gives a hint of a connection. How the TV3 recordings were made does also have to be taken in consideration. Recording a 21 hour search track that only comes from one day would maybe produce a different looking graph than the one shown in figure 19.

In the end, the search track length will become a compromise between hit percentage and computational load. Having a search track length of one month would probably yield a very high hit percentage, but it is hardly a realizable model due to much too large computations.

5.2 History Track Length

To see what an optimal length of the history track would be, one would again have to look at commercial broadcasting statistics from several different TV stations. It could give an idea of how often new commercials appear and how long the history tracks should be to contain the currently most popular commercials.

Another idea would be to build up a large history track and then almost only use that instead of the search track, but then new commercials would be hard to detect. There has to be a balance between these two tracks. More testing has to be made with focus on the history track alone to be able to draw a conclusion of what maximum length that would be sufficient.

5.3 Merge Variable

As can be seen in table 2, the merge variable m

len

has a great impact on the

hit percentage. Further testing has to be made to find out what this variable

(24)

5 DISCUSSION 5.4 False Hits

can be increased to without causing negative effects. One of these side effects reveals itself when a person giving short information about a program just after a commercial block is stated to be a potential commercial. This because there sometimes is a short commercial after the talking as shown in figure 20.

talk

Commercial

m

_len

Not commercial

Figure 20: Illustration of problem situation with large m

len

Several more side effects will appear in connection with the problems that reruns and news bring. All these false hits can in worst case scenario be merged into large blocks of potential commercials.

5.4 False Hits

As mentioned in section 4.3 and 4.4, reruns and news are two of the most oc- curring problems with the algorithm. One way to get around the news segment problem would be to set a time limit D in hours to how close a quality hit can be to the original fingerprint location. Since the same news segments rarely are broadcasted two days in a row this could be an effective action. This would however lead to that no searching could be done in the latest D hours in the search track because the search tracks has to build up some length.

The rerun problem seems to be more rooted in the fundamental properties of the endpoint scan. Only depending on a perfect result from the endpoint process is risky. An ongoing disturbance in the signal can appear anytime and create what can be interpreted as many short repetitions instead of one long.

5.5 Potential Commercial Lengths

In general there are two cases where potential commercials are too short com- pared to the real commercial. Either some disturbing noise is present that pro- hibits the endpoint scan or the commercial has a special design. This design is based on that in the first part of the commercial there is a so called main prod- uct theme which is followed by a special offer. In these cases the commercial scan often finds the main product theme but fails to find the special offer since they are of a bigger variety. In general this is not a major problem since the merge variable takes care of small glitches between commercials. However, if the commercial is located in the end of a block, the potential commercial will be to short.

To get statistics about how often the potential commercial lengths are too

long or too short is not possible in the model made in this project. Since close

hits are merged, information about how long they were from the first place gets

lost. Making sure this information does not get lost during the scan is however

not a difficult task to fix for further analysis purposes.

(25)

5 DISCUSSION 5.6 Commercial Scan Step Size

5.6 Commercial Scan Step Size

Not enough testing was made on changing this parameter to get a feel for how much of a loss it leads to. As seen in table 1 it obviously has a decreasing effect on the hit percentage, but at some point a compromise has to be made to bring down the computational load. If enlarging this step size to much, commercial blocks could be entirely missed. In theory that is if the step size is longer than the shortest commercial block. In practice, the step size is not the only thing affecting the hit percentage.

With a small step size one could get several chances to find a potential com- mercial in one block. This reveals another thing about the fundamental function of the commercial scan. If you only get one attempt to find a potential com- mercial in a block and that fails to find anything, the scan will simply continue.

This despite the fact that if a repetition scan was made just a second next to that attempt, it would have found a quality hit. The extended search logistics mentioned in section 3.5 that can give valuable extra hits will not start unless a potential commercial is found in the first place.

5.7 Real Time Model

When building a real time version of the commercial scan, trade-offs are bound to be made. The commercial scan step size sets a limit to how much time each repetition scan can take, this including the time a possible extended search logis- tics run would take. If the search track is scanned for potential commercials at a slower rate than new information is gained to the search track, the commercial scan process will fall behind.

Another thing to consider is that many small embedded microprocessors that could be used in a real time application does not handle floating point calculations. If being forced to use fixed point number representation further precision is lost compared to the Matlab model which presumably would lead to worse hit percentage. Information about number representation can be found in [9]. More discussions about ways to improve the algorithm and adapt it to a real time model are found in section 7.

All in all, the three variables that make up the most important computational parameters for the commercial scan are:

• Length of history and search track

• Commercial scan step size

• Number of allowed possible hits 5.7.1 Worst Case Scenario Example

The commercial scan step size c

step

determines how the other parameters are to be controlled and determined. The worst case scenario with respect to com- putational complexity is when the scan ends up in a long commercial block.

This since the extended search logistics goes on until no potential hits are found anymore. Suppose every commercial is five seconds long in that area, then that would lead to a maximum of six repetition scans per c

step

jump. As seen in figure 21 the scan actually continues forward also, where the smaller grey arrows represents repetition scans made within the extended search logistics. The scans made forward can however be taken into account to the next c

step

point scan.

In the real time 400 MHz DSP implementation of the repetition scan that

Popcatcher has made, one hour of the search track takes 0.95 seconds to scan

(26)

5 DISCUSSION 5.7 Real Time Model

Commercial

c

_step

Not commercial

Quality hit No hit

Figure 21: Illustration of worst case scenario computational wise.

for quality hits. This does not include the further testing made with the quality hits, but the computational load of getting the quality hits is by far the most significant. With this in mind, if c

step

= 30s every individual repetition scan then gets 30/6 = 5 seconds of processing time. This would yield the possibility of going through 5/0.95 ≈ 5.3 hours of history and search track in any desired combination. If using a faster processor would imply linearly faster calculations, it is fairly simple to predict how many more search track hours every individual repetition scan can analyze with a faster processor. This is illustrated in figure 22.

5 10 15 20 25 30

400 600 800 1000 1200 1400 1600 1800 2000

Search track hours

Processor speed in MHZ

Figure 22: Allowed maximum individual repetition scan length versus processing power.

5.7.2 Possible Hit Limits

In the Matlab model there is no upper limit to how many possible hits that are

allowed to be analyzed. The number of possible hits can vary from hundreds

to hundreds of thousands. Huge variations in computational time can therefore

occur from one repetition scan to another repetition scan. These large variations

should preferably be avoided and the easiest way to do that is to limit the

number of allowed possible hits. Doing the same in a future real time model of

the commercial scan would probably be necessary. How large negative impact

this would have on the hit percentage is unknown, but it could be tested by

making some small modifications to the Matlab model.

(27)

6 CONCLUSION

6 Conclusion

The goal of this thesis was to investigate how and if the patented Popcatcher repetition search could be modified to search for commercials instead of music.

As a summary one can say that several results in this project show that com- mercials are repeated in such an extent that commercial detection based on this feature is possible. When using an only seven hours long search track a hit per- centage above 80% was achieved when performing a commercial scan on TV6.

The results however vary from station to station and there are still issues to be resolved such as the problems that reruns and news bring. These are both issues that can result in many false hits. That set aside, the method rarely misses to detect the presence of a commercial block.

Another drawback is the learning time of the system. The detection will not work well at all until the search and history tracks have been filled up with at least several hours of data. How long search track this corresponds to is depending on demands of algorithm sufficiency and what TV station is being monitored. In a future application there could be a level meter that shows how good the current commercial detection is, based on the fact of how long the history and search track has gotten.

The detection also depends on that a commercial is in fact repeated. If it is not repeated within the search track length, it can not be found. However, as long as these commercials are not located in the start or end of a commercial block, the merging function can take care of that issue. The negative effects of having a large merge variable have not though been investigated enough to draw the conclusion that it will fix all such cases. Nevertheless, at just a brief glance of the results when increasing this variable from 5 to 60 seconds, the commercial scan on TV6 had a hit percentage raise of 26.31%.

One can say that the model works more like a commercial block locator than

an individual commercial detector. Another benefit compared to other detection

algorithms is the fact that it only analyzes audio streams, which in general is

easier to handle than video streams.

(28)

7 FURTHER STUDIES

7 Further Studies

Along with the work done in this project, some questions were answered but many new arose about how to improve the commercial scan results. Due to time constraints, far from all of these new questions were answered. All of them were however noted and divided into two categories:

• To investigate: Theoretical ideas that needs to be investigated.

• Model improvents: Hands on improvements to the Matlab model.

7.1 To Investigate

• See if a repetition search can be made to search wider. That is, search for two or more potential commercials in parallel. Then there is a bigger chance of finding a hit in every c

step

scan. This way you counteract the problem of the search not finding a hit, when it could have found one just next to the start point. The fingerprints could be picked out of the search track with a spacing of 10 to 15 seconds. One of the fingerprints could be used in an ordinary fashion while the others only are matched with the history track in order to keep the calculations at minimum.

• Record additional material to perform commercial scans on to build up a reference database that could give a more fair judgement of the model.

• Run more tests with different c

step

sizes to get a further understanding of what an optimum value could be.

• In the final Matlab model presented in this project the extended search variable c

step2

is set to jump three seconds next to a potential commercial.

This value was not based on any extensive research and other values should be tested to see if a better hit percentage can be achieved.

• All repetition scans made in the search track are now performed from the beginning to the end. Quite possibly a smarter way to scan would be if the search instead starts 23.5 to 24.5 hours back in the search track and then expands from there. This assumed that there is 24 hours or more in the search track. By doing this one can take advantage of the assumption that commercial blocks often appear at the same time every day.

• Potential commercials are now stored into the history track as they are found. If the potential commercial is stored too short, when finding the same commercial next time in the history track it will also be stated too short. One solution to that would be to store an extra two seconds taken from the beginning and the end of the potential commercial just to be safe.

One could also consider waiting to store the potential commercials to the history track until a block of commercials are passed. With help of the merge variable, only potential commercials in the beginning and the end of the block would then possibly be stated too short. Using this model would however cause the side effect of getting duplicates of a commercial in the history track. Perhaps this could be acceptable to counteract the problem mentioned in section 5.5.

• Maybe storing the potential commercials into the history track in the man-

ner first in, first out is not the best way. Additional information about start

and stop time of the potential commercial could also be stored along with

(29)

7 FURTHER STUDIES 7.2 Model Improvements

the history track, and if some commercials are repeated more often they should be placed early in the history track.

• When a c

step

scan has found a potential commercial the extended search logistics is started. It is aborted if no further potential commercials are found next to the original hit. Instead of aborting the search and jumping forward when being in such an interesting area, a different approach could be tried. As an attempt to gain better hit resolution without loosing too much processing power, the repetition search could continue independent of if a quality hit is found or not. A c

step2

of 5 seconds could be used and the process would run in an area of about 20 seconds before and after the potential commercial location. Only the history track would be scanned in these extra scans.

• Try using the commercial scan with a video stream as input instead of audio. Perhaps this could give better end results.

• Combining the commercial scan with other detection methods. For ex- ample, by using black frames detection in the video signal perhaps the endpoint precision could be enhanced.

• The audio signal in commercial blocks is often highly compressed compared to audio with non commercial content. This could be a useful audio feature when performing commercial block detection.

• Further investigate the risks of using a large merge variable value.

• See if better results can be gained by changing the allowed lengths of a potential commercial. Perhaps only lengths between 5 to 40 seconds would be better limits.

• Test the model in real time. Using fixed point calculations might severely worsen the results.

• Investigate if most commercial lengths are in an even multiple of 5 seconds as seen in [12]. If that statement is true for most TV channels, perhaps it can be utilized in some way.

• Find the reason to why reruns are not stated as a too long potential com- mercials.

• Find a way to work around the news problem.

• The algorithm works poorly when analyzing silent areas. See if there are ways to handle this problem.

• Make a comparison between one hit abort mode and thorough mode to see how often the thorough mode finds better hits.

7.2 Model Improvements

• When running in one hit abort mode, the scan first produces possible hits

by going through the entire search track. Another way to do this would

be to scan one hour at a time. This way the processing time of acquiring

quality hits could be shortened since the possible hits then are produced

in blocks.

(30)

7 FURTHER STUDIES 7.2 Model Improvements

• As the commercial scan is set up now, when finding a too long potential commercial the scan goes back to the starting point and then moves c

step

seconds forward. The scan should instead jump c

step

seconds forward

starting from the stop point of the stated too long potential commercial.

(31)

REFERENCES REFERENCES

References

[1] A. Amin et al., "Real-time Detection of Semi-transparent Watermarks in Decompressed Video," in Proceedings of the Eighth IEEE Workshop on Applications of Computer Vision, February 2007, p.49.

[2] E. R. Dougherty, Random Processes for Image and Signal Processin. SPIE and IEEE Press, 1999.

[3] X. S. Hua et al., "Robust Learning-based TV Commercial Detection," in Proceedings of the 2005 IEEE International Conference on Multimedia and Expo, July 2005, pp.149-152.

[4] D. Jang et al., "Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting," in Proceedings of the The 29th Audio Engi- neering Society Conference, September 2006, pp.38-43.

[5] N. Kalouptsidis, Processing Systems: Theory and Design. John Wiley &

Sons, 1997.

[6] S. Krishnan and A.Ramalingam, "Gaussian Mixture Modeling of Short- Time Fourier Transform Features for Audio Fingerprinting" in IEEE In- ternational Conference on Multimedia and Expo, July 2005, pp.1146-1149.

[7] B. P. Lahti, Signal Processing & Linear System. Berkeley-Cambridge Press, 1998.

[8] L. C. Ludeman, Processes: Filtering, Estimation, and Detection. John Wiley and Sons, 2003.

[9] D. G Manolakis and J. G. Proakis, Digital Signal Processing: Principles, Algorithms, and Applications. 3rd ed. Prentice-Hall, 1996.

[10] S. Marlow et al., "Automatic TV advertisement Detection from MPEG Bitstream," in Proceedings of the 1st International Workshop on Pattern Recognition in Information Systems, vol. 35, no. 12, December 2002 , pp.

Commercial Detection Based on Audio Repetition

2008:139 CIV

M A S T E R ' S T H E S I S