Methods of Text Information Extraction in Digital Videos

(1)

Master Thesis

Software Engineering

Thesis no: MSE-2012-93

May 2012

Methods of Text Information

Extraction in Digital Videos

Anna Tarczyńska

Faculty of Computer Science and Management Wroclaw University of Technology

PL-50 360 Wroclaw

School of Computing

(2)

The thesis is carried out within the Double Diploma program, in collaboration between Blekinge Institute of Technology in Sweden with Wroclaw University of Technology in Poland. Contact information: Author: Anna Tarczyńska E-mail: Anna.tarczynska@gmail.com Phone: +48 792 909 762 University Supervisors: Dr. Darja Šmite, Ph.D. E-mail: Darja.smite@bth.se School of Computing, BTH Dr. inż. Kazimierz Choroś E-mail: Choros@pwr.wroc.pl

Faculty of Computer Science and Management, PWr

School of Computing

Blekinge Institute of Technology SE-371 79 Karlskrona

Sweden

Faculty of Computer Science and Management Wroclaw University of Technology

(3)

ABSTRACT

Context The huge amount of existing digital video files needs to provide indexing to make it available for customers (easier searching). The indexing can be provided by text information extraction. In this thesis we have analyzed and compared methods of text information extraction in digital videos. Furthermore, we have evaluated them in the new context proposed by us, namely usefulness in sports news indexing and information retrieval.

Objectives The objectives of this thesis are as follows: providing a better understanding of the nature of text extraction; performing a systematic literature review on various methods of text information extraction in digital videos of TV sports news; designing and executing an experiment in the testing environment; evaluating available and promising methods of text information extraction from digital video files in the proposed context associated with video sports news indexing and retrieval; providing an adequate solution in the proposed context described above.

Methods This thesis consists of three research methods: Systematic Literature Review, Video Content Analysis with the checklist, and Experiment. The Systematic Literature Review has been used to study the nature of text information extraction, to establish the methods and challenges, and to specify the effective way of conducting the experiment. The video content analysis has been used to establish the context for the experiment. Finally, the experiment has been conducted to answer the main research question: How useful are the methods of text information extraction for indexation of video sports news and information retrieval?

Results Through the Systematic Literature Review we identified 29 challenges of the text information extraction methods, and 10 chains between them. We extracted 21 tools and 105 different methods, and analyzed the relations between them. Through Video Content Analysis we specified three groups of probability of text extraction from video, and 14 categories for providing video sports news indexation with the taxonomy hierarchy. We have conducted the Experiment on three videos files, with 127 frames, 8970 characters, and

1814 words, using the only available MoCA tool. As a result, we reported 10 errors and proposed recommendations for each of them. We evaluated the tool according to the categories mentioned above and offered four advantages, and nine disadvantages of the Tool mentioned above.

Conclusions It is hard to compare the methods described in the literature, because the tools are not available for testing, and they are not compared with each other. Furthermore, the values of recall and precision measures highly depend on the quality of the text contained in the video. Therefore, performing the experiments on the same indexed database is necessary. However, the text information extraction is time consuming (because of huge amount of frames in video), and even high character recognition rate gives low word recognition rate. Therefore, the usefulness of text information extraction for video indexation is still low. Because most of the text information contained in the videos news is inserted in post-processing, the text extraction could be provided in the root: during the processing of the original video, by the broadcasting company (e.g. by automatically saving inserted text in separate file). Then the text information extraction will not be necessary for managing the new video files.

Keywords: information retrieval, text extraction

(4)

STRESZCZENIE

Kontekst: Ogromna liczba cyfrowych plików wideo, która jest przechowywana w bibliotekach cyfrowych, wymaga zaindeksowania, aby była dostępna dla użytkowników. Wspomniana wyżej indeksacja może być przeprowadzona przy użyciu metod ekstrakcji danych. W pracy zostały przeanalizowane i porównane metody ekstrakcji danych z plików cyfrowego wideo, w zaproponowanym kontekście użyteczności do indeksowania wyszukiwania wiadomości sportowych.

Cele: Celem pracy jest: zapewnienie lepszego zrozumienia procesu ekstrakcji danych z plików cyfrowego wideo; przeprowadzenie systematycznego przeglądu literatury na metodach ekstrakcji informacji z plików cyfrowego wideo; zaplanowanie i przeprowadzanie eksperymentu w środowisku testowym; ocena dostępnych metod ekstrakcji danych tekstowych z plików cyfrowego wideo w kontekście indeksowania i wyszukiwania informacji; dostarczenie rozwiązania w kontekście opisanym powyżej.

Metody: W pracy zostały zastosowane trzy metody badawcze: systematyczny przegląd literatury, analiza zawartości plików wideo oraz eksperyment. Przegląd literaturowy został przeprowadzony w celu: zgłębienia procesu ekstrakcji informacji z plików cyfrowych wideo; zidentyfikowania metod i wyzwań, jakie niosą ze sobą; jak również do określenia metody przeprowadzenia eksperymentu. Następnie na podstawie listy kontrolnej została wykonana analiza zawartości plików wideo, w celu określenia kontekstu eksperymentu (przydatność dla indeksowania wiadomości sportowych). Ostatnią wykorzystaną metodą badawczą jest eksperyment, w którym została zweryfikowana przydatność ekstrakcji danych tekstowych dla indeksowania plików wideo na przykładzie telewizyjnych wiadomości sportowych. Wyniki: Podczas systematycznego przeglądu literatury zostało zidentyfikowanych 29 problemów i wyzwań, które niesie ekstrakcja danych tekstowych z plików wideo oraz 10 zależności między nimi. Znaleziono 21 narzędzi i 99 powiązanych metod. Podczas analizy zawartości plików wideo określono trzy grupy prawdopodobieństwa ekstrakcji danych tekstowych z plików wideo oraz 14 kategorii zawartości dla przeprowadzenia indeksowania wiadomości sportowych wraz z określoną hierarchią. Przeprowadzono eksperyment na trzech plikach wideo, które zawierały 127 ramek i 1918 słów, przy pomocy dostępnego oprogramowania MoCA. Na podstawie otrzymanych wyników, wykonano ocenę subiektywną i obiektywną narzędzia w wyżej wymienionych grupach, co dało 10 błędów wraz z rekomendacjami, oraz cztery zalety i dziewięć wad użytego oprogramowania.

Wnioski: Wyniki działania metod ekstrakcji informacji tekstowych są zależne od jakości plików użytych do testowania, dlatego też wyniki opisane w literaturze nie są porównywalne, ponieważ autorzy wykonali badania na różnych danych źródłowych. Ponadto oprogramowanie nie jest dostępne do testowania, więc analiza porównawcza nie została wykonana. Dlatego też konieczne jest przeprowadzenie eksperymentu na tych samych plikach źródłowych, poprzez użycie zaindeksowanej bazy danych. Jednakże proces ekstrakcji danych tekstowych z dużej liczby plików wideo jest czasochłonny z powodu sekwencyjnej budowy wyżej wspomnianych plików. Aby więc zwiększyć moc obliczeniową, konieczne jest użycie sieci komputerów. Jednakże pomimo satysfakcjonującego wyniku trafności rozpoznania znaków, procentowa wartość rozpoznanych różnych słów jest wciąż niewielka (50%). Dlatego też metody ekstrakcji danych tekstowych wymagają osiągnięcia wyższej wartości rozpoznawania znaków oraz porównywania rozpoznanych słów ze słownikiem.

Słowa kluczowe: wyszukiwanie informacji,

(5)

ACKNOWLEDGMENTS

The thesis would not exist without the invaluable help of the following people: My supervisors: Dr. Darja Šmite from Blekinge Institute of Technology, and

Dr. Kazimierz Choroś from Wroclaw University of Technology, because of

constructive criticism, and big amount of knowledge, which helped me improve the work.

Andres Estrella and Magdalena Tomalska, because they were very patient

about correcting my grammar mistakes.

My sister Monika Tarczyńska, and my best friend Magdalena Kasprzycka, because they were always with me, whatever happened.

(6)

DECLARATION OF ORIGINALITY

I assure the single handed composition of this master thesis only supported by declared resources.

Tarczyńska, Poland, 2012-05-22

(7)

2.2.1 Selected articles ... 20 2.3 Reporting Result ... 24 2.3.1 Article selection... 24 2.3.2 Database ... 24 2.3.3 Publication year ... 25 2.3.4 Research methodology ... 25 2.3.5 Quality assessment ... 26 2.3.6 Challenges identification (RQ1, RQ2) ... 27 2.3.7 Directions of development (RQ2) ... 32 2.3.8 Method comparison (RQ2) ... 32

2.3.9 Experimental data used for evaluation (RQ3) ... 33

(8)

3 TEXT INFORMATION EXTRACTION IN DIGITAL VIDEOS (RQ1-RQ3). 34

3.1 Related work (RQ1) ... 34

3.1.1 Text detection and localization ... 34

3.1.2 Text tracking... 35

3.1.3 Text binarization... 35

3.1.4 Text recognition... 36

3.2 Methods of text information extraction (RQ2) ... 36

3.2.1 List of methods ... 36

3.2.2 Directions of development... 38

3.2.3 Relations between each others... 42

3.3 Evaluation of the methods (RQ3)... 45

3.3.1 Experimental data used for evaluation... 45

3.3.2 Metrics used for evaluation ... 47

3.3.3 Results ... 50

3.4 Conclusion... 52

4 DIGITAL VIDEO CONTENT ANALYSIS (RQ1)... 54

4.1 Text information in digital video sport news ... 54

4.2 Video Sports News Content Analysis ... 54

4.2.1 Proposal of video sports news content categories... 58

4.2.2 Proposal of text information uses ... 59

5 EXPERIMENT (RQ4) ... 61

5.1 Planning the experiment ... 61

5.1.1 Hypotheses ... 61

5.1.2 Independent variables ... 61

5.1.3 Dependent variables ... 63

5.1.4 Tool selection ... 63

5.2 Conducting the experiment... 65

5.2.1 Tool installation... 65

5.2.2 Population of experimental data ... 65

5.2.3 Evaluation methodology... 65

5.3 Reporting results ... 66

5.3.1 Subjective assessment... 66

5.3.1.1 The errors report... 68

5.3.2 Objective Assessment... 70

5.3.2.1 Computational Time... 70

5.3.2.2 Text quality (recall, precision, and harmonic mean) ... 70

5.3.2.3 Sports news indexing and information retrieval (Different Word Recognition Rate)... 72

5.4 Conclusion... 76

(9)

6 DISCUSSION... 79

6.1 Summary... 79

6.2 Discussion of the principal findings ... 80

7 VALIDITY THREATS ... 83

7.1 Systematic Literature Review ... 83

7.1.1 Single researcher’s bias ... 83

7.1.2 Publication bias ... 83

7.1.3 Literature coverage ... 83

7.1.4 Missing information ... 84

7.1.5 Misinterpretation ... 84

7.1.6 Generalization... 84

7.2 Video Content Analysis ... 84

7.2.1 Checklist design... 84

7.2.2 Text properties evaluation ... 84

7.3 Experiment ... 85

7.3.1 Experimental design ... 85

7.3.2 Tool selection ... 85

7.3.3 Experimental data limitation (absent randomization)... 86

7.3.4 Results evaluation (instrument decay) ... 86

8 CONCLUSIONS... 87

8.1 Future work... 90

(10)

List of

Tables

TABLE 1–RESEARCH QUESTIONS RELATED TO THE RESEARCH METHODOLOGY... 6

TABLE 2–CHECKLIST FOR VIDEO CONTENT ANALYSIS... 8

TABLE 3–KEY TERMS EXPLANATION... 13

TABLE 4–STRATEGY OF SYSTEMATIC LITERATURE REVIEW... 14

TABLE 5–KEYWORD FOR SEARCH STRING... 15

TABLE 6–SOURCES FOR SYSTEMATIC LITERATURE REVIEW... 15

TABLE 7–INCLUSION AND EXCLUSION CRITERIA... 16

TABLE8–STUDY PROCEDURE... 16

TABLE 9–QUALITY CRITERIA... 17

TABLE 10–DATA EXTRACTION CRITERIA... 18

TABLE 11–ARTICLES SELECTED FROM DATABASES AND BY MANUAL SEARCH... 20

TABLE12–SELECTED ARTICLES FOR PRIMARY STUDIES FROM KITCHENHAM’S GUIDELINES AND MANUAL SEARC21 TABLE 13–SOURCE ARTICLES SELECTED FOR CONDUCTING SNOWBALL ON REFERENCES... 22

TABLE 14–NEW ARTICLES SELECTED FROM REFERENCES... 23

TABLE 15–SELECTED ARTICLES FOR PRIMARY STUDIES FROM SNOWBALL PROCEDURE... 23

TABLE 16–QUALITY ASSERTION... 26

TABLE 17–QUALITY ASSERTION RESULTS... 26

TABLE 18–CHALLENGES IDENTIFICATION... 27

TABLE 19–CHAINS BETWEEN CHALLENGES... 31

TABLE 20–METHODS PROPOSED BY DIFFERENT AUTHORS... 36

TABLE 21–LIST OF METHODS ACCORDING TO THE FEATURES... 37

TABLE 22–LIST OF METHODS ACCORDING TO THE APPROACHES... 37

TABLE 23–LIST OF METHODS ACCORDING TO THE TEXT EXTRACTION STAGES... 38

TABLE 24–METHODS CONNECTED WITH ABOVE ONES... 38

TABLE 25–DIRECTIONS OF DEVELOPMENT... 41

TABLE 26–COMPARED WITH METHODS... 42

TABLE 27–EXPERIMENTAL DATA DEFINITION IN PRIMARY STUDIES... 45

TABLE 28–SIZE OF DATASET USED FOR EVALUATION... 47

TABLE 29–METRICS USED FOR EVALUATION IN PRIMARY STUDIES... 48

TABLE 30–METRICS DESCRIPTION... 49

TABLE 31–METHODS FROM PRIMARY PAPERS EVALUATION... 50

TABLE 32–OTHER METRICS USED FOR COMPARISON... 52

TABLE 33–METHODS EVALUATION... 53

TABLE 34–VIDEO CONTENT ANALYSIS -PROPERTIES DESCRIPTION... 54

TABLE 35–VIDEO CONTENT ANALYSIS RESULTS... 55

TABLE 36–PERCENTAGE RATIO FROM VIDEO CONTENT ANALYSIS... 56

TABLE 37–CATEGORY PROPERTIES FROM VIDEO CONTENT ANALYSIS... 57

TABLE 38–GROUPS OF TEXT QUALITY... 62

TABLE 39–CONTENT CATEGORIES FOR PROVIDING VIDEO SPORTS NEWS INDEXING... 62

TABLE 40–RESOURCES USED IN THE EXPERIMENT WITH MOCA... 64

TABLE 41–SUPPORT FOR THE EXPERIMENT... 65

TABLE 42–THE ERRORS REPORT - PROPERTIES DESCRIPTION... 68

TABLE 43–ERRORS REPORTED DURING CONDUCTING EXPERIMENT... 69

TABLE 44–COMPUTATIONAL TIME OF THE MOCA SEGMENTATION AND RECOGNITION... 70

TABLE 45–RECALL OF THE MOCATOOL (SEGMENTATION WITH RECOGNITION)... 71

TABLE 46–PRECISION OF THE MOCATOOL (SEGMENTATION WITH RECOGNITION) ... 71

TABLE 47-HARMONIC MEAN OF RECALL AND PRECISION... 72

TABLE 48–EVALUATION IN CONTEXT OF SPORT NEWS INDEXATION... 73

TABLE 49–DIFFERENT WORD RECOGNITION RATE ACCORDING TO THE CATEGORIES OF TEXT INDEXATION... 75

TABLE 50–NUMBER OF DIFFERENT WORDS EXTRACTED PER ONE VIDEO... 76

(11)

List of

Figures

FIGURE 1–RESEARCH METHODOLOGY. ... 7

FIGURE 2–QUALITY DATA ANALYSIS MODEL [45]... 10

FIGURE 3–THESIS STRUCTURE... 12

FIGURE 4–STRATEGY OF SYSTEMATIC LITERATURE REVIEW... 19

FIGURE 5–SYSTEMATIC LITERATURE REVIEW RESULTS... 20

FIGURE 6–SYSTEMATIC LITERATURE REVIEW RESULTS OF SNOWBALL... 22

FIGURE 7–NUMBER OF PRIMARY STUDIES ACCORDING TO THE SEARCH TECHNIQUE... 24

FIGURE 8–NUMBER OF PRIMARY STUDIES ACCORDING TO THE DATABASE NAME... 24

FIGURE 9–NUMBER OF PRIMARY STUDIES ACCORDING TO THE PUBLICATION YEAR... 25

FIGURE 10–NUMBER OF THE PRIMARY STUDIES ACCORDING TO THE RESEARCH METHODOLOGY... 25

FIGURE 11–NUMBER OF THE PRIMARY STUDIES ACCORDING TO DISCUSSING THE DIRECTIONS OF DEVELOPMENT.. 32

FIGURE 12–NUMBER OF THE PRIMARY STUDIES WHICH COMPARED THE PRESENTED METHOD WITH ANOTHER ONE32 FIGURE 13–DATA DEFINITION IN METHOD EVALUATION... 33

FIGURE 14–NUMBER OF METRICS USED FOR THE METHOD EVALUATION... 33

FIGURE 15–TEXT INFORMATION EXTRACTION STAGES... 38

FIGURE 16–EXPERIMENTAL COMPARISONS... 43

FIGURE 17–RELATIONS BETWEEN EACH OTHERS... 44

FIGURE 18–AMOUNT OF DATA USED FOR EVALUATION (PART 1) ... 46

FIGURE 19–AMOUNT OF DATA USED FOR EVALUATION (PART 2) ... 46

FIGURE 20–PRECISION RATE OF EXTRACTED METHODS... 51

FIGURE 21–RECALL RATE OF EXTRACTED METHODS... 51

FIGURE 22–COMPUTATIONAL TIME OF EXTRACTED METHODS... 52

FIGURE 23–TAXONOMIC CLASSIFICATION OF INFORMATION CONTAINED IN SPORTS NEWS... 58

FIGURE 24–HIERARCHY OF SPORTS NEWS CATEGORIES. ... 60

FIGURE 25–MOCA SEGMENTATION FROM HIGH QUALITY TEXT IN FRAME... 66

FIGURE 26–MOCA SEGMENTATION FROM AVERAGE QUALITY TEXT IN FRAME... 67

FIGURE 27–MOCA SEGMENTATION FROM LOW QUALITY TEXT IN FRAME... 67

FIGURE 28–MOCA RECOGNITION RESULTS OBTAINED FROM BINARIZED FILE... 67

FIGURE 29-RESULTS WITH RESPECT TO THE RESEARCH METHODOLOGY... 80

(12)

1 Introduction

Through the history of civilization, preserving knowledge in libraries was always an important aspect. The first text libraries appeared very early around 280 BC (e.g. Library of Alexandria). The amount of knowledge stored in that kind of libraries has systematically been increasing during the centuries. Nowadays, huge amount of digital documents are stored in the World Wide Web, digital libraries or digital intranets of private companies [1].

1.1 Background

The first digital video format was created by Sony and appeared commercially in 1986 [56] replacing the analog format. This format makes video data easier to store for users. Recently, digital video format became one of the most popular type of media in the Internet, broadcasting TV, and digital libraries. However, video files without appropriate content indexing are not easily searchable. Providing content indexing is a major point of efficient management of visual data. Nowadays, the customers of video indexing and retrieval technologies are broadcasting companies. The necessity of building large digital video news archives and making them available on-line for society, drew attention of the researchers to improve the digital video news indexing techniques [2–4].

Most of the research papers describe image data indexing techniques such as manual indexing, face recognition, color representation, texture, sketches, shape of object, and spatial relationships [5]. Digital video format has a very specific structure. It is a sequence of images with hierarchical structure: from single frames, shots, scenes, and episodes to acts [6]. Most of the image indexing techniques can be adapted for video files. However, because of the huge amount of frames in video files, reducing processing time requires the use of several further techniques like scene detection, or key frame extraction [5], [7], [8]. It is a challenge also because of the low resolution of video files (in comparison with document images), the loss of contrast which is a result of compression (e.g. MPEG) and the complex background [7], [8].

(13)

localization, and tracking) and segmentation (binarization) before using OCR systems [8], [12–14].

1.2 Problem description

Text caption appearing in video files is divided into artificial text (called also superimposed text or caption text) and scene text (graphic text). Scene text comes from the camera, it appears naturally on the shot such as trademarks on advertisements or names on T-shirts. In contrast, artificial text is added during the post-processing stage (video editing), it should be easy to read and usually is more related to the video content [15].

Video news is a way of showing summarized information about daily events, where the most important piece of information is usually subscribed (post-processed, computer-made) as text blocks in special areas with contrasting background, mostly at the top or bottom of the screen. They contain more meaningful information than other type of video files such as: place, public people names, and other information correlated to the most important current event. This text information can be used for video indexing and information retrieval, by using text information extraction algorithms [6], [10], [16].

A special type of news is the sports news, which usually appears at the end of the daily news, before the weather forecast, or separately on sports channels in a form of a daily summary of sports events. The sports news contains huge amount of concentrated text information which can be divided in specific categories such as group, player’s name, score, etc. This information can be used for digital video sports news content indexation. Some text information appears also in the background (not as a computer-made description) such as numbers on T-shirts, which can be used for players’ names identification. In contrast to daily news, sports news is full of advertisements, which appear on T-shirts, in the background, or at the player fields. In the thesis we will discuss digital video text information extraction in case of sports news content indexation, which is detailed in Section 3.

Theoretically, text information extraction process in videos is divided into the following sub-stages, which are used interchangeably in literature (detection with localization, binarization with segmentation). In the thesis we will use the division proposed in the literature survey from 2004 [17] and 2008 [11]:

1. Text detection: looking for video frame regions including text.

2. Text localization: text regions connecting and setting bounding boxes.

3. Text tracking: specifying localization and time when the text movable caption occurred, verifying the localization results [18].

4. Text binarization: setting bounded text regions and background on other binary levels.

5. Text recognition: using OCR (Optical Character Recognition) technique with binarized bounded text regions.

(14)

text in a given scene and given shots is also important for the reduction of the number of frames analyzed.

Some algorithms at the detection and localization stage to make the outcome more relevant have been proposed. They can be divided into the following categories [11], [19]:

1. Region Based Approach (e.g. Hessian matrix, YCbCr, Gaussian Mixture Model),

2. Texture-based approach (e.g. Fuzzy Clustering Ensemble, Fuzzy C Mean, Support Vector Machine. shift algorithm),

3. Other approaches (e.g. Graphical model-based, Hidden Markov Model. Discriminative Random Field).

In addition, specific algorithms for speedup process at the tracking stage have been proposed: motion vector, track rigid, text macroblock [11].

In two literature surveys from 2004 [17] and 2008 [11] text information extraction process from digital videos has been described according to the extraction stages (detection, localization, tracking, binarization and recognition) together with methods and algorithms. The pros and cons of text extraction methods have been theoretically discussed in form of plain text where the methods are described continuously one by one. The text is divided only by headlines into sections, each describing a single text extraction stage or approach. The authors do not provide any comparison between the methods using, e.g. visual aids, therefore it is difficult to grasp the relations between them. Researches in [20–22] proposed specific methods and algorithms for text information extraction from digital videos (single tools which contain a few text information extraction algorithms) and evaluated them according to the experimental results, using recall and precision metrics like in [7], [18], [20], [23–35]. However, only a few research papers discussed results in case of false positive alarms rate like in [36–39]. Furthermore, the data source used for the experiment is often described with not enough details for using the results as comparison metrics between different techniques (some of the authors have defined the number of text blocks, some are using amount of characters, others - the amount of words, they also often have not specified the format and resolution of the videos used in the experiment) [26], [40]. The authors also do not specify evaluation categories in case of video content indexation (like Names, Places, or Advertisements). There are no articles which would analyze “directions of evolution algorithms” and “relations to each other from empirical comparison”.

There are also several commercial and freeware tools helpful in the digital video text information extraction process. However, most of them are either commercialized (so they are not available for testing), or are not suitable for providing text extraction from broadcasting news:

(15)

• ConTEXTract™ [60] commercial tool for detecting, tracking, and recognizing text, not available for testing.

• MoCA (Movie Content Analysis Project) [57] on GNU General Public License which have implemented Video OCR module at the text detection and tracking stages [41].

Our contribution has been as follows:

1. We have performed video content analysis (supported with a self-designed checklist) to specify content categories of text information contained in digital videos with TV sports news.

2. We have performed systematic literature review to establish the state of art in the topic area since 1996 (when the topic appeared). We have divided, described, and specified “direction of evolution” of methods and algorithms. We have identified relationships among these methods and algorithms on the basis of empirical comparison, similar to what is done in [42]. We have identified the challenges in text information extraction process together with recommendations as well as specified the chains between them. We have also presented the recent progress in that area since 2008 (the last paper with the evaluation of the algorithms was published in 2008).

3. On the basis of the systematic literature review results, we have also compared the existing algorithms using the criteria: suitability for digital video news indexing.

4. We have performed the experiment on selected methods based on available tools using short digital videos with TV sports news. The context of experiment has been specified by us. We have investigated the impact of text quality on the text information extraction performance. We also examined the usefulness of text information extraction methods for providing video indexing using the example of digital sports news.

The difference mentioned above between literature surveys from 2004 [17] and 2008 [11] and our contribution is described in detail in Section 3.1 1 entitled “Related work”.

1.3 Thesis overview

1.3.1 Aim and Objectives

The purpose of this thesis is to analyze and compare methods for text information extraction in digital videos and evaluate the influence of text quality on their usefulness for sports news indexing and information retrieval.

Therefore, the specific objectives are:

1. To provide a better understanding of the nature of text information extraction. 2. To perform a systematic literature review on various methods of text

information extraction in digital videos with TV sports news. 3. To design and execute an experiment on testing environment.

(16)

1.3.2 Research Questions

The following research questions will be given priority to:

RQ1. What is the nature of text information extraction from digital videos with TV sports news video indexing?

• What are the types of text information contained in digital video news?

• Where do they appear, are they legible?

• How can the text information contained in digital videos be used?

• What are the stages of information extraction process?

RQ2. What is the state of art for text information extraction methods used in digital video Sport and Broadcasting News Indexing approach?

• Which methods and algorithms are being used for text extraction?

• How can the text extraction methods and algorithms be divided in relation to the stages of the extraction process?

• Which algorithms have been developed and compared since 1996?

• What is the recent progress in the area (since 2008)?

• Which methods and tools are the most promising for digital video indexing in sports news?

• Which methods and tools are available for testing?

RQ3. How are the methods evaluated in the experiments described in research papers?

RQ4. What are the advantages and disadvantages of selected text information extraction tools used in video indexing in a given context?

1.3.3 Achieved outcomes

RQ1 - AO1 Description of the types, properties, and potential use of text information in digital videos for video indexing.

RQ1 – AO2 Description of the text information extraction process in digital videos indexing.

RQ2 – AO3 Obtaining information about the properties, evaluation directions, relations to each other by comparing empirically text information extraction methods and algorithms.

RQ3 – AO4 Creating a plan for conducting the experiment.

RQ3 – AO5 List available tools for the experiment and methods, which can be used.

RQ3 – AO6 Results of the experiment.

RQ4 – AO7 Comparison report of selected tool for providing text information extraction contains at least:

• Recall, precision rates in a given context,

• Analysis of the differences between outcomes.

RQ4 – AO8 Implications for future research and use of text information extraction.

1.4 Research methodology

(17)

used different metrics for evaluation and did not identify the experimental data properly (e.g. by specifying the quality of text). Afterwards, an analysis of text information content in digital video sports news has been performed. The purpose of video content analysis was to find out how useful can text information appearing in digital videos be for providing sports news indexing. The analysis has been based on the designed checklist (which is discussed later). As a result of the analysis, a context for conducting an experiment has been introduced. We have specified different qualities of text which appears in videos. We also set the success probability of providing text information extraction and we divided the text information into context categories such as e.g. Names or Places. Finally, available tools, which could be used for testing, have been investigated, and an experiment on testing environment has been performed. The outcome of the experiment provided a direct answer to the research question concerned with evaluating text information extraction tools in digital videos in a given context proposed by us. We have investigated the influence of different text quality on the results of providing text information extraction in digital video sports news. Our hypothesis has been confirmed, that is, the quality of text has a high impact on the results (precision and recall of the found text divided by text qualities). We also investigated the usefulness of text information extraction for providing video sports news indexing (different word recognition rate divided into content categories), using the MoCA tool. We concluded that the methods of text information extraction based on geometry and contrast analysis are more effective for providing video indexing in categories such as Discipline, Author, and Channel, than in Advertisement or Names. The obtained results confirmed the probability of text extraction which we set up during video content analysis. The overall conclusion is that results obtained in research papers are not comparable, because the text quality influences the results to a large extent. The whole process is shown and described below in Table 1 and Figure 1.

Table 1 – Research questions related to the Research methodology

Research Question Research Methodology

RQ1 What is the nature of text

information extraction from digital videos with TV sports news video indexing?

Systematic Literature Review* and video content analysis with the checklist.

RQ2 What is the state of art for text

information extraction methods used in digital video Sport and Broadcasting News Indexing approach?

Systematic Literature Review*.

RQ3 How are the methods evaluated in

the experiments described in research

papers? Systematic Literature Review*.

RQ4 What are the advantages and

disadvantages of selected text information extraction tools used in video indexing in a given context?

Execution and evaluation available methods (tool) in a given context by conducting an experiment.

(18)

Figure 1 – Research methodology.

1.4.1 Literature Review: SLR and Snowballing (RQ1-RQ3)

The initial stage has been done by studying the available literature on the chosen topic, including books and research papers in order to establish the nature of text information extraction from digital videos, the state of art of text information extraction methods used in digital sports news and broadcast news, and to find out how to evaluate algorithms.

The systematic literature review technique has been performed using guidelines presented by Kitchenham [43] supported by Snowball Searching Technique [44].

(19)

The details of the planning and the conducting Systematic Literature Review are described in Section 2.1 and Section 2.2. The results are discussed in Section 2.3.

1.4.2 Digital Video Content Analysis (RQ1)

Text information appearing in sports news has often similar properties in a given category. For instance, advertisement (logos) often appear in the background and are less legible than the subtitled group names; another example, is the name of the country, which is often written with two or three capital letters stands. In that case, some of the methods could work better for one category, e.g. hard liable logos, than another which could be more suitable for shortcuts of the names of the countries. A video content analysis has been performed to establish the categories for conducting the evaluation study of digital video text extraction methods. The analysis was conducted by performing an observation study based on the checklist designed by us (Table 2). The checklist has been thoroughly planned to identify the properties of text information appearing in digital video sports news. We have selected categories of text information according to the video indexing categorization, such as: Names, Places, Score, Advertisements, etc. Next, we have specified properties of text information appeared in each of category such as: visibility, time of appearance, place of appearance, etc. Finally, we classified the text appearing in the videos into 3 groups of text quality, on which base we set up the success probability of conducting text information extraction on each of category. The material for the analysis has been collected from CNN and BBC Sport News Channels during one week of observation. The content has been evaluated manually. The details of checklist are described in Section 4.2.

Table 2 – Checklist for video content analysis

Visibility Location Displaying time Moving Categories No. of

(20)

1.4.3 Evaluation Study (RQ4)

The evaluation study has been based on the results of the conducted experiment. The context of evaluation has been specified during the video content analysis. The tool and metrics used in the experiment have been selected as a result of Systematic Literature Review.

The scope of the experiment was to analyze the influence of text quality on the success of text information extraction; and to evaluate the available tools in a new context proposed by us, namely the usefulness for providing video indexing and information retrieval. We consider success of the text information extraction as the recall and precision rates in groups of text quality (related to the probability of text extraction). We consider the usefulness for providing video indexing as the different word recognition rate in the content categories such as names, places, logos, etc. The experiment has been conducted on video sports news by applying selected tools to the studied text qualities and content categories. The results have been compared by using recall, precision, harmonic mean, and different word recognition rates. The tool selection criteria were availability and suitability for providing text information extraction from broadcasting news. Through Systematic Literature Review we found only one suitable and available tool: MoCA (detection, localization, tracking, binarization, and recognition) supported with DJpeg (decompression) and Vista (creating the binarized file) which we used for conducting the experiment.

The experimental data was chosen randomly, however, it was limited to the short digital videos (less than 10 minutes) containing sports news from CNN and BBC Channel. The number of words and characters extracted from the videos has been compared with the manually counted content of videos in the three groups of text quality and in 14 content categories. The reported false alarms, recall, precision, harmonic mean, and different word recognition rate have been analyzed to find correlations. Finally, the influence of text quality on the methods performance has been specified and the effectiveness of text information extraction for providing video sports news has been reported.

The details about the experiment such as hypothesis formulations, dependent and independent variables definitions, tool selection and validity threats are described in Section 7.3.

1.4.4 Data synthesis (RQ1-RQ3)

The data has been synthesized using qualitative and quantitative assessment through literature review, video content analysis, and evaluation study.

(21)

also identified the data used in the experiment (size, type, quality), the metrics used by the authors, and we compared the results as well.

We chose the Qualitative Data Analysis model [45] to identify all relevant challenges of the text information extraction process mentioned in the selected primary studies. Therefore, we identified the categories during the collection of data instead of using a predefined form such as the data extraction criteria.

The Qualitative Data Analysis model is based on non-linear repeatable steps such as: Notice Thing, Collect Thing, Think About Things, so the context can be easily identified (Figure 2). During the analysis of the full text of the primary studies, the challenges have been noticed, collected, and sorted into five main categories and 29 subcategories (Table 18) by use of comments and highlights in Mandalay software. We gathered and described all solutions mentioned by authors for each challenge. The chains between the groups have been identified as a conclusion of the thinking step (Table 19). Finally, the challenges have been described according to the text extraction stages in Section 3.1.

Figure 2 – Quality Data Analysis Model [45]

(22)

In the Experiment we used subjective and objective evaluation. We chose 127 video frames from three BBC sport news video files for testing. To evaluate the results

subjectively we used 3 qualities of occurring text (probability of extraction) specified

from Video Content Analysis. Next, we visually compared the outcome of the algorithm with the real content of frame in each text quality categories. We noticed the occurring problems and presented them visually (Figure 25, Figure 26, Figure 27, Figure 28) and in a form of error report (Table 43). To evaluate the algorithm

objectively, we measured first the recall and precision (of words and characters)

(23)

1.5 Thesis Outline

Structure of the thesis with respect to the research questions is presented in Figure 3:

(24)

1.6 Key terms

According to the topic of the thesis, key terms have been explained in Table 3.

Table 3 – Key terms explanation

Serial

Number Key Term Description

Digital Videos

1 Digital Video Video files in digital format.

2 Digital Video News Digital Videos containing broadcast news.

3 Digital Video Sports

News Digital Videos containing broadcast Sports News.

Information Extraction

4 Information Extraction Automated extraction of structured information contained in documents

5 Text Information

Extraction Information Extraction limited to text information

6 Text Information Extraction Process

Text Information Extraction Execution. In this

paper we will focus on Process applied to Digital

Videos.

7 Text Information Extraction Steps

Division of Text Information Extraction Process into specific stages. In this study we followed the division proposed in two literature surveys [17], [11] such as localisation, detection, tracking,

binarization, and recognition. Methods

8 Text Information Extraction Methods

Proposed Method for conducting Text

Information Extraction Process. In this paper we

discussed the methods proposed for text

information extraction from Digital Video News. 9 Method Features

Video attributes, used in the Text Information

Extraction Method. In this study we followed the

division proposed in two literature surveys [47][11] such as Spatial, Temporal, and Both.

10 Method Approach

Text attributes, used in the Text Information

Extraction Method. In this study we followed the

division proposed in two literature surveys [47][11] such as Region-based, Texture-Based, and the Other Approach.

Other

11 Text Information Extraction Algorithms

Algorithms proposed for Text Information

Extraction Methods implementation

12 Text Information Extraction Tools

Program or application designed for Text

(25)

2 Systematic Literature Review

(RQ1 – RQ3)

The Systematic Literature Review is divided into several stages [48]:

1. Planning Systematic Literature Review – identifying strategy (Table 4), keywords (Table 5), search string, databases sources (Table 6), the inclusion and exclusion criteria (Table 7), analyzing strategy (Table 8), and extraction criteria (Table 9).

2. Conducting Systematic Literature review – strategy (Figure 4, Figure 5) and execution of review (Figure 6, Figure 7), data collecting and synthesizing (Table 11 – Table 15).

3. Results Report – presenting the results as research report.

2.1 Planning Systematic Literature Review

The Systematic Literature Review have been conducted to identify text extraction methods in digital videos used for data indexing in news, as well as relationships between these methods, the level of their empirical validation and appropriateness for Sports News in particular. Further, the findings indicate the current gaps and the development directions (visualized using a technique development graph).

2.1.1 Strategy

The strategy of Systematic Literature Review has been defined by the following steps (Table 4):

Table 4 – Strategy of Systematic Literature Review

Serial Number Step

1. Identifying search areas according to the research questions. 2. Finding synonymous and alternative words appropriate for

identified search areas. 3.

Creating search strings using Boolean ‘OR’ with synonymous and alternative keywords, and Boolean ‘AND’ with major keywords (group of synonyms).

4. Supporting results using manual search (articles suggested by supervisor).

(26)

2.1.2 Keywords

The Keywords have been defined using four categories (Table5).

Table 5 – Keyword for Search String

Major

keyword Synonymous or alternative keywords

Text

information

Text data, Text information in digital videos, Text information extraction

Text extraction

Text extraction, Text recognition, Character recognition, Text localization, Text tracking, Information extraction, Text information extraction, Data extraction, OCR

Methods

Method, Approach, Approaches, Strategy, Framework,

Frameworks, Way, Ways, Technique, Techniques, Algorithm, Algorithms

Evaluation Efficiency, Performance, Validation, Verification, Comparison, Progress, Experiment

Digital Video Video, Videos

Sports News News, TV News, Broadcast News

Using the Boolean operators AND, OR, the string of keywords has been defined as:

(“text information” OR “text data“ OR “text information extraction” OR “text information in digital videos”) AND (“extraction” OR “text extraction” OR “character recognition“ OR “text recognition” OR “text localization” OR “text tracking” OR “information extraction” OR “text information extraction” OR “data extraction” OR “OCR”) AND (“methods” OR “method” OR “approach” OR “approaches” OR “algorithm” OR “algorithms” OR “technique” OR “techniques” OR “strategy” OR “framework” OR “frameworks” OR “way” OR “ways”) AND (“evaluation” OR “efficiency” OR “performance” OR “comparison” OR “experiment” OR “validation” OR “verification” OR “progress”) AND (“digital video” OR “digital videos” OR “video” OR “videos”) AND (“sports news” OR “news” OR “TV news” OR “broadcast news”)

2.1.3 Sources

Three databases have been selected as sources for the Systematic Literature Review from the list of appropriated data sources for software engineers proposed by Brereton in [49]. The data sources have been limited to IEEE explore, ACM Digital Library, and Science Direct because of individual thesis (Table 6).

Table 6 – Sources for Systematic Literature Review

Serial

Number Database

1 IEEE explore

(27)

2.1.4 Selection criteria

The inclusion and exclusion criteria have been defined as (Table 7):

Table 7 – Inclusion and Exclusion Criteria

Serial

Number Inclusion Criteria

1 The language of the article is English. 2 The article is available in full text. 3 The articles should be peer reviewed. 4 The text is comprehensible.

5 The described method is associated with digital video. 6 The method refers to the text extraction stages (detection,

localization, tracking, binarization or recognition). 7 The method is designed for extraction of Latin letters. 8 The method is evaluated according to experimental results 9 The method is tested on Broadcast News.

10

The method is designed for Text Information Extraction from Digital Videos News, or is compared with other method.

11

Due to experimental results it is proved that the method is suitablefor the given context (e.g. fast, suitable for caption text).

Exclusion Criteria

12 Articles which do not meet the inclusion criteria.

The articles will be selected by analyzing text areas in the proposed order (Table 8).

Table8 – Study Procedure

Serial

Number Analyzed area

(28)

2.1.5 Quality Criteria

The quality criteria has been defined according to three categories: using the uniform division proposed in literature surveys from 2009 [11] and 2004 [47], detailed definition of dataset chosen for experiment, and conducted evaluation details (Table 9). Each of the questions have been answered Yes or No (no partially answers). The articles which met between 80% and 100% of the criteria were set up as High quality. The articles which fulfilled at least 50% were set up as Average quality. The articles which did not meet 50% of criteria were set up as Low quality. We discarded all the studies which did not meet 40% of the quality criteria.

Table 9 – Quality Criteria

Serial

Number Quality Criteria

The paper follow uniform division proposed in literature surveys from 2009 [11] and 2004 [47],

Q1

Are the steps of text extraction defined as detection, localization, tracking, binarization, and / or recognition? Is the approach: region-based, texture-based, or other? Are the features: spatial, temporal, or both?

Detailed definition of dataset chosen for experiment

Q2 Is number of frames used for evaluation specified? Q3 Is number of text (block, words, or characters) specified? Q4 Is type of text (e.g. Chinese) specified?

Q5 Is format of video (e.g. MPEG) specified? Q6 Is resolution of video files specified?

The method is well evaluated

Q7 Is the method compared with another?

Q8 Does the method outperform the methods which it was compared with?

Q9 Are there directions of development described?

Q10 Are the metrics: recall and precision used for evaluation?

The quality assessment for the selected Primary Studies is provided in Section 2.3.5 (Table 16 and Table 17).

2.1.6 Data extraction criteria

(29)

from one to all five extraction stages. The method can be connected with many other methods, and has many directions of development. One method can be evaluated by using many metrics, types videos, text, or resolutions.

Table 10 – Data Extraction Criteria

Serial

Number Extracted Data Example

Paper Details

1 Serial number 10.1109/DAS.2008.49

2 Paper name Extraction of Text Objects in Video

Documents: Recent Progress

3 Year of Publication 2010, 2003, 2000, (>1996)

4 Publication database ACM, IEEE, Science Direct,

Springer Link

Proposed Method Details

5 Name of the Method Dubey’s Method.

6 Method feature Temporal, Spatial, Both

7 Method approaches Texture based, Region based

8 Extraction Stages Localization, Tracking,

Binarization

9 Method is suitable for Caption text, complex background,

not available

Connections between other Methods

10 Directions of Development Not available, Liu Method

11 Compared with Fuzzy C Means, Chen’s Method

Experiment Details

12 Metrics used for evaluation Recall, precision, false alarms rate

13 Result in % 56, 76, 86

14 Types of video used for evaluation Video news, documentaries 15 Number of video files (sequences) 2, 3, not available

16 Number of video frames 3609, 1000, not available

17 Resolution of dataset 325x240, not available

18 Time 2h20s, not available

19 Types of text English, Chinese

20 Number of text blocks 150, 100, not available

(30)

2.2 Conducting Systematic Literature Review

The Systematic Literature Review has been conducted according to the Kitchenham’s guidelines procedure and Snowball Technique (Figure 4).

Figure 4 – Strategy of Systematic Literature Review

Kitchenham’s guidelines procedure

The initial string of Keywords has been put into the three databases selected as data sources (IEEE Explore, ACM Digital Library, and Science Direct). Keywords string has been refined for getting the most relevant output. The final outcome has been analyzed using the Inclusion and Exclusion Criteria for selecting articles for Primary Studies. From the Selected articles, the data has been extracted using extraction criteria for Report Results.

Snowball technique

The results found through the systematic search according to Kitchenham’s guidelines have been extended with the help of Snowball technique. Two literature surveys on text information extraction techniques from 2009 [11]and 2004 [47] have been chosen as Snowballing Data Sources. The Titles from references have been put into selected databases (in the following order: IEEE Explore, if the article has not been found, then ACM Digital Library, and lastly Science Direct). The procedure has been limited to the analysis of the root references because the literature surveys concern relevant research papers in the area (the references from the outcome have not been analyzed).

(31)

Manual Search

The results from both techniques have been supported by using manual search (articles suggested by the supervisor).

2.2.1 Selected articles

Kitchenham’s guidelines supported by manual search

After putting in the keywords, 33 articles in IEEE Engine were found, 153 in ACM Digital Library and 114 in Science Direct which amount to 293 studies after removing 12 duplicates (Figure 5). In the next step, the amount of found studies has been reduced to 80 after applying inclusion and exclusion criteria by analyzing title, keywords, and abstracts. Then, they have been limited to 12 articles by using quality criteria, by analyzing conclusions, results, and full texts if necessary. Also, one article has been added which was recommended by the supervisor (manual search) what gives the total amount of 16 articles which will be used in primary studies (Table 11, Table 12).

Figure 5 – Systematic Literature Review Results

Table 11 – Articles Selected from Databases and by Manual Search

Data Source Number of Articles Selected Articles for Primary Studies

IEEE explore 5 [S1-S5]

ACM Digital Library 4 [S6-S9]

Science Direct 3 [S10-S12]

Manual Search 1 [S13]

(32)

Table 12 – Selected Articles for Primary Studies from Kitchenham’s guidelines and Manual Search

No Selected Article Name

S1

Y. Song and W. Wang, “Text Localization and Detection for News Video,” 2009 Second International Conference on Information and Computing Science, no. 1, pp. 98-101, 2009

S2

X. Huang, H. Ma, and H. Zhang, “A new video text extraction approach”, Beijing University of Posts and Telecommunications , Beijing 100876 , China,” pp. 650-653, 2009

S3 D. Zhang et al., “Accurate overly text extraction for digital video analysis”

pp. 233-237, 2003

S4

Y. Su, , Y. X. Ji, , X. Song, ,and R. Hua, "Caption Text Location with Combined Features for News Videos." 2008 International Workshop on Education Technology and Training & 2008 International Workshop on Geoscience and Remote Sensing, pp.714-718, 2008

S5

F. Xiaoling , “Gray-based news video text extraction approach” 5th International Conference on Computer Sciences and Convergence Information Technology, pp. 208-211, 2010

S6 J. Yuan, W. Lu, and L. Wang, “A New Video Text Detection Method,”

Science And Technology, pp. 359-362, 2011

S7

D. S. Guru, S. Manjunath, P. Shivakumara, and C. L. Tan, “An Eigen Value Based Approach for Text Detection in Video,” Proceedings of the 8th IAPR International Workshop on Document Analysis Systems - DAS ’10, pp. 501-506, 2010

S8 X. Liu and W. Wang, “Extracting Captions from Videos Using Temporal

Feature,” Framework, pp. 843-846, 2010

S9

P. Shivakumara, A. Dutta, C. L. Tan, and U. Pal, “A New Wavelet-Median-Moment based Method for Multi- Oriented Video Text Detection,” Pattern Recognition, pp. 279-286, 2010

S10

M. Anthimopoulos, B. Gatos, and I. Pratikakis, “A two-stage scheme for text detection in video images,” Image and Vision Computing, vol. 28, no. 9, pp. 1413-1426, Sep. 2010

S11

X. Qian, G. Liu, H. Wang, and R. Su, “Text detection, localization, and tracking in compressed video,” Signal Processing: Image Communication, vol. 22, no. 9, pp. 752-768, Oct. 2007

S12

D. Chen and J. Odobez, “Video text recognition using sequential Monte Carlo and error voting methods,” Pattern Recognition Letters, vol. 26, no. 9, pp. 1386-1403, Jul. 2005

(33)

Snowball technique

Two literature surveys have been selected for conducting Snowball limited to the analysis of root references only (Table 13).

Table 13 – Source articles selected for conducting Snowball on references.

No Selected Articles for conducting Snowball

A1 K. Jung, “Text information extraction in images and video: a survey,”

Pattern Recognition, vol. 37, no. 5, pp. 977-997, May 2004.

A2

J. Zhang and R. Kasturi, “Extraction of Text Objects in Video Documents: Recent Progress,” 2008 The Eighth IAPR International Workshop on Document Analysis Systems, pp. 5-17, Sep. 2008.

(34)

Table 14 – New Articles Selected from references

Data Source Number of Articles Selected Articles for Primary Studies

A1 5 [S14-S18]

A2 3 [S19-S21]

ΣΣΣΣ 8 [S14-S21]

Table 15 – Selected Articles for Primary Studies from Snowball Procedure

No Selected Article Name

S14 D. Chen and J.-Philippe Thiran, “Text Identification in Complex Background

Using SVM,” Signal Processing, pp. 621-626, 2001

S15

J. Gllavata, E. Qeli, and B. Freisleben, “Detecting Text in Videos Using Fuzzy Clustering Ensembles,” Symposium A Quarterly Journal In Modern Foreign

Literatures, 2006.

S16

C.-C. Lee, Y.-C. Chiang, C.-Y. Shih, and H.-M. Huang, “Caption Localization and Detection for News Videos Using Frequency Analysis and Wavelet Features,” 19th IEEE International Conference on Tools with Artificial

Intelligence (ICTAI 2007), pp. 539-539, Oct. 2007.

S17

Y. Liu, H. Lu, X. Xue, and Y.-P. Tan, “Effective Video Text Detection Using Line Features,” C. Science, Image (Rochester, N.Y.), no. December, pp. 6-9, 2004.

S18

R. Lienhart and W. Effelsberg, “Automatic text segmentation and text

recognition for video indexing,” Multimedia Systems, vol. 8, no. 1, pp. 69-81, Jan. 2000

S19 T. Saeo and M. A. Smith, “Video OCR for Digital News Archive,” World Wide Web Internet And Web Information Systems, pp. 52-60, 1997.

S20

D. Chen, J.-Marc Odobez, and H. Bourlard, “Text Segmentation and Recognition in Complex Background Based on Markov Random Field *,”

Complexity, pp. 227-230, 2002.

S21

Y.-Kyu Lim, S.-Ha Choi, and S.-Whan Lee,“Text Extraction in MPEG Compressed Video for Content-Based Indexing,” Proceedings 15th

International Conference on Pattern Recognition. ICPR-2000, vol. 4, pp.

(35)

2.3 Reporting Result

2.3.1 Article selection

From 21 selected primary studies, 12 were found by using Kitchenham’s guidelines procedure (57%), eight by Snowball technique (38%), and one by manual search (5%) (Figure 7). 0 2 4 6 8 10 12 P ri m a ry s tu d ie s Kitchenham’s guidelines

Snow ball Manual Search

Search technique

Number od selected primary studies according to the search technique

Figure 7 – Number of primary studies according to the search technique

2.3.2 Database

12 of primary studies were found in IEEE Engine (57%), four in ACM Digital Library (19%), three in Science Direct (14%), and two using other sources (9%) (Figure 8).

Figure 8 – Number of primary studies according to the database name

(36)

2.3.3 Publication year

The topic of text information extraction first appeared in literature in 1996. We chose articles according to the video news indexation, 72% of the selected primary studies were published after 2003, which in turn included 69 % of works published after last literature survey from 2008. It seems that researches recently pay much attention to that area, nine of selected primary studies (50%) were published between 2009, and 2011 (Figure 9). 0 1 2 3 4 5 P ri m a ry s tu d ie s 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 Publication year

Number of selected primary studies according to the publication year

Figure 9 – Number of primary studies according to the publication year

2.3.4 Research methodology

All of the 21 selected primary studies were experiments (Figure 10).

0 5 10 15 20 P ri m a ry s tu d ie s Experiment Other Research methodology

Number of primary studies according to the research methodology

Figure 10 – Number of the primary studies according to the research methodology

(37)

2.3.5 Quality assessment

According to the Quality Criteria (Table 9), we assessed the quality of selected articles. The articles which meet between 80% and 100% of the criteria were set up as High quality. The articles which fulfilled at least 50% were set up as Average quality. The articles which do not meet 50% of criteria were set up as Low quality (Table 16).

Table 16 – Quality assertion

Primary Study Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Quality S1 1 0 0 1 0 0 0 0 1 1 4 40% S2 0 1 1 1 0 1 1 1 1 0 7 70% S3 1 0 1 1 1 1 1 1 1 1 9 90% S4 0 1 0 1 0 1 1 1 1 1 7 70% S5 0 0 1 1 0 0 0 0 1 1 4 40% S6 0 1 0 1 0 0 1 1 0 1 5 50% S7 1 1 0 1 0 1 1 1 0 0 6 60% S8 0 0 1 1 0 1 0 0 1 1 5 50% S9 0 1 0 1 0 0 1 1 1 0 5 50% S10 1 1 1 1 0 1 1 1 1 1 9 90% S11 1 1 1 1 1 1 1 1 1 1 10 100% S12 1 1 1 1 1 0 1 1 1 1 9 90% S13 1 1 1 1 1 1 0 0 1 1 8 80% S14 1 1 0 0 1 1 1 1 1 0 7 70% S15 1 1 1 1 1 0 1 1 1 1 9 90% S16 1 1 1 0 0 1 0 0 1 1 6 60% S17 0 1 1 0 0 1 1 1 1 0 6 60% S18 0 1 1 1 1 1 0 0 1 0 6 60% S19 0 0 1 1 1 1 1 1 1 0 7 70% S20 0 0 1 0 1 0 1 1 1 1 6 60% S21 1 1 0 1 1 0 1 1 0 1 7 70%

That gives six articles of high quality, 13 of average, and two of low (Table 17). However, we decided to include also the two articles of low quality in this thesis because they meet all included criteria and they are strongly connected with video news.

Table 17 – Quality assertion results

Quality Number

of articles Selected primary studies

High 6 [S3], [S10], [S11], [S12], [S13], [S15]

Average 13 [S2], [S4], [S6], [S7], [S8], [S9], [S14], [S16], [S17], [S18], [S19], [S20], [S21]

Low 2 [S1], [S5]

ΣΣΣΣ 21 [S1],[S2], … [S21]

High quality of the article Averagequality of the article Low quality of the article

(38)

2.3.6 Challenges identification (RQ1, RQ2)

The following challenges in text information extraction process have been identified while conducting Systematic Literature Review (Table 18). The details of data analysis are presented in Section 1.4.4.

Table 18 – Challenges identification

S.N. Name Primary

Study Description

Videos file properties

C1 Complex Background [S1] [S2], [S3], [S4], [S6], [S7], [S9], [S10], [S12], [S14], [S16], [S17], [S19], [S20].

The background in the video is more complex than in images [S12]. That is why it is necessary to provide text detection and localization before using OCR [S12]. However, region based methods [S14] such as edge-based [S6], [S8], [S9], corner-based [S8], connected components [S1], [S7], [S10], [S17], or hybrid methods [S10] are not efficient. Threshold-based, simple smoothing, and M-estimation methods failed because they are processing gray level images [S2] [S20]. Texture-based methods are designed to solve the problem, but they are time consuming [S7], [S8], To make it more robust, researchers are using texture-based methods supported with edge-based methods [S4]. In that case, the temporal features can be also used like fuzzy clustering Ensemble in [S15] or multi-frame integration in [S19] C2 Compression [S11], [S13], [S15], [S19], [S21].

The challenge is the variety of compression video formats (MPEG1, MPEG2, MPEG4, MPEG7, Real Video etc.) [S13]. Some of the methods are providing the decompression before text extraction [S21]. However, DCT Coefficient[S11] is trying to face the problem with macroblock information, but minimal decompression is still needed [S21]

C3 Low contrast [S6], [S7], [S9], [S10], [S11], [S15].

The low contrast is usually an effect of compression [S6], [S9]. The methods such as texture-based [S7], region-based, and edge-based [S7], heuristic techniques [S10] and DCT coefficient [S11] are not effective. In [S15] researchers propose fuzzy clustering ensembles to tackle this problem

C4 Noise

[S12], [S14], [S20]

Noise is often an effect of compression [S14]. Standard image segmentation methods and color clustering are not effective because of the noise in video files [S12], [S20]

Text (fonts) properties

C5 Various style of fonts

[S2], [S4], [S7], [S8], [S11].

Methods of Text Information Extraction in Digital Videos

Master Thesis

Software Engineering

Thesis no: MSE-2012-93

May 2012

Methods of Text Information

Extraction in Digital Videos

Anna Tarczyńska

ABSTRACT

STRESZCZENIE

ACKNOWLEDGMENTS

DECLARATION OF ORIGINALITY

CONTENTS

List of

Tables

List of

Figures

1

Introduction

1.1

Background

1.2

Problem description

1.3

Thesis overview

1.4

Research methodology

1.5

Thesis Outline

1.6

Key terms

2

Systematic Literature Review

(RQ1 – RQ3)

2.1

Planning Systematic Literature Review

2.2

Conducting Systematic Literature Review

2.3

Reporting Result