• No results found

Information State Based Speech Recognition

N/A
N/A
Protected

Academic year: 2021

Share "Information State Based Speech Recognition"

Copied!
2
0
0

Loading.... (view fulltext now)

Full text

(1)

Information State Based Speech Recognition

Rebecca Jonson

Department of Philosophy, Linguistics and Theory of Science

Academic dissertation in linguistics, to be publicly defended, by due permission of the Faculty of Humanities at University of Gothenburg, on May 22, at 1 p.m., in T307, Olof Wijksgatan 6 (Gamla Hovr¨atten), Humanisten, G¨oteborg.

(2)

Abstract

Ph.D dissertation in Linguistics at University of Gothenburg, Sweden, 2010 Title: Information State Based Speech Recognition

Author: Rebecca Jonson Language: English

Department: Department of Philosophy, Linguistics and Theory of Sciences, University of Gothenburg, Box 200, SE-405-30 G¨oteborg

Series: Gothenburg Monographs in Linguistics 41 Published at: http://hdl.handle.net/2077/22169

One of the pitfalls in spoken dialogue systems is the brittleness of automatic speech recognition (ASR). ASR systems often misrecognize user input and they are unreliable when it comes to judging their own performance. Recognition failures and deficient confidence estimation affect the performance of a dialogue system as a whole and the impression it makes on a user. Humans outperform ASR systems on most tasks related to speech understanding. One of the reasons is that humans make use of much more knowledge. For example humans appear to take a variety of knowledge-based aspects of the current dialogue into account when processing speech. The main purpose of this thesis is to investigate whether speech recognition also can benefit from the use of higher level knowledge sources and dialogue context when used in spoken dialogue systems.

One of the major contributions of this thesis is to provide more insight into what type of knowledge sources in spoken dialogue systems would be potential contributors to the task of ASR and how such knowledge can be represented computationally. In the framework of information state based dialogue management we have an important source of semantic and pragmatic knowl- edge represented in the information state. We will investigate if the knowledge in the information state can help to alleviate the search problem and reliability estimation in speech recognition.

We call this knowledge and context aware approach to speech recognition information state based speech recognition.

The first part of this thesis investigates approaches to obtaining better initial language models more rapidly for spoken dialogue systems and ways of dynamically selecting the most appropriate models based on the dialogue context.

The second part of this thesis concerns the use of the speech recognition output and investi- gates how additional knowledge sources can enhance a dialogue system’s decision-making on how to proceed and make use of speech recognition hypotheses.

The thesis presents several experimental studies addressing the issues described above and proposes an integration of the explored techniques into theGoDiS dialogue system.

Keywords: dialogue systems, speech recognition, language modelling, dialogue move, dialogue context, ASR, higher level knowledge, linguistic knowledge, N-Best re-ranking, confidence scoring, confidence annotation, information state, ISU approach.

References

Related documents

To that list we would like to add accommodation of topoi – when a topos which is necessary for an enthymematic argument to make sense is added to the discourse model – and

The analyses focus on authentic dialogue material, and informal theories from linguistics and language philosophy are combined with formal theories in what can be con- sidered

A COMMUNICATION PLAN includes the total planning of the information initiatives, communication input and dialogue for the entire project and is prepared according

The aim and objective of this research is to find and investigate an approach to design accurate speech recognition application while .net framework is being considered

The first experiment on the MP3 domain predicted 19 different dialogue moves. In practice, 19 different classes would mean preparing beforehand 19 different SLMs and load all these

Hösten 1998, efter skörd, fanns det signifikant större mängder mineralkväve i marken i behandling som fått färsk djupströgödsel på våren, jämfört med ogödslad behandling

Resultatet visade att det inte var någon större skillnad mellan de olika bakgrundsvariablerna i frågan om respondenterna ansåg att det var viktigt med etiska och/eller

In this thesis an evaluation of Google Speech will be made using recordings in English from two Swedish speakers based on word error rate (WER) and translation speed.. The