• No results found

Using the Wizard-of-Oz Framework in a Pronunciation Training System for Providing User Feedback and Instructions

N/A
N/A
Protected

Academic year: 2021

Share "Using the Wizard-of-Oz Framework in a Pronunciation Training System for Providing User Feedback and Instructions"

Copied!
3
0
0

Loading.... (view fulltext now)

Full text

(1)

http://www.diva-portal.org

This is the published version of a paper presented at IS ADEPT.

Citation for the original published paper:

Cabral, J P., Kane, M., Ahmed, Z., Székely, É., Zahra, A. et al. (2012)

Using the Wizard-of-Oz Framework in a Pronunciation Training System for Providing User Feedback and Instructions.

In:

N.B. When citing this work, cite the original published paper.

Permanent link to this version:

http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-185531

(2)

Using the Wizard-of-Oz Framework in a

Pronunciation Training System for Providing User Feedback and Instructions

Jo˜ao P. Cabral, Mark Kane, Zeeshan Ahmed, ´Eva Sz´ekely, Amalia Zahra, Kalu U. Ogbureke, Peter Cahill, Julie Carson-Berndsen and Stephan Schl¨ogl

School of Computer Science and Informatics, University College Dublin, Ireland Email: see http://muster.ucd.ie

School of Computer Science and Statistics, Trinity College Dublin, Ireland Email: schlogls@tcd.ie

Index Terms—DEMO, MySpeech, Wizard-of-Oz, Pronuncia- tion training

I. INTRODUCTION

A prototype of a computer-assisted pronunciation training system called MySpeech is showcased in this demo. The web- based interface of the MySpeech system enables users to select a sentence from different domains, such as greetings, the difficulty level and contains both recording and playback functionalities. The interface is also used to provide instruc- tions and feedback messages based on the pronunciation errors detected in their recorded speech by the system.

MySpeech uses an automatic speech recognition (ASR) method for detecting mispronunciation in the speech recorded by the user which is similar to [1]. However, this method was adapted to introduce difficulty levels in the pronunci- ation training of MySpeech, as proposed in [2] and clearly indicates that broad phonetic/phonological groups are suitable for tackling mispronunciations by non-native speakers. Recent developments include a spoken term detection front-end that explicitly uses underspecification to tackle pronunciation vari- ation resulting in an increase in performance [3].

Both the pronunciation analysis component and the web interface are connected to a database. The database contains the audio and text data for the pronunciation practice exercise.

It is also used to store data obtained from the interaction of each student with the system. The aim of collecting the user’s data is to build a personalised student model that can be used to adapt the system to the user and to develop a pedagogical model. For example, the analysis of the pronunciation errors stored for a student could be used to automatically predict the appropriate difficulty level for that student. It could also be used to detect the most frequent types of pronunciation errors, in order to automatically suggest words containing those sounds for the student to practice. Other types of student data could also be used for adaptation, such as the recorded speech to adapt the acoustic models of ASR to the speaker.

One current limitation of the MySpeech system is that feedback and instructions given to a user are not automatically

generated. The WebWOZ Wizard-of-Oz platform (http://www.

webwoz.com) was integrated into the MySpeech system, in order to enable a human (who acts as a wizard) to give feedback and instructions to the practising user, while the user is not aware that there is another person involved in the communication. The Wizard-of-Oz (WOZ) method has been used before in language learning applications. For example, it was used to study a dialogue strategy in [4]. It was also employed to test and refine the human-computer interface and feedback display of a computer-based speech training aid called ARTUR [5]. In this demo, WOZ is used in a different context, namely to enable a semi-automatic operation of the MySpeech system, in which the wizard has access to the pronunciation analysis results computed by the system and provides feedback to the user based on those results.

Another function of the wizard is to guide the student through the selection of sentences and control the progression of the student through their skills. For example, a student starts at the

”easy” level and after practicing for some time at this level she is asked (by the Wizard) to progress to the next level. The data collected from the wizard will also be used to further improve the system.

II. DEMONSTRATION

A. Web Interface

Participants in the demo will use the MySpeech system to train their English pronunciation. Figure 1 shows a screenshot of the MySpeech web interface, which consists of several numbered panels. In panel 1 the user can select the language.

The second panel allows the user to adapt the difficulty level (“easy”, “medium”, or “hard”). Next, there is a category panel (panel 3), so that for example, the category “greetings” can be associated with several phrases related to this domain. The different sentences are then chosen in panel 4. The audio players embedded in the interface are used by the users to listen to the selected sentence spoken by a native speaker (panel 5) and to record their own version of the same sentence and consequently submit it to the system (panel 6). The user can then submit the recording for the system to evaluate the

(3)

pronunciation. In the full automatic operation mode (without using the WOZ platform), the feedback panel (panel 7) shows the detected mispronunciation errors of a submitted utterance using darker colours. In this example, the submitted utterance corresponds to the sentence: See you in the morning. The other operation mode which requires the wizard interaction is explained in the next section.

Fig. 1. Screenshot of the MySpeech web interface.

B. The Wizard-of-Oz Setup

A voice-over-IP system is used to give the wizard a real-time visualisation of the user’s screen, and to transmit everything a user is saying (the user is not aware of this transmission). Fur- thermore, the wizard has access to the pronunciation analysis results computed by the system (displayed on a second screen).

The wizard’s task is to interpret a result and consequently to transform it into an appropriate textual feedback to be sent to the user. A screenshot of the WOZ interface is shown in Figure 2. The interface allows for selecting predefined sentences for instructions as well as feedback. Choosing from predefined sentences as opposed to typing feedback in real- time allows for a quicker response.

To decrease the time a wizard searches for an appropriate response, sentences are grouped into different panels. For example, the “difficulty” panel contains sentences that prompt the user to switch to a different difficulty level, whereas the

“phrases” panel contains sentences that prompt her to select a different phrase. The panel called “feedback” contains sen- tences for indicating where the individual mispronunciations errors are within the sentence or word. There is also a panel with encouragement messages (called “positive”), which offers a way of motivating the user and indicating that pronunciation assessment was positive. Table I shows examples of the corrective and positive feedback sentences. Finally, the panel called “free text” allows a wizard to input any text, or edit an already predefined sentence from one of the other panels, before sending it to the user.

During the demo, one of the authors assumes the role of wizard and was trained beforehand to follow a simple

Fig. 2. Screenshot of the WOZ web interface.

interaction model to guide the user through the learning exercise. However, wizard has the freedom to alter the model as she wishes (exploring the interaction space).

Corrective feedback messages You mispronounced the last part of the word

Please try to emphasize You mispronounced the word

Positive feedback messages Perfect, you pronounced the phrase correctly

You are showing some improvement You are almost there

TABLE I

EXAMPLES OF MESSAGES FROM THEFEEDBACKANDPOSITIVE PANELS OF THEWOZINTERFACE.

ACKNOWLEDGMENT

This research is supported by the Science Foundation Ire- land (Grant 07 / CE / I 1142) as part of the Centre for Next Generation Localisation (www.cngl.ie).

REFERENCES

[1] Witt, S. M. and Young, S. J., “Phone-level pronunciation scoring and assessment for interactive language learning”, Speech Communication, Vol. 30, pp. 95–108, 2000.

[2] Kane, M., Cabral, J. P., Zahra, A., Carson-Berndsen, J., “Introducing Difficulty-Levels in Pronunciation Learning” Proc. of SLaTE, Italy, 2011.

[3] Kane, M., Ahmed, Z. and Carson-Berndsen, J., “Underspecification in Pronunciation Variation”, In Proc. of International Symposium on Automatic Detection of Errors in Pronunciation Training (IS ADEPT), Stockholm, 2012.

[4] Ehsani, F., Bernstein, J., Najmi, A.,“An interactive dialog system for learning Japanese”, Speech Communication, 30(2-3), pp. 167–178, 2000.

[5] B¨alter, O., Engwall, O., ¨Oster, A., and Kjellstr¨om, H., “Wizard-of-Oz test of ARTUR: a computer-based speech training system with articulation correction”, Proc. of ASSETS, pp. 36–43, 2005.

All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.

References

Related documents

För att uppskatta den totala effekten av reformerna måste dock hänsyn tas till såväl samt- liga priseffekter som sammansättningseffekter, till följd av ökad försäljningsandel

The increasing availability of data and attention to services has increased the understanding of the contribution of services to innovation and productivity in

Av tabellen framgår att det behövs utförlig information om de projekt som genomförs vid instituten. Då Tillväxtanalys ska föreslå en metod som kan visa hur institutens verksamhet

Generella styrmedel kan ha varit mindre verksamma än man har trott De generella styrmedlen, till skillnad från de specifika styrmedlen, har kommit att användas i större

Parallellmarknader innebär dock inte en drivkraft för en grön omställning Ökad andel direktförsäljning räddar många lokala producenter och kan tyckas utgöra en drivkraft

Närmare 90 procent av de statliga medlen (intäkter och utgifter) för näringslivets klimatomställning går till generella styrmedel, det vill säga styrmedel som påverkar

• Utbildningsnivåerna i Sveriges FA-regioner varierar kraftigt. I Stockholm har 46 procent av de sysselsatta eftergymnasial utbildning, medan samma andel i Dorotea endast

I dag uppgår denna del av befolkningen till knappt 4 200 personer och år 2030 beräknas det finnas drygt 4 800 personer i Gällivare kommun som är 65 år eller äldre i