• No results found

Customizing Interaction for Natural Language Interfaces

N/A
N/A
Protected

Academic year: 2021

Share "Customizing Interaction for Natural Language Interfaces"

Copied!
25
0
0

Loading.... (view fulltext now)

Full text

(1)

Vol. 1(1996): nr 1

Linkoping University Electronic Press

Link oping, Sweden

http://www.ep.liu.se/ea/cis/1996/001/

Customizing Interaction for

Natural Language Interfaces

Lars Ahrenberg

Nils Dahlback

Arne Jonsson



Ake Thuree

Department of Computer and Information Science

Linkoping University

(2)

581 83 Linkoping, Sweden

Linkoping Electronic Articles in

Computer and Information Science

ISSN 1401-9841 Series editor: Erik Sandewall

c1996 Lars Ahrenberg, Nils Dahlback, Arne Jonsson, Ake Thuree

Typeset by the authors using TeX Formatted using etendustyle

Recommended citation:

<Authors>. <Title>. Linkoping electronic articles

in computer and information science, Vol. 1(1996): nr 1.

http://www.ep.liu.se/ea/cis/1996/001/. October 1, 1996.

This URL will also contain a link to the authors' home pages. The publishers will keep this article on-line on the Internet

(or its possible replacement network in the future) for a period of 25 years from the date of publication, barring exceptional circumstances as described separately.

The on-line availability of the article implies

a permanent permission for anyone to read the article on-line, to print out single copies of it, and to use it unchanged for any non-commercial research and educational purpose,

including making copies for classroom use. This permission can not be revoked by subsequent transfers of copyright. All other uses of the article are

conditional on the consent of the copyright owners. The publication of the article on the date stated above included also the production of a limited number of copies on paper, which were archived in Swedish university libraries

like all other written works published in Sweden. The publisher has taken technical and administrative measures

to assure that the on-line version of the article will be permanently accessible using the URL stated above, unchanged, and permanently equal to the archived printed copies

at least until the expiration of the publication period. For additional information about the Linkoping University Electronic Press and its procedures for publication and for

assurance of document integrity, please refer to its WWW home page: http://www.ep.liu.se/

(3)

Habitability and robustness have been noted as important

quali-ties of natural-language interfaces. In this paper we discuss how

these requirements can be met, in particular as regards the

sys-tem's ability to support a coherent and smooth dialogue. The

discussion is based on current work on customizing a dialogue

system for three dierent applications.

We adopt a sublanguage approach to the problem and

pro-pose a method for customization combining bottom-up use of

empirical data with a global pragmatic analysis of a given

ap-plication. Finally, we suggest three design principles that have

emerged from our work called the sublanguage principle, the

asymmetry principle and the quantity principle.

This is an extended version of a paper with the same title presented at the Workshop on Pragmatics in Dialogue, The XIV:th Scandinavian Conference of Linguistics and the VIII:th Conference of Nordic and General Linguistics, Goteborg, Sweden, August 16-21,

(4)

1 Introduction

Research on computational models of discourse can be motivated from two di erent standpoints. One approach is to develop general models and theories that apply to all kinds of agents and situations. The other is to develop accounts of speci c discourse genres (Dahl-b ack & J onsson, 1992). It is not obvious that the two approaches should produce similar computational theories of discourse and we believe it is important to distinguish the two tasks from each oth-er. Moreover, in the case of dialogues for natural-language interfaces (NLIs), which is our prime concern in this paper, there is not merely the question of modelling some external linguistic reality but also an important element of design, linguistic as well as otherwise.

The following requirements are widely recognized as being impor-tant for NLIs.

habitability: the user should conveniently be able to express the commands and requests that the background system can deal with, without transgressing the linguistic capabilities of the interface (Watt, 1968)

eciency: the NLI should not slow down the interaction with the background system noticeably

robustness: the system should be able to react sensibly to all input (cf. Hayes & Reddy, 1983)

transparency: the system's capabilities and limitations should be evident to the user from experience

In this paper we discuss ways of making teletype natural-language interfaces satisfy these requirements in the context of applications which belong to the domain that Hayes and Reddy (1983) call simple service systems, i.e. systems that \require in essence only that the customer or client identify certain entities to the person providing the service these entities are parameters of the service, and once they are identi ed the service can be provided" (ibid. p. 252). These systems exhibit the kind of dialogue that Van Loo and Bego (1993) term parameter dialogue.

The requirement that puts the highest demand on linguistic com-petence is the one concerning habitability. However, habitability does not necessarily imply that the system must understand any relevant request. Where extended linguistic coverage comes in conict with either robustness, transparency or eciency, it may be compromised (though cf. Ogden, 1988). This trade-o between requirements re-con rms the need for good design. The importance of habitability, however, suggests that a dialogue system must handle those phe-nomena that occur frequently in typed human-computer interaction correctly and eciently, so that the user does not feel constrained or restricted when using the interface. The trade-o implies that the

(5)

interface should not waste e ort on complex computations in order to handle irrelevant or rare phenomena. For instance, the system need not be able to handle features such as jokes or surprise, when these are not demanded by the purposes of the system.

On these grounds we have adopted a sublanguage approach (Grish-man & Kittredge, 1986) to dialogue systems of this kind. All aspects of linguistic communication, including interaction patterns and the use of indexical language is assumed to depend on the application and domain. For this reason the interface system must be designed to facilitate customization to meet the needs of di erent applications. Moreover, we must nd methods that allow us to determine what the needs of a given application are. Such a method, based on Wizard of Oz-simulations, will be outlined below.

The rest of this paper is organized as follows. The next section briey describes our system and applications, and the ways in which the system can be customized. The following section presents ways of meeting the requirements listed above as they apply to dialogue behaviour. In the third and nal section we discuss our solutions from a more general perspective and propose three design principles relating to the notion of sublanguage, the quantity of information, and asymmetries between users and systems.

2 The

linlin

model

The natural language interfacelinlin(Ahrenberg, J onsson, & Dahlb ack,

1990 J onsson, 1991, 1993a) is designed to facilitate customization to various applications. Dialogue in linlin is modelled using dialogue

objects which represent speech acts and speech act sequences. The dialogue objects are structured in terms of parameters that represent their properties and relations. A dialogue manager, which is the ma-jor controlling module of the system, records instances of dialogue objects as nodes of a dialogue tree as the interaction proceeds. The dialogue tree constitutes the global context of the dialogue.

The dialogue objects are divided into three main classes on the basis of structural complexity. There is one class corresponding to the size of a dialogue and another corresponding to the size of a dis-course segment (cf. Grosz & Sidner, 1986). An initiative-response (IR) structure is assumed (cf. adjacency-pairs Scheglo & Sacks, 1973) where an initiative opens a segment by introducing a new goal and the response closes the segment (Dahlb ack, 1991). The third class corresponds to the size of a single speech act, or dialogue move. Thus, a dialogue is structured in terms of discourse segments, and a discourse segment in terms of moves and embedded segments. Utter-ances are not analysed as dialogue objects, but as linguistic objects which function as vehicles of one or more moves.1

1The use of three levels for the hierarchical structuring of the dialogue is

(6)

2.1 Customization

Dialogue objects have been determined for three di erent applica-tions on the basis of a corpus of 30 dialogues collected in Wizard of Oz-experiments (cf. Dahlb ack, J onsson, & Ahrenberg, 1993 Fraser & Gilbert, 1991). 10 dialogues were used for each application. In one application, bildata, the system is a database providing

informa-tion on properties of second-hand cars. The other two applicainforma-tions are both concerned with the travel domain. In one of them, tra v-el1, users can only gather information on charter trips to the Greek

Archipelago, while in the other, travel2, they can also order such

a charter trip.

For the purpose of customization, two kinds of information can be obtained from a corpus:

First, it can be used as a source of phenomena which the de-signer of the natural language interface was not aware of from the beginning, e.g. in the travel1 information system some

users unexpectedly tried to make orders in spite of the fact that the system does not support it.

Second, it can be used to rule out phenomena that do not occur in the corpus, especially those requiring sophisticated reasoning to be handled correctly.

Another matter is whether the corpus should serve as the only resource for determining what to include in the system, or whether the corpus data needs to be augmented somehow. The rst, rather extreme stand is taken by Kelley (1983) who proposes a method, the User-Derived Interface (UDI), for acquiring the lexical and grammat-ical knowledge of a natural language interface in six steps. The rst two steps are mainly concerned with determining and implementing essential features of the application. In the third step, known as the rst Wizard of Oz-step, the subject interacts with what they believe is a natural language interface but which in fact is a human simulat-ing such an interface. This provides data that are used to build a rst version of the interface (step four). Kelley starts without grammar or lexicon. The rules and lexical entries are those used by the users during the simulation. In step ve, Kelley improves his interface by conducting new Wizard of Oz simulations, this time with the inter-face running. However, when the user/subject enters a query that the system cannot handle, the wizard takes over and produces an appropriate response. The advantage is that the user's interaction is not interrupted and a more realistic dialogue is thus obtained. This interaction is logged and in step six the system is updated to be able to handle the situations where the wizard responded.

linlin is customized to a speci c application using a process

in-spired by the method of User-Derived Interfaces. However, there is

(7)

a drawback to Kelley's method as a very large corpus is needed for coverage of the possible actions taken by a potential user (cf. Ogden, 1988).

Our approach is thus less extreme. If a phenomenon is present in the corpus then it should be included. If it is not present, but occurs in other studies using similar background systems and scenarios and implementation is straightforward, the system should be customized to deal with it. Otherwise, if it is not present and it would increase the complexity of the system, it is not included. Knowledge from other sources is also employed (cf. Grishman, Hirshman, & Nhan, 1986). In the customization of linlin for the bildata and travel

systems, knowledge on how the database is organised and also how users retrieve information from databases is needed.

3 Meeting requirements

In this section we discuss how the requirements listed in the intro-duction: habitability, eciency, robustness and transparency, can be satis ed, in particular as they apply to the dialogue behaviour of the system.

3.1 Habitability

A dialogue object consists of parameters that specify di erent kinds of information. The parameters and their values can be modi ed for each new application, although some parameters are likely to be needed often. Customization of the Dialogue Manager to meet re-quirements on habitability involves two major tasks:

De ning focal parameters of the dialogue objects and customiz-ing heuristic principles for changcustomiz-ing the values of these param-eters.

Constructing a dialogue grammar for controlling the dialogue, i.e. specify parameters that determine what actions to take in di erent situations.

Two focal parameters, used in all three applications, are Objects and Properties. They are focal in the sense that they can be in focus over a sequence of segments. Their basic function is to represent the information structure of a move. Objects identify, via description or enumeration, a set of primary referents, and Properties identify a complex predicate ascribed to this set (cf. Ahrenberg, 1987).

Two principles for maintaining the focus structure are utilized. A general heuristic principle is that everything not changed in an utterance is copied from one IR-node in the dialogue tree to the newly created IR-node. Another principle is that the value for Objects will be updated with the value from the module accessing the database, if provided.

(8)

Primary parameters for de ning the dialogue grammar are Type and Topic. Type represents the illocutionary force of a move. Hayes and Reddy (1983, p 266) identify two sub-goals in simple service systems: 1) \specify a parameter to the system" and 2) \obtain the speci cation of a parameter". Initiatives are categorized accordingly as being of two di erent types 1) update, U, where users provide information to the system and 2) question, Q, where users obtain information from the system. Responses are categorized as answer, A, for database answers from the system or answers to clari cation requests. Other Type categories are Greeting, Farewell and Discourse Continuation (DC) (Dahlb ack, 1991). The latter type is used for system utterances that signal to the user that it is her turn.

Topic describes which knowledge source to consult. In our database applications three di erent topics are used: the background system for solving a task (T), the database model for queries about system properties, (S) and, nally, the dialogue tree for clari cations relating to the interpretation of moves (D).

A number of other parameters describing speaker, hearer, utter-ance content and so on, are also used. Although they provide ad-ditional information for the dialogue manager, the structure of the dialogue is largely captured through the parameters Type and Topic.

3.1.1 The focus structure

In the bildata application, task-related questions are about cars.

Thus, the Objects parameter holds descriptions of, or explicit lists of cars while the Properties parameter, holds a set of car properties. In thetravelsystems, on the other hand, users switch their attention

between objects of di erent kinds: hotels, resorts and trips. This requires a more complex behaviour of the Objects parameter and the use of domain knowledge for focus tracking (J onsson, 1993b).

The general focusing principles need to be slightly modi ed to apply to the bildata and travel applications. For the bildata

application the heuristic principles apply well to the Objects param-eter. An intensionally speci ed object description provided in a user initiative will be replaced by the extensional speci cation provided by the module accessing the database. For the travelapplications

the principles for providing information to the Objects parameter are modi ed to allow hotels to be added if the resort remains the same.

For thebildataapplication the heuristic principles for the

Prop-erties parameter need to be modi ed. The modi cation is that if the user does not add new cars to Objects, then the attributes provided in the new user initiative are added to the old set of attributes. This is based on the observation that users often start with a rather large set of cars and compare them by gradually adding restrictions (cf. Kaplan, 1983), for instance using utterances like Remove all small-sized cars. We call such responses cumulative as they summarize all information obtained in a sequence of questions. For the travel

(9)

Statistics on focusing heuristics

bildata travel1 travel2

Fully speci ed initiatives 52% 48% 47%

Initiatives requiring local context 40% 42% 45%

Initiatives requiring global context 4% 2% 5%

Table 1: User-initiatives classi ed according to context-dependence. applications the copy principle holds without exception.

The results from the customizations showed that the heuristic principles worked well, minimizing the need to search the global con-text for referents of indexical constructions. See Table 1.

In the travel2system there is one more object the order form.

A holiday trip is not fully de ned by specifying a hotel at a resort. It also requires information concerning the actual trip: length of stay, departure date and so on. The order form is lled with user informa-tion during a system controlled phase of the dialogue.

3.1.2 The dialogue structure

The dialogue structure parameters Type and Topic also require cus-tomization. In thebildatasystem the users never update the database

with new information, but in thetravel2system where ordering is

allowed the users update the order form. Here another Type is need-ed, CONF, which is used to close an ordering session by summarizing the order and implicitly prompt for con rmation. For the ordering phase the Topic parameter O for order is added, which means that the utterance a ects the order form.

The resulting grammars from the customizations of all systems are quite simple. The most common segment consists of a task-related initiative followed by an answer from the database, QT

=A T

2,

some-times with an embedded clari cation sequence, QD =A

D. In

bildata

60% of the initiatives start segments of this type. Fortravel83% of

the initiatives in the non-ordering dialogues and 70% of the ordering dialogues are of this type. Other task related initiatives result in a response providing system information, QT

=A

S, or a response stating

that the intitiative was too vague, QT =A

D. There are also a number

of explicit calls for system information, QS =A

S. See Table 2 for a

summary of the statistics on dialogue structure.

In the work by Kelley on lexical and grammatical acquisition, the customization process was saturated after a certain number of dia-logues. The results presented here indicate that this is the case also

2For brevity, when presenting the dialogue grammar, the Topic of a move is

indicated as a subscript to the Type. Labels of IR-segments have the form of a pair of move labels separated by a slash (/).

(10)

Statistics on dialogue structure

bildata travel1 travel2

Number of rules 15 12 14 QT =A T 60% 83% 70% QD =A D 12% 2% 2% QS =A S 9% 2% 2% QT =A D 7% 2% 3% QT =A S 5% 6% 4% Ordering rules - 1% 17%

Others, e.g. greetings, farewells 7% 4% 2%

Table 2: Types of dialogue segments and their relative frequency in three di erent applications.

for the dialogue structure of our applications. From a rather limit-ed number of dialogues, a context free grammar can be constructlimit-ed which, with a few generalizations, will cover the interaction patterns occurring in the actual application (J onsson, 1993a).

The IR-sequences found in the analysis of dialogue structure have a natural explanation if we consider the purpose of the system. Al-though the dialogue objects do not represent information on user's goals, it turns out that the user utterances can be classi ed into a few classes in goal-related terms. The segments can basically be di-vided into four classes, taking the user's initiative as the basis for the classi cation: (i) "proper" information requests that are satis ed by an answer with information from the database, (ii) successful queries about system properties, (iii) successful moves satisfying subordinate goals, such as greetings or discourse continuations (iv) initiatives that transgress the system's knowledge and which require robust er-ror handling.

Note that we can view the taxonomy as a preference order. There is a basic division between the simple, normal segment (QT

=A T with

two moves) and the other segments, which in one way or another indicates that we have left the normal smooth interaction, and do something subordinate to the main purpose. From the point of view of eciency the interaction is optimal when it consists of a sequence of QT

=A

T-segments. If the user resorts to a QS, there is something

about the system that she isn't aware of, but which he could be aware of. The third type is also of a subordinate, instrumental charac-ter, but really unnecessary for an experienced user. Uninterpretable moves are obviously the least preferred ones.

(11)

3.2 Optimal responses in normal mode

If the system determines that a certain user initiative is a complete and successful QT, the only remaining task is to derive an appropriate

SQL-query and respond with the tuples that are returned from the database. One problem that the system can encounter in this process is ambiguity: a certain word or expression may be translated to the query language in more than one way. For instance, in the bildata

database there is information on acceleration and top speed, so it is not clear what a user has in mind if he asks whether a certain car make is fast (Sw. snabb), or requests to see a list of fast cars. Similarly, the adjective rymlig (Eng. spacious) may pertain to the booth or the space inside the car. In this case, one option is to enter into a clari cation sub-dialog. However, this takes time and it may be more swift to provide information on all aspects that are potentially relevant. The user may easily ignore information, if he is not interested.

A similar strategy is used if a question is about the same set of cars as a previous question (cf. 3.1.1). Then the information requested in the second question is added to that of the rst. Some subjects actually mentioned this as a good feature of the system in their comments, as it facilitates comparisons and evaluations to have all relevant information in a single table.

In both of these cases the system provides more information to the user than she has actually asked for. Some other cases where this happens are described in the following section.

3.3 Communication of meta-knowledge

As regards the second type of user initiative, the system queries, there is rst the problem of distinguishing them from the task-oriented queries. Descriptively, these queries have a di erent topic. The topic is recognized from the major constituents of the query, in particular main verbs and their complements.

Frequently the user starts a session by asking a question such as Vilka bilar nns? (What makes are there?) or Finns det priser pa bade nya och begagnade bilar? (Can you provide prices on both new and second-hand cars?) These questions are currently all answered by a pre-stored message giving a brief description of the contents of the database. This text usually gives more information than the user has asked for, and, as with the second example above, provides an answer only implicitly. However, it would easily be possible to make the responses more ne-grained given a sucient empirical basis. An-other option would be to display this text in a separate window on the screen so that the question need not arise as part of the dialogue. The most common type of system query in the corpus concerns how information should be interpreted, e.g. Forklara sirorna for rostbenagenhet. (Explain the numbers indicating susceptibility to

(12)

rust). These are recognized on the basis of the occurrence of a predi-cate that indipredi-cates that the interpretation of a certain value, or type of value, is at stake. The answers are then not obtained from the bil-datadatabase, but from a special le forming a part of the database

model. Thus, there is one text informing on how rust is evaluated and what the various numbers in the rust column means, which is used as a response to all questions of type S relating to rust. It should be noted that the strategy of answering system queries by pre-stored standard answers has the e ect that the analysis of a system question need not be as detailed as the analysis of a task-oriented question, thus contributing to eciency.

3.4 Robustness

Although the model o ers a general way to support robustness, i.e. that of engaging in a clari cation sub-dialogue, this way is not always the best one. Many problems cannot be diagnosed correctly by the system, and other problems, such as mis-spellings, can be regarded as too small to warrant a clari cation sub-dialog. Moreover, the model is restricted to information signalled verbally as part of a NL-dialog. But the system may communicate with the user by other means, e.g. in another window (pop-up window) or by non-verbal signals in the dialogue window. The purpose of this extra channel would be to give instructions and hints that enable the user to stay within the bounds of what the system allows with as little disturbance and time loss as possible. That is, it can be argued that it is preferable to solve problems by giving signals to the user that he need not respond to verbally, only interpret.

Below we discuss several types of transgressions that the user can make and discuss means to handle them. They relate to the lexicon, the sentence grammar, the dialogue grammar and the domain model, respectively.

It is common that the user enters a word that has no analysis in the lexicon. Considering the requirements on eciency it is impor-tant that the transgression is detected as soon as possible. If lexical processing does not start until the user enters a carriage return, a lot of time may be wasted. The best one can achieve in this regard is to detect the mistake as soon as the user has pressed a character key that results in a substring with no continuation in the lexicon. As our current lexicon does not support error detection at this level, we go for the second best alternative, i.e. to detect errors when a word sep-arator has been encountered. The error is presently signalled to the user by changing the font of the echoed input. There is no diagnosis performed by the system this is a task given over to the user on the assumption that he knows better than the system what could be the trouble.3 We believe that a non-verbal signal is more adequate than 3Of course, this need not always be the case and we are currently looking into

(13)

How-a verbHow-al messHow-age. The importHow-ant thing is to mHow-ake the user How-awHow-are of the lexical transgression while he is still engaged in writing.

Grammatical transgressions are handled similarly to lexical trans-gressions paying due regard to early detection. How soon you can detect a grammatical error depends on the parsing technique you use. To increase eciency we are using a left-to-right, incremental chart-parser, which means that parsing starts as soon as the user en-ters some input (Wiren, 1992 Wiren & R onnquist, 1993). The time interval between the user's pressing the carriage return key and the moment when parsing is nished is thus reduced noticeably. For an input to receive a grammatical analysis it is necessary that every word of the input receives an analysis as part of some phrase, possibly as a direct descendant of the category spanning the whole input. Thus, if a word is entered that cannot combine with active edges to its left, nor yield a phrase that can combine with those edges (top-down l-tering is used), then the system knows that no complete parse can be obtained. Thus, many grammatical transgressions can be detected fairly quickly. Currently there is no diagnosis but the transgression is signalled to the user as soon as it is detected by inverting the dialogue window. This shows the user that the input cannot be in-terpreted, although it does not tell him what is at fault. We are contemplating giving information of the following kinds: information about expressions that could follow strings that are left unextended at the relevant vertex, and lists of example sentences that are some-how similar to what the user has written, e.g. that contain the same initial words.

Dialogue grammar transgressions occur when the user enters some-thing which the system can only interpret as a move which is out of place. For example, the user may interpret a system message as a question and enter "Yes", while the system intended it as a discourse continuation and expected a question. The transgression is detected when the dialogue manager tries to connect the user move with the current dialogue tree. It is signalled to the user by an explicit error message, which entails a super cial analysis of the error and brings recovery with it. The system takes no notice of the error, however, and does not enter the segment in the dialogue tree. This means that the segment is treated as if it had never occurred for the purpose of focus tracking and expectations, i.e. in the same fashion as a lexical or grammatical transgression. The dialogue continues in the same state as before the transgressive move.

Another type of transgression is when the user requests informa-tion that is not in the database, although the user seems to believe so. Some examples in thebildatadomain concern the colour of cars,

or variation in the number of doors. This type of error is detected by using information in the database model, which also should represent

ever, under current circumstances, the user is better equipped to handle lexical transgressions.

(14)

common knowledge of the domain e.g. properties of cars that people like to inquire about. These properties can be added to the database model as they are discovered and be marked there as not correspond-ing to anythcorrespond-ing in the database. Thus, these transgressions are easily detected and diagnosed. The response is an AS-move and is taken to

be part of the dialogue, as the user is likely to refer to objects and properties mentioned in his request afterwards.

A question may refer to a set of properties, or to several objects, only one of which is not in the database. The general strategy is then to lter out those properties and objects that there is no information on, but supply an answer for those that are correct. In addition to the retrieved table, an error message is included informing about the transgression. This is in line with the preference ordering of IR-segments: we prefer database information to be exchanged as much as possible.

4 A set of principles for design

In this section we ask whether the solutions to the speci c problems that we have discussed, conform to some general design principles. In-deed, we believe that they do. However, allowing users to participate in a coherent dialogue does not in itself guarantee that the require-ments discussed above on habitability, robustness, transparency and eciency are properly addressed. To do this we need to consider the overall design of the system.

4.1 The Sublanguage Principle

The language used in an interface dialogue is a ected by several fac-tors, e.g. that one participant is a human and the other a computer system, that the human partner is involved in a speci c (work) task and that this task is related to a particular knowledge domain.

With reference to the well-known work on sublanguages (cf. Grish-man & Kittredge, 1986) we refer to this guide-line as

The

Sublan-guage Principle:

Restrict and adapt the linguistic and general knowl-edge of the system to that which is needed to support the users' tasks. Apart from the general conclusion that we have a more restrict-ed development task to cope with than full NL-understanding and -interaction, we also see it as a virtue to restrict the linguistic and con-ceptual knowledge of the system to that which is needed for the task at hand, since the interpretations that the system makes of the user's utterances can be fewer (less ambiguity) and more speci c (domain-speci c meanings and categorizations can be used).

The Sublanguage Principle raises the question of what the nec-essary requirements are. All aspects of linguistic communication, including interaction patterns and the use of indexical language are assumed to depend on the application and domain. For this rea-son the interface system must be designed to facilitate customization

(15)

to meet the needs of di erent applications. Moreover, we must nd methods that allow us to determine what the needs of a given ap-plication are. Such a method, is the Wizard of Oz-simulations. Our results so far indicate that for simple service systems the answers can be obtained with reasonable e ort (J onsson, 1993a).

We also take The Sublanguage Principle to imply that the seman-tics of the language used is jointly provided by the background system and the dialogue acts that are meaningful to perform in the context of the application. Di erent applications put di erent demands on the dialogue acts. For instance, the travel2 application needs to

handle orders, which normally is not required of applications where ordering is not allowed. However, the simulated dialogues showed that some users in the travel1 application also tried to place an

order, although the background system did not support that. Conse-quently, the interface needed to be equipped with mechanisms for (i) recognizing orders, and (ii) explaining the limits of this application to the subjects as a response. Thus, customization and empirical inves-tigations are necessary to make The Sublanguage Principle support habitability in practice.

Similarly, the requirements on focus structure can vary from one application to another. In the cars dialogues we have found that

virtually all ellided plural NPs and plural third person pronouns and demonstratives such as them and those refer to a set of cars focused in the previous segment. This is a fact that we have exploited in the interpreter of thecars system.

4.2 The Asymmetry Principle

An obvious, but nevertheless important observation on NLIs and cur-rent NLP technology, is that the user and system have di ecur-rent abil-ities and knowledge. We note especially the following interesting asymmetries between the two participants of a NLI dialogue (c.f. Dahlb ack, 1987 Dahlb ack & H agglund, 1987 J onsson & Dahlb ack, 1988):

The user reads and understands natural language at good speed moreover she is able to select information on the basis of rele-vance. In contrast, understanding is a hard task for the com-puter, not to talk about the task of distinguishing important from irrelevant information

The user is often slow at writing on the terminal. In contrast, the interface can display a lot of information on the screen in virtually no time at all.

The system's knowledge of the standard language and its com-mon sense knowledge of the world is only partial, while the user's knowledge is much richer. This would be true as far as

(16)

language is concerned, even if we did not adhere to the sub-language principle. Moreover, it is dicult to make the system adapt to users with di erent linguistic and knowledge skills, while users generally have this capacity to adapt to the compe-tence of others.

The guide-line that can be derived from these observations is sim-ply that you should look for solutions that takes the strong and weak points of the two sides into consideration. More formally stated, we have

The Asymmetry Principle:

Design the interface in a way that exploits the relative strengths and circumvents the relative weaknesses of the two participants.

This principle must be used with some care, of course, and we must recognize that technical developments may make NLIs more powerful than they are today. For the present we generally want to minimize the inference tasks that the system has to cope with, such as diagnosing users' errors, inferring accurate representations of the user's goals and beliefs, instead relying on simpler means that clearly reveal the capabilities and limitations of the system, e.g. by providing examples of what the system can understand or by displaying help messages.

Another important consequence of this principle is that if there is a choice, prefer solutions that make the user learn from system contributions to solutions that require the system to learn from the user's contributions.

Answers are pre-stored and retrieved on the basis of the semantic representation of the query. The answer usually gives more informa-tion than the user has asked for (c.f. The Quantity Principle below). Another option would be to display this information in a separate window on the screen so that the question need not arise as part of the dialogue.

As previously mentioned, the most common type of query relating to system properties in the corpus concerns how displayed informa-tion should be interpreted, e.g.:

User

Explain the numbers indicating susceptibility to rust.

System

The evaluation of rust susceptibility is based on information given by car owners as regards the corrosion of their cars. The ranking has been based on the following classes:

1. Much worse than the average car, 2. Worse than the average car, 3. Average

4. Better than the average car, 5. Much better than the average car.

The pre-stored messages presented from the system are often more complicated than one would expect that the parsing modules are able

(17)

to interpret. The design recommendations for natural language inter-faces presented by (Ogden, 1988) suggest that the feedback provided by the system must be allowable as input expressions as there are studies showing that users often echo the form of the feedback (cf. Zoltan-Ford, 1984, 1991). However, the system performs a di erent communicative act when informing the user about properties of the system than when paraphrasing a user question, a kind of act that the user herself never performs. Consequently, users do not try to mimic the system's rather verbose messages explaining properties of the system when retrieving information from the background system. This observation is not only true for this kind of system messages, but seem to generalize to other kinds of pre-stored answers that stylis-tically deviates from the rest of the dialogue. We have for instance found that the users of thetravel systems treat the long stretches

of canned text with travel agency resort information di erently from other system utterances, by rarely referring to contents of parts of the text using pronouns or other indexical expressions. They are, in a sense, encapsulated within the dialogue (Dahlb ack, 1991).

Another application of The Asymmetry Principle is the use of system initiatives to support eciency. If the system does not only respond to users' inputs, but also occasionally takes the initiative, the interaction can be brought to an end quicker. Also, the interpretation of user utterances becomes simpler. Although users respond to a system initiative using a sentence fragment, for instance if the system asks How long do you want to stay? a typical user answer could be the fragment Two weeks. The interpretation of such a fragment is computationally easy to handle as the system knows what type of user input to expect. System initiatives also supports transparency as it tells the user what information is considered important in a particular application.

System responses and initiatives need not be restricted to the verbal channel. The system may communicate with the user by other means, e.g. in another window (pop-up window) or by non-verbal signals in the dialogue window. For example, in the cars system,

mis-spellings are signaled to the user by echoing input in a di erent font than normally. Generally speaking, the purpose of the extra channel is to give instructions and hints that enable the user to stay within the bounds of what the system allows with as little disturbance and time loss as possible. That is, it can be argued that it is preferable to solve problems by giving signals to the user that he need not respond to verbally, only interpret.

The asymmetry principle also implies that the way of expressing messages need not be the same for both participants. Answering a query with tables instead of with written text is simple to achieve for a computer, and is often preferred to a textual description of the contents of the table. The users of our cars-system, where such an

interaction was used, gave a very positive evaluation of this way of presenting the information, presumably since it is easier to read, and

(18)

makes the decision of the user, e.g. selecting the optimal solution based on the integration of the values on a number of parameters, simpler.

4.3 The Quantity Principle

A frequent interpretation problem in natural-language interfaces is ambiguity: a certain word or expression may be translated to the query language in more than one way. For instance, in the cars

database there is information on acceleration and top speed, so it is not clear what a user has in mind if she asks whether a certain car make is fast, or requests to see a list of fast cars. Similarly, the adjective spacious may pertain to the booth or the space inside the car. In this case, one option is to enter into a clari cation sub-dialog. However, this takes time and it may be more swift to provide information on all aspects that are potentially relevant. The user may easily ignore information, if she is not interested.

A similar strategy is used if a question is about the same set of cars as a previous question. Then the information requested in the second question is added to that of the rst. Although this obviously deviates from how a question is normally answered in a human conversation, some subjects actually mentioned this as a good feature of the system in their comments, as it facilitates comparisons and evaluations to have all relevant information in a single table, instead of having to integrate information from two displays, or having to memorize the contents of the previous answer.

These particular solutions are motivated by another design prin-ciple, which we term

The Quantity Principle:

The system may give more information to the user than has actually been requested pro-vided it is potentially relevant and does not overload the user. Another case where the principle was applied was in the design of answers to queries about system properties, as illustrated in the previous section. The principle can be motivated from what we know about the aver-age user: she has the ability to select (within limits, of course) what is relevant for her. The situation is similar to the one for information in tabular form, selection does not require excessive reading.

The Quantity Principle generally increases speed and supports robustness, while there may be a conict with transparency, if the principle is not applied with care. But, as mentioned previously, users seem to be able to distinguish between di erent kinds of system ut-terances, and do not expect the system to accept as input everything it produces as output. We also note that it seemingly contradicts Grice's (Grice, 1975) second maxim of quantity: \Do not make your contribution more informative than is required". But then, Grice did not have human-computer dialogue in mind when he stated it, and, in fact, suggested that it could be subordinated to the maxim of relevance.

(19)

exam-ple where strategies are used that conform to The Quantity Princi-ple. genie analyses user goals to produce not only accurate answers

to a user query, but also to introduce new information related to the answer, for instance, to the query How can I send a message to someone? the system does not only answer with information on how to send e-mail to one person but also adds Also, you can send mail to a group. To specify a group, separate the addresses with commas. (ibid. p. 168)

5 Generalization to other kinds of interfaces?

All the examples in the paper have been taken from one kind of interface and two kinds of system, teletype NLIs in conjunction with DB information retrieval or information retrieval and ordering. The obvious question is to what extent the suggested principles apply to other kinds of man-machine interaction.

Looking rst at other kinds of systems using typed interaction, we believe that the principles apply here too, though their realization will take di erent forms. The interface for the sherlock tutoring

sys-tem (Lemaire & Moore, 1994) provides an illustration of this. They found in their analysis of human student-tutor interactions that tu-tors frequently refer to what they have said in a previous explanation. Another observation was that students frequently ask questions that refer to the tutor's previous explanations, mainly to request compar-isons between the current and previous situation. Deciding not to mimic the human-human interaction, but instead exploit the unique attributes of the computer to make use of the dialogue history, the system cannot only refer to previous explanations but allow the user to visualize what has been said. We take this as an excellent appli-cation of The Asymmetry Principle in a domain di erent from ours. Concerning spoken instead of written interaction, we believe that The Sublanguage Principle and The Asymmetry Principle are as valid for speech interfaces as they are for teletype interfaces. However, for speech output The Quantity Principle cannot be applied, for the sim-ple reason that the user cannot receive and lter information with the same ease in the auditory channel as in the visual channel. However, when the screen is used to output text or graphics to the user, the principle applies with the same force.

Multi-modal interfaces provide means to utilize the Asymme-try and Quantity principles further, for instance using pop-up win-dows to communicate system information and hints, spoken output of error messages, text or graphics output combined with speech in-put etc. Examples of systems which use a variety of modalities for both interpretation and generation include AlFresco (Stock, 1991), XTRA (Wahlster, 1991), Voyager (Zue, 1994) and cubricon (Neal

& Shapiro, 1991).

(20)

con-ventional natural language interfaces is their ability to use a com-bination of input and output modalities such as speech, graphics, pointing and video output. Thus, more advanced interpretation and generation modules are required and principles for determining which media to utilize are needed (Arens, Hovy, & Vossers, 1993).

However, the dialogue and focus structures need not necessarily be more complicated. For instance, Voyager successfully utilizes the local context approach presented here (Sene , 1992). Bilange (1991) presents a dialogue grammar for dialogue management for telephone interaction with an information retrieval application. Sitter and Stein (1992) present a model based on possible sequences of dialogue acts which are modeled in a transition network. In Stein and Maier (1993), Stein and Thiel (1993) that model is extended to handle multi-modal interaction as utilized in the MERIT system (Stein, Thiel, & Ti en, 1992). Traum and Hinkelman (1992) present a nite state machine for recognizing discourse units units comprising more than a single utterance but which conveys one mutually understood intent. The conversation acts proposed by Traum and Hinkelman (1992) cover discourse obligations and are to be used in conjunction with beliefs and desires (cf. Traum & Allen, 1994) in a task-oriented spoken dialogue system. Another system is the Vehicle Navigation System (Novick & Sutton, 1994) where users can receive driving directions a step at a time by cellular telephone. This type of interaction requires a system capable of recognizing various acknowledgment acts and determining their applicability. A grammar based model for this is de ned where exchanges are interpreted in terms of speech acts.

Thus, it seems that for simple service systems, the dialogue model presented here will be sucient. However, for task-oriented dialogues, where the user's task directs the dialogue (Loo & Bego, 1993), a model of this and the user's goals need to be consulted in order to provide user-friendly interaction (cf. Burger & Marshall, 1993). This does not imply the necessity of a sophisticated model based on the user's intentions. A hierarchical structure of plans based on the various tasks possible to carry out in the domain might do just as well (cf. Wahlster, Andre, Finkler, Pro tlich, & Th, 1993).

6 Conclusion

The proposed method of customization and the design principles should be regarded as tentative as no real users have actually put their hands on our system. Yet, taken together, the pragmatic ob-servations on NLIs and the data from NLI dialogues that we have analysed suggest to us that many features, e.g. user models and gen-eral text inference components, that have been argued are important for good functionality of NLIs are not generally required for simple service applications. Instead our principles take us in another direc-tion. Rather than modelling the user we prefer the system to present

(21)

a good model of itself (The Asymmetry Principle) and rather than in-ferring interpretations we rely on application-speci c meanings (The Sublanguage Principle). To put it in a slogan-like phrase: if there is a choice, prefer global pragmatics at design time to local pragmatics at run-time.

Acknowledgements

This work results from a project on Dynamic Natural-Language Un-derstanding supported by The Swedish Council of Research in the Humanities and Social Sciences (HSFR) and The Swedish National Board for Industrial and Technical Development (NUTEK) in their joint Research Program for Language Technology. We are indebted to the other members ofNLPLABfor many valuable discussions on

the topics of this paper.

References

Ahrenberg, L. (1987). Interrogative Structures of Swedish. Aspects of the Relation between grammar and speech acts. Ph.D. thesis, Uppsala University.

Ahrenberg, L., J onsson, A., & Dahlb ack, N. (1990). Discourse rep-resentation and discourse management for natural language in-terfaces. In Proceedings of the Second Nordic Conference on Text Comprehension in Man and Machine, Taby, Sweden. Arens, Y., Hovy, E., & Vossers, M. (1993). On the knowledge

un-derlying multimedia presentations. In Maybury, M. T. (Ed.), Intelligent Multimedia Interfaces, pp. 280{306. MITPress. Bilange, E. (1991). A task independent oral dialogue model. In

Proceedings of the Fifth Conference of the European Chapter of the Association for Computational Linguistics, Berlin.

Burger, J. D. & Marshall, R. J. . (1993). The application of nat-ural language models to intelligent multimedia. In Maybury, M. T. (Ed.), Intelligent Multimedia Interfaces, pp. 174 { 196. MITPress.

Dahlb ack, N. & J onsson, A. (1992). An empirically based com-putationally tractable dialogue model. In Proceedings of the Fourteenth Annual Meeting of The Cognitive Science Society, Bloomington, Indiana.

Dahlb ack, N. (1987). Kommunikation med datorer i naturligt spr"ak -vad ar det och vem beh over det?. Tech. rep. LITH-IDA-R-87-15, Department of Computer and Information Science, Link oping University.

(22)

Dahlb ack, N. (1991). Representations of Discourse, Cognitive and Computational Aspects. Ph.D. thesis, Link oping University. Dahlb ack, N. & H agglund, S. (1987). M anniska och

datorsys-tem i samverkan. Tech. rep. MDA-rapport 1987:16, Ar-betsmilj ofonden and Styrelsen f or teknisk utveckling.

Dahlb ack, N., J onsson, A., & Ahrenberg, L. (1993). Wizard of oz studies { why and how. Knowledge-Based Systems, 6(4), 258{ 266.

Fraser, N. & Gilbert, N. S. (1991). Simulating speech systems. Com-puter Speech and Language, 5, 81{99.

Grice, P. H. (1975). Logic and conversation. In Cole, P. & Mor-gan, J. L. (Eds.), Syntax and Semantics (vol. 3) Speech Acts. Academic Press.

Grishman, R. & Kittredge, R. I. (1986). Analysing language in re-stricted domains. Lawrence Erlbaum.

Grishman, R., Hirshman, L., & Nhan, N. T. (1986). Discovery proce-dures for sublanguage selectional patterns: initial experiments. Computational Linguistics, 12(3), 205{215.

Grosz, B. J. & Sidner, C. L. (1986). Attention, intention and the structure of discourse. Computational Linguistics, 12(3), 175{ 204.

Hayes, P. J. & Reddy, D. R. (1983). Steps toward graceful interaction in spoken and written man-machine communication. Interna-tional Journal of Man-Machine Studies, 19, 231{284.

Ingels, P. (1993). Robust parsing with charts and relaxation. In NODALIDA '93: Proceedings from 9:e Nordiska Datalingvis-tikdagarna, Stockholm, June 1993.

J onsson, A. (1991). A dialogue manager using initiative-response units and distributed control. In Proceedings of the Fifth Con-ference of the European Chapter of the Association for Compu-tational Linguistics, Berlin.

J onsson, A. (1993a). Dialogue Management for Natural Language Interfaces { An Empirical Approach. Ph.D. thesis, Link oping University.

J onsson, A. (1993b). A method for development of dialogue managers for natural language interfaces. In Proceedings of the Eleventh National Conference of Arti cial Intelligence, Washington DC, pp. 190{195.

(23)

J onsson, A. & Dahlb ack, N. (1988). Talking to a computer is not like talking to your best friend. In Proceedings of the First Scandinavian Conference on Arti cial Interlligence, Troms. Kaplan, S. J. (1983). Cooperative responses from a portable natural

language database query system. In Computational Models of Discourse, pp. 167{208. MIT Press.

Kelley, J. F. (1983). An empirical methodology for writing user-friendly natural language computer applications. In Proceedings of the CHI'83, pp. 193{196.

Lemaire, B. & Moore, J. (1994). An improved interface for tutorial dialogues: browsing a visual dialogue history. In Proceedings of CHI'94, Boston.

Loo, W. V. & Bego, H. (1993). Agent tasks and dialogue manage-ment. In Workshop on Pragmatics in Dialogue, The XIV:th Scandinavian Conference of Linguistics and the VIII:th Con-ference of Nordic and General Linguistics, Goteborg, Sweden. Neal, J. G. & Shapiro, S. C. (1991). Intelligent multi-media interface

technology. In Sullivan, J. W. & Tyler, S. W. (Eds.), Intelligent User Interfaces. ACM Press, Addison-Wesley.

Novick, D. G. & Sutton, S. (1994). An empirical model of acknowl-edgement for spoken-language systems. In Proceedings of the 32nd Conference of the Association for Computational Linguis-tics, New Mexico.

Ogden, W. C. (1988). Using natural language interfaces. In Helander, M. (Ed.), Handbook of Human-Computer Interaction. Elsevier Science Publishers B. V. (North Holland).

Scheglo , E. A. & Sacks, H. (1973). Opening up closings. Semiotica, 7, 289{327.

Sene , S. (1992). A relaxation method for understanding spontaneous speech utterances. In Paper presented at the Fifth DARPA Workshop on Speech and Natural Language.

Sitter, S. & Stein, A. (1992). Modeling the illocutionary aspects of information-seeking dialogues. Information Processing & Man-agement, 28(2), 165{180.

Stein, A. & Maier, E. (1993). Modeling and guiding cooperative mul-timodal dialogues. In Proceedings of the AAAI Fall Symposium '93 on Human-Computer Collaboration: Reconciling Theory, Synthesizing Practice, Raleigh, NC, U.S.A.

Stein, A. & Thiel, U. (1993). A conversational model of multi-modal interaction in information systems. In Proceedings of the

(24)

Eleventh National Conference of Arti cial Intelligence, Wash-ington DC, pp. 283 { 288.

Stein, A., Thiel, U., & Ti en, A. (1992). Knowledge based control of visual dialogues in information systems. In Proceedings of the 1st International Workshop on Advanced Visual Interfaces, Rome, Italy.

Stock, O. (1991). Natural language exploration of an information space: the alfresco interactive system. In Proceedings of the Twelfth International Joint Conference on Arti cial Intelli-gence, Sydney, Australia, pp. 972{978.

Traum, D. R. & Allen, J. F. (1994). Discourse obligations in dialogue processing. In Proceedings of the 32nd Conference of the Asso-ciation for Computational Linguistics, New Mexico, pp. 1{8. Traum, D. R. & Hinkelman, E. A. (1992). Conversation acts in

task-oriented spoken dialogue. Computational Intelligence, 8(3), 575{599.

Wahlster, W. (1991). User and discourse models for multimodal com-munication. In Sullivan, J. W. & Tyler, S. W. (Eds.), Intelligent User Interfaces. ACM Press, Addison-Wesley.

Wahlster, W., Andre, E., Finkler, W., Pro tlich, H.-J., & Rist, T. (1993). Plan-based integration of natural language and graphics generation. Arti cial Intelligence, 63, 387 { 427.

Watt, W. C. (1968). Habitability. American Documentation, July, 338{351.

Wiren, M. (1992). Studies in Incremental Natural Language Analysis. Ph.D. thesis, Link oping University.

Wiren, M. & R onnquist, R. (1993). Fully incremental chart-parsing. In Third International Workshop on Parsing Technologies, Tilburg, The Netherlands and Durbuy, Belgium.

Wolz, U. (1993). Providing opportunistic enrichment in customized on-line assistance. In Proceedings from the 1993 International Workshop on Intelligent User Interfaces, Orlando, Florida. Zoltan-Ford, E. (1984). Reducing variablity in natural -language

in-teractions with computers. In Proceedings of the Human Fac-tors Society 28th Annual Meting, Santa Monica, CA, pp. 768{ 772.

Zoltan-Ford, E. (1991). How to get people to say and type what com-puters can understand. International Journal of Man-Machine Studies, 34, 527{547.

(25)

Zue, V. W. (1994). Toward systems that understand spoken language. IEEE Expert, 9, 51{59.

Figure

Table 1: User-initiatives classied according to context-dependence.
Table 2: Types of dialogue segments and their relative frequency in three dierent applications.

References

Related documents

Vid analys av blodgruppskontroll med antikropps-screening och förenlighetsprövning användes reagens Anti-A samt Anti-B Seraclone® (Bio-Rad Medical Diagnostics GmbH,

First of all, we notice that in the Budget this year about 90 to 95- percent of all the reclamation appropriations contained in this bill are for the deyelopment

De konsumenter som av en eller annan anledning inte har möjlighet att ta sig till stormarknaderna kommer sedan att vara de som betalar för att de mindre butikerna finns kvar och som

The essay will argue that the two dystopian novels, Brave New World and The Giver, although having the opportunity, through their genre, to explore and

Restricted by the limitations of the current situation - relating to work from a distance and perceiving it through a screen - we will be exploring many aspects including

I don’t mind evolutionists claiming that they intend intentional language to be useful shorthand for selectionist considerations, but to what degree it is useful

Even if the focus of the study is on English outside of school, it was relevant to find out how the teaching of writing in English had been organised in the classes. For the

Brev och nyhetsartiklar, skrivna av 37 16-åriga elever analyseras för att undersöka skillnader i språkbruk mellan elever som ägnar stor del av sin fritid åt aktiviteter där