Dialogue model specification and interpretation for

Anuncio
Dialogue model specification and
interpretation for interacting
with service robots
Luis Pineda
and
The Golem group
[email protected]
http://leibniz.iimas.unam.mx/~luis
IIMAS, UNAM, México
September, 2009
The Golem project team
(current)
Luis A. Pineda
Héctor Avilés
Paty Pérez Pavón
Ivan Meza
Wendy Aguilar
Montserrat Alvarado
External collaborators
Students
Dr. Luis Pineda, IIMAS, UNAM, 2009
Content
Content
Introduction
A multimodal architecture for service robots
Dialogue models
Implementation
Applications:
– Tell me about the magazine!
– Guess the card!
– A guided tour with pointing acts
Conclusion and future work
Introduction
A multimodal architecture for service robots
Dialogue models
Implementation
Applications:
– Tell me about the magazine!
– Guess the card!
– A guided tour with pointing acts
Conclusion and future work
Dr. Luis Pineda, IIMAS, UNAM, 2009
Dr. Luis Pineda, IIMAS, UNAM, 2009
The Context
Main assumptions…
The interaction cycle must be tight!
– The agent needs to keep in touch with the world!
– Only reactive behavior?
Human-level communication (e.g. Coordinated
language and vision) needs to rely on symbolic
representations
The objects of representation are interpretations (not
the world directly!)
Perception is guided by intentions and memory
Interpretation depends on the context
Dr. Luis Pineda, IIMAS, UNAM, 2009
Every interpretation act is performed in context:
– A spatial and time situation (indexical)
– A set of agents (I, you…also indices)
– A discourse or interaction history (anaphoric)
– A conceptual domain
– A set of potential intention and action types that
can be expressed by the agents during the
interaction
The context is a big thing!
Can it be modelled explicitly in a simple way?
Dr. Luis Pineda, IIMAS, UNAM, 2009
Content
Agent’s expectations
Linguistic
expectations (communicative)
– What are the speech acts that can be
expressed by interlocutors in specific
conversational situations?
Environmental expectations (without intent)
– What events are expected to appear in the
situation that are perceived by the different
senses?
How to deal with the unexpected?
Introduction
A multimodal architecture for service robots
Dialogue models
Implementation
Applications:
– Tell me about the magazine!
– Guess the card!
– A guided tour with pointing acts
Future work
Dr. Luis Pineda, IIMAS, UNAM, 2009
Dr. Luis Pineda, IIMAS, UNAM, 2009
A three layers architecture
Representation
Representation
Interpretation
Interpretation
Recognition
Recognition
Dr. Luis Pineda, IIMAS, UNAM, 2009
Dr. Luis Pineda, IIMAS, UNAM, 2009
Representation
Representation
Interpretation
Interpretation
Perception
Recognition
Recognition
Dr. Luis Pineda, IIMAS, UNAM, 2009
Contextual knowledge
(modality independent)
+
Inference
Modality specific
or
combination
of specific modalities
Dr. Luis Pineda, IIMAS, UNAM, 2009
Modality specific images…
Representation
Interpretation
Expectations driven interpretation
Expectations
Memory
Recognition
Interpretation
The context
Memory
Recognition
Dr. Luis Pineda, IIMAS, UNAM, 2009
An interpretation act…
Expectations
An interpretation act…
Expectations
Contextual expectations
Interpretation
Dr. Luis Pineda, IIMAS, UNAM, 2009
Memory
Contextual expectations
Interpretation
Memory
Uninterpreted image!
Recognition
Recognition
Message/stimulus
Dr. Luis Pineda, IIMAS, UNAM, 2009
Memory is indexed by expectations!
Expectations
Dr. Luis Pineda, IIMAS, UNAM, 2009
Memory retrieval
Expectations
Contextual expectations (Indices)
Interpretation
Memory
Interpretation
Uninterpreted image!
Memory
A qualitative match
Recognition
Recognition
Dr. Luis Pineda, IIMAS, UNAM, 2009
Dr. Luis Pineda, IIMAS, UNAM, 2009
The elements of interpretation…
Expectations
Representation
Selected intention/expectation
Memory
Interpretation
Recognition
1) The stimulus
2) The expectations
3) Memory
Interpretation relative to
the context
Interpreted message/expectation!
Interpretation
Memory
Recognition
Dr. Luis Pineda, IIMAS, UNAM, 2009
The objects of representation
are interpretations!
Dr. Luis Pineda, IIMAS, UNAM, 2009
Interpretation Examples:
Expectations
Linguistic & visual
Interpretations
Interpretation
Representations of the world
Recognition (aspects of it)
Dr. Luis Pineda, IIMAS, UNAM, 2009
Linguistic Interpretation
Expectations
Spoken recognition…
Expectations
Expected intentions
Interpretation
Dr. Luis Pineda, IIMAS, UNAM, 2009
Expected intentions
Memory
Interpretation
Memory
Recognized text
Recognition
Recognition
The text is an uninterpreted image!
Spoken expression
Dr. Luis Pineda, IIMAS, UNAM, 2009
Dr. Luis Pineda, IIMAS, UNAM, 2009
The memory objects…
Expectations
The qualitative match…
Expectations
Expected Intentions (Indices)
Memory
Interpretation
Expected Intentions index
Regular Expressions
Memory
Interpretation
Text
Text
Recognition
Recognition
Expected. Inti --> Reg. Expi
Selected intentions are
those whose associated
Reg. Exp. is satisfied by
the text (i.e. grep)
Dr. Luis Pineda, IIMAS, UNAM, 2009
The output of the interpretation…
Dr. Luis Pineda, IIMAS, UNAM, 2009
The intended speech act!
Representation In relation to the context
Expectations
Selected intention (grounded)
Memory
Interpretation
Recognition
Interpreted spoken input!
Interpretation
Recognition
Dr. Luis Pineda, IIMAS, UNAM, 2009
Memory may not be required!
name-of-visitor(Peter)
Interpretation
Dr. Luis Pineda, IIMAS, UNAM, 2009
Visual Interpretation
Expectations
Expectations
name-of-visitor(X)
Memory
Semantic parser
Visual expectation
Interpretation
Memory
Recognition
Recognition
Concrete visual image
Hum, my name… I’m Peter
Dr. Luis Pineda, IIMAS, UNAM, 2009
Dr. Luis Pineda, IIMAS, UNAM, 2009
Spoken recognition…
Expectations
The memory objects…
Expectations
Visual expectations (Indices)
Visual expectation
Memory
Interpretation
Interpretation
Memory
SIFT features
SIFT features
Recognition
Recognition
Dr. Luis Pineda, IIMAS, UNAM, 2009
The qualitative match…
Expectations
Dr. Luis Pineda, IIMAS, UNAM, 2009
The qualitative match
Expectations
Expectations index images
Memory
Interpretation
Selected visual expectation
Interpretation
Memory
SIFT vector
Recognition
Selected expectations are
those whose associated
vector is the closest to the
vector of the external image
Dr. Luis Pineda, IIMAS, UNAM, 2009
Recognition
Dr. Luis Pineda, IIMAS, UNAM, 2009
Interpretation of the visual act
Representation In relation to the context
The full architecture:
Interpreted image (symbolic rep.)
Interpretation
Memory
Interpretation and Action
Recognition
Dr. Luis Pineda, IIMAS, UNAM, 2009
Dr. Luis Pineda, IIMAS, UNAM, 2009
The full architecture
Output actions
Representation
Representation
Interpretation
Memory
Specification
Specification
Modality specific
Recognition
Realization
Realization
Dr. Luis Pineda, IIMAS, UNAM, 2009
Dr. Luis Pineda, IIMAS, UNAM, 2009
Reactive behavior
Schematic behavior…
Representation
Representation
Interpretation
Memory
Recognition
Specification
Interpretation
Realization
Recognition
Memory
Realization
Dr. Luis Pineda, IIMAS, UNAM, 2009
Dr. Luis Pineda, IIMAS, UNAM, 2009
Content
Our current working setting…
Recognition
Memory
Introduction
A multimodal architecture for service robots
Dialogue models
Implementation
Applications:
– Tell me about the magazine!
– Guess the card!
– A guided tour with pointing acts
Conclusion and future work
Representation
Interpretation
Specification
Specification
Realization
Dr. Luis Pineda, IIMAS, UNAM, 2009
Dr. Luis Pineda, IIMAS, UNAM, 2009
Dialogue Models
Dialogue Models
Representation
Representation
Interpretation
Specification
Memory
Recognition
Realization
Dr. Luis Pineda, IIMAS, UNAM, 2009
Dr. Luis Pineda, IIMAS, UNAM, 2009
Interactive situation
The
The formalism:
Functional Recursive Transition
Networks (F(F-RTN)
conversational situation
– Information state
– Modality oriented
– Speech acts and expectations
– Rhetorical Acts: actions performed by
the agent (spoken, motor, etc.)
Relative to a context!
Dr. Luis Pineda, IIMAS, UNAM, 2009
Dr. Luis Pineda, IIMAS, UNAM, 2009
Conversational Situation
Conversational Situation
Input SpAct
si-1
SpAct:RhtAct
si
SpAct:RhtAct
si+1
si-1
SpAct:RhtAct
si
SpAct:RhtAct
si+1
Only a specific set of intentions is
meaningful
Dr. Luis Pineda, IIMAS, UNAM, 2009
Dr. Luis Pineda, IIMAS, UNAM, 2009
Intention types
Conversational Situation
Listening
Visual
Telling
Recursive
Error
Final
SpAct:RhtAct
si-1
SpAct:RhtAct
si
si+1
RhtAct
Dr. Luis Pineda, IIMAS, UNAM, 2009
Dr. Luis Pineda, IIMAS, UNAM, 2009
Rhetorical acts (Actions)
Structure:
– Lists of basic acts
– Basic Acts: Modality specific actions
Function:
– Direct external behavior
– Can direct internal behavior (embedded in
the interpretation state): Razoning,
planning, theorem-proving, problemsolving
The Model
Conversational
model:
– Set of dialogue models
Dialogue model:
– Set of situations
– Set of rhetorical acts
Declarative specification from a task
analysis
Dr. Luis Pineda, IIMAS, UNAM, 2009
Dr. Luis Pineda, IIMAS, UNAM, 2009
A Dialogue Model
A Recursive Situation
rs 1
is
ε:ra1(tour)
ls 2
pr:ra4(pr)
rs 2
pl:ra 4(pl)
ε :ra5(pr)
ls 3
ε :ra5(ai)
ai:ra4(ai)
ε:ra1(tour)
ls 2
ε:ra5(pl)
pr:ra4(pr)
rs 2
pl:ra 4(pl)
ok:ra2([ai, pr, pl])
ls 1
rs 1
is
ε :ra5(ai)
ai:ra4(ai)
ε :ra5(pr)
ls 3
ε:ra5(pl)
ok:ra2([ai, pr, pl])
no:ra 3(tour)
no:ra 3(tour)
rs 3
ok:w
no:ra 3(tour)
no:ra 3(tour)
no:ra 3(tour)
fs
ls 1
ls 2
fs
Dr. Luis Pineda, IIMAS, UNAM, 2009
rs 3
ok:w
no:ra 3(tour)
ls 2
Dr. Luis Pineda, IIMAS, UNAM, 2009
Modality independent
specification
A subordinated DM
ts 1
per:ra2(per)
ls1
is
ε:ma(ai)
area:ra 2(area)
ts 2
proy:ra2(proy)
ok:ra1 ([per , area, proy])
ts 1
ε:ra3(per)
no:ra4(ai)
ε:ra3(area )
ls2
ε:ra3(proy)
ls1
ok:w
is
fs
ε:ma(ai)
no: ra4(ai)
rshelp
ε
area:ra 2(area)
ts 2
no:ra4(ai)
ε:ra3(area )
ls2
ε:ra3(proy)
ts 3
ms 1
fs
no:ra 2(error)
ε:ra3(per)
proy:ra 2(proy)
ok:ra1 ([per , area, proy])
ts 3
ms 1
per:ra2(per)
ok:w
no: ra4(ai)
no:ra2(error)
ls1
is
rshelp
ε
ls1
is
Dr. Luis Pineda, IIMAS, UNAM, 2009
Dr. Luis Pineda, IIMAS, UNAM, 2009
Declarative specification of situations
[id ==> ls_2,
type ==> listening,
in_pairs ==> [ls_1 => ok:ra_2([ai,pr,pl]),
Basic expressive power:
Recursive Transition Networks
ls_3 => ok:ra_2([ai,pr]),
ls_3 => ok:ra_2([ai,pl]),
ls_3 => ok:ra_2([pr,pl])],
out_pairs ==> [ai:ra_4(ai) => rs_1,
pr:ra_4(pr) => rs_2,
pl:ra_4(pl) => rs_3,
no:ra_3(tour) => fs],
expected_intentions ==>
[ai,pr, pl, no]
],
Dr. Luis Pineda, IIMAS, UNAM, 2009
Dr. Luis Pineda, IIMAS, UNAM, 2009
Dialogue Manager = Interpreter program
Generation (NL & motor action)
Dialogue Models
Basic Rhetorical Acts
(Modality Specific)
Speech
Act
Interpreter
Rhetorical
Act
Rhetorical
Act
Interpreter
Spec. of
Mod. Spec.
Action
Recursive Transition Network
Dr. Luis Pineda, IIMAS, UNAM, 2009
Dr. Luis Pineda, IIMAS, UNAM, 2009
Representational objects
(Speech acts, Rhetorical acts and
Situations)
Specification of actions
Representation
Constants
Interpretation
Memory
Specification
Grounded
predicates
Variables
Recognition
Realization
Predicates
with free variables
Functions
Dr. Luis Pineda, IIMAS, UNAM, 2009
Dr. Luis Pineda, IIMAS, UNAM, 2009
Constants
si
Grounded Predicates
a:b
si+1
p(a,b):q(c,d)
Definite intentions and actions
Definite intentions and actions
Dr. Luis Pineda, IIMAS, UNAM, 2009
Variables
si
si+1
si
Dr. Luis Pineda, IIMAS, UNAM, 2009
Predicates with freefree-variables
(The indexical context)
x:p(x)
si+1
Abstract intentions and actions
Dr. Luis Pineda, IIMAS, UNAM, 2009
p(a,x):q(x)
si+1
si
Abstract intentions and actions
Dr. Luis Pineda, IIMAS, UNAM, 2009
Collecting arguments from perception
Passing values from input to output
Expectations
p(a,x)
Representation
p(a,x)
p(a,b)
Interpretation
q(b)
p(a,b)
Memory
Interpretation
Memory
Specification
Recognition
Recognition
Realization
Dr. Luis Pineda, IIMAS, UNAM, 2009
Dr. Luis Pineda, IIMAS, UNAM, 2009
The interaction history
(anaphoric context)
Situations can have parameters
p(a,b):q(c,d)
si+1
si
p(a,x):q(x)
si
sj(x)
Abstract intentions, actions and situations
The full path is recorded once an arc has been
traveled (i.e. grounded), for all arcs traveled:
dm1: si =>p(a,b):q(c,d) => sj
The interaction history is a list of these terms
Dr. Luis Pineda, IIMAS, UNAM, 2009
Functions
si
λX.f1(X):λY. f2(Y)
Dr. Luis Pineda, IIMAS, UNAM, 2009
Functional Actions
λZ. f3(Z)
Functional definition of intentions, actions and
next situations: X, Y and Z depend on the
interaction history, the current DM and the
current situation.
Dr. Luis Pineda, IIMAS, UNAM, 2009
Suppose a scenario in which the
robot is giving a tour to a visitor
and there is a situation where it
has to make an offer, but taking
into account the offers it has
made before…
Dr. Luis Pineda, IIMAS, UNAM, 2009
Functional Expression of
Actions
si
_:λY. f2(Y)
Evaluation…
si+1
si
_:λY. f2(Y) = p(a,b,c)
si+1
Where p = do you want me to do …
Robot: Do you want me to do a or b or c?
User: Please, do b
Dr. Luis Pineda, IIMAS, UNAM, 2009
Dr. Luis Pineda, IIMAS, UNAM, 2009
But if we come again to this
situation…
si
_:λY. f2(Y) = p(a,c)
si+1
Where p = do you want me to do …
Functional Definition of
Expectations
Suppose the robot is expecting to
see one among a number of
posters at a specific situation
during the interaction (at the
same position)
Robot: Do you want me to do a or c?
User: Please, do a
Dr. Luis Pineda, IIMAS, UNAM, 2009
Dr. Luis Pineda, IIMAS, UNAM, 2009
Functional Definition of
Expectations
si
λX.f1(X):_
Evaluation…
si+1
si
λX.f1(X) = p(a, b, c) :_
si+1
Robot expects to see a poster a or b or c
at the current situation
Robot recognizes poster a
Dr. Luis Pineda, IIMAS, UNAM, 2009
Dr. Luis Pineda, IIMAS, UNAM, 2009
But if we come to this situation
again…
si
λX.f1(X) = p(b, c) :_
si+1
Robot focuses in recognizing b or c
The unexpected…
Every
situation can be followed by a
recursive situation embedding a recovery
dialogue model
The information gathered is passed as a
parameter to the recovery model
The recovery model may use the
interaction history…
– You already saw this poster… do you
want so see it again?
Dr. Luis Pineda, IIMAS, UNAM, 2009
Dr. Luis Pineda, IIMAS, UNAM, 2009
Functional Definition of
Next Situations
Suppose the robot responds up to
four questions from the user for
each poster… then he or she
might get bored!
Functional Definition of
Next Situations
_:_
si
λZ. f3(Z)
Functional definition of next situations
Dr. Luis Pineda, IIMAS, UNAM, 2009
Dr. Luis Pineda, IIMAS, UNAM, 2009
Providing information…
_:_
si
λZ. f3(Z) = si
But after the fourth time…
_:_
si
λZ. f3(Z) = sj
Where si = si+1
Where sj is a “continuation dialogue”:
Where the system offers to explain a topic of the
current poster (not explained before!)
Do you want to see another poster?
Dr. Luis Pineda, IIMAS, UNAM, 2009
Dr. Luis Pineda, IIMAS, UNAM, 2009
Declarative specification
Possible value’s types of functions…
[id ==> ls_2,
type ==> listening,
in_pairs ==>
[ls_1 => λx.f(x): λx.g(y),
ls_3 => ok:λx.h(x),
out_pairs ==> [λx.f(a,x):λx.h(y)
=> λz.rs(z)),
no:ra_3(tour) => fs],
],
Constants
Grounded
predicates
Variables
Predicates
with free variables
Functions
Dr. Luis Pineda, IIMAS, UNAM, 2009
Dr. Luis Pineda, IIMAS, UNAM, 2009
Content
Introduction
A multimodal architecture for service robots
Dialogue models
Implementation
Applications:
– Tell me about the magazine!
– Guess the card!
– A guided tour with pointing acts
Conclusion and future work
The full expressive power:
Functional Recursive Transition
Networks (F(F-RTN)
Dr. Luis Pineda, IIMAS, UNAM, 2009
Dr. Luis Pineda, IIMAS, UNAM, 2009
Implementation
Agents
Implementation
environment:
Robot navigation:
– Open Agent Architecture (linux)
Dialogue
– Prolog
Speech
– Player/Stage
models spec. & interpretation:
recognition:
– Sphinx
– Acoustic models: The Corpus DIMEx100
Speech
Vision:
– OpenCV
– SIFT
Robot:
Magellan Pro (RWI)
Fixed Applications with PCs
synthesis:
– Mbrola & Festival
Dr. Luis Pineda, IIMAS, UNAM, 2009
Dr. Luis Pineda, IIMAS, UNAM, 2009
Content
Content
Introduction
A multimodal architecture for service robots
Dialogue models
Implementation
Applications:
– Tell me about the magazine!
– Guess the card!
– A guided tour with pointing acts
Conclusion and Future work
Introduction
A multimodal architecture for service robots
Dialogue models
Implementation
Applications:
– Tell me about the magazine!
– Guess the card!
– A guided tour with pointing acts
Conclusion and future work
Dr. Luis Pineda, IIMAS, UNAM, 2009
A Bayesian Architecture
(with symbolic knowledge)
Expectations
p(E)
p(E/O)
Interpretation
Memory
Dr. Luis Pineda, IIMAS, UNAM, 2009
Future work
Speech and language processing:
– More roboust speech recognition (Language
Models)
– More general NLP capabilities (e.g. Semantic
parsing)
– Richer linguistic explanations
p(O/E)
Recognition
Dr. Luis Pineda, IIMAS, UNAM, 2009
Dr. Luis Pineda, IIMAS, UNAM, 2009
Future work
Future work
Dialogue Models
– A new DM interpreter (more general)
– Introduce multimodal situations: listening and
seeing
– Extend the range of speech acts that can be
understood (grounding strategies, turn taken
strategies)
– More general recovery strategies in the DM
– Introduce a stochastic component in the F-RTN
– Introduce internal actions (i.e. “thinking”)
Dr. Luis Pineda, IIMAS, UNAM, 2009
Vision
and navigation
– More intelligent navigation (e.g. Metric
maps, qualitative plans and interaction
cycles of move, see, ask)
– More varied recognition and action
devices
– More control input devices handled from
DM directly (contact, laser, ultrasonic,
infrared, etc.)
Dr. Luis Pineda, IIMAS, UNAM, 2009
Future work
Architecture
– Integration of reactive behavior
– Intergration of schematic behavior
– Use a multimodal data-base to implement
the memory component
Dr. Luis Pineda, IIMAS, UNAM, 2009
Many thanks!
Dr. Luis Pineda, IIMAS, UNAM, 2009
Construct diverse applicatios
with the same platform!
Dr. Luis Pineda, IIMAS, UNAM, 2009
Descargar