Dialogue model specification and interpretation for interacting with service robots Luis Pineda and The Golem group [email protected] http://leibniz.iimas.unam.mx/~luis IIMAS, UNAM, México September, 2009 The Golem project team (current) Luis A. Pineda Héctor Avilés Paty Pérez Pavón Ivan Meza Wendy Aguilar Montserrat Alvarado External collaborators Students Dr. Luis Pineda, IIMAS, UNAM, 2009 Content Content Introduction A multimodal architecture for service robots Dialogue models Implementation Applications: – Tell me about the magazine! – Guess the card! – A guided tour with pointing acts Conclusion and future work Introduction A multimodal architecture for service robots Dialogue models Implementation Applications: – Tell me about the magazine! – Guess the card! – A guided tour with pointing acts Conclusion and future work Dr. Luis Pineda, IIMAS, UNAM, 2009 Dr. Luis Pineda, IIMAS, UNAM, 2009 The Context Main assumptions… The interaction cycle must be tight! – The agent needs to keep in touch with the world! – Only reactive behavior? Human-level communication (e.g. Coordinated language and vision) needs to rely on symbolic representations The objects of representation are interpretations (not the world directly!) Perception is guided by intentions and memory Interpretation depends on the context Dr. Luis Pineda, IIMAS, UNAM, 2009 Every interpretation act is performed in context: – A spatial and time situation (indexical) – A set of agents (I, you…also indices) – A discourse or interaction history (anaphoric) – A conceptual domain – A set of potential intention and action types that can be expressed by the agents during the interaction The context is a big thing! Can it be modelled explicitly in a simple way? Dr. Luis Pineda, IIMAS, UNAM, 2009 Content Agent’s expectations Linguistic expectations (communicative) – What are the speech acts that can be expressed by interlocutors in specific conversational situations? Environmental expectations (without intent) – What events are expected to appear in the situation that are perceived by the different senses? How to deal with the unexpected? Introduction A multimodal architecture for service robots Dialogue models Implementation Applications: – Tell me about the magazine! – Guess the card! – A guided tour with pointing acts Future work Dr. Luis Pineda, IIMAS, UNAM, 2009 Dr. Luis Pineda, IIMAS, UNAM, 2009 A three layers architecture Representation Representation Interpretation Interpretation Recognition Recognition Dr. Luis Pineda, IIMAS, UNAM, 2009 Dr. Luis Pineda, IIMAS, UNAM, 2009 Representation Representation Interpretation Interpretation Perception Recognition Recognition Dr. Luis Pineda, IIMAS, UNAM, 2009 Contextual knowledge (modality independent) + Inference Modality specific or combination of specific modalities Dr. Luis Pineda, IIMAS, UNAM, 2009 Modality specific images… Representation Interpretation Expectations driven interpretation Expectations Memory Recognition Interpretation The context Memory Recognition Dr. Luis Pineda, IIMAS, UNAM, 2009 An interpretation act… Expectations An interpretation act… Expectations Contextual expectations Interpretation Dr. Luis Pineda, IIMAS, UNAM, 2009 Memory Contextual expectations Interpretation Memory Uninterpreted image! Recognition Recognition Message/stimulus Dr. Luis Pineda, IIMAS, UNAM, 2009 Memory is indexed by expectations! Expectations Dr. Luis Pineda, IIMAS, UNAM, 2009 Memory retrieval Expectations Contextual expectations (Indices) Interpretation Memory Interpretation Uninterpreted image! Memory A qualitative match Recognition Recognition Dr. Luis Pineda, IIMAS, UNAM, 2009 Dr. Luis Pineda, IIMAS, UNAM, 2009 The elements of interpretation… Expectations Representation Selected intention/expectation Memory Interpretation Recognition 1) The stimulus 2) The expectations 3) Memory Interpretation relative to the context Interpreted message/expectation! Interpretation Memory Recognition Dr. Luis Pineda, IIMAS, UNAM, 2009 The objects of representation are interpretations! Dr. Luis Pineda, IIMAS, UNAM, 2009 Interpretation Examples: Expectations Linguistic & visual Interpretations Interpretation Representations of the world Recognition (aspects of it) Dr. Luis Pineda, IIMAS, UNAM, 2009 Linguistic Interpretation Expectations Spoken recognition… Expectations Expected intentions Interpretation Dr. Luis Pineda, IIMAS, UNAM, 2009 Expected intentions Memory Interpretation Memory Recognized text Recognition Recognition The text is an uninterpreted image! Spoken expression Dr. Luis Pineda, IIMAS, UNAM, 2009 Dr. Luis Pineda, IIMAS, UNAM, 2009 The memory objects… Expectations The qualitative match… Expectations Expected Intentions (Indices) Memory Interpretation Expected Intentions index Regular Expressions Memory Interpretation Text Text Recognition Recognition Expected. Inti --> Reg. Expi Selected intentions are those whose associated Reg. Exp. is satisfied by the text (i.e. grep) Dr. Luis Pineda, IIMAS, UNAM, 2009 The output of the interpretation… Dr. Luis Pineda, IIMAS, UNAM, 2009 The intended speech act! Representation In relation to the context Expectations Selected intention (grounded) Memory Interpretation Recognition Interpreted spoken input! Interpretation Recognition Dr. Luis Pineda, IIMAS, UNAM, 2009 Memory may not be required! name-of-visitor(Peter) Interpretation Dr. Luis Pineda, IIMAS, UNAM, 2009 Visual Interpretation Expectations Expectations name-of-visitor(X) Memory Semantic parser Visual expectation Interpretation Memory Recognition Recognition Concrete visual image Hum, my name… I’m Peter Dr. Luis Pineda, IIMAS, UNAM, 2009 Dr. Luis Pineda, IIMAS, UNAM, 2009 Spoken recognition… Expectations The memory objects… Expectations Visual expectations (Indices) Visual expectation Memory Interpretation Interpretation Memory SIFT features SIFT features Recognition Recognition Dr. Luis Pineda, IIMAS, UNAM, 2009 The qualitative match… Expectations Dr. Luis Pineda, IIMAS, UNAM, 2009 The qualitative match Expectations Expectations index images Memory Interpretation Selected visual expectation Interpretation Memory SIFT vector Recognition Selected expectations are those whose associated vector is the closest to the vector of the external image Dr. Luis Pineda, IIMAS, UNAM, 2009 Recognition Dr. Luis Pineda, IIMAS, UNAM, 2009 Interpretation of the visual act Representation In relation to the context The full architecture: Interpreted image (symbolic rep.) Interpretation Memory Interpretation and Action Recognition Dr. Luis Pineda, IIMAS, UNAM, 2009 Dr. Luis Pineda, IIMAS, UNAM, 2009 The full architecture Output actions Representation Representation Interpretation Memory Specification Specification Modality specific Recognition Realization Realization Dr. Luis Pineda, IIMAS, UNAM, 2009 Dr. Luis Pineda, IIMAS, UNAM, 2009 Reactive behavior Schematic behavior… Representation Representation Interpretation Memory Recognition Specification Interpretation Realization Recognition Memory Realization Dr. Luis Pineda, IIMAS, UNAM, 2009 Dr. Luis Pineda, IIMAS, UNAM, 2009 Content Our current working setting… Recognition Memory Introduction A multimodal architecture for service robots Dialogue models Implementation Applications: – Tell me about the magazine! – Guess the card! – A guided tour with pointing acts Conclusion and future work Representation Interpretation Specification Specification Realization Dr. Luis Pineda, IIMAS, UNAM, 2009 Dr. Luis Pineda, IIMAS, UNAM, 2009 Dialogue Models Dialogue Models Representation Representation Interpretation Specification Memory Recognition Realization Dr. Luis Pineda, IIMAS, UNAM, 2009 Dr. Luis Pineda, IIMAS, UNAM, 2009 Interactive situation The The formalism: Functional Recursive Transition Networks (F(F-RTN) conversational situation – Information state – Modality oriented – Speech acts and expectations – Rhetorical Acts: actions performed by the agent (spoken, motor, etc.) Relative to a context! Dr. Luis Pineda, IIMAS, UNAM, 2009 Dr. Luis Pineda, IIMAS, UNAM, 2009 Conversational Situation Conversational Situation Input SpAct si-1 SpAct:RhtAct si SpAct:RhtAct si+1 si-1 SpAct:RhtAct si SpAct:RhtAct si+1 Only a specific set of intentions is meaningful Dr. Luis Pineda, IIMAS, UNAM, 2009 Dr. Luis Pineda, IIMAS, UNAM, 2009 Intention types Conversational Situation Listening Visual Telling Recursive Error Final SpAct:RhtAct si-1 SpAct:RhtAct si si+1 RhtAct Dr. Luis Pineda, IIMAS, UNAM, 2009 Dr. Luis Pineda, IIMAS, UNAM, 2009 Rhetorical acts (Actions) Structure: – Lists of basic acts – Basic Acts: Modality specific actions Function: – Direct external behavior – Can direct internal behavior (embedded in the interpretation state): Razoning, planning, theorem-proving, problemsolving The Model Conversational model: – Set of dialogue models Dialogue model: – Set of situations – Set of rhetorical acts Declarative specification from a task analysis Dr. Luis Pineda, IIMAS, UNAM, 2009 Dr. Luis Pineda, IIMAS, UNAM, 2009 A Dialogue Model A Recursive Situation rs 1 is ε:ra1(tour) ls 2 pr:ra4(pr) rs 2 pl:ra 4(pl) ε :ra5(pr) ls 3 ε :ra5(ai) ai:ra4(ai) ε:ra1(tour) ls 2 ε:ra5(pl) pr:ra4(pr) rs 2 pl:ra 4(pl) ok:ra2([ai, pr, pl]) ls 1 rs 1 is ε :ra5(ai) ai:ra4(ai) ε :ra5(pr) ls 3 ε:ra5(pl) ok:ra2([ai, pr, pl]) no:ra 3(tour) no:ra 3(tour) rs 3 ok:w no:ra 3(tour) no:ra 3(tour) no:ra 3(tour) fs ls 1 ls 2 fs Dr. Luis Pineda, IIMAS, UNAM, 2009 rs 3 ok:w no:ra 3(tour) ls 2 Dr. Luis Pineda, IIMAS, UNAM, 2009 Modality independent specification A subordinated DM ts 1 per:ra2(per) ls1 is ε:ma(ai) area:ra 2(area) ts 2 proy:ra2(proy) ok:ra1 ([per , area, proy]) ts 1 ε:ra3(per) no:ra4(ai) ε:ra3(area ) ls2 ε:ra3(proy) ls1 ok:w is fs ε:ma(ai) no: ra4(ai) rshelp ε area:ra 2(area) ts 2 no:ra4(ai) ε:ra3(area ) ls2 ε:ra3(proy) ts 3 ms 1 fs no:ra 2(error) ε:ra3(per) proy:ra 2(proy) ok:ra1 ([per , area, proy]) ts 3 ms 1 per:ra2(per) ok:w no: ra4(ai) no:ra2(error) ls1 is rshelp ε ls1 is Dr. Luis Pineda, IIMAS, UNAM, 2009 Dr. Luis Pineda, IIMAS, UNAM, 2009 Declarative specification of situations [id ==> ls_2, type ==> listening, in_pairs ==> [ls_1 => ok:ra_2([ai,pr,pl]), Basic expressive power: Recursive Transition Networks ls_3 => ok:ra_2([ai,pr]), ls_3 => ok:ra_2([ai,pl]), ls_3 => ok:ra_2([pr,pl])], out_pairs ==> [ai:ra_4(ai) => rs_1, pr:ra_4(pr) => rs_2, pl:ra_4(pl) => rs_3, no:ra_3(tour) => fs], expected_intentions ==> [ai,pr, pl, no] ], Dr. Luis Pineda, IIMAS, UNAM, 2009 Dr. Luis Pineda, IIMAS, UNAM, 2009 Dialogue Manager = Interpreter program Generation (NL & motor action) Dialogue Models Basic Rhetorical Acts (Modality Specific) Speech Act Interpreter Rhetorical Act Rhetorical Act Interpreter Spec. of Mod. Spec. Action Recursive Transition Network Dr. Luis Pineda, IIMAS, UNAM, 2009 Dr. Luis Pineda, IIMAS, UNAM, 2009 Representational objects (Speech acts, Rhetorical acts and Situations) Specification of actions Representation Constants Interpretation Memory Specification Grounded predicates Variables Recognition Realization Predicates with free variables Functions Dr. Luis Pineda, IIMAS, UNAM, 2009 Dr. Luis Pineda, IIMAS, UNAM, 2009 Constants si Grounded Predicates a:b si+1 p(a,b):q(c,d) Definite intentions and actions Definite intentions and actions Dr. Luis Pineda, IIMAS, UNAM, 2009 Variables si si+1 si Dr. Luis Pineda, IIMAS, UNAM, 2009 Predicates with freefree-variables (The indexical context) x:p(x) si+1 Abstract intentions and actions Dr. Luis Pineda, IIMAS, UNAM, 2009 p(a,x):q(x) si+1 si Abstract intentions and actions Dr. Luis Pineda, IIMAS, UNAM, 2009 Collecting arguments from perception Passing values from input to output Expectations p(a,x) Representation p(a,x) p(a,b) Interpretation q(b) p(a,b) Memory Interpretation Memory Specification Recognition Recognition Realization Dr. Luis Pineda, IIMAS, UNAM, 2009 Dr. Luis Pineda, IIMAS, UNAM, 2009 The interaction history (anaphoric context) Situations can have parameters p(a,b):q(c,d) si+1 si p(a,x):q(x) si sj(x) Abstract intentions, actions and situations The full path is recorded once an arc has been traveled (i.e. grounded), for all arcs traveled: dm1: si =>p(a,b):q(c,d) => sj The interaction history is a list of these terms Dr. Luis Pineda, IIMAS, UNAM, 2009 Functions si λX.f1(X):λY. f2(Y) Dr. Luis Pineda, IIMAS, UNAM, 2009 Functional Actions λZ. f3(Z) Functional definition of intentions, actions and next situations: X, Y and Z depend on the interaction history, the current DM and the current situation. Dr. Luis Pineda, IIMAS, UNAM, 2009 Suppose a scenario in which the robot is giving a tour to a visitor and there is a situation where it has to make an offer, but taking into account the offers it has made before… Dr. Luis Pineda, IIMAS, UNAM, 2009 Functional Expression of Actions si _:λY. f2(Y) Evaluation… si+1 si _:λY. f2(Y) = p(a,b,c) si+1 Where p = do you want me to do … Robot: Do you want me to do a or b or c? User: Please, do b Dr. Luis Pineda, IIMAS, UNAM, 2009 Dr. Luis Pineda, IIMAS, UNAM, 2009 But if we come again to this situation… si _:λY. f2(Y) = p(a,c) si+1 Where p = do you want me to do … Functional Definition of Expectations Suppose the robot is expecting to see one among a number of posters at a specific situation during the interaction (at the same position) Robot: Do you want me to do a or c? User: Please, do a Dr. Luis Pineda, IIMAS, UNAM, 2009 Dr. Luis Pineda, IIMAS, UNAM, 2009 Functional Definition of Expectations si λX.f1(X):_ Evaluation… si+1 si λX.f1(X) = p(a, b, c) :_ si+1 Robot expects to see a poster a or b or c at the current situation Robot recognizes poster a Dr. Luis Pineda, IIMAS, UNAM, 2009 Dr. Luis Pineda, IIMAS, UNAM, 2009 But if we come to this situation again… si λX.f1(X) = p(b, c) :_ si+1 Robot focuses in recognizing b or c The unexpected… Every situation can be followed by a recursive situation embedding a recovery dialogue model The information gathered is passed as a parameter to the recovery model The recovery model may use the interaction history… – You already saw this poster… do you want so see it again? Dr. Luis Pineda, IIMAS, UNAM, 2009 Dr. Luis Pineda, IIMAS, UNAM, 2009 Functional Definition of Next Situations Suppose the robot responds up to four questions from the user for each poster… then he or she might get bored! Functional Definition of Next Situations _:_ si λZ. f3(Z) Functional definition of next situations Dr. Luis Pineda, IIMAS, UNAM, 2009 Dr. Luis Pineda, IIMAS, UNAM, 2009 Providing information… _:_ si λZ. f3(Z) = si But after the fourth time… _:_ si λZ. f3(Z) = sj Where si = si+1 Where sj is a “continuation dialogue”: Where the system offers to explain a topic of the current poster (not explained before!) Do you want to see another poster? Dr. Luis Pineda, IIMAS, UNAM, 2009 Dr. Luis Pineda, IIMAS, UNAM, 2009 Declarative specification Possible value’s types of functions… [id ==> ls_2, type ==> listening, in_pairs ==> [ls_1 => λx.f(x): λx.g(y), ls_3 => ok:λx.h(x), out_pairs ==> [λx.f(a,x):λx.h(y) => λz.rs(z)), no:ra_3(tour) => fs], ], Constants Grounded predicates Variables Predicates with free variables Functions Dr. Luis Pineda, IIMAS, UNAM, 2009 Dr. Luis Pineda, IIMAS, UNAM, 2009 Content Introduction A multimodal architecture for service robots Dialogue models Implementation Applications: – Tell me about the magazine! – Guess the card! – A guided tour with pointing acts Conclusion and future work The full expressive power: Functional Recursive Transition Networks (F(F-RTN) Dr. Luis Pineda, IIMAS, UNAM, 2009 Dr. Luis Pineda, IIMAS, UNAM, 2009 Implementation Agents Implementation environment: Robot navigation: – Open Agent Architecture (linux) Dialogue – Prolog Speech – Player/Stage models spec. & interpretation: recognition: – Sphinx – Acoustic models: The Corpus DIMEx100 Speech Vision: – OpenCV – SIFT Robot: Magellan Pro (RWI) Fixed Applications with PCs synthesis: – Mbrola & Festival Dr. Luis Pineda, IIMAS, UNAM, 2009 Dr. Luis Pineda, IIMAS, UNAM, 2009 Content Content Introduction A multimodal architecture for service robots Dialogue models Implementation Applications: – Tell me about the magazine! – Guess the card! – A guided tour with pointing acts Conclusion and Future work Introduction A multimodal architecture for service robots Dialogue models Implementation Applications: – Tell me about the magazine! – Guess the card! – A guided tour with pointing acts Conclusion and future work Dr. Luis Pineda, IIMAS, UNAM, 2009 A Bayesian Architecture (with symbolic knowledge) Expectations p(E) p(E/O) Interpretation Memory Dr. Luis Pineda, IIMAS, UNAM, 2009 Future work Speech and language processing: – More roboust speech recognition (Language Models) – More general NLP capabilities (e.g. Semantic parsing) – Richer linguistic explanations p(O/E) Recognition Dr. Luis Pineda, IIMAS, UNAM, 2009 Dr. Luis Pineda, IIMAS, UNAM, 2009 Future work Future work Dialogue Models – A new DM interpreter (more general) – Introduce multimodal situations: listening and seeing – Extend the range of speech acts that can be understood (grounding strategies, turn taken strategies) – More general recovery strategies in the DM – Introduce a stochastic component in the F-RTN – Introduce internal actions (i.e. “thinking”) Dr. Luis Pineda, IIMAS, UNAM, 2009 Vision and navigation – More intelligent navigation (e.g. Metric maps, qualitative plans and interaction cycles of move, see, ask) – More varied recognition and action devices – More control input devices handled from DM directly (contact, laser, ultrasonic, infrared, etc.) Dr. Luis Pineda, IIMAS, UNAM, 2009 Future work Architecture – Integration of reactive behavior – Intergration of schematic behavior – Use a multimodal data-base to implement the memory component Dr. Luis Pineda, IIMAS, UNAM, 2009 Many thanks! Dr. Luis Pineda, IIMAS, UNAM, 2009 Construct diverse applicatios with the same platform! Dr. Luis Pineda, IIMAS, UNAM, 2009