Topic: Implicit intention recognition by integrating speech and planning (ESR 4)
Supervisors: Prof. Thomas Hellström, Dr. Ola Ringdahl
I’m a PhD student at Umeå University. I took my Master in Computer Science and Engineering in Politecnico di Milano, Italy. I’m currently studying Natural Language Processing in the context of Human Robot Interaction, more in particular I’m interested in semantics and their role in robotic systems. Is the system shaped by its semantics, or are the semantics that get shaped by the system? Being from Italy I’ve also some nice skills in cooking pasta and pizza. I like board games and playing strategy games on the pc; I’m also interested in psychology and especially in the interpretation of dreams.
Summary of research
My research mainly involved the investigation and implementation of intent recognition algorithms that merge task planning and speech. Intent recognition is the task of inferring an agent’s intentions, and can be casted as a plan recognition problem. Following this approach, we developed a speech component that grounds the speaker’s utterances into custom planning domains. Once the utterance is grounded, that is, an appropriate representation at the task level has been found, we can use the information contained in the utterance to infer the most likely intention, given the assumption that an intention is present. This assumption is reasonable in the context of a collaboration between a human and a robot, where both specify what they’re going to do.
Parallelly, we also investigated how to best define planning domains that can be shared between a human and a robot. This lead us to the exploration of planning domains that leverage the notion of affordance, that correspond to the union of how an environment invites behaviors and the corresponding agents’ abilities to perceive and enact those behaviors. As also previous research have shown this type of formalization allows the creation of planning domains suitable for intent recognition. Nevertheless, a domain’s affordances must first either be gathered from the environment or specified; we picked the former option and defined an algorithm that by leveraging natural language processing techniques gathers and generates possible names of actions associated to objects, with the goal of then linking this generation process to actual physical objects and actions, hence supporting intent recognition though eg. computer vision.
Finally, during my socrates secondment at Fraunhofer IPA in Stuttgart, Germany, I participated in the creation of a service robot dedicated to serving drinks. The robot, denoted the Traveling Drinksman, detects the persons positions in the room by their head’s position, then generates optimal plans to serve all of the detected guests in the most optimal way. For the moment only seated persons can be served but future research could remove this constraint. Furthermore, the robot could be extended towards more social aspects rather than only serving drinks.
Following is a video introducing intentions:
Following is a video of a working application:
List of publications:
Michele Persiani, Thomas Hellstrom. Intent Recognition From Speech and Plan Recognition. Accepted at the 18th edition of Practical Applications of Agents and Multi-Agent Systems (PAAMS), 2020.
Michele Persiani, Thomas Hellstrom. Unsupervised Inference of Object Affordance from Text Corpora. Proceedings of the 22nd Nordic Conference on Computational Linguistics (NoDaLiDa), 2019.
Michele Persiani, Cagatay Odabasi, Florenz Graf, Mohit Kalra, Thomas Hellstrom, Birgit Graf. TravelingDrinksman—A Mobile Service Robot for People in Care-Homes. Accepted at the International Symposium of Robotics (ISR), 2020.
Michele Persiani, Maitreyee Tewari. Mediating Joint Intentions with a Dialogue Management System. Submitted at the workshop ”Mental Models of Robots” at HRI 2020, 2020