Designing an Artificial Attention System for Social Robots


The following video summarises the overall outcomes of the project. Besides the outcomes for the solutions already detailed below, it also presents the two top-level components that regulate top-down modulation of attention (aka the "top-down controller"):

  1. The attention schema module, that allows inferring and predicting at an individualised level the attentional state of "self" (i.e. the robot) and "other" (i.e. any human interlocutor) through the combined use of a simplified model of attention applied to any chosen individual, a set of preprocessed cues (e.g. gaze direction), and current belief on the attentional goal of both "self" (absolute belief) and "other" (prediction).
  2. The joint attention finite-state machine (JASM) module, that models the attentional process by assuming it as a recognisable interaction (i.e. a "joint action") when an interlocutor is present. This allows for an even more sophisticated account of the attentional process by fully acknowledging its social trait.



The full technical description together with overall results was presented in IROS (see Publications). The final technical report documenting the full software framework is presented below.



Lanillos, P., and Ferreira, J.F. The CASIR-IMPEP Attention Framework for Social Interaction with Robots. Technical Report, MRL-CASIR-2015-07-TR004, 2015.

A Bayesian Hierarchy for Robust Gaze Estimation in Human-Robot Interaction


Gaze estimation, in the sense of continuously assessing gaze direction of an interlocutor so as to determine his/her focus of visual attention, is important in several important computer vision applications, such as the development of non-intrusive gaze-tracking equipment for psychophysical experiments in neuroscience, specialised telecommunication devices, video surveillance, human-computer interfaces (HCI) and artificial cognitive systems for human-robot interaction (HRI), our application of interest.


We have developed a robust solution based on a probabilistic approach that inherently deals with the uncertainty of sensor models, but also and in particular with uncertainty arising from distance, incomplete data and scene dynamics. This solution comprises a hierarchical formulation in the form of a mixture model that loosely follows how geometrical cues provided by facial features are believed to be used by the human perceptual system for gaze estimation.



We have also developed a ray-tracing solution that establishes the cells on the BVM-3D saliency framework (see below) that were traversed by the gaze director line so as to provide the necessary dyadic-to-triadic transformation needed for attention in a social context (see above).

Multisensory 3D Saliency for Artificial Attention Systems


As one of the main outcomes of this project, the CASIR team developed a short-term 3D memory for artificial attention systems, loosely inspired in perceptual processes believed to be implemented in the human brain. Our solution supports the implementation of multisensory perception and stimulus-driven processes of attention.


For this purpose, it provides:

  1. Knowledge persistence with temporal coherence tackling potential salient regions outside the field of view, via a panoramic, log-spherical inference grid.
  2. Prediction, by using estimates of local 3D velocity to anticipate the effect of scene dynamics.
  3. Spatial correspondence between volumetric cells potentially occupied by proto-objects and their corresponding multisensory saliency scores.


Visual and auditory signals are processed to extract features that are then filtered by a proto-object segmentation module that employs colour and depth as discriminatory traits. We consider as features, apart from the commonly used colour and intensity contrast, colour bias, the presence of faces, scene dynamics and also loud auditory sources. Combining conspicuity maps derived from these features we obtain a 2D saliency map, which is then processed using the probability of occupancy in the scene to construct the final 3D saliency map as an additional layer of the Bayesian Volumetric Map (BVM) inference grid [Ferreira et al. 2013].

The inherent properties of our solution were presented in REACTS 2015 (see Publications), and a summary video showcasing these properties is presented next.



Apart from the generic properties of the 3D saliency framework, there are also two important sets of features that are provided by this solution:

  1. Sophisiticated, inherent bottom-up characteristics, such as environmental context influence implementation via gist, and exploration vs exploitation attentional behaviour promoted via a balance between feature-based saliency and entropy-based saliency (introducing effects approximating "inhibition-of-return").
  2. Top-down modulation, via an attentional set of weights that are modifiable on-the-fly by top-level, abstractly- and cognitively-driven modules.


Some of these properties are shown in the video presented below.




[Ferreira2013] J. F. Ferreira, J. Lobo, P. Bessière, M. Castelo-Branco, J. Dias - "A Bayesian Framework for Active Artificial Perception" - IEEE Transactions on Cybernetics (Systems Man and Cybernetics, part B), vol.43, no.2, pp.699-711, April 2013

Demonstration of current implementation


The CASIR team has developed a demo showcasing the most recent stable features of the CASIR-IMPEP attentional framework, namely in how it replicates human-like overt attention processes by using a sophisticated probabilistic multisensory perceptual framework, while providing a means to validate current implementations of low-level supporting algorithms. The Integrated Multimodal Perception Platform (IMPEP v2.0), an active robotic head designed and developed at the Faculty of Science and Technology of the University of Coimbra (FCT-UC), is to be used as the prototypical robotic agent throughout the project. The mounting hardware and motors were designed by the Perception on Purpose (POP - EC project number FP6-IST-2004-027268) team of the Institute of Systems and Robotics/FCT-UC, and the sensor systems mounted at the Mobile Robotics Laboratory of the same institute, within the scope of the Bayesian Approach to Cognitive Systems project (BACS - EC project number FP6-IST-027140). 


The current version of this demonstrator showcases basic active perception capabilities using OpenCV functionalities, which provide a set of feature images and 3D point clouds that are used by the CASIR attentional framework to promote gaze shifts (i.e. active head reorientations) towards the most behaviourally relevant features in the image. The demonstrator is to be run in this fashion in an intentional infinite loop.


A clip showing a demonstration of a few of the system's capabilities, together with a Technical Report detailing the system, are presented below.




Lanillos, P., Oliveira, J., and Ferreira, J.F. Experimental Setup and Configuration for Joint Attention in CASIR. Technical Report, MRL-CASIR-2013-11-TR001, 2013.