DIVAD

Bayesian deep learning for dynamic interaction applied to visually assistive devices

Summary

The population with visual impairments, such as age-related macular degeneracy, is continuously growing worldwide. The most promising treatment is the use of visual prosthesis or other visual assistive devices, which uses an external camera to feed the information to the prosthetic device. However, information that can be transmitted through the prosthetic device is very limited. This project continues a line of research started in the previous project to create an intelligent visual assistant that is able to extract and distill the most relevant information to overcome the limitations of the device. In the previous project, we obtained very promising results, where we were able to improve the results on object and scene recognition using machine learning and computer vision. We run experiments on a prototype of simulated prosthetic vision to use people with normal vision as subjects. However, previous results were mostly based on static scenes and passive scenarios where the user has no interaction. In this project, we aim to go beyond and work on dynamic and interactive problems where the system should be able to process dynamic information and anticipate the user needs to provide timely information and visual cues. We will improve our prototype of simulated prosthetic vision with a fully immersive virtual reality system with live feed from external cameras to perform experiments in fully interactive environments and perform complex sequences of tasks such as cooking a meal.

The scientific contribution is rooted on three pillars to deal with dynamic scenarios and being able to predict the user actions. First, interactive scenarios require a deeper knowledge of the environment. In this project, we will build upon the theory of affordances which define action options and intentions and allow the use of topological maps from activity-centric regions, reducing the complexity of the search in the action space. Second, dynamic scenarios and future predictions require a careful quantification of uncertainty. Previous results were based on a combination of geometric and deep learning methods that lack uncertainty quantification. In this project, we will use Bayesian deep learning methods to estimate both the aleatoric and epistemic sources of uncertainty. Third, having quantified the uncertainty, the assistive device needs to anticipate the user and generate the corresponding visual cue or indication. The reinforcement learning framework is a popular choice for optimal action selection in the presence of uncertainty. Because we want to limit the learning process with the user, we will use model-based reinforcement learning and Bayesian optimization for their excellent efficiency in terms of trials. Model-based methods learn a dynamic model of the user, which might be very different from one user to the next. Therefore, we intend to personalize the user experience by learning specific user models.

Our previous knowledge in robotics provides an excellent opportunity to design a fully immersive virtual reality system based on a real environment captured by egocentric cameras or a fully virtual environment based on 3D models. We intend to release the code of the simulator to speed up the development of prosthetic hardware and software. The simulator can also be used in scientific fairs and other events to show the limitations in the daily lives of people with visual defects or prosthetic vision and bring awareness to the society.

Principal investigators

José J. Guerrero
Ruben Martinez-Cantin

Team

Jesus Bermudez-Cameo
Alejandro Perez-Yus
Bruno Berenguel
Javier Garcia-Barcos
Miguel Marcos
Lorenzo Mur
Carlos Plou
Julia Tomas
Maria Santos

Papers

Javier Garcia-Barcos and Ruben Martinez-Cantin (2025) Advanced Monte Carlo for Acquisition Sampling in Bayesian optimization. Entropy, 27(58). (PDF) (BibTeX)
Juan Garcia-Lechuz Sierra, Ana Elvira Huezo Martin, Ashok M.Sundaram, Ruben Martinez-Cantin and Maximo A. Roa (2024) Bayesian optimization for robust robotic grasping using a sensorized compliant hand. IEEE Robotics and Automation Letters, 9(11):10503-10510. (PDF) (BibTeX)
Lorenzo Mur-Labadia, Ruben Martinez-Cantin, Josechu Guerrero, Giovanni Maria Farinella and Antonino Furnari (2024) AFF-ttention! Affordances and Attention models for Short-Term Object Interaction Anticipation. In Proc. of the European Conference on Computer Vision (ECCV).. (PDF) (BibTeX)
Carlos Plou, Pablo Pueyo, Ruben Martinez-Cantin, MacSchwager, Ana C. Murillo and Eduardo Montijano (2024) Gen-Swarms: Adapting Deep Generative Models toSwarms of Drones. In Proc. of the European Conference on Computer Vision (ECCV) workshops, Workshop on Cooperative Intelligence for Embodied AI.. (BibTeX)
Carlos Plou, Nerea Gallego, Alberto Sabater, Eduardo Montijano, Pablo Urcola, Luis Montesano, Ruben Martinez-Cantin and Ana C. Murillo (2024) EventSleep: Sleep Activity Recognition with Event Cameras. In Proc. of the European Conference on Computer Vision (ECCV) workshops, Workshop on Neuromorphic Vision: Advantages and Applications of Event Cameras.. (PDF) (Project) (BibTeX)
Carlos Plou, Ana C. Murillo and Ruben Martinez-Cantin (2024) Active Exploration in Bayesian Model-based Reinforcement Learning for Robot Manipulation. Technical report, arXiv:2404.01867. (PDF) (BibTeX)
Lorenzo Mur-Labadia, José J. Guerrero and Ruben Martinez-Cantin (2023) Multi-label affordance mapping from egocentric vision. In Proc. of the IEEE/CVF International Conference on Computer Vision (ICCV).. (PDF) (Project) (BibTeX)
Javier Rodriguez-Puigvert , Víctor M. Batlle, J. M. M. Montiel, Ruben Martinez-Cantin, Pascal Fua, Juan D. Tardos and Javier Civera (2023) LightDepth: Single-View Depth Self-Supervision from Illumination Decline. In Proc. of the IEEE/CVF International Conference on Computer Vision (ICCV).. (PDF) (Project) (BibTeX)
David Morilla-Cabello*, Lorenzo Mur-Labadia*, Ruben Martinez-Cantin and Eduardo Montijano (2023) Robust Fusion for Bayesian Semantic Mapping. In Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).. (PDF) (BibTeX)
Lorenzo Mur-Labadia, Ruben Martinez-Cantin and Jose J. Guerrero (2023) Bayesian deep learning for affordance segmentation in images. In Proc. of the IEEE International Conference on Robotics and Automation.. (PDF) (BibTeX)
Francisco Merino-Casallo, Maria Jose Gomez-Benito, Ruben Martinez-Cantin and Jose Manuel Garcia-Aznar (2022) A mechanistic protrusive-based model for 3D cell migration. European Journal of Cell Biology, 101(3):151255. (PDF) (BibTeX)
Sanchez-Garcia, Melani, Morollon-Ruiz, Roberto, Martinez-Cantin, Ruben, Guerrero, Jose J. and Fernandez-Jover, Eduardo (2022) Assessing visual acuity in visual prostheses through a virtual-reality system. Technical report, arXiv:2205.10395. (PDF) (BibTeX)
Javier Rodríguez-Puigvert and David Recasens, Javier Civera and Rubén Martinez-Cantín (2022) On the Uncertain Single-View Depths in Colonoscopies. In International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2022.. (PDF) (Project) (BibTeX)
Javier Rodriguez-Puigvert, Ruben Martinez-Cantin and Javier Civera (2022) Bayesian Deep Neural Networks for Supervised Learning Single-View Depth. IEEE Robotics and Automation Letters, 7(2):2565-2572. (PDF) (BibTeX)
Lorenzo Mur-Labadia and Ruben Martinez-Cantin (2021) Bayesian prediction of affordances from images. In IROS 2021 Workshop on Egocentric vision for interactive perception, learning, and control.. (PDF) (BibTeX)
Melani Sanchez-Garcia, Alejandro Perez-Yus, Ruben Martinez-Cantin and Jose J. Guerrero (2021) Augmented reality navigation system for visual prosthesis. Technical report, arXiv:2109.14957. (PDF) (BibTeX)

bibtex

Ruben Martinez-Cantin

Bayesian deep learning for dynamic interaction applied to visually assistive devices

Summary

Principal investigators

Team

Papers