A Bayesian Exploration-Exploitation Approach for Optimal Online Sensing and Planning with a Visually Guided Mobile Robot

Abstract We address the problem of online path planning for optimal sensing with a mobile robot. The objective of the robot is to learn the most about its pose and the environment given time constraints. We use a POMDP with a utility function that depends on the belief state to model the finite horizon planning problem.We replan as the robot progresses throughout the environment. The POMDP is highdimensional, continuous, non-differentiable, nonlinear, non- Gaussian and must be solved in real-time. Most existing techniques for stochastic planning and reinforcement learning are therefore inapplicable. To solve this extremely complex problem, we propose a Bayesian optimization method that dynamically trades off exploration (minimizing uncertainty in unknown parts of the policy space) and exploitation (capitalizing on the current best solution). We demonstrate our approach with a visually-guide mobile robot. The solution proposed here is also applicable to other closelyrelated domains, including active vision, sequential experimental design, dynamic sensing and calibration with mobile sensors.

Keywords Bayesian Optimization Online Path Planning Sequential Experimental Design Attention and gaze planning Active Vision Dynamic Sensor Networks Active Learning Policy Search Active SLAM Model Predictive Control Reinforcement Learning

Papers

Ruben Martinez-Cantin, Nando de Freitas, Eric Brochu, Jose Castellanos and Arnaud Doucet (2009) A Bayesian Exploration-Exploitation Approach for Optimal Online Sensing and Planning with a Visually Guided Mobile Robot. Autonomous Robots - Special Issue on Robot Learning, Part B, 27(3):93-103. (Project site) (PDF) (BibTeX)

Ruben Martinez-Cantin, Nando de Freitas, Jose Castellanos and Arnaud Doucet (2007) Active Policy Learning for Robot Planning and Exploration under Uncertainty. In Proc. of Robotics: Science and Systems.. (Project site) (PDF) (BibTeX)

Videos

The video shows a mobile robot selecting, in real-time, the most informative trajectory to improve the knowledge of its own localization and the map consisting on the ARtoolkit landmarks. The robot odometry and a cheap webcam is the only input for the robot.

Vimeo link
Youtube Link
Original video (WMV)