A Bayesian Exploration-Exploitation Approach for Optimal Online Sensing and Planning with a Visually Guided Mobile Robot
Abstract We address the problem of online path planning for optimal sensing with a mobile robot. The objective of the robot is to learn the most about its pose and the environment given time constraints. We use a POMDP with a utility function that depends on the belief state to model the finite horizon planning problem.We replan as the robot progresses throughout the environment. The POMDP is highdimensional, continuous, non-differentiable, nonlinear, non- Gaussian and must be solved in real-time. Most existing techniques for stochastic planning and reinforcement learning are therefore inapplicable. To solve this extremely complex problem, we propose a Bayesian optimization method that dynamically trades off exploration (minimizing uncertainty in unknown parts of the policy space) and exploitation (capitalizing on the current best solution). We demonstrate our approach with a visually-guide mobile robot. The solution proposed here is also applicable to other closelyrelated domains, including active vision, sequential experimental design, dynamic sensing and calibration with mobile sensors.
Keywords Bayesian Optimization · Online Path Planning · Sequential Experimental Design · Attention and gaze planning · Active Vision · Dynamic Sensor Networks · Active Learning · Policy Search · Active SLAM · Model Predictive Control · Reinforcement Learning
Ruben Martinez-Cantin, Nando de Freitas, Eric Brochu, Jose Castellanos and Arnaud Doucet (2009) A Bayesian Exploration-Exploitation Approach for Optimal Online Sensing and Planning with a Visually Guided Mobile Robot. Autonomous Robots - Special Issue on Robot Learning, Part B, 27(3):93-103. (Project site) (PDF) (BibTeX)
Ruben Martinez-Cantin, Nando de Freitas, Jose Castellanos and Arnaud Doucet (2007) Active Policy Learning for Robot Planning and Exploration under Uncertainty. In Proc. of Robotics: Science and Systems.. (Project site) (PDF) (BibTeX)
The video shows a mobile robot selecting, in real-time, the most informative trajectory to improve the knowledge of its own localization and the map consisting on the ARtoolkit landmarks. The robot odometry and a cheap webcam is the only input for the robot.