Semantic Visual SLAM in Populated Environments

Abstract

We propose a visual SLAM (Simultaneous Localization And Mapping) system able to perform robustly in populated environments. The image stream from a moving RGB-D camera is the only input to the system. The computed map in real-time is composed of two layers. 1) The unpopulated geometrical layer, which describes the geometry of the bare scene as an occupancy grid where pieces of information corresponding to people have been removed. 2) A semantic human activity layer, which describes the trajectory of each person with respect to the unpopulated map, labelling an area as traversable or occupied. Our proposal is to embed a real-time human tracker into the system. The purpose is twofold. First, to mask out of the rigid SLAM pipeline the image regions occupied by people, which boosts the robustness, the relocation, the accuracy and the reusability of the geometrical map in populated scenes. Secondly, to estimate the full trajectory of each detected person with respect to the scene map, irrespective of the location of the moving camera when the person was imaged. The proposal is tested with two popular visual SLAM systems, C2TAM and ORBSLAM2, proving its generality. The experiments process a benchmark of RGB-D sequences from camera onboard a mobile robot. They prove the robustness, accuracy and reuse capabilities of the two layer map for populated scenes.

Publication
European conference on mobile robots (ECMR), 2017
Luis Riazuelo
Luis Riazuelo
Assistant professor