Time-of-flight (ToF) imaging has become a widespread technique for depth estimation, allowing affordable off-the-shelf cameras to provide depth maps in real time. However, multipath interference (MPI) resulting from indirect illumination significantly degrades the captured depth. Most previous works have tried to solve this problem by means of complex hardware modifications or costly computations. In this work we avoid these approaches, and propose a new technique that corrects errors in depth caused by MPI that requires no camera modifications, and corrects depth in just 10 milliseconds per frame. By observing that most MPI information can be expressed as a function of the captured depth, we pose MPI removal as a convolutional approach, and model it using a convolutional neural network. In particular, given that the input and output data present similar structure, we base our network in an autoencoder, which we train in two stages: first, we use the encoder (convolution filters) to learn a suitable basis to represent corrupted range images; then, we train the decoder (deconvolution filters) to correct depth from the learned basis from synthetically generated scenes. This approach allows us to tackle the lack of reference data, by using a large-scale captured training set with corrupted depth to train the encoder, and a smaller synthetic training set with ground truth depth to train the corrector stage of the network, which we generate by using a physically-based, time-resolved rendering. We demonstrate and validate our method on both synthetic and real complex scenarios, using an off-the-shelf ToF camera, and with only the captured incorrect depth as input.


Dataset Contents

The dataset is provided in HDF5 format, and contains labeled Time-of-Flight depth simulations for a set of 1050 viewpoints and albedo combinations (augmented to a total of 8400) in different diffuse scenes. In particular we provide the ToF depth images and amplitude (with MPI errors), and their respective reference depth images, all at a resolution of 256x256 pixels. The ToF depths and amplitudes were obtained by simulating the ToF imaging model with sinusoidal 20MHz modulation frequency. The time-resolved simulations were obtained with the physically-based transient framework by Jarabo and colleagues [2014]. If you plan to use this dataset, please cite both works appropriately using the provided bibtex snippets.

Additionally, we can provide the 1050 time-resolved physically-based simulations taken by the transient renderer. These simulations have a temporal resolution of 16 picoseconds, with a maximum Time-of-Flight of 65 nanoseconds (20 meters in vacuum). Since these time-resolved renders are quite heavy (more than just a few GBs), if you want them please drop us an email and we will find an appropriate way to share them.

Supplemental Video


@article{MarcoSIGA2017DeepToF, author = {Marco, Julio and Hernandez, Quercus and Mu\~{n}oz, Adolfo and Dong, Yue and Jarabo, Adrian and Kim, Min and Tong, Xin and Gutierrez, Diego}, title = {DeepToF: Off-the-Shelf Real-Time Correction of Multipath Interference in Time-of-Flight Imaging}, journal = {ACM Transactions on Graphics (SIGGRAPH Asia 2017)}, volume = {36}, number = {6}, year = {2017} }

Related Bibtex


We want to thank the anonymous reviewers for their insightful comments, Belen Masia for proofreading the manuscript, and the members of the Graphics & Imaging Lab for helpful discussions. We also thank David Jimenez for providing the code for our comparisons. This project has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (CHAMELEON project, grant agreement No 682080), DARPA (project REVEAL), and the Spanish Ministerio de Economía y Competitividad (projects TIN2016-78753-P and TIN2014-61696-EXP). Min H. Kim acknowledges Korea NRF grants (2016R1A2B2013031, 2013M3A6A6073718), Giga KOREA Project (GK17P0200) and KOCCA in MCST of Korea. Julio Marco was additionally funded by a grant from the Gobierno de Aragón.