CVPR 2026: Dense Metric Depth Completion from Sparse Direct Time-of-Flight Sensors

RESEARCH AREAS PEOPLE PUBLICATIONS COURSES ABOUT US

Home / Publications

indicator

Computer Vision and Pattern Recognition (CVPR 2026)

Dense Metric Depth Completion from Sparse Direct Time-of-Flight Sensors

	Hakyeong Kim	Ruicheng Wang	Chengtang Yao	Jiaolong Yang	Min H. Kim
	KAIST	USTC	Microsoft Research Asia		KAIST


	Zero-shot generalization of our model across different dToF sensing conditions. Top: sparse depth inputs from three representative settings—autonomous driving LiDAR (rotating dToF), lightweight mobile dToF, and extremely sparse random sampling. Bottom: our predicted dense metric depth for each case, demonstrating strong robustness under diverse sparsity, noise, and sensor patterns. Right: comparison of accuracy versus inference time (bubble size indicates model parameters). Our method achieves the most favorable balance of low error and fast runtime, outperforming state-of-the-art depth completion and enhancement approaches.


	Abstract

	Direct Time-of-Flight (dToF) sensors provide highly accurate metric depth and are more robust than indirect ToF systems in challenging real-world conditions. However, their high manufacturing cost and limited photodiode array size produce depth maps that are extremely sparse, low-resolution, and noisy, making them unsuitable for VR/XR, robotics, and 3D perception tasks that require dense metric depth. Existing monocular and depth completion methods struggle to handle the unique sampling patterns and hardware artifacts of dToF devices, and their performance often deteriorates significantly under severe sparsity or noise. We present a generalizable framework for dense metric depth completion from sparse dToF measurements, capable of operating across diverse sensor types, sparsity levels, and noise conditions. Our model employs a depth-guided dual-branch Vision Transformer encoder that processes RGB images and sparse dToF measurements separately, while a masked joint attention module allows depth tokens to reliably guide image features without being overwritten by them. A lightweight decoder reconstructs dense metric depth efficiently, without diffusion-based or refinement-heavy post-processing. To address the scarcity of paired training data, we introduce a comprehensive dToF simulation pipeline that reproduces the characteristics of flash, sub-VGA flash, and rotating sensors, including hardware-induced degradation, irregular sparsity, and realistic noise distributions. Trained entirely on synthetic data, our model achieves strong zero-shot generalization across 6 datasets and 3 real dToF devices, outperforming state-of-the-art approaches in both accuracy and computational efficiency. This establishes a robust and practical solution for dense metric depth completion from sparse direct ToF sensors.


	BibTeX
	@InProceedings{Kim_2026_CVPR, author = {Hakyeong Kim and Ruicheng Wang and Chengtang Yao and Jiaolong Yang and Min H. Kim}, title = {Dense Metric Depth Completion from Sparse Direct Time-of-Flight Sensors}, booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026} }



	Preprint paper: PDF (1.6MB)
	Supplemental material #1: PDF (6.5MB)
	GitHub code (TBA)

Hosted by Visual Computing Laboratory, School of Computing, KAIST.