VCLab

 

RESEARCH AREAS   PEOPLE   PUBLICATIONS   COURSES   ABOUT US
Home / Publications

indicator

Computer Vision and Pattern Recognition (CVPR 2026)

 
Dense Metric Depth Completion from Sparse Direct Time-of-Flight Sensors
 
  Hakyeong Kim Ruicheng Wang Chengtang Yao Jiaolong Yang Min H. Kim  
  KAIST USTC Microsoft Research Asia KAIST  
 
 
  Zero-shot generalization of our model across different dToF sensing conditions. Top: sparse depth inputs from three representative settings—autonomous driving LiDAR (rotating dToF), lightweight mobile dToF, and extremely sparse random sampling. Bottom: our predicted dense metric depth for each case, demonstrating strong robustness under diverse sparsity, noise, and sensor patterns. Right: comparison of accuracy versus inference time (bubble size indicates model parameters). Our method achieves the most favorable balance of low error and fast runtime, outperforming state-of-the-art depth completion and enhancement approaches.  
   
  Abstract
   
 

Direct Time-of-Flight (dToF) sensors provide highly accurate metric depth and are more robust than indirect ToF systems in challenging real-world conditions. However, their high manufacturing cost and limited photodiode array size produce depth maps that are extremely sparse, low-resolution, and noisy, making them unsuitable for VR/XR, robotics, and 3D perception tasks that require dense metric depth. Existing monocular and depth completion methods struggle to handle the unique sampling patterns and hardware artifacts of dToF devices, and their performance often deteriorates significantly under severe sparsity or noise. We present a generalizable framework for dense metric depth completion from sparse dToF measurements, capable of operating across diverse sensor types, sparsity levels, and noise conditions. Our model employs a depth-guided dual-branch Vision Transformer encoder that processes RGB images and sparse dToF measurements separately, while a masked joint attention module allows depth tokens to reliably guide image features without being overwritten by them. A lightweight decoder reconstructs dense metric depth efficiently, without diffusion-based or refinement-heavy post-processing. To address the scarcity of paired training data, we introduce a comprehensive dToF simulation pipeline that reproduces the characteristics of flash, sub-VGA flash, and rotating sensors, including hardware-induced degradation, irregular sparsity, and realistic noise distributions. Trained entirely on synthetic data, our model achieves strong zero-shot generalization across 6 datasets and 3 real dToF devices, outperforming state-of-the-art approaches in both accuracy and computational efficiency. This establishes a robust and practical solution for dense metric depth completion from sparse direct ToF sensors.

     
   
  BibTeX
 
@InProceedings{Kim_2026_CVPR,
   author = {Hakyeong Kim and Ruicheng Wang and 
   Chengtang Yao and Jiaolong Yang and Min H. Kim},
   title = {Dense Metric Depth Completion from Sparse Direct Time-of-Flight Sensors},
   booktitle = {IEEE Conference on Computer Vision and 
      Pattern Recognition (CVPR)},
   month = {June},
   year = {2026}
} 
   
   
icon
Preprint paper:
PDF (1.6MB)
icon
Supplemental
material #1:
PDF (6.5MB)
www GitHub
code (TBA)
 

Hosted by Visual Computing Laboratory, School of Computing, KAIST.

KAIST