[1]郑 游,王 磊*,杨紫文.基于多尺度深度图自适应融合的单目深度估计[J].武汉工程大学学报,2024,46(01):85-90.[doi:10.19843/j.cnki.CN42-1779/TQ.202306025]
 ZHENG You,WANG Lei*,YANG Ziwen.Monocular depth estimation based on adaptive fusion ofmulti-scale depth maps[J].Journal of Wuhan Institute of Technology,2024,46(01):85-90.[doi:10.19843/j.cnki.CN42-1779/TQ.202306025]





Monocular depth estimation based on adaptive fusion of
multi-scale depth maps
1674 - 2869(2024)01 - 0085 - 06
郑 游王 磊*杨紫文
武汉工程大学电气信息学院,湖北 武汉 430205
School of Electrical and Information Engineering,Wuhan Institute of Technology,Wuhan 430205,China
monocular depth estimationattention mechanismmulti-scale feature fusion networkmulti-scale depth adaptive fusion network
深度估计网络通常具有较多的网络层数,图像特征在网络编码和解码过程中会丢失大量信息,因此预测的深度图缺乏对象结构细节且边缘轮廓不清晰。本文提出了一种基于多尺度深度图自适应融合的单目深度估计方法,可有效保留对象的细节和几何轮廓。首先,引入压缩与激励残差网络(SE-ResNet),利用注意力机制对不同通道的特征进行编码,从而保留远距离平面深度图的更多细节信息。然后,利用多尺度特征融合网络,融合不同尺度的特征图,得到具有丰富几何特征和语义信息的特征图。最后,利用多尺度自适应深度融合网络为不同尺度特征图生成的深度图添加可学习的权重参数,对不同尺度的深度图进行自适应融合,增加了预测深度图中的目标信息。本文方法在NYU Depth V2数据集上预测的深度图具有更高的准确度和丰富的物体信息,绝对相对误差为0.115,均方根误差为0.525,精确度最高达到99.3%。

Depth estimation networks usually have a large number of layers,which may lose substantial image information in the process of image feature encoding and decoding,so the predicted depth maps lack detailed object structures and the maps’ edges are not clear. In this paper,a monocular depth estimation method was proposed based on the adaptive fusion of multi-scale depth maps and it can better preserve the object details and geometric contours. First,the squeeze-and-excitation residual network was employed and the attention mechanisms were utilized to encode the feature maps from different channels,and more details of the long-distance depth maps can be reserved. Second,a multi-scale feature fusion network was adopted to fuse the feature maps of different scales,which produced feature maps with rich geometric and semantic information. Third,a multi-scale adaptive depth fusion network was used to add learnable weights to the depth maps generated by feature maps of different scales. Finally,the depth maps of different scales can be adaptively fused and the object information in the predicted depth maps increases. Experiments on NYU Depth V2 dataset demonstrate that the absolute relative error is 0.115,the root mean square error is 0.525,and the accuracy can reach 99.3%. The depth map predictions of the proposed method have higher accuracy and rich object information.


