With the rapid advancement of networks, graphics hardware, and computing techniques, the 3D information has been widely applied in various domains, such as virtual reality and medical industry. The proliferation of such applications has produced explosively growing 3D multimedia data, which lead to the requirement for large-scale 3D data analysis. The traditional 3D feature representation and learning algorithms may not work well for large-scale analysis, such as 3D retrieval and recognition, since the computational cost in such cases is much larger. How to design efficient and effective 3D feature representation and learning techniques to deal with the 3D big data is desirable and meaningful. This special issue targets the most recent technical progress on the analysis and applications of large-scale 3D multimedia.

Submissions came from an open call for papers and with the assistance of professional referees. Eight papers are finally selected out from in total 24 submissions after two rounds of rigorous reviews. These accepted papers cover several popular topics of large-scale 3D multimedia analysis and applications, including 3D recognition, retrieval, compression, preprocessing, etc. We summarize these papers as follows:

In “3D skeleton based action recognition by video-domain translation-scale invariant mapping and multi-scale dilated CNN” [4], Li et al. propose an action recognition approach from 3D skeleton videos. By a video domain translation-scale invariant image mapping, the 3D skeleton videos are transformed to skeleton color images, which are input to a multi-scale dilated CNN for action recognition. Further, different data augmentation strategies are designed to improve the generalization and robustness of the method. Experimental results on several popular benchmark datasets demonstrate the superiority of the approach with significant performance improvements.

The paper “Dense matching for multi-scale images by propagation” [3] presents a new dense matching algorithm between two non-stereoscopic images. There are no specific restrictions on the capture conditions. Based on the identified points of interest, the propagation process is guided according to the geometric constraints. The matching algorithm is verified on different tasks, i.e. 3D reconstruction. This dense matching method can be useful for 3D retrieval. Instead of identifying points of interest first, Nie et al. propose an end-to-end multi-scale CNN (MSCNN) for 3D model retrieval in the paper “Multi-scale CNNs for 3D model retrieval” [6]. Multiple rendered images are extracted from a 3D object, which are combined into one representative view and input to the network. The multi-layer design can effectively save local and global information for each 3D model. To compute the similarity between two different 3D models for retrieval, the Euclidean metric is employed. The retrieval and classification results on the NTU dataset show the superiority.

It is a challenging task for network service providers to deliver the multi-view plus depth (MVD) 3D videos over Internet with the best user’s Quality of Experience in dynamic network condition. In “User-perceived quality aware adaptive streaming of 3D multi-view video plus depth over the internet” [2], Karn et al. comprehensively analyze the dynamic network environment for streaming of 3D MVD over Internet. A new multi-view video streaming method is further designed to concentrate on decision strategy by user-perceived quality responsive adaptation. The simulation results show significant quality improvements under challenging network conditions. Another way to transmit and store 3D video efficiently is compression. In “Perceptual rate distortion optimization of 3D-HEVC using PSNR-HVS” [8], Valizadeh et al. propose to integrate a perceptual video quality metric inside the rate distortion optimization process of the 3D-HEVC. Based on the characteristics of the human visual system, PSNR-HVS is used as a measure for distortion in the coding unit mode selection process. The proposed method achieves 2.78% 3D-HEVC compression efficiency improvement tested on different standard multiview video sequences.

When dealing with the 3D data with complex scenes, some preprocessing techniques are required. One typical preprocessing is denoising. The paper “Feature-preserving mesh denoising based on guided normal filtering” [5] proposes a feature-preserving mesh denoising algorithm based on face classification. When denoising, the sharp features which play a key role in 3D models are kept unchanged. The multi-scale tensor voting is used to classify the faces into feature faces and non-feature faces. Based on the sub-neighborhood of the feature faces, the second-order joint bilateral filter is employed for the face normal field and vertex position update. Experiments on various synthetic datasets demonstrate the effectiveness of the proposed method. Another typical preprocessing is to subtract the background to extract the foreground objects. In “End-to-end video background subtraction with 3d convolutional neural networks” [7], Sakkos et al. propose an end-to-end temporal-aware background subtraction approach with 3D convolutional neural networks. By performing 3D convolutions on the 10 most recent frames of the video, the changes on the spatial and temporal dimension are simultaneously tracked. In this way, the model can effectively track the movement of the foreground and the relations between neighboring pixels using multi-modal features. The experiments show a significant improvement as compared with several baseline methods.

To better understand the pathophysiological processes of ischemic heart diseases, Gao et al. present a robust motion estimation framework with an adaptive biomechanical model constraint using dual H criteria in the paper “Robust recovery of myocardial kinematics using dual H criteria” [1]. The design of two iterative H filters, i.e. the kinematics estimation filter and the elasticity estimation filter, ensures the robustness to noise changes and the flexibility to new algorithm incorporation. The results on the motion parameter estimation from MRI cardiac images demonstrate its effectiveness.

To conclude, the eight papers contained in this special issue cover several emerging topics of large-scale 3D multimedia analysis and applications. We sincerely hope that these papers can provide interesting insights and inspirations for the researchers and engineers in the related fields. We would like to thank Prof. Borko Furht for providing us the opportunity to organize this special issue. We also thank the reviewers for their generous efforts in reviewing the papers, which guarantee the high quality of accepted papers. Finally, we thank all the authors who have contributed to this special issue. Thanks to all the people who help us to make this special issue a successful one.