PCDNF: Revisiting Learning-based Point Cloud Denoising via Joint Normal Filtering

Zheng Liu, Sijing Zhan, Yaowu Zhao, Yuanyuan Liu, Renjie Chen, Ying He Z. Liu, S. Zhan, Y. Zhao, and Y. Liu are with School of Computer Science, China University of Geosciences (Wuhan). R. Chen is with School of Mathematical Sciences, University of Science and Technology of China. Y. He is with School of Computer Science and Engineering, Nanyang Technological University. Corresponding author. E-mail:renjiec@ustc.edu.cn.
Abstract

Recovering high quality surfaces from noisy point clouds, known as point cloud denoising, is a fundamental yet challenging problem in geometry processing. Most of the existing methods either directly denoise the noisy input or filter raw normals followed by updating point positions. Motivated by the essential interplay between point cloud denoising and normal filtering, we revisit point cloud denoising from a multitask perspective, and propose an end-to-end network, named PCDNF, to denoise point clouds via joint normal filtering. In particular, we introduce an auxiliary normal filtering task to help the overall network remove noise more effectively while preserving geometric features more accurately. In addition to the overall architecture, our network has two novel modules. On one hand, to improve noise removal performance, we design a shape-aware selector to construct the latent tangent space representation of the specific point by comprehensively considering the learned point and normal features and geometry priors. On the other hand, point features are more suitable for describing geometric details, and normal features are more conducive for representing geometric structures (e.g., sharp edges and corners). Combining point and normal features allows us to overcome their weaknesses. Thus, we design a feature refinement module to fuse point and normal features for better recovering geometric information. Extensive evaluations, comparisons, and ablation studies demonstrate that the proposed method outperforms state-of-the-arts for both point cloud denoising and normal filtering.

Point cloud denoising, normal filtering, 3D deep learning, point cloud processing

1 Introduction

Point clouds are widely used in various fields, e.g., computer graphics, 3D computer vision, photogrammetry, autonomous driving, simultaneous localization and mapping (SLAM), just to name a few. Recently, with the rapid development of modern 3D digital acquisition devices, such as LiDAR and depth cameras, more and more 3D models are routinely obtained and stored as point clouds in shape repositories. Due to physical measurement and reconstruction errors, the acquired point clouds are corrupted by noise inevitably [44]. Noise not only degrades the visual quality of 3D models, but also causes unexpected troubles in downstream applications [10, 46]. Therefore, point cloud denoising is highly desired and often considered as the first step in digital geometry processing. Since noise and geometric features are both of high frequency information, it is challenging to distinguish and recover features while removing noise. Point cloud denoising has been studied extensively in the past two decades. Although considerable progress has been made, traditional denoising methods [29, 24, 9, 8, 23, 21] generally require many parameters and tedious parameter tuning. The tuning process is time-consuming yet crucial for producing promising results.

More recently, the success of deep neural networks for image processing has motivated a data-driven approach for various tasks in point cloud processing, including denoising. The deep learning methods are automatic and avoid parameter tuning, thereby can work for a wider range of 3D models than traditional methods. The existing deep denoising methods can generally be classified into one- and two-stage methods. One-stage methods, such as PointCleanNet [34], Pointfilter[46], RePCD-Net[10], typically use point feature representations to regress a displacement vector per noisy point and adjust its position to the ground-truth directly. Due to lack of consideration of normal information, PointCleanNet [34] and RePCD-Net[10] may blur sharp features, especially in the presence of large-degree noise. Pointfilter [46], which incorporates normal information in its loss function, can better preserve sharp features. However, it may oversmooth geometric details. Two-stage methods, which filter point normals followed by updating their locations, have received widespread attention given their ability to incorporate local geometry information [22, 44]. The main difference among the two-stage methods is in their deep normal filtering networks. Similar to Pointfilter[46], small-scale geometric features including fine details may get blurred when using only the learned normal-based features to update point positions. To address this issue and recover detailed geometric features accurately, the method in [22] relies on the additional feature detection network, while GeoDualCNN [44] needs the guidance of geometry expertise and a feature-preserving position updating algorithm. As we can see, the aforementioned learning-based methods either directly denoise noisy points or filter raw normals followed by updating point positions and therefore cannot be employed for joint denoising and normal filtering. However, as we know, denoising and normal filtering tasks are intertwined inseparably, which influence and benefit each other. If better normals can be estimated from the noisy point cloud, the performance of the denoising task can be improved significantly, and vice versa. However, no existing work can perform denoising and normal filtering tasks jointly.

The above fact motivates us to develop a point-normal feature interaction network in a multitask paradigm. Unlike previous work, we present a unified architecture for jointly learning point cloud denoising and normal filtering. The proposed network has two branches (one for point cloud denoising and the other for normal filtering) that can benefit from each other. The combined technique is able to apply the best properties of each of the two tasks, and try to overcome the weakness of both. Specifically, the proposed network consists of four modules (the multiscale feature extractor, shape-aware selector, feature refinement, and the decoder). The feature extractor uses DGCNN [43] as the backbone for learning point and normal feature representations in a multiscale manner. Then, we feed the learned point and normal features combined with geometric priors to the shape-aware selector to construct the latent tangent space for the specific point, which can reduce the negative impact of those points not in the tangent space. After that, we design a feature refinement module, including two units (feature augmentation and fusion), to promote the network recovery of geometric features more accurately. Specifically, the feature augmentation unit aggregates neighboring features to obtain richer point and normal feature representations. The feature fusion unit integrates the augmented point and normal features to better preserve both structure and detailed geometric features. Finally, we feed the integrated features to the decoder to predict the denoised point coordinate and the filtered normal. To summarize, the main contributions of this work include the following:

  • We propose a novel architecture for jointly learning point cloud denoising and normal filtering. To the best of our knowledge, this is the first end-to-end framework for point cloud denoising from a multitask perspective.

  • To improve the denoising performance, we design a shape-aware selecting module to reduce the negative impact of neighboring points outside the latent tangent space of the specific point. This module can represent the tangent space by leveraging the learned point and normal features with geometric priors.

  • We design a feature refinement module to promote the network preserving geometric features. This module first augments the learned features by enlarging the receptive fields. Then, it integrates the learned point and normal features to complement each other to better recover different types of geometric features (e.g., structure and detailed features).

  • Qualitative and quantitative experiments on synthetic and scanned data demonstrate that our network performs favorably against the state-of-the-art methods for both denoising and normal filtering.

The remainder of this paper is organized as follows. Section 2 briefly reviews the literature on point cloud denoising and normal filtering. The proposed method is elaborated in Section 3, and the experimental results are demonstrated in Section 4. Finally, Section 5 concludes with remarks and discusses directions for future work.

2 Related work

2.1 Point Cloud Denoising

As a fundamental geometry processing problem, point cloud denoising has drawn great attention in the past decades. Due to the abundance of literature on denoising techniques, it is beyond our scope to review all existing work. We refer the interested reader to [52] for a comprehensive review. Here, we first mention relevant traditional methods, and then concentrate on the recent learning-based techniques.

Traditional methods. MLS-relevant (Moving Least Squares) methods [1, 2, 12, 29] project the input point set onto the approximated underlying surface iteratively, which have been designed for reconstructing noise-free surfaces originally. However, these classical methods assume piecewise smooth priors for the underlying surface, which inevitably causes geometric features being smoothed. Later on, Lipman et al. [20] proposed the pioneering method LOP (Locally Optimal Projection), which has been proven successful for point cloud consolidation. LOP and its variants [17, 18, 31, 24] aim at producing a point set to describe the underlying surface while enforcing a uniform distribution. Unfortunately, although LOP-related methods can robustly remove noise and yield uniformly sampled results, they are unable to preserve geometric features in the presence of large noise. Optimization-based methods formulate the denoising process as optimization problems with appropriate priors. Among them, sparse optimization methods, such as [3, 41, 27, 21], are more effective in preserving geometric features (especially for sharp features), based on the prior that geometric features are sparse over underlying surfaces. Recently, using the self-similarity characteristics of underlying surfaces, low-rank and dictionary learning techniques [11, 9, 23, 42, 40] have received attention for providing structural repetition preservation of underlying surfaces. Although optimization-based methods have advantages in preserving some kinds of geometric features, geometric priors may downgrade their performance in preserving other kinds of geometric features. In general, traditional methods have to tackle complex computation or optimization problems, and require a tedious trial-and-error process to produce satisfactory results.

Learning-based methods. Recently, with the development of neural networks[32, 33, 43], deep learning techniques have been introduced into point cloud denoising extensively and achieved impressive results. Roveri et al. [35] proposed PointProNet, a fully differentiable denoising architecture based on 2D CNN, which converts unordered points to regularly sampled height maps. EC-Net[45] and DMRDenoise[25] mainly focus on upsampling and consolidating techniques over point clouds. EC-Net designs an edge-aware consolidation network to denoise point clouds. This method preserves sharp geometric features but retain noise to some extent, especially in the presence of large noise. Luo and Hu proposed a downsample-upsample architecture, called DMRDenoise, to reconstruct noise-free point clouds by learning manifold structures of underlying surfaces. Although DMRDenoise can remove noise successfully during the downsampling stage, it may blur geometric features. TotalDenoising, developed by Hermosilla et al. [15], introduces a spatial prior that steers converge to underlying surfaces without supervision. However, as an unsupervised method, TotalDenoising is sensitive to large noise and may suffer shrinkage artifacts. GPDNet [30] uses the graph-convolutional neural network for denoising. Luo and Hu [26] proposed a novel paradigm of denoising by exploiting the distribution model of noisy point clouds. Most of the above learning-based methods cannot effectively preserve geometric features, especially sharp edges and corners. Moreover, the noise removal performance of these methods decreases evidently as the noise level increases.

More recently, some techniques belonging to the two-stage paradigm [22, 44], i.e., normal filtering followed by updating point coordinates, have been developed for feature-preserving denoising. Lu et al. [22] classified the noisy input as feature points and non-feature points and then predicted multi-normal on the feature points to preserve sharp features. Wei et al. [44] proposed a geometry-supporting dual convolutional neural network (GeoDualCNN) to filter normals, and then updated point coordinates to match the filtered normals. Although GeoDualCNN can produce promising results, the requirement of computing the extra homogeneous neighborhood for each point limits its end-to-end applicability. Compared to those two-stage methods, an alternative paradigm is to develop networks that directly predict displacements of the noisy point cloud and then apply the predicted displacements to reposition point coordinates. Following this paradigm, some methods have been developed [34, 46, 10]. PointCleanNet [34] first excludes outliers and then predicts displacement vectors for the remaining noisy points. This method may retain extra noise on the denoised results and tends to smooth sharp features in varying degrees. To address this issue, Pointfilter [46] introduces a feature-preserving loss function with an encoder-decoder framework. Although Pointfilter can preserve sharp features, it cannot recover small-scale geometric features well. Later, Chen et al. [10] proposed a feature-aware recurrent architecture to learn more representative features for recovering multiscale geometric features effectively. However, in the case of large noise, their method seems to be difficult to keep a balance between noisy removal and recovery of geometric features.

2.2 Point Cloud Normal Filtering

As an important signal which indicates the direction field of the scanned surface, normals of point clouds have been widely applied in various practical problems (e.g., surface reconstruction [19, 16], 3D descriptor[36], registration[48]). However, it is challenging to accurately estimate point normals, since the captured point clouds are inevitably corrupted by noise and outliers. To address this issue, normal filtering has been studied extensively over the past decades. Due to the considerable amount of literature on normal filtering, we only review those learning-based methods related to our work. For a more comprehensive review of normal filtering, readers are referred to [28, 49].

Since the pioneering work of Qi et al. [32], hundreds of work have extended and applied the PointNet architecture to point cloud processing problems. PCPNet [13] is the first to apply the PointNet architecture for estimating normals from noisy point clouds (the normal filtering task). Zhou et al. [50] proposed a plane constraint mechanism to divide neighborhood points into main plane points and error points, and then only used the learned features of the main plane points to regress normals. Their method is robust to noise and neighborhood scales. To overcome the oversmoothing artifacts, Nesti-Net [5] introduces mixtures of experts to predict the optimal neighborhood scale instead of simply concatenating multiple scales together. Their multiscale strategy can improve performance effectively but leads to evident time consumption. Zhou et al. [51] proposed a normal filter based on multipatch stitching. Thanks to their patch-level architecture, their method can reduce computational costs and improve the robustness of noise removal. Instead of directly predicting normals from the learned features, some methods [4, 54, 47] estimate the normal for a specific point by fitting a local underlying surface through its neighboring points and then compute the normal from the fitting surface. These methods are based on the weighted least squares surface fitting of the local geometric neighborhood, which can improve the generalization ability of their networks on real scanning data [54]. Cao et al. [6] learned a latent tangent space representation with a lightweight network, and then utilized a differentiable RANSAC to estimate normals of the underlying surface. Zhou et al. [49] deployed a multi-feature scheme to capture geometric information from multiple feature representations and then updated normals in a refinement system. Unlike these existing methods, we present a unified architecture for jointly learning point cloud denoising and normal filtering. The combined technique is able to apply the best properties of each of the two tasks, and try to overcome the weakness of both. Thus, it performs well in preserving geometric features and removing noise, and at the same time avoids the artifacts in the results.

3 Method

This section starts with an overview of our framework of point cloud denoising with joint normal filtering. Then, we present our network architecture, followed by elaboration on each module of the network. Finally, an end-to-end joint loss function is introduced.

An overview of our joint point cloud denoising and normal filtering network, coined as PCDNF. PCDNF consists of four main modules: the multiscale feature extractor, shape-aware selector, feature refinement (including feature augmentation and fusion units), and the decoder. Given a noisy patch of a point (with the corresponding raw normals of the patch), PCDNF can predict the coordinate and filtered normal of the specific point simultaneously.
Fig. 1: An overview of our joint point cloud denoising and normal filtering network, coined as PCDNF. PCDNF consists of four main modules: the multiscale feature extractor, shape-aware selector, feature refinement (including feature augmentation and fusion units), and the decoder. Given a noisy patch of a point (with the corresponding raw normals of the patch), PCDNF can predict the coordinate and filtered normal of the specific point simultaneously.

3.1 Problem Statement

Point cloud denoising is nontrivial due to the ill-posed nature of the problem. Many learning-based methods [34, 46, 10] cast the noise as pointwise residuals (i.e., a displacement vector per input point), and try to predict the residual vectors in order to smooth the input point cloud. However, given only the point positions, it is still challenging to achieve satisfactory denoising results while preserving geometric features. Note that, high-quality normals can improve denoising performance; conversely, accurate normals can be computed from high-quality point clouds. Thus, as an alternative to directly predict the displacement vectors from the noisy input, we propose to perform point cloud denoising by combining position correction and normal filtering, which is stated as follows

(1)

where and are the noisy point cloud and its corresponding denoised point cloud, denotes the predicted displacement vectors, denotes the raw normals of the noisy input and denotes the corresponding predicted normals. Our method aims to learn a mapping for predicting displacement vectors and filtered normals simultaneously. Then, the denoised point cloud can be derived from (1) straightforwardly.

Feature embedding
(a) Feature embedding
Coordinate regression
(b) Coordinate regression
Normal regression
(c) Normal regression
Fig. 5: Illustration of (a) feature embedding and decoder consisting of (b) coordinate regression and (c) normal regression.

3.2 Network Architecture

Based on the problem statement (1), we design a multitask network, dubbed PCDNF, for joint point cloud denoising and normal filtering. Normal filtering can be seen as an auxiliary task that helps our network better recover noise-free point clouds. Fig. 1 shows the architecture of PCDNF. Our network consists of four modules: the multiscale feature extractor, shape-aware selector, feature refinement, and the decoder.

Specifically, given a noisy patch and its corresponding raw normals, the multiscale feature extractor embeds the inputs into the coarse representations of point and normal features. Then, the shape-aware selector selects the points highly related to the specific point in terms of geometric information and coarse representations. These selected points form a latent tangent space of the specific point, making noise removal more effective. The feature refinement module first encodes local spatial information to augment the similarity representations and then fuses the augmented representations for better geometric features preservation. Finally, the coordinate and filtered normal of the specific point can be predicted by the coordinate and normal regressors of the decoder.

Note that most previous methods either directly predict point displacements for denoising or separate the denoising task into normal filtering and point updating and process them separately. In contrast, we combine the highly related denoising and normal filtering tasks to make them benefit each other.

3.2.1 Multiscale feature extractor

Given a point of the noisy point cloud , a patch centered at is defined as

(2)

where is the number of points within the patch and is the patch radius. The corresponding raw normals of can be denoted as . It is known that the features learned from a single-scale receptive field cannot faithfully describe the local shape of the underlying surface. To address this issue, we propose a multiscale feature extractor using a series of EdgeConv operations [43], which can learn multiscale discriminative representations in both the Euclidean and the feature spaces.

Given an input patch with coordinates and normals , our multiscale feature extractor learns a coarse point feature from , and a coarse normal feature from . Fig. (a)a lists the details of our feature embeddings. We elaborate the process of extracting the coarse point feature . The normal feature embedding is done similarly. For any point , to extract its pointwise feature , we first construct two k-nearest neighbours (kNN) graphs of different sizes in order to capture multiscale geometric information around . Then, the features of with graph sizes and in the -st layer are computed with the EdgeConv operation [43] as

where and denote the neighbors of point in the graphs with sizes and , denotes the multi-layer perception (MLP) parameterized by , denotes the max pooling operation. In order to make the pointwise feature more discriminative, we concatenate features and followed by MLP as

where is the concatenation operation. To further enlarge receptive fields and object relations in the feature space, we perform EdgeConv () in the feature space followed by MLP as

where is the pointwise feature of . Thus, the coarse point feature of patch can be derived straightforwardly. Analogously, we can learn the pointwise normal feature , and directly obtain the coarse normal feature of the patch.

Illustration for selecting similar points to the specific point
(a)
Illustration for selecting similar points to the specific point
(b)
Fig. 8: Illustration for selecting similar points to the specific point plotted in red. The selected similar points in the same latent space of are plotted in green, and the negative points are plotted in blue. The solid line is the latent tangent space of . The dashed line is the patch radius.
Illustration of the shape-aware selector.
Fig. 9: Illustration of the shape-aware selector.

3.2.2 Shape-aware selector

The extracted coarse representations and for point can roughly describe the overall shape of patch . However, outliers, different scales and types of noise, and sampling anisotropy may be contained in the noisy patch. These defects of the patch inevitably hurt network performance and lead to unsatisfactory denoising results. For example, suppose the patch of contains outliers and heavy-noise points; see Fig. (a)a. In this case, these points negatively influence the representations of the patch, which may degrade denoising results, causing problems such as residual noise or shape collapsing. Furthermore, if the patch contains sharp features, the neighbors outside the same tangent space of have negative influences on feature representations of the patch, which may blur geometric features in the denoising results; see Fig. (b)b.

In order to address the above problems, we design a shape-aware selector to choose points similar to the specific point within the patch, which can greatly reduce influences of negative points. Fig. 9 shows the detailed structure of the shape-aware selector. As we can see, the selector leverages the guidance of four types of information, including the coarse point and normal features learned by our feature extractor and two geometry information (distance and angle information). Specifically, we first apply fully connected layers to extract the feature from the four types of information, respectively. Then, we calculate the similarity score between point and using the following score function:

(3)

where denotes fully connected layers. and represent angle and distance information of defined as

The implementation of score function (3) is shown in Fig. 9. It produces a similarity vector , recording the degree of similarity between point and . Then, the points within the patch that have top-K scores are retained, while the others are discarded as follows:

(4)

where is the function that extracts the indices of the largest elements of the given input, and is the similarity point set to the specific point. As a result, we can obtain the similarity representation for point and normal vectors from the similarity point set within the patch as

(5)

where and are pointwise coordinate and normal features of point .

Illustration of similar points selected from three different shaped patches. From top to bottom: patches corrupted with 0%, 0.25%, 0.5% noise, respectively.
From left to right: for each patch, we show the front and side views of the patch to demonstrate the selected points (plotted in dark blue) similar to the specific point (plotted in red). The negative points to the specific point are plotted in light blue.
Fig. 10: Illustration of similar points selected from three different shaped patches. From top to bottom: patches corrupted with 0%, 0.25%, 0.5% noise, respectively. From left to right: for each patch, we show the front and side views of the patch to demonstrate the selected points (plotted in dark blue) similar to the specific point (plotted in red). The negative points to the specific point are plotted in light blue.

Fig. 10 shows examples of selecting similar points from different shapes of patches using our shape-aware selector. Consistent with our intuition, our selector is robust against noise and can choose the similar points distributed on the tangent space of of the specific point.

Illustration of the feature refinement module consisting of feature augmentation and fusion units, using point cloud denoising network branch as example.
Fig. 11: Illustration of the feature refinement module consisting of feature augmentation and fusion units, using point cloud denoising network branch as example.

3.2.3 Feature refinement

Due to the unavoidable trade-off between noise removal and preservation of geometric features, we can remove the noise effectively but inevitably blur some small-scale geometric features and geometric details through the previous two modules (the multiscale feature extractor and shape-aware selector). To address this issue, we propose a feature refinement module to facilitate the feature-preserving ability of the overall network. Fig. 11 shows the proposed module consisting of two units: feature augmentation and feature fusion, which will be detailed in the following.

(a) Illustration of the feature augmentation unit, which enlarges receptive fields (dotted circles) of similar points plotted in green. Thus, the specific point (plotted in red) can obtain a more representative feature from those augmented features of similar points, illustrated in (b).
(a)
(a) Illustration of the feature augmentation unit, which enlarges receptive fields (dotted circles) of similar points plotted in green. Thus, the specific point (plotted in red) can obtain a more representative feature from those augmented features of similar points, illustrated in (b).
(b)
Fig. 14: (a) Illustration of the feature augmentation unit, which enlarges receptive fields (dotted circles) of similar points plotted in green. Thus, the specific point (plotted in red) can obtain a more representative feature from those augmented features of similar points, illustrated in (b).
(a) Noisy input. (b) Denoising result without the feature augmentation unit. (c) Denoising result with the feature augmentation unit.
(a)
(a) Noisy input. (b) Denoising result without the feature augmentation unit. (c) Denoising result with the feature augmentation unit.
(b)
(a) Noisy input. (b) Denoising result without the feature augmentation unit. (c) Denoising result with the feature augmentation unit.
(c)
Fig. 18: (a) Noisy input. (b) Denoising result without the feature augmentation unit. (c) Denoising result with the feature augmentation unit.

Feature augmentation. After the stage of similar point selection, we sequentially augment the learned features of similar points by enlarging the receptive fields for discovering more locally geometric details; see Fig. 14 for illustration. To learn more representative features, for each similar point (to the specific point within the patch), we first search KNN neighbors of it and gather the features of the neighbors according to the similarity scores. Then, the augmented point and normal features of each similar point are computed as

where is the neighboring number of , and is the similarity score obtained with (3). Fig. 18 shows the capability of the feature augmentation unit. Fig. (c)c shows that the proposed unit plays a key role in recovering local geometric features. Without this unit, some geometric details and shallow structures are blurred in the result; see Fig. (b)b.

Illustration of the feature fusion unit. (a) Noisy input. (b) Denoising and normal filtering results without the feature fusion unit. (c) Denoising and normal filtering results with the feature fusion unit. The second row visualizes the normal error maps, measured as the angular differences between the filtered normals and the ground truth.
(a)
Illustration of the feature fusion unit. (a) Noisy input. (b) Denoising and normal filtering results without the feature fusion unit. (c) Denoising and normal filtering results with the feature fusion unit. The second row visualizes the normal error maps, measured as the angular differences between the filtered normals and the ground truth.
(b)
Illustration of the feature fusion unit. (a) Noisy input. (b) Denoising and normal filtering results without the feature fusion unit. (c) Denoising and normal filtering results with the feature fusion unit. The second row visualizes the normal error maps, measured as the angular differences between the filtered normals and the ground truth.
(c)
Fig. 22: Illustration of the feature fusion unit. (a) Noisy input. (b) Denoising and normal filtering results without the feature fusion unit. (c) Denoising and normal filtering results with the feature fusion unit. The second row visualizes the normal error maps, measured as the angular differences between the filtered normals and the ground truth.

Feature fusion. The feature fusion unit is motivated by the following observation. The point features are more instrumental in recovering local geometric features, while the normal features are more suitable for recovering sharp edges, corners, and smooth transition regions. Thus, the fusion of point and normal features can help better recover various types of geometric features. In addition, the global feature can better describe the whole shape of the patch. Specifically, for each similar point within the patch, we concatenate the three types of features together (including point and normal features and the global feature) to obtain the fused features as

where , are the refined point and normal features of the similar point. and are global features of the patch which are obtained by feeding the coarse features and into the max-pooling operator, respectively. We validate the effectiveness of our feature fusion unit in Fig. 22. As Figs. (b)b and (c)c show, incorporating normal features into point features allows the denoising task to better preserve sharp geometric features and smooth regions, while incorporating point features into normal features can yield a more satisfactory normal filtering result, see the normal error maps in the second row of Figs. (b)b and (c)c.

3.2.4 Decoder

Given a point , the feature refinement module generates refined point and normal features of the patch , and , which are inputs to our decoder. Our decoder includes two regressors: coordinate and normal regression. For coordinate regression, we apply MLP and FC layers to predict the displacement vector from the refined point feature , and then obtain the denoised point by adding the predicted displacement vector to the original coordinate; see Fig. (b)b. For normal regression, we use ResNet-like operations [14] to obtain the filtered normal from the refined normal feature ; see the details in Fig. (c)c.

3.3 Training Losses

To train our network in an end-to-end manner, we design three types of losses as our optimization objectives: a point-denoise loss, a normal-filter loss and an orthogonality loss. The joint loss function is formulated as

(6)

where , , and are three hyperparameters that balance the importance of each term. We empirically set , and for training.

Point-denoise loss. To ensure that the denoising result can approximate the underlying surface while preserving sharp features well, we apply a bilateral mechanism proposed in [46] to compute the project distance between the denoised point and its neighboring points within the ground-truth patch. To further improve the distribution of the denoising result, we follow [20, 46, 10] and adopt a repulsive term to penalize those points that are too close to each other. Thus, our two-term point denoising loss is defined as follows:

where is a parameter balances the denoising and uniform distribution terms, is the ground-truth patch centered at the denoised point , and are the ground-truth normals of and . and are two monotonically decreasing functions in terms of distance and normal deviation. is the Gaussian function, and is given as .

Normal-filter loss. For the denoised point , we simply use the Euclidean distance between its filtered normal and the ground truth as our normal-filter loss:

Orthogonality loss. To encourage the denoised point to move in the direction of its filtered normal while ensuring the orthogonality between the filtered normal and the edges connecting the denoised point and its neighboring points, we propose the following orthogonality loss that can constrain the denoising and normal filtering branches at the same time:

4 Experiments and Discussions

We evaluate our method for denoising and normal filtering tasks visually and numerically, and show its superiority compared to the state-of-the-art denoising and normal filtering methods. We further modify our method and conduct ablation studies to validate the effectiveness of each design choice made in our method.

4.1 Experimental Settings

Dataset. To train our end-to-end network for denoising and normal filtering tasks, we adopt the dataset provided by [46]. The training set contains 11 CAD and 11 non-CAD models (clean data without noise). The ground truth point clouds with normal information can be obtained by randomly sampling the clean models. The number of sample points for each model is set uniformly as 100K. For these groundtruth point clouds, we corrupt each of them by Gaussian noise with standard deviations of 0.25%, 0.5%, 1%, 1.5% and 2.5% to the diameter of the bounding box. Thus, the final training set contains 110 noisy point clouds (normals are estimated with PCA) and the corresponding 22 ground truth point clouds (with normals).

To conduct comparisons more effectively for both tasks, we construct a test set consisting of synthetic point clouds and raw scanned data. For synthetic data, we rely on the synthetic dataset released in [46] including three categories: simple, medium, and complicated, in which there are 7, 6, and 7 clean models, respectively. The raw scanned data will be introduced in the following experimental section. Again, the number of sampled points for each model is unified to 100K. Each of the clean point clouds is perturbed by Gaussian noise with standard deviations of 0.25%, 0.5%, 1% and 1.5% to the diagonal of the bounding box.

Network inference. Our multitask network can produce feature-preserving denoising results and satisfactory normal filtering results simultaneously. Our method can be iteratively applied to improve the results when the noise level is high. The denoised points and filtered normals produced by previous iterations can further serve as the inputs to our method for further refinement. Note that this iterative process is only performed during the inference phase.

Implementation details. We implement our network in PyTorch and train it on a single NVIDIA GTX 2080Ti GPU for 45 epochs with the SGD optimizer. The learning rate is decreased from 1e-4 to 1e-8.

Noisy
(a) Noisy
WLOP
(b) WLOP
RIMLS
(c) RIMLS
EC-Net
(d) EC-Net
DMR
(e) DMR
PCN
(f) PCN
PF
(g) PF
Ours
(h) Ours
Fig. 31: Denoising results of synthetic data with CAD models. From left to right: noisy input, results produced by WLOP, RIMLS, EC-Net, DMR, PCN, PF, and our method, respectively. The first and third rows show the denoised results with 1% noise and their zoomed views. The second and fourth rows show the corresponding surface reconstruction results. Our method better removes noise and preserves sharp features.

4.2 Experiments for Point Cloud Denoising

We present visual and numerical comparisons between our method and state-of-the-art denoising methods, including WLOP [17], RIMLS [29], EC-Net [45], DMRDenoise (DMR) [25], PointCleanNet (PCN) [34] and Pointfilter (PF) [46]. For WLOP and RIMLS, we perform the code provided by their authors and carefully tune the parameters to produce the denoised results. For DMR and PCN, we use the code released by their authors to retrain new models over our training set. For EC-Net and Pointfilter, we use the pretrained models provided by the authors. Sometimes, we reconstruct the denoised results produced by the tested methods for enhancing visual effects via RIMLS (provided by Meshlab for feature-preserving reconstruction).

Synthetic data. Fig. 31 demonstrates comparisons of CAD surfaces including sharp features (sharp edges and corners) and smooth regions. The tested CAD surfaces are corrupted by a considerable amount of noise. As shown in Fig. 31, WLOP and EC-Net may retain excessive noise in the results when the noise level is high. Although DMR can remove the noise, it seriously distorts the overall shapes of the surfaces; see Fig. (e)e. RIMLS and PCN recover smooth regions well but blur sharp features to varying degrees; see Figs. (c)c and (f)f. In addition, at the tested noise level, PCN may induce some artifacts in the smooth regions of the results. In contrast, our method and PF effectively preserve sharp features. Compared to PF, our method recovers sharp features more accurately; see Figs. (g)g and (h)h. Thus, our method is superior in preserving sharp features and smooth regions simultaneously to the compared methods.

Noisy
(a) Noisy
WLOP
(b) WLOP
RIMLS
(c) RIMLS
EC-Net
(d) EC-Net
DMR
(e) DMR
PCN
(f) PCN
PF
(g) PF
Ours
(h) Ours
Fig. 40: Denoising results of synthetic data with 1% non-CAD models. From left to right: noisy input, results produced by WLOP, RIMLS, EC-Net, DMR, PCN, PF, and our method, respectively. The zoomed views, shown in the first row, highlight that our method better preserves detailed features.
Noisy
(a) Noisy
WLOP
(b) WLOP
RIMLS
(c) RIMLS
EC-Net
(d) EC-Net
DMR
(e) DMR
PCN
(f) PCN
PF
(g) PF
Ours
(h) Ours
Fig. 49: Denoising results of synthetic data with 0.5% non-CAD models. From left to right: noisy input, results produced by WLOP, RIMLS, EC-Net, DMR, PCN, PF, and our method, respectively. The zoomed views, shown in the first row, highlight that our method better preserves multiscale features.

Fig. 40 gives comparisons of a non-CAD surface corrupted by considerable noise. WLOP and EC-Net fail to remove noise entirely in this example. Although DMR does a good job of noise removal, it causes shape distortion and induces many bumps in the result; see Fig. (e)e. RIMLS and PCN seem to have difficulty balancing the performance of noise removal and feature recovery when the noise level is high; see Figs. (c)c and (f)f. PF and our method can generate visually better results than the other tested methods However, from the zoomed views of (g)g and (h)h, we observe that the trunk of the elephant in the result produced by PF is swelling; in contrast, our result does not induce this artifact. Therefore, our method visually yields the best result for faithfully preserving geometric features.

Fig. 49 shows comparisons of a non-CAD surface with multiscale geometric features. The tested non-CAD surface is corrupt by moderate noise. Except for WLOP, all the other methods can remove noise effectively. DMR flattens medium- and small-scale features and induces distortion and swelling in the ear regions of the bunny surface; see Fig. (e)e. RIMLS and PF oversmooth small-scale features and fine details to varying degrees, and PF makes this situation even worse; see the reconstruction results in Figs. (c)c and (g)g. Furthermore, although EC-Net and PCN can preserve different levels of geometric features in a better manner, they induce some artifacts in the ear regions of the results. These artifacts degrade the visual quality of the denoised results and further affect the reconstruction results; see the zoomed views and reconstruction results of Figs. (d)d and (f)f. In contrast, our method outperforms the other methods in terms of recovering most geometric features and preventing inducing annoying artifacts.

Noisy
(a) Noisy
WLOP
(b) WLOP
RIMLS
(c) RIMLS
EC-Net
(d) EC-Net
DMR
(e) DMR
PCN
(f) PCN
PF
(g) PF
Ours
(h) Ours
Fig. 58: Denoising results of data captured by Kinect. From left to right: noisy input, results produced by WLOP, RIMLS, EC-Net, DMR, PCN, PF, and our method.
Noisy
(a) Noisy
WLOP
(b) WLOP
RIMLS
(c) RIMLS
EC-Net
(d) EC-Net
DMR
(e) DMR
PCN
(f) PCN
PF
(g) PF
Ours
(h) Ours
Fig. 67: Denoising results of the laser scanned point cloud. From left to right: noisy input, results produced by WLOP, RIMLS, EC-Net, DMR, PCN, PF, and our method. The zoomed views, highlight that our method better keeps geometrical features.
Real scanned outdoor scene
(a) Real scanned outdoor scene
Noisy (b) Noisy WLOP (c) WLOP RIMLS (d) RIMLS EC-Net (e) EC-Net DMR (f) DMR PCN (g) PCN PF (h) PF Ours (i) Ours
Fig. 77: Denoising results of a real scanned point cloud scene. The zoomed views highlight that our method better removes heavy noise and avoids introducing additional artifacts.

Scanned data. To further verify the effectiveness of our method, we test it on real scanning data and do not retrain our method (by using any scanned data). Fig. 58 shows the results for Kinect scanning data. As we can see, for the CAD surfaces in the first and second rows, all the other methods retain additional bumps on the denoised results. In contrast, our method can effectively prevent visible artifacts and show its superior performance in preserving geometric features and recovering smooth regions. For the non-CAD surface in the third row of Fig. 58, RILMS, PF, and our method can generate more natural results than the other methods; see Figs. (c)c, (g)g, and (h)h. However, from the numerical metrics in Table I for the tested non-CAD surface (David), we observe that the error of our method is lower than those of RIMLS and PF, which shows that our method outperforms RIMLS and PF in dealing with the scanned surface containing complex surface characteristics.

Then, we further evaluate the effectiveness of our method on laser scanned data. As shown in Fig. 67, although all of the tested methods can remove noise and preserve geometric features to some extent, our method can yield more appealing results with neat structure features preserved and clean smooth regions recovered; see the zoomed views of Fig. 67. Furthermore, we test our method on raw outdoor scenes from the Paris-rue-Madame Database [38]. As shown in Fig. 77, compared with the other methods, our method can produce better denoised results with fewer outliers in terms of removing the large noise of the outdoor scene.

Noisy
(a) Noisy
WLOP
(b) WLOP
RIMLS
(c) RIMLS
EC-Net
(d) EC-Net
DMR
(e) DMR
PCN
(f) PCN
PF
(g) PF
Ours
(h) Ours
Fig. 86: Denoising results of irregular sampling data. From left to right: noisy input, results produced by WLOP, RIMLS, EC-Net, DMR, PCN, PF, and our method.
1%
(a) 1%
2%
(b) 2%
3%
(c) 3%
4%
(d) 4%
Fig. 91: Denoising results of Cylinder corrupted by different levels of noise. The first row shows noisy point clouds (1%, 2%, 3%, and 4% noise), while the second row shows the corresponding denoising results produced by our method.
Denoising result of Trim_star with outliers. Noisy input (left) and the corresponding denoising result (right).
(a)
Denoising result of Trim_star with outliers. Noisy input (left) and the corresponding denoising result (right).
(b)
Fig. 94: Denoising result of Trim_star with outliers. Noisy input (left) and the corresponding denoising result (right).

Robustness tests. To further verify the robustness of our method, we demonstrate the performances of our method against irregular sampling, different noise levels, and outliers in the following. We show the effectiveness of our method against nonuniform sampling in Fig. 86. Although the noisy surface suffers from irregular sampling, our method is noticeably better than all the other compared methods, which can produce a compelling result that preserves sharp edges and corners. Fig. 91 shows the robustness of our method against different noise levels. As we can see in Figs. (a)a, (b)b, and (c)c, our method can remove noise while preserving sharp features well when increasing the noise level. However, when the noise level is larger than the geometric feature sizes, our method may blur some geometric features and cannot produce satisfactory results; see Fig. (d)d. Since our method takes into account the handling of outliers, it can deal with the surface corrupted by a larger number of outliers effectively; see Fig. (a)a.

Model WLOP RIMLS EC-NET DMR PCN PF Ours
PartLp 18.09;  89.04 2.78; 708.21 4.15; 51.14 3.32; 14.30 1.84; 704.00 1.02; 152.08 0.93; 311.08
Boxunion 4.80;  94.70 2.57; 669.28 4.29; 49.82 2.98; 20.60 1.98; 726.03 1.29; 170.20 1.21; 303.54
Elephant 35.10;107.25 11.42; 551.68 9.83; 52.19 7.01; 14.70 3.65; 726.54 4.48; 163.87 2.87; 310.92
Bunny_Hi 1.29;  56.18 1.06; 187.73 0.90; 52.18 1.87; 15.20 1.03; 736.06 0.81; 155.43 0.70; 311.30
Cone 20.97;  10.73 18.04;  27.23 19.74;   6.67 28.19;   2.40 19.47;  47.92 17.54;  29.50 17.08;  34.08
Pyramid 20.18;    9.06 17.26;  21.77 19.86;   8.56 22.21;   2.80 20.16;  56.28 18.20;  20.41 17.17;  45.07
David 15.78;  13.80 19.03;  40.02 15.27;   9.72 26.47;   2.90 15.30;  67.91 16.37;  26.23 15.01;  49.13
TABLE I: Quantitative evaluation of results in Figs. 31, 40, 49 and 58. For each result, we list CD error ( ) and running time (in seconds).
Quantitative comparisons of different denoising methods on the synthetic dataset with different noise levels.
(a)
Fig. 96: Quantitative comparisons of different denoising methods on the synthetic dataset with different noise levels.

Quantitative evaluation. We observe from the aforementioned qualitative comparisons that our method can generate visually better denoised results than the competing methods. Here, we compare them numerically. We utilize the Chamfer Distance (CD) to measure the fidelity of the denoised result to the ground truth point cloud. The CD error metric is widely used in the work [10, 46, 34], and a lower CD value denotes a better denoised result. We first compare the examples in Figs. 31, 40, 49 and 58 and list the evaluation results in Table I. As we can see, our method outperforms the competing ones because our CD errors are significantly smaller than all the other compared methods. Then, we further compare our method to those deep learning methods (EC-Net, DMR, PCN, and PF) on the test set introduced in subsection 4.1. The evaluation results on the test set are shown in Fig. 96. When the noise level is low, moderate, and considerable (0.25%, 0.5%, and 1.0% noise), our method clearly outperforms the learning-based methods in terms of the averages of the CD error metric. Moreover, under the highest noise level (1.5% noise), PF and our method both show lower error values, indicating that these two methods can produce more compelling results than the other competing methods. Accordingly, the quantitative evaluations are consistent with the visual comparisons, both of which show the superiority of our method.

Computational time. We also record the CPU costs of all the tested methods in Table I. As seen, DMR is the fastest method, while PCN is the slowest one. Our method ranks fourth in speed among the learning-based methods. In particular, the traditional methods (WLOP, RIMLS) require trial-and-error efforts to tune parameters to produce satisfactory results in practice; thus, we only discuss the inference time of the deep learning methods (EC-Net, DMR, PCN, PF, and ours). DMR and EC-Net are faster than the other deep learning methods (PCN, PF, and ours). The reason is that DMR and EC-Net have an analogously upsampling process and only use a few patches of the noisy point cloud as input. In contrast, the PCN, PF, and our method use a pointwise manner to deal with the noisy input; thus, the inference times of these three methods are higher. Moreover, due to the network complexity, our method takes less time than the PCN and more time than the PF. In summary, although the inference time of our method seems to be slightly computationally intensive, it can generate more appealing results in terms of visual quality and CD error metrics in most examples.

(a)  Jet
(a) (a)  Jet
(b) PCPNet
(b) (b) PCPNet
(c) DeepFit
(c) (c) DeepFit
(d) AdaFit
(d) (d) AdaFit
(e) Ours
(e) (e) Ours
Visualization of normal error for synthetic data with 0.5% noise. From left to right: Jet, PCPNet, DeepFit, AdaFit, and our method. The point clouds are color-coded based on angular difference, with a color map given by the color bar on the right. The numerical value denotes RMSE, and lower error is better.
(f)
Fig. 103: Visualization of normal error for synthetic data with 0.5% noise. From left to right: Jet, PCPNet, DeepFit, AdaFit, and our method. The point clouds are color-coded based on angular difference, with a color map given by the color bar on the right. The numerical value denotes RMSE, and lower error is better.
RGB image
(a) RGB image
Jet
(b) Jet
PCPNet
(c) PCPNet
DeepFit
(d) DeepFit
AdaFit
(e) AdaFit
Ours
(f) Ours
Fig. 110: Visual comparison of normal estimation results on scanned point clouds from the NYU Depth V2 dataset[39]. From left to right: RGB image (noisy input), Jet, PCPNet, DeepFit, AdaFit, and our method.

4.3 Experiments for Normal Filtering

The normal filtering task also plays an important role in our method. We compare our method with state-of-the-art normal filtering methods, including the traditional method of Jet [7] and the deep learning methods of PCPNet [13], DeepFit [4] and AdaFit [54].

Qualitative comparisons. Fig. 103 visualizes the error map for normals, where the error is defined as the angular deviations between the filtered normals and the ground truth normals. As we can see, Jet and PCPNet produce higher errors in geometric feature regions. DeepFit can deal with smoothly curved regions well, but it blurs small-scale features. AdaFit effectively recovers small-scale features and fine details but slightly oversmoothes sharp features. In contrast, our method produces the desired results in terms of preserving sharp and detailed features. To test the generalization capability of our method, we also present Fig. 110 for comparisons between our method and four other methods applied to real indoor scenes acquired by Kinect sensors (NYU Depth V2 dataset [39]). Note that we do not train all the tested methods on scanned scenes. For a fair comparison, we only use the filtered normals produced by our method instead of retaining denoised point coordinates. As we can see in Fig. 110, Jet retains considerable noise in the results. PCPNet can smooth noisy surfaces effectively but flatten medium- and small-scale geometric features. DeepFit and AdaFit preserve geometric features well even for small-scale and detailed features, although they induce some bumps in the results. The reason for inducing the bumpy artifacts may be as follows. Since both noise and geometric details are high frequency information, DeepFit and Adafit may erroneously restore some high level noise as geometric features. Compared to DeepFit and AdaFit, our method can better preserve structure features, although some geometric details are flattened slightly. Moreover, our method tends to produce visually cleaner results without noticeable artifacts.

Comparison of RMSE for normal estimation for classical geometric and learning-based methods.
(a)
Fig. 112: Comparison of RMSE for normal estimation for classical geometric and learning-based methods.

Quantitative Comparisons. To further compare the quality of the normal filtering results, we adopted the root mean squared error (RMSE) of the angle difference to quantitatively evaluate the filtered normals, as suggested in the previous work [13], [54]. Lower RMSE values indicate better normal filtered results. As we can see in Fig. 103, the results of our method have the lowest RMSE values for all the tested examples. We also compare our method with the other four on the test set introduced in subsection 4.1. As we can see in Fig. 112, when the noise is low and moderate (0.25%, 0.5% noise), our results show the lowest average RMSE values. While the noise level is increasing (1.0%, 1.5% noise), AdaFit achieves the best results, but our results are comparable to those of AdaFit (our method produces the second-best results).

CD errors ( )
Noise level V1 V2 V3 Full
0.25% 0.64 0.62 0.63 0.61
0.5% 0.87 0.81 0.82 0.80
1% 1.52 1.33 1.39 1.31
1.5% 2.99 2.88 2.80 2.76
TABLE II: Ablation analysis: quantitative comparisons of different network versions.

4.4 Ablation Studies

We verify the individual contributions of the major modules in our network by conducting the following three ablation studies on our test dataset:

  • Removing feature selection and refinement modules (V1).

  • Removing feature refinement module (V2).

  • The normal filtering network branch is removed (V3).

Table II lists the denoising results of our method and three ablated variants. From the table, we have the following observations. All three variants show lower performance than our full pipeline. Each module of our pipeline is necessary to ensure high-quality denoising results. More specifically, by comparing V1 with V2, it can be seen that the feature selection module is necessary for removing noise by selecting those feature points with similar characteristics to the denoised point. By comparing V2 with our full pipeline, we can see that the additional feature refinement module positively impacts denoising accuracy. The feature refinement module consists of two units (feature augmentation and fusion). The roles of these two units are demonstrated in Figs. 18 and 22 and explained in subsection 3.2.3. We use a two-branch network structure for point cloud denoising and normal filtering tasks. To demonstrate the positive interaction between these two tasks, we design a variant (V3) that performs only the denoising task. Compare V3 with our full pipeline, we observe that the additional normal filtering branch in the multitask learning of our network helps our network generate better denoised results through the mutual promotion of these two tasks, which confirms our hypothesis.

Noisy
(a) Noisy
PCA
(b) PCA
PCPNet
(c) PCPNet
AdaFit
(d) AdaFit
From left to right: noisy point cloud, denoising and normal filtering results produced by using initial normals estimated by PCA, PCPNet, and AdaFit. The top row shows reconstruction surfaces from denoising results. The bottom row visualizes normal error maps, and the number value denotes RMSE.
(e)
Fig. 118: From left to right: noisy point cloud, denoising and normal filtering results produced by using initial normals estimated by PCA, PCPNet, and AdaFit. The top row shows reconstruction surfaces from denoising results. The bottom row visualizes normal error maps, and the number value denotes RMSE.
(a)
The screened Poisson surface reconstruction (sPSR) algorithm balances smoothness and accuracy via an interpolation weight
Fig. 120: The screened Poisson surface reconstruction (sPSR) algorithm balances smoothness and accuracy via an interpolation weight . The top row shows the surfaces produced by applying sPSR directly to the noisy input with weight , , , . The second row shows the reconstructed surfaces which are computed by applying sPSR (with the same interpolation weight as the top row) to the denoised input. Our reconstructed surfaces are not sensitive to perturbation of .
100K
(a) 100K
30K
(b) 30K
20K
(c) 20K
5K
(d) 5K
Fig. 125: (a) Clean input with 100K points. The sampling ratios of the input in (b), (c), (d) are 0.3, 0.2, 0.05, respectively. The top row shows noisy point clouds, and the bottom row shows the corresponding denoising results.
Application of point cloud registration. (a) Noisy input. (b) Registration result of (a). (c) Registration result after applying our denoising method to (a).
(a)
Application of point cloud registration. (a) Noisy input. (b) Registration result of (a). (c) Registration result after applying our denoising method to (a).
(b)
Application of point cloud registration. (a) Noisy input. (b) Registration result of (a). (c) Registration result after applying our denoising method to (a).
(c)
Fig. 129: Application of point cloud registration. (a) Noisy input. (b) Registration result of (a). (c) Registration result after applying our denoising method to (a).
Application of RANSAC plane fitting. (a) Clean point cloud. (b) The corresponding noisy point cloud. (c) Plane segmentation result produced after applying our method to (b).
(a)
Application of RANSAC plane fitting. (a) Clean point cloud. (b) The corresponding noisy point cloud. (c) Plane segmentation result produced after applying our method to (b).
(b)
Application of RANSAC plane fitting. (a) Clean point cloud. (b) The corresponding noisy point cloud. (c) Plane segmentation result produced after applying our method to (b).
(c)
Fig. 133: Application of RANSAC plane fitting. (a) Clean point cloud. (b) The corresponding noisy point cloud. (c) Plane segmentation result produced after applying our method to (b).

4.5 Limitations

Although our method can produce high-quality results in terms of removing noise and recovering geometric structure and detailed features, it is subject to a couple of limitations.

The results of our method depend on the quality of the input initial normals to some extent. To examine the influence of the input initial normals on our method, we demonstrate our (denoising and normal filtering) results produced with different initial normals as inputs. As Fig. 118 shows, using better initial normals (produced by PCPNet and AdaFit) as inputs, we can obtain more faithful denoising and normal filtering results simultaneously. However, it is a chicken-and-egg problem to obtain a good normal estimation from the noisy point cloud. Thus, to keep the simplicity and generality of our method, we choose to use the normals estimated by PCA as input.

Our method may generate unsatisfactory results when the sampling of the input point cloud is extremely sparse. As shown in Figs. (b)b and (c)c, when the sampling ratios of the input point cloud are 0.3 and 0.2, our method can still produce reliable results, showing the robustness of our method at these sampling rates. However, our method suffers from the shrinkage artifacts when the sample ratio decreases to 0.05; see Fig. (d)d. One possible reason is that, when the sampling rate is very low, our network cannot learn an effective representation from these few points in order to characterize the shape of the patch.

4.6 Applications

To demonstrate the effectiveness of our method in various applications, we take our denoised results as inputs for surface reconstruction [19], point cloud registration [53], and plane fitting [37]. Fig. 120 shows, the reconstruction results from the noisy point cloud suffer from severe artifacts, in contrast, the reconstruction results from our denoised input are of high-quality and feature-preserving. Fig. 129 shows that our denosing method can help with the point cloud registration task, as it is noticeable that the result produced after applying our method is more accurate than that from the raw input. We perform plane fitting [37] on an indoor scene point cloud. As Fig. 133 shows, using our denoised result as input, the RANSAC algorithm [37] can produce a more reliable plane segmentation result, which is closer to the segmentation generated from the corresponding clean point cloud.

5 Conclusion

In this work, we propose a deep learning method for point cloud denoising via joint normal filtering, based on the insight that denoising and normal filtering tasks are intertwined inseparably. Our method takes the noisy point cloud and corresponding initial normals as inputs and simultaneously predicts denoised points and filtered normals in an end-to-end manner. Our method is composed of two novel modules: the shape-aware selector and feature refinement. The shape-aware selector can reduce the negative influence of noise and outliers for feature learning, thus improving the denoising and filtering performance. The feature refinement, consisting of feature augmentation and fusion units, has significant advantages in recovering structure and detailed features. The extensive experiments demonstrate the superiority of incorporating normal filtering in denoising. Specifically, for the denoising task, our method achieves a new state-of-the-art with significant improvements in terms of visual quality and quantitative evaluation. Although our method is not primarily designed for normal filtering, it also performs favorably against state-of-the-art normal filtering methods in most cases.

To our knowledge, this is the first work that attempts to couple the two interdependent tasks of point cloud denoising and normal filtering within a single deep neural network. We believe that there is a rich opportunity for exploring future directions. We can further improve our normal filtering network branch to make it independent of the initial normals, and to address the normal orientation ambiguity problem. As a pointwise denoising approach, our method has a high computational cost on training and inferring. In the future, we will develop a patchwise framework to improve runtime performance.

References

  • [1] M. Alexa, J. Behr, D. Cohen-Or, S. Fleishman, D. Levin, and C. T. Silva (2003) Computing and rendering point set surfaces. IEEE Trans. Vis. Comput. Graph. 9 (1), pp. 3–15. Cited by: §2.1.
  • [2] N. Amenta and Y. J. Kil (2004) Defining point-set surfaces. ACM Trans. Graph. 23 (3), pp. 264–270. Cited by: §2.1.
  • [3] H. Avron, A. Sharf, C. Greif, and D. Cohen-Or (2010) -Sparse reconstruction of sharp point set surfaces. ACM Trans. Graph. 29 (5), pp. 135:1–12. Cited by: §2.1.
  • [4] Y. Ben-Shabat and S. Gould (2020) Deepfit: 3d surface fitting via neural network weighted least squares. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 20–34. Cited by: §2.2, §4.3.
  • [5] Y. Ben-Shabat, M. Lindenbaum, and A. Fischer (2019) Nesti-net: normal estimation for unstructured 3d point clouds using convolutional neural networks. In Proceedings of the IEEE Conference on Cmputer Vision and Pattern Recognition (CVPR), pp. 10112–10120. Cited by: §2.2.
  • [6] J. Cao, H. Zhu, Y. Bai, J. Zhou, J. Pan, and Z. Su (2021) Latent tangent space representation for normal estimation. IEEE Trans. Ind. Electron. 69 (1), pp. 921–929. Cited by: §2.2.
  • [7] F. Cazals and M. Pouget (2005) Estimating differential quantities using polynomial fitting of osculating jets. Comput. Aided Geom. Des. 22 (2), pp. 121–146. Cited by: §4.3.
  • [8] H. Chen, J. Huang, O. Remil, H. Xie, J. Qin, Y. Guo, M. Wei, and J. Wang (2019) Structure-guided shape-preserving mesh texture smoothing via joint low-rank matrix recovery. Comput-Aided Des. 115 (), pp. 122–134. Cited by: §1.
  • [9] H. Chen, M. Wei, Y. Sun, X. Xie, and J. Wang (2019) Multi-patch collaborative point cloud denoising via low-rank recovery with graph constraint. IEEE Trans. Vis. Comput. Graph. 26 (11), pp. 3255–3270. Cited by: §1, §2.1.
  • [10] H. Chen, Z. Wei, X. Li, Y. Xu, M. Wei, and J. Wang (2022) RePCD-Net: feature-aware recurrent point cloud denoising network. Int. J. Comput. Vision 130 (3), pp. 615–629. Cited by: §1, §1, §2.1, §3.1, §3.3, §4.2.
  • [11] J. Digne, S. Valette, and R. Chaine (2017) Sparse geometric representation through local shape probing. IEEE Trans. Vis. Comput. Graph. 24 (7), pp. 2238–2250. Cited by: §2.1.
  • [12] S. Fleishman, D. Cohen-Or, and C. T. Silva (2005) Robust moving least-squares fitting with sharp features. ACM Trans. Graph. 24 (3), pp. 544–552. Cited by: §2.1.
  • [13] P. Guerrero, Y. Kleiman, M. Ovsjanikov, and N. J. Mitra (2018) PCPNet learning local shape properties from raw point clouds. Comput. Graph. Forum 37 (2), pp. 75–85. Cited by: §2.2, §4.3, §4.3.
  • [14] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778. Cited by: §3.2.4.
  • [15] P. Hermosilla, T. Ritschel, and T. Ropinski (2019) Total denoising: unsupervised learning of 3d point cloud cleaning. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 52–60. Cited by: §2.1.
  • [16] F. Hou, C. Wang, W. Wang, H. Qin, C. Qian, and Y. He (2022) Iterative Poisson surface reconstruction (IPSR) for unoriented points. ACM Trans. Graph. 41 (4), pp. 128:1–13. Cited by: §2.2.
  • [17] H. Huang, D. Li, H. Zhang, U. Ascher, and D. Cohen-Or (2009) Consolidation of unorganized point clouds for surface reconstruction. ACM Trans. Graph. 28 (5), pp. 176:1–7. Cited by: §2.1, §4.2.
  • [18] H. Huang, S. Wu, M. Gong, D. Cohen-Or, U. Ascher, and H. R. Zhang (2013) Edge-aware point set resampling. ACM Trans. Graph. 32 (1), pp. 9:1–9:12. Cited by: §2.1.
  • [19] M. Kazhdan and H. Hoppe (2013) Screened poisson surface reconstruction. ACM Trans. Graph. 32 (3), pp. 1–13. Cited by: §2.2, §4.6.
  • [20] Y. Lipman, D. Cohen-Or, D. Levin, and H. Tal-Ezer (2007) Parameterization-free projection for geometry reconstruction. ACM Trans. Graph. 26 (3), pp. 22. Cited by: §2.1, §3.3.
  • [21] Z. Liu, X. Xiao, S. Zhong, W. Wang, Y. Li, L. Zhang, and Z. Xie (2020) A feature-preserving framework for point cloud denoising. Comput-Aided Des. 127 (), pp. 102857. Cited by: §1, §2.1.
  • [22] D. Lu, X. Lu, Y. Sun, and J. Wang (2020) Deep feature-preserving normal estimation for point cloud filtering. Comput-Aided Des. 125 (), pp. 102860. Cited by: §1, §2.1.
  • [23] X. Lu, S. Schaefer, J. Luo, L. Ma, and Y. He (2020) Low rank matrix approximation for 3d geometry filtering. IEEE Trans. Vis. Comput. Graph. 28 (4), pp. 1835–1847. Cited by: §1, §2.1.
  • [24] X. Lu, S. Wu, H. Chen, S. Yeung, W. Chen, and M. Zwicker (2017) GPF: GMM-inspired feature-preserving point set filtering. IEEE Trans. Vis. Comput. Graph. 24 (8), pp. 2315–2326. Cited by: §1, §2.1.
  • [25] S. Luo and W. Hu (2020) Differentiable manifold reconstruction for point cloud denoising. In Proceedings of the ACM International Conference on Multimedia (ACM MM), pp. 1330–1338. Cited by: §2.1, §4.2.
  • [26] S. Luo and W. Hu (2021) Score-based point cloud denoising. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 4583–4592. Cited by: §2.1.
  • [27] E. Mattei and A. Castrodad (2017) Point cloud denoising via moving . Comput. Graph. Forum 36 (8), pp. 123–137. Cited by: §2.1.
  • [28] G. Metzer, R. Hanocka, D. Zorin, R. Giryes, D. Panozzo, and D. Cohen-Or (2021) Orienting point clouds with dipole propagation. ACM Trans. Graph. 40 (4), pp. 1–14. Cited by: §2.2.
  • [29] A. C. Öztireli, G. Guennebaud, and M. Gross (2009) Feature preserving point set surfaces based on non-linear kernel regression. Comput. Graph. Forum 28 (2), pp. 493–501. Cited by: §1, §2.1, §4.2.
  • [30] F. Pistilli, G. Fracastoro, D. Valsesia, and E. Magli (2020) Learning graph-convolutional representations for point cloud denoising. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 103–118. Cited by: §2.1.
  • [31] R. Preiner, O. Mattausch, M. Arikan, R. Pajarola, and M. Wimmer (2014) Continuous projection for fast reconstruction.. ACM Trans. Graph. 33 (4), pp. 47–1. Cited by: §2.1.
  • [32] C. R. Qi, H. Su, K. Mo, and L. J. Guibas (2017) PointNet: deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Cmputer Vision and Pattern Recognition (CVPR), pp. 652–660. Cited by: §2.1, §2.2.
  • [33] C. R. Qi, L. Yi, H. Su, and L. J. Guibas (2017) PointNet++: deep hierarchical feature learning on point sets in a metric space. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), pp. 5100–5109. Cited by: §2.1.
  • [34] M. Rakotosaona, V. La Barbera, P. Guerrero, N. J. Mitra, and M. Ovsjanikov (2020) PointCleanNet: learning to denoise and remove outliers from dense point clouds. Comput. Graph. Forum 39 (1), pp. 185–203. Cited by: §1, §2.1, §3.1, §4.2, §4.2.
  • [35] R. Roveri, A. C. Öztireli, I. Pandele, and M. Gross (2018) PointProNets: consolidation of point clouds with convolutional neural networks. Comput. Graph. Forum 37 (2), pp. 87–99. Cited by: §2.1.
  • [36] R. B. Rusu, N. Blodow, and M. Beetz (2009) Fast point feature histograms (FPFH) for 3d registration. In IEEE International Conference on Robotics and Automation (ICRA), pp. 3212–3217. Cited by: §2.2.
  • [37] R. Schnabel, R. Wahl, and R. Klein (2007) Efficient RANSAC for point-cloud shape detection. 26 (2), pp. 214–226. Cited by: §4.6.
  • [38] A. Serna, B. Marcotegui, F. Goulette, and J. Deschaud (2014) Paris-rue-madame database: a 3d mobile laser scanner dataset for benchmarking urban detection, segmentation and classification methods. In International Conference on Pattern Recognition Applications and Methods (ICPRAM), Cited by: §4.2.
  • [39] N. Silberman, D. Hoiem, P. Kohli, and R. Fergus (2012) Indoor segmentation and support inference from rgbd images. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 746–760. Cited by: Fig. 110, §4.3.
  • [40] G. Sun, C. Chu, J. Mei, W. Li, and Z. Su (2022) Structure-aware denoising for real-world noisy point clouds with complex structures. Comput-Aided Des. 149 (), pp. 103275. Cited by: §2.1.
  • [41] Y. Sun, S. Schaefer, and W. Wang (2015) Denoising point sets via minimization. Comput. Aided Geom. Des. 35 (), pp. 2–15. Cited by: §2.1.
  • [42] J. Wang, J. Jiang, X. Lu, and M. Wang (2022) Rethinking point cloud filtering: a non-local position based approach. Comput-Aided Des. 144 (), pp. 103162. Cited by: §2.1.
  • [43] Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Solomon (2019) Dynamic graph cnn for learning on point clouds. ACM Trans. Graph. 38 (5), pp. 1–12. Cited by: §1, §2.1, §3.2.1, §3.2.1.
  • [44] M. Wei, H. Chen, Y. Zhang, H. Xie, Y. Guo, and J. Wang (2021) GeoDualCNN: geometry-supporting dual convolutional neural network for noisy point clouds. IEEE Trans. Vis. Comput. Graph. (), pp. . Cited by: §1, §1, §2.1.
  • [45] L. Yu, X. Li, C. Fu, D. Cohen-Or, and P. Heng (2018) Ec-net: an edge-aware point set consolidation network. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 386–402. Cited by: §2.1, §4.2.
  • [46] D. Zhang, X. Lu, H. Qin, and Y. He (2021) Pointfilter: point cloud filtering via encoder-decoder modeling. IEEE Trans. Vis. Comput. Graph. 27 (3), pp. 2015–2027. Cited by: §1, §1, §2.1, §3.1, §3.3, §4.1, §4.1, §4.2, §4.2.
  • [47] J. Zhang, J. Cao, H. Zhu, D. Yan, and X. Liu (2022) Geometry guided deep surface normal estimation. Comput.-Aided Des. 142 (), pp. 103119. Cited by: §2.2.
  • [48] J. Zhang, Y. Yao, and B. Deng (2022) Fast and robust iterative closest point. IEEE Trans. Pattern Anal. Mach. Intell. 44 (7). Cited by: §2.2.
  • [49] H. Zhou, H. Chen, Y. Zhang, M. Wei, H. Xie, J. Wang, T. Lu, J. Qin, and X. Zhang (2022) Refine-Net: normal refinement neural network for noisy point clouds. IEEE Trans. Pattern Anal. Mach. Intell. (), pp. . Cited by: §2.2, §2.2.
  • [50] J. Zhou, H. Huang, B. Liu, and X. Liu (2020) Normal estimation for 3d point clouds via local plane constraint and multi-scale selection. Comput-Aided Des. 129 (), pp. 102916. Cited by: §2.2.
  • [51] J. Zhou, W. Jin, M. Wang, X. Liu, Z. Li, and Z. Liu (2022) Fast and accurate normal estimation for point clouds via patch stitching. Comput.-Aided Des. 142 (), pp. 103121. Cited by: §2.2.
  • [52] L. Zhou, G. Sun, Y. Li, W. Li, and Z. Su (2022) Point cloud denoising review: from classical to deep learning-based approaches. Graphical Models 121 (), pp. 101140. Cited by: §2.1.
  • [53] Q. Zhou, J. Park, and V. Koltun (2016) Fast global registration. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 766–782. Cited by: §4.6.
  • [54] R. Zhu, Y. Liu, Z. Dong, Y. Wang, T. Jiang, W. Wang, and B. Yang (2021) AdaFit: rethinking learning-based normal estimation on point clouds. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 6118–6127. Cited by: §2.2, §4.3, §4.3.