Image noise can often be accurately fitted to a Poisson-Gaussian distribution. However, estimating the distribution parameters from a noisy image only is a challenging task. Here, we study the case when paired noisy and noise-free samples are accessible. No method is currently available to exploit the noise-free information, which may help to achieve more accurate estimations. To fill this gap, we derive a novel, cumulant-based, approach for Poisson-Gaussian noise modeling from paired image samples. We show its improved performance over different baselines, with special emphasis on MSE, effect of outliers, image dependence, and bias. We additionally derive the log-likelihood function for further insights and discuss real-world applicability.
translated by 谷歌翻译
尽管近年来取得了显着的进展,但开发了几个局限性的单像超分辨率方法。具体而言,它们在具有某些降解(合成还是真实)的固定内容域中进行了培训。他们所学的先验容易过度适应培训配置。因此,目前尚不清楚对新型领域(例如无人机顶视图数据以及跨海)的概括。尽管如此,将无人机与正确的图像超分辨率配对具有巨大的价值。这将使无人机能够飞行更高的覆盖范围,同时保持高图像质量。为了回答这些问题,并为无人机图像超级分辨率铺平了道路,我们探索了该应用程序,特别关注单像案例。我们提出了一个新颖的无人机图像数据集,其场景在低分辨率和高分辨率下捕获,并在高度范围内捕获。我们的结果表明,现成的最先进的网络见证了这个不同领域的性能下降。我们还表明了简单的微调,并将高度意识纳入网络的体系结构,都可以改善重建性能。
translated by 谷歌翻译
深度图像置位者实现最先进的结果,但具有隐藏的成本。如最近的文献所见,这些深度网络能够过度接受其训练分布,导致将幻觉不准确地添加到输出并概括到不同的数据。为了更好地控制和解释性,我们提出了一种新颖的框架,利用了去噪网络。我们称之为可控的基于席位的图像去噪(CCID)。在此框架中,我们利用深度去噪网络的输出与通过可靠的过滤器卷积的图像一起。这样的过滤器可以是一个简单的卷积核,其不会增加添加幻觉信息。我们建议使用频域方法熔断两个组件,该方法考虑了深网络输出的可靠性。通过我们的框架,用户可以控制频域中两个组件的融合。我们还提供了一个用户友好的地图估算,空间上的置信度可能包含网络幻觉。结果表明,我们的CCID不仅提供了更多的可解释性和控制,而且甚至可以优于深脱离机构的定量性能和可靠的过滤器的定量性能,尤其是当测试数据从训练数据发散时。
translated by 谷歌翻译
经典图像恢复算法使用各种前瞻性,无论是明确的还是明确的。他们的前沿是手工设计的,它们的相应权重是启发式分配的。因此,深度学习方法通​​常会产生优异的图像恢复质量。然而,深度网络是能够诱导强烈且难以预测的幻觉。在学习图像时,网络隐含地学会联合忠于观察到的数据;然后是不可能的原始数据和下游的幻觉数据的分离。这限制了它们在图像恢复中的广泛采用。此外,通常是降解模型过度装备的受害者的幻觉部分。我们提出了一种具有解耦的网络先前的幻觉和数据保真度的方法。我们将我们的框架称为贝叶斯队的生成先前(BigPrior)的集成。我们的方法植根于贝叶斯框架中,并将其紧密连接到经典恢复方法。实际上,它可以被视为大型经典恢复算法的概括。我们使用网络反转来从生成网络中提取图像先前信息。我们表明,在图像着色,染色和去噪,我们的框架始终如一地提高了反演结果。我们的方法虽然部分依赖于生成网络反演的质量,具有竞争性的监督和任务特定的恢复方法。它还提供了一种额外的公制,其阐述了每像素的先前依赖程度相对于数据保真度。
translated by 谷歌翻译
Non-linear state-space models, also known as general hidden Markov models, are ubiquitous in statistical machine learning, being the most classical generative models for serial data and sequences in general. The particle-based, rapid incremental smoother PaRIS is a sequential Monte Carlo (SMC) technique allowing for efficient online approximation of expectations of additive functionals under the smoothing distribution in these models. Such expectations appear naturally in several learning contexts, such as likelihood estimation (MLE) and Markov score climbing (MSC). PARIS has linear computational complexity, limited memory requirements and comes with non-asymptotic bounds, convergence results and stability guarantees. Still, being based on self-normalised importance sampling, the PaRIS estimator is biased. Our first contribution is to design a novel additive smoothing algorithm, the Parisian particle Gibbs PPG sampler, which can be viewed as a PaRIS algorithm driven by conditional SMC moves, resulting in bias-reduced estimates of the targeted quantities. We substantiate the PPG algorithm with theoretical results, including new bounds on bias and variance as well as deviation inequalities. Our second contribution is to apply PPG in a learning framework, covering MLE and MSC as special examples. In this context, we establish, under standard assumptions, non-asymptotic bounds highlighting the value of bias reduction and the implicit Rao--Blackwellization of PPG. These are the first non-asymptotic results of this kind in this setting. We illustrate our theoretical results with numerical experiments supporting our claims.
translated by 谷歌翻译
Nowadays, the current neural network models of dialogue generation(chatbots) show great promise for generating answers for chatty agents. But they are short-sighted in that they predict utterances one at a time while disregarding their impact on future outcomes. Modelling a dialogue's future direction is critical for generating coherent, interesting dialogues, a need that has led traditional NLP dialogue models that rely on reinforcement learning. In this article, we explain how to combine these objectives by using deep reinforcement learning to predict future rewards in chatbot dialogue. The model simulates conversations between two virtual agents, with policy gradient methods used to reward sequences that exhibit three useful conversational characteristics: the flow of informality, coherence, and simplicity of response (related to forward-looking function). We assess our model based on its diversity, length, and complexity with regard to humans. In dialogue simulation, evaluations demonstrated that the proposed model generates more interactive responses and encourages a more sustained successful conversation. This work commemorates a preliminary step toward developing a neural conversational model based on the long-term success of dialogues.
translated by 谷歌翻译
Three-dimensional (3D) technologies have been developing rapidly recent years, and have influenced industrial, medical, cultural, and many other fields. In this paper, we introduce an automatic 3D human head scanning-printing system, which provides a complete pipeline to scan, reconstruct, select, and finally print out physical 3D human heads. To enhance the accuracy of our system, we developed a consumer-grade composite sensor (including a gyroscope, an accelerometer, a digital compass, and a Kinect v2 depth sensor) as our sensing device. This sensing device is then mounted on a robot, which automatically rotates around the human subject with approximate 1-meter radius, to capture the full-view information. The data streams are further processed and fused into a 3D model of the subject using a tablet located on the robot. In addition, an automatic selection method, based on our specific system configurations, is proposed to select the head portion. We evaluated the accuracy of the proposed system by comparing our generated 3D head models, from both standard human head model and real human subjects, with the ones reconstructed from FastSCAN and Cyberware commercial laser scanning systems through computing and visualizing Hausdorff distances. Computational cost is also provided to further assess our proposed system.
translated by 谷歌翻译
We propose a 6D RGB-D odometry approach that finds the relative camera pose between consecutive RGB-D frames by keypoint extraction and feature matching both on the RGB and depth image planes. Furthermore, we feed the estimated pose to the highly accurate KinectFusion algorithm, which uses a fast ICP (Iterative Closest Point) to fine-tune the frame-to-frame relative pose and fuse the depth data into a global implicit surface. We evaluate our method on a publicly available RGB-D SLAM benchmark dataset by Sturm et al. The experimental results show that our proposed reconstruction method solely based on visual odometry and KinectFusion outperforms the state-of-the-art RGB-D SLAM system accuracy. Moreover, our algorithm outputs a ready-to-use polygon mesh (highly suitable for creating 3D virtual worlds) without any postprocessing steps.
translated by 谷歌翻译
In this paper, a Kinect-based distributed and real-time motion capture system is developed. A trigonometric method is applied to calculate the relative position of Kinect v2 sensors with a calibration wand and register the sensors' positions automatically. By combining results from multiple sensors with a nonlinear least square method, the accuracy of the motion capture is optimized. Moreover, to exclude inaccurate results from sensors, a computational geometry is applied in the occlusion approach, which discovers occluded joint data. The synchronization approach is based on an NTP protocol that synchronizes the time between the clocks of a server and clients dynamically, ensuring that the proposed system is a real-time system. Experiments for validating the proposed system are conducted from the perspective of calibration, occlusion, accuracy, and efficiency. Furthermore, to demonstrate the practical performance of our system, a comparison of previously developed motion capture systems (the linear trilateration approach and the geometric trilateration approach) with the benchmark OptiTrack system is conducted, therein showing that the accuracy of our proposed system is $38.3\%$ and 24.1% better than the two aforementioned trilateration systems, respectively.
translated by 谷歌翻译
With the increase in health consciousness, noninvasive body monitoring has aroused interest among researchers. As one of the most important pieces of physiological information, researchers have remotely estimated the heart rate (HR) from facial videos in recent years. Although progress has been made over the past few years, there are still some limitations, like the processing time increasing with accuracy and the lack of comprehensive and challenging datasets for use and comparison. Recently, it was shown that HR information can be extracted from facial videos by spatial decomposition and temporal filtering. Inspired by this, a new framework is introduced in this paper to remotely estimate the HR under realistic conditions by combining spatial and temporal filtering and a convolutional neural network. Our proposed approach shows better performance compared with the benchmark on the MMSE-HR dataset in terms of both the average HR estimation and short-time HR estimation. High consistency in short-time HR estimation is observed between our method and the ground truth.
translated by 谷歌翻译