与传统的基于模型的故障检测和分类(FDC)方法相比,深神经网络(DNN)被证明对航空航天传感器FDC问题有效。但是,在训练中消耗的时间是DNN的过度,而FDC神经网络的解释性分析仍然令人难以置信。近年来,已经研究了一个称为基于图像缺陷的智能FDC的概念。这个概念主张将传感器测量数据堆叠到图像格式中,然后将传感器FDC问题转换为堆叠图像上的异常区域检测问题,这很可能很可能借用了机器视觉领域的最新进展。尽管在基于图像缺陷的智能FDC研究中声称有希望的结果,但由于堆叠图像的尺寸较低,使用了小的卷积核和浅DNN层,这阻碍了FDC性能。在本文中,我们首先提出了一种数据增强方法,该方法将堆叠的图像膨胀到更大的尺寸(与机器视觉领域中开发的VGG16网的通讯)。然后,通过直接对VGG16进行微调训练FDC神经网络。为了截断和压缩FDC净大小(因此其运行时间),我们在微调网上进行修剪。还采用了类激活映射(CAM)方法,以解释FDC NET的解释性分析以验证其内部操作。通过数据增强,VGG16的微调以及模型修剪,本文开发的FDC网络声称,在5个飞行条件下(运行时间26 ms),在4架飞机上,FDC精度为98.90%。 CAM结果还验证FDC Net W.R.T.它的内部操作。
translated by 谷歌翻译
产生相同解剖结构的多对比度/模态MRI丰富了诊断信息,但由于数据获取时间过多而在实践中受到限制。在本文中,我们提出了一种新的深入学习模型,用于使用几种源模态的不完整的k空间数据作为输入,用于联合重建和合成多模式MRI。我们模型的输出包括源模式的重建图像和目标模式中合成的高质量图像。我们提出的模型被公式化为一个变异问题,该问题利用了几个可学习的特定特征提取器和多模式合成模块。我们提出了一种可学习的优化算法来求解该模型,该算法可以使用多模式MRI数据训练其参数的多相网络。此外,采用了一个二线优化框架进行鲁棒参数训练。我们使用广泛的数值实验证明了方法的有效性。
translated by 谷歌翻译
我们提出了一个基于一般学习的框架,用于解决非平滑和非凸图像重建问题。我们将正则函数建模为$ l_ {2,1} $ norm的组成,并将平滑但非convex功能映射参数化为深卷积神经网络。我们通过利用Nesterov的平滑技术和残留学习的概念来开发一种可证明的趋同的下降型算法来解决非平滑非概念最小化问题,并学习网络参数,以使算法的输出与培训数据中的参考匹配。我们的方法用途广泛,因为人们可以将各种现代网络结构用于正规化,而所得网络继承了算法的保证收敛性。我们还表明,所提出的网络是参数有效的,其性能与实践中各种图像重建问题中的最新方法相比有利。
translated by 谷歌翻译
Compressed videos often exhibit visually annoying artifacts, known as Perceivable Encoding Artifacts (PEAs), which dramatically degrade video visual quality. Subjective and objective measures capable of identifying and quantifying various types of PEAs are critical in improving visual quality. In this paper, we investigate the influence of four spatial PEAs (i.e. blurring, blocking, bleeding, and ringing) and two temporal PEAs (i.e. flickering and floating) on video quality. For spatial artifacts, we propose a visual saliency model with a low computational cost and higher consistency with human visual perception. In terms of temporal artifacts, self-attention based TimeSFormer is improved to detect temporal artifacts. Based on the six types of PEAs, a quality metric called Saliency-Aware Spatio-Temporal Artifacts Measurement (SSTAM) is proposed. Experimental results demonstrate that the proposed method outperforms state-of-the-art metrics. We believe that SSTAM will be beneficial for optimizing video coding techniques.
translated by 谷歌翻译
As one of the most important psychic stress reactions, micro-expressions (MEs), are spontaneous and transient facial expressions that can reveal the genuine emotions of human beings. Thus, recognizing MEs (MER) automatically is becoming increasingly crucial in the field of affective computing, and provides essential technical support in lie detection, psychological analysis and other areas. However, the lack of abundant ME data seriously restricts the development of cutting-edge data-driven MER models. Despite the recent efforts of several spontaneous ME datasets to alleviate this problem, it is still a tiny amount of work. To solve the problem of ME data hunger, we construct a dynamic spontaneous ME dataset with the largest current ME data scale, called DFME (Dynamic Facial Micro-expressions), which includes 7,526 well-labeled ME videos induced by 671 participants and annotated by more than 20 annotators throughout three years. Afterwards, we adopt four classical spatiotemporal feature learning models on DFME to perform MER experiments to objectively verify the validity of DFME dataset. In addition, we explore different solutions to the class imbalance and key-frame sequence sampling problems in dynamic MER respectively on DFME, so as to provide a valuable reference for future research. The comprehensive experimental results show that our DFME dataset can facilitate the research of automatic MER, and provide a new benchmark for MER. DFME will be published via https://mea-lab-421.github.io.
translated by 谷歌翻译
Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications (i.e., phone unlocking) while lacking consideration of long-distance scenes (i.e., surveillance security checks). In order to promote relevant research and fill this gap in the community, we collect a large-scale Surveillance High-Fidelity Mask (SuHiFiMask) dataset captured under 40 surveillance scenes, which has 101 subjects from different age groups with 232 3D attacks (high-fidelity masks), 200 2D attacks (posters, portraits, and screens), and 2 adversarial attacks. In this scene, low image resolution and noise interference are new challenges faced in surveillance FAS. Together with the SuHiFiMask dataset, we propose a Contrastive Quality-Invariance Learning (CQIL) network to alleviate the performance degradation caused by image quality from three aspects: (1) An Image Quality Variable module (IQV) is introduced to recover image information associated with discrimination by combining the super-resolution network. (2) Using generated sample pairs to simulate quality variance distributions to help contrastive learning strategies obtain robust feature representation under quality variation. (3) A Separate Quality Network (SQN) is designed to learn discriminative features independent of image quality. Finally, a large number of experiments verify the quality of the SuHiFiMask dataset and the superiority of the proposed CQIL.
translated by 谷歌翻译
Interview has been regarded as one of the most crucial step for recruitment. To fully prepare for the interview with the recruiters, job seekers usually practice with mock interviews between each other. However, such a mock interview with peers is generally far away from the real interview experience: the mock interviewers are not guaranteed to be professional and are not likely to behave like a real interviewer. Due to the rapid growth of online recruitment in recent years, recruiters tend to have online interviews, which makes it possible to collect real interview data from real interviewers. In this paper, we propose a novel application named EZInterviewer, which aims to learn from the online interview data and provides mock interview services to the job seekers. The task is challenging in two ways: (1) the interview data are now available but still of low-resource; (2) to generate meaningful and relevant interview dialogs requires thorough understanding of both resumes and job descriptions. To address the low-resource challenge, EZInterviewer is trained on a very small set of interview dialogs. The key idea is to reduce the number of parameters that rely on interview dialogs by disentangling the knowledge selector and dialog generator so that most parameters can be trained with ungrounded dialogs as well as the resume data that are not low-resource. Evaluation results on a real-world job interview dialog dataset indicate that we achieve promising results to generate mock interviews. With the help of EZInterviewer, we hope to make mock interview practice become easier for job seekers.
translated by 谷歌翻译
Nowadays, time-stamped web documents related to a general news query floods spread throughout the Internet, and timeline summarization targets concisely summarizing the evolution trajectory of events along the timeline. Unlike traditional document summarization, timeline summarization needs to model the time series information of the input events and summarize important events in chronological order. To tackle this challenge, in this paper, we propose a Unified Timeline Summarizer (UTS) that can generate abstractive and extractive timeline summaries in time order. Concretely, in the encoder part, we propose a graph-based event encoder that relates multiple events according to their content dependency and learns a global representation of each event. In the decoder part, to ensure the chronological order of the abstractive summary, we propose to extract the feature of event-level attention in its generation process with sequential information remained and use it to simulate the evolutionary attention of the ground truth summary. The event-level attention can also be used to assist in extracting summary, where the extracted summary also comes in time sequence. We augment the previous Chinese large-scale timeline summarization dataset and collect a new English timeline dataset. Extensive experiments conducted on these datasets and on the out-of-domain Timeline 17 dataset show that UTS achieves state-of-the-art performance in terms of both automatic and human evaluations.
translated by 谷歌翻译
For Prognostics and Health Management (PHM) of Lithium-ion (Li-ion) batteries, many models have been established to characterize their degradation process. The existing empirical or physical models can reveal important information regarding the degradation dynamics. However, there is no general and flexible methods to fuse the information represented by those models. Physics-Informed Neural Network (PINN) is an efficient tool to fuse empirical or physical dynamic models with data-driven models. To take full advantage of various information sources, we propose a model fusion scheme based on PINN. It is implemented by developing a semi-empirical semi-physical Partial Differential Equation (PDE) to model the degradation dynamics of Li-ion-batteries. When there is little prior knowledge about the dynamics, we leverage the data-driven Deep Hidden Physics Model (DeepHPM) to discover the underlying governing dynamic models. The uncovered dynamics information is then fused with that mined by the surrogate neural network in the PINN framework. Moreover, an uncertainty-based adaptive weighting method is employed to balance the multiple learning tasks when training the PINN. The proposed methods are verified on a public dataset of Li-ion Phosphate (LFP)/graphite batteries.
translated by 谷歌翻译
Due to their ability to offer more comprehensive information than data from a single view, multi-view (multi-source, multi-modal, multi-perspective, etc.) data are being used more frequently in remote sensing tasks. However, as the number of views grows, the issue of data quality becomes more apparent, limiting the potential benefits of multi-view data. Although recent deep neural network (DNN) based models can learn the weight of data adaptively, a lack of research on explicitly quantifying the data quality of each view when fusing them renders these models inexplicable, performing unsatisfactorily and inflexible in downstream remote sensing tasks. To fill this gap, in this paper, evidential deep learning is introduced to the task of aerial-ground dual-view remote sensing scene classification to model the credibility of each view. Specifically, the theory of evidence is used to calculate an uncertainty value which describes the decision-making risk of each view. Based on this uncertainty, a novel decision-level fusion strategy is proposed to ensure that the view with lower risk obtains more weight, making the classification more credible. On two well-known, publicly available datasets of aerial-ground dual-view remote sensing images, the proposed approach achieves state-of-the-art results, demonstrating its effectiveness. The code and datasets of this article are available at the following address: https://github.com/gaopiaoliang/Evidential.
translated by 谷歌翻译