Long-form numerical reasoning in financial analysis aims to generate a reasoning program to calculate the correct answer for a given question. Previous work followed a retriever-generator framework, where the retriever selects key facts from a long-form document, and the generator generates a reasoning program based on retrieved facts. However, they treated all facts equally without considering the different contributions of facts with and without numbers. Meanwhile, the program consistency were ignored under supervised training, resulting in lower training accuracy and diversity. To solve these problems, we proposed APOLLO to improve the long-form numerical reasoning framework. For the retriever, we adopt a number-aware negative sampling strategy to enable the retriever to be more discriminative on key numerical facts. For the generator, we design consistency-based reinforcement learning and target program augmentation strategy based on the consistency of program execution results. Experimental results on the FinQA and ConvFinQA leaderboard verify the effectiveness of our proposed method, achieving the new state-of-the-art.
translated by 谷歌翻译
Recent researches show that the deep learning based object detection is vulnerable to adversarial examples. Generally, the adversarial attack for object detection contains targeted attack and untargeted attack. According to our detailed investigations, the research on the former is relatively fewer than the latter and all the existing methods for the targeted attack follow the same mode, i.e., the object-mislabeling mode that misleads detectors to mislabel the detected object as a specific wrong label. However, this mode has limited attack success rate, universal and generalization performances. In this paper, we propose a new object-fabrication targeted attack mode which can mislead detectors to `fabricate' extra false objects with specific target labels. Furthermore, we design a dual attention based targeted feature space attack method to implement the proposed targeted attack mode. The attack performances of the proposed mode and method are evaluated on MS COCO and BDD100K datasets using FasterRCNN and YOLOv5. Evaluation results demonstrate that, the proposed object-fabrication targeted attack mode and the corresponding targeted feature space attack method show significant improvements in terms of image-specific attack, universal performance and generalization capability, compared with the previous targeted attack for object detection. Code will be made available.
translated by 谷歌翻译
Previous studies have explored generating accurately lip-synced talking faces for arbitrary targets given audio conditions. However, most of them deform or generate the whole facial area, leading to non-realistic results. In this work, we delve into the formulation of altering only the mouth shapes of the target person. This requires masking a large percentage of the original image and seamlessly inpainting it with the aid of audio and reference frames. To this end, we propose the Audio-Visual Context-Aware Transformer (AV-CAT) framework, which produces accurate lip-sync with photo-realistic quality by predicting the masked mouth shapes. Our key insight is to exploit desired contextual information provided in audio and visual modalities thoroughly with delicately designed Transformers. Specifically, we propose a convolution-Transformer hybrid backbone and design an attention-based fusion strategy for filling the masked parts. It uniformly attends to the textural information on the unmasked regions and the reference frame. Then the semantic audio information is involved in enhancing the self-attention computation. Additionally, a refinement network with audio injection improves both image and lip-sync quality. Extensive experiments validate that our model can generate high-fidelity lip-synced results for arbitrary subjects.
translated by 谷歌翻译
Due to the ambiguity of homophones, Chinese Spell Checking (CSC) has widespread applications. Existing systems typically utilize BERT for text encoding. However, CSC requires the model to account for both phonetic and graphemic information. To adapt BERT to the CSC task, we propose a token-level self-distillation contrastive learning method. We employ BERT to encode both the corrupted and corresponding correct sentence. Then, we use contrastive learning loss to regularize corrupted tokens' hidden states to be closer to counterparts in the correct sentence. On three CSC datasets, we confirmed our method provides a significant improvement above baselines.
translated by 谷歌翻译
流行的图神经网络模型在图表学习方面取得了重大进展。但是,在本文中,我们发现了一个不断被忽视的现象:用完整图测试的预训练的图表学习模型的表现不佳,该模型用良好的图表测试。该观察结果表明,图中存在混杂因素,这可能会干扰模型学习语义信息,而当前的图表表示方法并未消除其影响。为了解决这个问题,我们建议强大的因果图表示学习(RCGRL)学习可靠的图形表示,以防止混杂效应。 RCGRL引入了一种主动方法,可以在无条件的力矩限制下生成仪器变量,该方法使图表学习模型能够消除混杂因素,从而捕获与下游预测有因果关系的歧视性信息。我们提供定理和证明,以保证拟议方法的理论有效性。从经验上讲,我们对合成数据集和多个基准数据集进行了广泛的实验。结果表明,与最先进的方法相比,RCGRL实现了更好的预测性能和泛化能力。
translated by 谷歌翻译
命名实体识别(NER)是检测和对实体跨越文本的跨度的任务。当实体跨越彼此之间的重叠时,此问题被称为嵌套NER。基于跨度的方法已被广泛用于应对嵌套的NER。这些方法中的大多数都会获得分数$ n \ times n $矩阵,其中$ n $表示句子的长度,每个条目对应于跨度。但是,先前的工作忽略了分数矩阵中的空间关系。在本文中,我们建议使用卷积神经网络(CNN)对分数矩阵中的这些空间关系进行建模。尽管很简单,但在三个常用的嵌套NER数据集中进行的实验表明,我们的模型超过了几种具有相同预训练的编码器的最近提出的方法。进一步的分析表明,使用CNN可以帮助模型更准确地找到嵌套实体。此外,我们发现不同的论文对三个嵌套的NER数据集使用了不同的句子引导,这将影响比较。因此,我们发布了一个预处理脚本,以促进将来的比较。
translated by 谷歌翻译
预测道路代理的未来行为是自动驾驶的关键任务。尽管现有模型在预测边际代理的未来行为方面取得了巨大的成功,但有效预测多种代理的一致的关节行为仍然是一个挑战。最近,提出了占用场的占用场表示,以通过占用网格和流量的结合来代表公路代理的联合未来状态,从而支持有效且一致的关节预测。在这项工作中,我们提出了一个新颖的占用流场预测因子,以产生准确的占用和流动预测,通过结合图像编码器的功能,该图像编码器从栅格化的流量图像中学习特征和矢量编码器,以捕获连续代理轨迹和地图状态的信息。在生成最终预测之前,这两个编码的功能由多个注意模块融合。我们的简单但有效的模型排在Waymo Open数据集占用和流预测挑战中,并在封闭的占用和流动预测任务中取得了最佳性能。
translated by 谷歌翻译
最近的作品以自我监督的方式探索学习图表表示。在图形对比学习中,基准方法应用各种图形增强方法。但是,大多数增强方法都是不可学习的,这导致发出不束缚的增强图。这种增强可以缩短曲线图对比学学习方法的表现能力。因此,我们激励我们的方法通过可学习的图形增强器来生成增强图,称为元图形增强器(Mega)。然后,我们阐明了“良好”的图形增强必须在特征级别的实例级别和信息性上具有均匀性。为此,我们提出了一种新颖的方法来学习图形增强者,可以以统一和信息性产生增强。图表增强器的目的是促进我们的特征提取网络,以学习更辨别的特征表示,这激励我们提出元学范式。经验上,多个基准数据集的实验表明,Mega优于图形自我监督学习任务中的最先进的方法。进一步的实验研究证明了巨型术语的有效性。
translated by 谷歌翻译
自动化医疗编码,医疗保健操作和交付的基本任务,通过从临床文献预测医学代码来实现非结构化数据。自然语言处理中深入学习模型的最新进展已被广泛应用于此任务。然而,它缺乏对医学编码的神经网络架构设计的统一视图。本综述提出了一个统一的框架,为医疗编码模型的构建块提供了一般性的理解,并概述了近期框架下的最新模型。我们的统一框架将医疗编码分解为四个主要组件,即文本特征提取的编码器模块,为构建深编码器架构的机制,解码器模块,用于将隐藏的表示转换为医学代码,以及辅助信息的使用。最后,我们讨论了关键的研究挑战和未来方向。
translated by 谷歌翻译
目前全面监督的面部地标检测方法迅速进行,实现了显着性能。然而,当在大型姿势和重闭合的面孔和重闭合时仍然遭受痛苦,以进行不准确的面部形状约束,并且标记的训练样本不足。在本文中,我们提出了一个半监督框架,即自我校准的姿势注意网络(SCPAN),以实现更具挑战性的情景中的更强大和精确的面部地标检测。具体地,建议通过定影边界和地标强度场信息来模拟更有效的面部形状约束的边界意识的地标强度(BALI)字段。此外,设计了一种自我校准的姿势注意力(SCPA)模型,用于提供自学习的目标函数,该功能通过引入自校准机制和姿势注意掩模而无需标签信息而无需标签信息。我们认为,通过将巴厘岛领域和SCPA模型集成到新颖的自我校准的姿势网络中,可以了解更多的面部现有知识,并且我们的面孔方法的检测精度和稳健性得到了改善。获得具有挑战性的基准数据集获得的实验结果表明,我们的方法优于文献中最先进的方法。
translated by 谷歌翻译