Current math word problem (MWP) solvers are usually Seq2Seq models trained by the (one-problem; one-solution) pairs, each of which is made of a problem description and a solution showing reasoning flow to get the correct answer. However, one MWP problem naturally has multiple solution equations. The training of an MWP solver with (one-problem; one-solution) pairs excludes other correct solutions, and thus limits the generalizability of the MWP solver. One feasible solution to this limitation is to augment multiple solutions to a given problem. However, it is difficult to collect diverse and accurate augment solutions through human efforts. In this paper, we design a new training framework for an MWP solver by introducing a solution buffer and a solution discriminator. The buffer includes solutions generated by an MWP solver to encourage the training data diversity. The discriminator controls the quality of buffered solutions to participate in training. Our framework is flexibly applicable to a wide setting of fully, semi-weakly and weakly supervised training for all Seq2Seq MWP solvers. We conduct extensive experiments on a benchmark dataset Math23k and a new dataset named Weak12k, and show that our framework improves the performance of various MWP solvers under different settings by generating correct and diverse solutions.
translated by 谷歌翻译
Math word problem (MWP) solving is an important task in question answering which requires human-like reasoning ability. Analogical reasoning has long been used in mathematical education, as it enables students to apply common relational structures of mathematical situations to solve new problems. In this paper, we propose to build a novel MWP solver by leveraging analogical MWPs, which advance the solver's generalization ability across different kinds of MWPs. The key idea, named analogy identification, is to associate the analogical MWP pairs in a latent space, i.e., encoding an MWP close to another analogical MWP, while moving away from the non-analogical ones. Moreover, a solution discriminator is integrated into the MWP solver to enhance the association between the representations of MWPs and their true solutions. The evaluation results verify that our proposed analogical learning strategy promotes the performance of MWP-BERT on Math23k over the state-of-the-art model Generate2Rank, with 5 times fewer parameters in the encoder. We also find that our model has a stronger generalization ability in solving difficult MWPs due to the analogical learning from easy MWPs.
translated by 谷歌翻译
解决数学单词问题需要对文本中的数量进行演绎推理。各种最近的研究工作主要依赖于序列到序列或序列模型,以生成数学表达式,而无需在给定情况下明确执行数量之间的关系推理。尽管经验上有效,但这种方法通常并未为生成的表达提供解释。在这项工作中,我们将任务视为一个复杂的关系提取问题,提出了一种新的方法,该方法提出了可解释的演绎推理步骤,以迭代构建目标表达式,其中每个步骤涉及两个定义其关系的数量的原始操作。通过在四个基准数据集上进行的大量实验,我们表明该提出的模型显着优于现有的强基础。我们进一步证明,演绎过程不仅提出了更可解释的步骤,而且还使我们能够对需要更复杂推理的问题进行更准确的预测。
translated by 谷歌翻译
为了解决数学单词问题,人类学生利用达到不同方程解决方案的各种推理逻辑。但是,自动求解器的主流序列到序列方法旨在解码通过人类注释监督的固定溶液方程。在本文中,我们通过利用一组控制代码来指导模型考虑某些推理逻辑并解码从人类参考转换的相应方程式表达式来指导模型来考虑某些推理逻辑并解码相应的方程式表达式来提出一个受控方程生成求解器。经验结果表明,我们的方法普遍提高了单人(MATH23K)和多项(draw1k,hmwp)基准的性能,在具有挑战性的多重未知数据集上,高达13.2%的准确性。
translated by 谷歌翻译
Solving math word problems is the task that analyses the relation of quantities and requires an accurate understanding of contextual natural language information. Recent studies show that current models rely on shallow heuristics to predict solutions and could be easily misled by small textual perturbations. To address this problem, we propose a Textual Enhanced Contrastive Learning framework, which enforces the models to distinguish semantically similar examples while holding different mathematical logic. We adopt a self-supervised manner strategy to enrich examples with subtle textual variance by textual reordering or problem re-construction. We then retrieve the hardest to differentiate samples from both equation and textual perspectives and guide the model to learn their representations. Experimental results show that our method achieves state-of-the-art on both widely used benchmark datasets and also exquisitely designed challenge datasets in English and Chinese. \footnote{Our code and data is available at \url{https://github.com/yiyunya/Textual_CL_MWP}
translated by 谷歌翻译
组成概括是指模型可以根据训练期间观察到的数据组件概括为新组成的输入数据的能力。它触发了对不同任务的一系列组成概括分析,因为概括是语言和解决问题技能的重要方面。但是,关于数学单词问题(MWP)的类似讨论受到限制。在此手稿中,我们研究了MWP求解中的组成概括。具体来说,我们首先引入了一种数据分割方法,以创建现有MWP数据集的组合分解。同时,我们合成数据以隔离组成的效果。为了改善MWP解决方案中的组成概括,我们提出了一种迭代数据增强方法,该方法将各种组成变化包括在培训数据中,并可以与MWP方法合作。在评估过程中,我们检查了一组方法,发现所有方法都会在评估的数据集中遇到严重的性能损失。我们还发现我们的数据增强方法可以显着改善一般MWP方法的组成概括。代码可在https://github.com/demoleiwang/cgmwp上找到。
translated by 谷歌翻译
Given that rich information is hidden behind ubiquitous numbers in text, numerical reasoning over text should be an essential skill of AI systems. To derive precise equations to solve numerical reasoning problems, previous work focused on modeling the structures of equations, and has proposed various structured decoders. Though structure modeling proves to be effective, these structured decoders construct a single equation in a pre-defined autoregressive order, potentially placing an unnecessary restriction on how a model should grasp the reasoning process. Intuitively, humans may have numerous pieces of thoughts popping up in no pre-defined order; thoughts are not limited to the problem at hand, and can even be concerned with other related problems. By comparing diverse thoughts and chaining relevant pieces, humans are less prone to errors. In this paper, we take this inspiration and propose CANTOR, a numerical reasoner that models reasoning steps using a directed acyclic graph where we produce diverse reasoning steps simultaneously without pre-defined decoding dependencies, and compare and chain relevant ones to reach a solution. Extensive experiments demonstrated the effectiveness of CANTOR under both fully-supervised and weakly-supervised settings.
translated by 谷歌翻译
自动解决数学字问题是自然语言处理领域的关键任务。最近的模型已达到其性能瓶颈,需要更高质量的培训数据。我们提出了一种新的数据增强方法,扭转了数学词问题的数学逻辑,以产生新的高质量数学问题,并介绍了能够在数学推理逻辑中受益的新知识点。我们在两个Sota Math Word问题解决模型上应用增强数据,并将我们的结果与强大的数据增强基线进行比较。实验结果表明了我们方法的有效性。我们在https://github.com/yiyunya/roda发布我们的代码和数据。
translated by 谷歌翻译
变量名称对于传达预期的程序行为至关重要。基于机器学习的程序分析方法使用变量名称表示广泛的任务,例如建议新的变量名称和错误检测。理想情况下,这些方法可以捕获句法相似性的名称之间的语义关系,例如,名称平均和均值的事实是相似的。不幸的是,以前的工作发现,即使是先前的最佳的表示方法主要是捕获相关性(是否有两个变量始终链接),而不是相似性(是否具有相同的含义)。我们提出了VarCLR,一种用于学习变量名称的语义表示的新方法,这些方法有效地捕获了这种更严格的意义上的可变相似性。我们观察到这个问题是对比学习的优秀契合,旨在最小化明确类似的输入之间的距离,同时最大化不同输入之间的距离。这需要标记的培训数据,因此我们构建了一种新颖的弱监督的变量重命名数据集,从GitHub编辑开采。我们表明VarCLR能够有效地应用BERT等复杂的通用语言模型,以变为变量名称表示,因此也是与变量名称相似性搜索或拼写校正等相关的下游任务。 varclr产生模型,显着越优于idbench的最先进的现有基准,明确地捕获可变相似度(与相关性不同)。最后,我们贡献了所有数据,代码和预先训练模型的版本,旨在为现有或未来程序分析中使用的可变表示提供的可变表示的替代品。
translated by 谷歌翻译
Recent studies have shown the impressive efficacy of counterfactually augmented data (CAD) for reducing NLU models' reliance on spurious features and improving their generalizability. However, current methods still heavily rely on human efforts or task-specific designs to generate counterfactuals, thereby impeding CAD's applicability to a broad range of NLU tasks. In this paper, we present AutoCAD, a fully automatic and task-agnostic CAD generation framework. AutoCAD first leverages a classifier to unsupervisedly identify rationales as spans to be intervened, which disentangles spurious and causal features. Then, AutoCAD performs controllable generation enhanced by unlikelihood training to produce diverse counterfactuals. Extensive evaluations on multiple out-of-domain and challenge benchmarks demonstrate that AutoCAD consistently and significantly boosts the out-of-distribution performance of powerful pre-trained models across different NLU tasks, which is comparable or even better than previous state-of-the-art human-in-the-loop or task-specific CAD methods. The code is publicly available at https://github.com/thu-coai/AutoCAD.
translated by 谷歌翻译
常规的显着性预测模型通常会学习从图像到其显着图的确定性映射,因此无法解释人类注意力的主观性质。在本文中,为了模拟视觉显着性的不确定性,我们通过在给定输入图像上学习有条件的概率分布来研究显着性预测问题,并将其视为从显着图中的有条件预测问题,并将显着性预测视为从该过程中的样本预测。学会的分布。具体而言,我们提出了一个生成合作的显着性预测框架,其中有条件的潜在变量模型(LVM)和有条件的基于能量的模型(EBM)经过共同训练以以合作的方式预测显着物体。 LVM用作快速但粗糙的预测指标,可有效地生成初始显着图,然后通过EBM的迭代langevin修订将其作为缓慢但良好的预测指标进行完善。如此粗略的合作显着性预测策略提供了两者中最好的。此外,我们提出了“恢复合作学习”策略,并将其应用于弱监督的显着性预测,其中部分观察到了训练图像的显着性注释。最后,我们发现EBM中学习的能量函数可以用作改进模块,可以完善其他预训练的显着性预测模型的结果。实验结果表明,我们的模型可以生成图像的一组不同和合理的显着性图,并在完全监督和弱监督的显着性预测任务中获得最先进的性能。
translated by 谷歌翻译
Step-by-step reasoning approaches like chain-of-thought (CoT) have proved to be a very effective technique to induce reasoning capabilities in large language models. However, the success of the CoT approach depends primarily on model size, and often billion parameter-scale models are needed to get CoT to work. In this paper, we propose a knowledge distillation approach, that leverages the step-by-step CoT reasoning capabilities of larger models and distils these reasoning abilities into smaller models. Our approach Decompositional Distillation learns a semantic decomposition of the original problem into a sequence of subproblems and uses it to train two models: a) a problem decomposer that learns to decompose the complex reasoning problem into a sequence of simpler sub-problems and b) a problem solver that uses the intermediate subproblems to solve the overall problem. On a multi-step math word problem dataset (GSM8K), we boost the performance of GPT-2 variants up to 35% when distilled with our approach compared to CoT. We show that using our approach, it is possible to train a GPT-2-large model (775M) that can outperform a 10X larger GPT-3 (6B) model trained using CoT reasoning. Finally, we also demonstrate that our approach of problem decomposition can also be used as an alternative to CoT prompting, which boosts the GPT-3 performance by 40% compared to CoT prompts.
translated by 谷歌翻译
Change detection (CD) is to decouple object changes (i.e., object missing or appearing) from background changes (i.e., environment variations) like light and season variations in two images captured in the same scene over a long time span, presenting critical applications in disaster management, urban development, etc. In particular, the endless patterns of background changes require detectors to have a high generalization against unseen environment variations, making this task significantly challenging. Recent deep learning-based methods develop novel network architectures or optimization strategies with paired-training examples, which do not handle the generalization issue explicitly and require huge manual pixel-level annotation efforts. In this work, for the first attempt in the CD community, we study the generalization issue of CD from the perspective of data augmentation and develop a novel weakly supervised training algorithm that only needs image-level labels. Different from general augmentation techniques for classification, we propose the background-mixed augmentation that is specifically designed for change detection by augmenting examples under the guidance of a set of background-changing images and letting deep CD models see diverse environment variations. Moreover, we propose the augmented & real data consistency loss that encourages the generalization increase significantly. Our method as a general framework can enhance a wide range of existing deep learning-based detectors. We conduct extensive experiments in two public datasets and enhance four state-of-the-art methods, demonstrating the advantages of our method. We release the code at https://github.com/tsingqguo/bgmix.
translated by 谷歌翻译
Natural Language Generation (NLG) has improved exponentially in recent years thanks to the development of sequence-to-sequence deep learning technologies such as Transformer-based language models. This advancement has led to more fluent and coherent NLG, leading to improved development in downstream tasks such as abstractive summarization, dialogue generation and data-to-text generation. However, it is also apparent that deep learning based generation is prone to hallucinate unintended text, which degrades the system performance and fails to meet user expectations in many real-world scenarios. To address this issue, many studies have been presented in measuring and mitigating hallucinated texts, but these have never been reviewed in a comprehensive manner before. In this survey, we thus provide a broad overview of the research progress and challenges in the hallucination problem in NLG. The survey is organized into two parts: (1) a general overview of metrics, mitigation methods, and future directions; and (2) an overview of task-specific research progress on hallucinations in the following downstream tasks, namely abstractive summarization, dialogue generation, generative question answering, data-to-text generation, machine translation, and visual-language generation. This survey serves to facilitate collaborative efforts among researchers in tackling the challenge of hallucinated texts in NLG.
translated by 谷歌翻译
几十年来,手写的中文文本识别(HCTR)一直是一个活跃的研究主题。但是,大多数以前的研究仅关注裁剪文本图像的识别,而忽略了实际应用程序中文本线检测引起的错误。尽管近年来已经提出了一些针对页面文本识别的方法,但它们要么仅限于简单布局,要么需要非常详细的注释,包括昂贵的线条级别甚至角色级边界框。为此,我们建议Pagenet端到端弱监督的页面级HCTR。 Pagenet检测并识别角色并预测其之间的阅读顺序,在处理复杂的布局(包括多方向和弯曲的文本线路)时,这更健壮和灵活。利用所提出的弱监督学习框架,Pagenet只需要对真实数据进行注释。但是,它仍然可以在字符和线级别上输出检测和识别结果,从而避免标记字符和文本线条的界限框的劳动和成本。在五个数据集上进行的广泛实验证明了Pagenet优于现有的弱监督和完全监督的页面级方法。这些实验结果可能会引发进一步的研究,而不是基于连接主义时间分类或注意力的现有方法的领域。源代码可在https://github.com/shannanyinxiang/pagenet上获得。
translated by 谷歌翻译
自动数学问题解决最近引起了越来越多的关注作为长期的AI基准。在本文中,我们专注于解决几何问题,这需要全面了解文本描述,视觉图和定理知识。但是,现有方法高度依赖于手工规则,并且仅在小规模数据集上进行评估。因此,我们提出了一个几何问题应答DataSet GeoQA,其中包含4,998个几何问题,其中具有相应的注释程序,其说明了给定问题的解决过程。与另一个公开的数据集GEOS相比,GeoQA是25倍,程序注释可以为未来的明确和解释数值推理提供实际测试平台。此外,我们通过全面解析多媒体信息和产生可解释程序来引入神经几何求解器(NGS)来解决几何问题。我们进一步为NGS添加了多个自我监督的辅助任务,以增强跨模型语义表示。关于GeoQA的广泛实验验证了我们提出的NGS和辅助任务的有效性。然而,结果仍然明显低于人类性能,这为未来的研究留下了大型空间。我们的基准和代码在https://github.com/chen-judge/geoqa发布。
translated by 谷歌翻译
视频瞬间检索旨在找到给定自然语言查询描述的片刻的开始和结束时间戳(视频的一部分)。全面监督的方法需要完整的时间边界注释才能获得有希望的结果,这是昂贵的,因为注释者需要关注整个时刻。弱监督的方法仅依赖于配对的视频和查询,但性能相对较差。在本文中,我们仔细研究了注释过程,并提出了一种称为“ Glance注释”的新范式。该范式需要一个只有一个随机框架的时间戳,我们将其称为“目光”,在完全监督的对应物的时间边界内。我们认为这是有益的,因为与弱监督相比,添加了琐碎的成本,还提供了更大的潜力。在一眼注释设置下,我们提出了一种基于对比度学习的一眼注释(VIGA),称为视频力矩检索的方法。 Viga将输入视频切成片段,并在剪辑和查询之间形成对比,其中一眼指导的高斯分布重量被分配给所有夹子。我们的广泛实验表明,VIGA通过很大的边距较小的弱监督方法获得了更好的结果,甚至可以在某些情况下与完全监督的方法相媲美。
translated by 谷歌翻译
组合优化是运营研究和计算机科学领域的一个公认领域。直到最近,它的方法一直集中在孤立地解决问题实例,而忽略了它们通常源于实践中的相关数据分布。但是,近年来,人们对使用机器学习,尤其是图形神经网络(GNN)的兴趣激增,作为组合任务的关键构件,直接作为求解器或通过增强确切的求解器。GNN的电感偏差有效地编码了组合和关系输入,因为它们对排列和对输入稀疏性的意识的不变性。本文介绍了对这个新兴领域的最新主要进步的概念回顾,旨在优化和机器学习研究人员。
translated by 谷歌翻译
密集的预期旨在预测未来的行为及其持续的持续时间。现有方法依赖于完全标记的数据,即标有所有未来行动及其持续时间的序列。我们仅使用少量全标记的序列呈现(半)弱监督方法,主要是序列,其中仅标记即将到来的动作。为此,我们提出了一个框架,为未来的行动及其持续时间产生伪标签,并通过细化模块自适应地改进它们。仅考虑到即将到来的动作标签作为输入,这些伪标签指南对未来的动作/持续时间预测。我们进一步设计了注意力机制,以预测背景感知的持续时间。早餐和50salads基准测试的实验验证了我们的方法的效率;与完全监督最先进的模型相比,我们竞争甚至。我们将在:https://github.com/zhanghaotong1/wslvideodenseantication提供我们的代码。
translated by 谷歌翻译
由于低资源语言缺乏培训数据,交叉语言机器阅读理解(XMRC)是挑战。最近的方法仅使用培训数据,以资源丰富的语言,如英语到微调大规模的跨语法预训练的语言模型。由于语言之间的巨大差异,仅由源语言微调的模型可能无法对目标语言表现良好。有趣的是,我们观察到,虽然先前方法预测的前1个结果可能经常无法达到地面真理答案,但是正确的答案通常包含在Top-K预测结果中。基于这种观察,我们开发了一种两级方法来提高模型性能。召回的第一阶段目标:我们设计一个艰难的学习(HL)算法,以最大化顶级预测包含准确答案的可能性。第二阶段专注于精确:开发了答案感知对比学习(AA-CL)机制,以了解准确答案和其他候选者之间的细差异。我们的广泛实验表明,我们的模型在两个交叉语言MRC基准数据集上显着优于一系列强大的基线。
translated by 谷歌翻译