由于人类参与者的参与,收集培训对话系统的数据可能非常昂贵,并且需要广泛的注释。特别是在文档接地的对话系统中,人类专家需要仔细阅读非结构化文件以回答用户的问题。结果,现有的文档接地对话对话数据集相对较小,并且妨碍了对话系统的有效培训。在本文中,我们提出了一种通过生成对话模型在文档上接地的自动数据增强技术。对话模型由用户BOT和代理机器人组成,可以在给定输入文档的情况下合成不同的对话,然后用于训练下游模型。在补充原始数据集时,我们的方法可以实现对传统数据增强方法的显着改进。我们还在低资源环境中实现了良好的性能。
translated by 谷歌翻译
在典型的客户服务聊天方案中,客户联系支持中心以便帮助或提高投诉,人类代理商试图解决这些问题。在大多数情况下,在谈话结束时,要求代理人写一份简短的总结强调问题和建议的解决方案,通常是为了使其他可能需要处理同一客户或问题的其他代理商的利益。本文的目标是推进此任务的自动化。我们介绍了第一个大规模,高质量的客户服务对话框摘要数据集,接近6500人的注释摘要。数据基于现实世界的客户支持对话框,包括提取和抽象摘要。我们还介绍了一种特定于对话框的新无监督的提取摘要方法。
translated by 谷歌翻译
我们在面向任务为导向的对话框(TOD)的端到端学习中提出了一种新问题,其中对话系统模仿故障排除代理,该故障排除代理通过诊断其问题(例如,汽车而未启动)帮助用户。这些对话框基于特定于域的流程图,该代理在对话期间应该遵循代理。我们的任务暴露了神经TOD的新颖技术挑战,例如在没有显式注释的情况下对流程图的话语接地,当用户询问澄清问题时,提及额外的手动页面,以及在测试时间遵循看不见的流程图。我们释放由2,738个对话框组成的数据集(浮雕),该对话框为12个不同的故障排除流程图。我们还设计了一个神经模型,扑腾,它使用检索增强的生成架构来训练对话框。我们的实验发现,Flonet可以对未来的流程图进行零射流传输,并为未来的研究设定强大的基线。
translated by 谷歌翻译
The United States coastline spans 95,471 miles; a distance that cannot be effectively patrolled or secured by manual human effort alone. Unmanned Aerial Vehicles (UAVs) equipped with infrared cameras and deep-learning based algorithms represent a more efficient alternative for identifying and segmenting objects of interest - namely, ships. However, standard approaches to training these algorithms require large-scale datasets of densely labeled infrared maritime images. Such datasets are not publicly available and manually annotating every pixel in a large-scale dataset would have an extreme labor cost. In this work we demonstrate that, in the context of segmenting ships in infrared imagery, weakly-supervising an algorithm with sparsely labeled data can drastically reduce data labeling costs with minimal impact on system performance. We apply weakly-supervised learning to an unlabeled dataset of 7055 infrared images sourced from the Naval Air Warfare Center Aircraft Division (NAWCAD). We find that by sparsely labeling only 32 points per image, weakly-supervised segmentation models can still effectively detect and segment ships, with a Jaccard score of up to 0.756.
translated by 谷歌翻译
The paper presents a cross-domain review analysis on four popular review datasets: Amazon, Yelp, Steam, IMDb. The analysis is performed using Hadoop and Spark, which allows for efficient and scalable processing of large datasets. By examining close to 12 million reviews from these four online forums, we hope to uncover interesting trends in sales and customer sentiment over the years. Our analysis will include a study of the number of reviews and their distribution over time, as well as an examination of the relationship between various review attributes such as upvotes, creation time, rating, and sentiment. By comparing the reviews across different domains, we hope to gain insight into the factors that drive customer satisfaction and engagement in different product categories.
translated by 谷歌翻译
Visual language such as charts and plots is ubiquitous in the human world. Comprehending plots and charts requires strong reasoning skills. Prior state-of-the-art (SOTA) models require at least tens of thousands of training examples and their reasoning capabilities are still much limited, especially on complex human-written queries. This paper presents the first one-shot solution to visual language reasoning. We decompose the challenge of visual language reasoning into two steps: (1) plot-to-text translation, and (2) reasoning over the translated text. The key in this method is a modality conversion module, named as DePlot, which translates the image of a plot or chart to a linearized table. The output of DePlot can then be directly used to prompt a pretrained large language model (LLM), exploiting the few-shot reasoning capabilities of LLMs. To obtain DePlot, we standardize the plot-to-table task by establishing unified task formats and metrics, and train DePlot end-to-end on this task. DePlot can then be used off-the-shelf together with LLMs in a plug-and-play fashion. Compared with a SOTA model finetuned on more than >28k data points, DePlot+LLM with just one-shot prompting achieves a 24.0% improvement over finetuned SOTA on human-written queries from the task of chart QA.
translated by 谷歌翻译
Automated offensive language detection is essential in combating the spread of hate speech, particularly in social media. This paper describes our work on Offensive Language Identification in low resource Indic language Marathi. The problem is formulated as a text classification task to identify a tweet as offensive or non-offensive. We evaluate different mono-lingual and multi-lingual BERT models on this classification task, focusing on BERT models pre-trained with social media datasets. We compare the performance of MuRIL, MahaTweetBERT, MahaTweetBERT-Hateful, and MahaBERT on the HASOC 2022 test set. We also explore external data augmentation from other existing Marathi hate speech corpus HASOC 2021 and L3Cube-MahaHate. The MahaTweetBERT, a BERT model, pre-trained on Marathi tweets when fine-tuned on the combined dataset (HASOC 2021 + HASOC 2022 + MahaHate), outperforms all models with an F1 score of 98.43 on the HASOC 2022 test set. With this, we also provide a new state-of-the-art result on HASOC 2022 / MOLD v2 test set.
translated by 谷歌翻译
Free-text rationales (FTRs) follow how humans communicate by explaining reasoning processes via natural language. A number of recent works have studied how to improve language model (LM) generalization by using FTRs to teach LMs the correct reasoning processes behind correct task outputs. These prior works aim to learn from FTRs by appending them to the LM input or target output, but this may introduce an input distribution shift or conflict with the task objective, respectively. We propose KNIFE, which distills FTR knowledge from an FTR-augmented teacher LM (takes both task input and FTR) to a student LM (takes only task input), which is used for inference. Crucially, the teacher LM's forward computation has a bottleneck stage in which all of its FTR states are masked out, which pushes knowledge from the FTR states into the task input/output states. Then, FTR knowledge is distilled to the student LM by training its task input/output states to align with the teacher LM's. On two question answering datasets, we show that KNIFE significantly outperforms existing FTR learning methods, in both fully-supervised and low-resource settings.
translated by 谷歌翻译
Visual language data such as plots, charts, and infographics are ubiquitous in the human world. However, state-of-the-art vision-language models do not perform well on these data. We propose MatCha (Math reasoning and Chart derendering pretraining) to enhance visual language models' capabilities in jointly modeling charts/plots and language data. Specifically, we propose several pretraining tasks that cover plot deconstruction and numerical reasoning which are the key capabilities in visual language modeling. We perform the MatCha pretraining starting from Pix2Struct, a recently proposed image-to-text visual language model. On standard benchmarks such as PlotQA and ChartQA, the MatCha model outperforms state-of-the-art methods by as much as nearly 20%. We also examine how well MatCha pretraining transfers to domains such as screenshots, textbook diagrams, and document figures and observe overall improvement, verifying the usefulness of MatCha pretraining on broader visual language tasks.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译