We introduce \textsc{PoliteRewrite} -- a dataset for polite language rewrite which is a novel sentence rewrite task. Compared with previous text style transfer tasks that can be mostly addressed by slight token- or phrase-level edits, polite language rewrite requires deep understanding and extensive sentence-level edits over an offensive and impolite sentence to deliver the same message euphemistically and politely, which is more challenging -- not only for NLP models but also for human annotators to rewrite with effort. To alleviate the human effort for efficient annotation, we first propose a novel annotation paradigm by a collaboration of human annotators and GPT-3.5 to annotate \textsc{PoliteRewrite}. The released dataset has 10K polite sentence rewrites annotated collaboratively by GPT-3.5 and human, which can be used as gold standard for training, validation and test; and 100K high-quality polite sentence rewrites by GPT-3.5 without human review. We wish this work (The dataset (10K+100K) will be released soon) could contribute to the research on more challenging sentence rewrite, and provoke more thought in future on resource annotation paradigm with the help of the large-scaled pretrained models.
translated by 谷歌翻译
We propose eXtensible Prompt (X-Prompt) for prompting a large language model (LLM) beyond natural language (NL). X-Prompt instructs an LLM with not only NL but also an extensible vocabulary of imaginary words that are introduced to help represent what NL words hardly describe, allowing a prompt to be more descriptive. Like NL prompts, X-Prompt is out-of-distribution (OOD) robust, for which we propose context-guided learning with prompt augmentation to learn its imaginary words for general usability, enabling them to use in different prompt contexts for fine-grain specifications. The promising results of X-Prompt demonstrate its potential of approaching advanced interaction between humans and LLMs to bridge their communication gap.
translated by 谷歌翻译
The role of mobile cameras increased dramatically over the past few years, leading to more and more research in automatic image quality enhancement and RAW photo processing. In this Mobile AI challenge, the target was to develop an efficient end-to-end AI-based image signal processing (ISP) pipeline replacing the standard mobile ISPs that can run on modern smartphone GPUs using TensorFlow Lite. The participants were provided with a large-scale Fujifilm UltraISP dataset consisting of thousands of paired photos captured with a normal mobile camera sensor and a professional 102MP medium-format FujiFilm GFX100 camera. The runtime of the resulting models was evaluated on the Snapdragon's 8 Gen 1 GPU that provides excellent acceleration results for the majority of common deep learning ops. The proposed solutions are compatible with all recent mobile GPUs, being able to process Full HD photos in less than 20-50 milliseconds while achieving high fidelity results. A detailed description of all models developed in this challenge is provided in this paper.
translated by 谷歌翻译
Temporal modeling is crucial for various video learning tasks. Most recent approaches employ either factorized (2D+1D) or joint (3D) spatial-temporal operations to extract temporal contexts from the input frames. While the former is more efficient in computation, the latter often obtains better performance. In this paper, we attribute this to a dilemma between the sufficiency and the efficiency of interactions among various positions in different frames. These interactions affect the extraction of task-relevant information shared among frames. To resolve this issue, we prove that frame-by-frame alignments have the potential to increase the mutual information between frame representations, thereby including more task-relevant information to boost effectiveness. Then we propose Alignment-guided Temporal Attention (ATA) to extend 1-dimensional temporal attention with parameter-free patch-level alignments between neighboring frames. It can act as a general plug-in for image backbones to conduct the action recognition task without any model-specific design. Extensive experiments on multiple benchmarks demonstrate the superiority and generality of our module.
translated by 谷歌翻译
速度控制预测是驾驶员行为分析中一个具有挑战性的问题,旨在预测驾驶员在控制车速(例如制动或加速度)中的未来行动。在本文中,我们尝试仅使用以自我为中心的视频数据来应对这一挑战,与使用第三人称视图数据或额外的车辆传感器数据(例如GPS或两者)的文献中的大多数作品相比。为此,我们提出了一个基于新型的图形卷积网络(GCN)网络,即Egospeed-net。我们的动机是,随着时间的推移,对象的位置变化可以为我们提供非常有用的线索,以预测未来的速度变化。我们首先使用完全连接的图形图将每个类的对象之间的空间关系建模,并在其上应用GCN进行特征提取。然后,我们利用一个长期的短期内存网络将每个类别的此类特征随着时间的流逝融合到矢量中,加入此类矢量并使用多层perceptron分类器预测速度控制动作。我们在本田研究所驾驶数据集上进行了广泛的实验,并证明了Egospeed-NET的出色性能。
translated by 谷歌翻译
具有高分辨率(HR)的磁共振成像(MRI)提供了更详细的信息,以进行准确的诊断和定量图像分析。尽管取得了重大进展,但大多数现有的医学图像重建网络都有两个缺陷:1)所有这些缺陷都是在黑盒原理中设计的,因此缺乏足够的解释性并进一步限制其实际应用。可解释的神经网络模型引起了重大兴趣,因为它们在处理医学图像时增强了临床实践所需的可信赖性。 2)大多数现有的SR重建方法仅使用单个对比度或使用简单的多对比度融合机制,从而忽略了对SR改进至关重要的不同对比度之间的复杂关系。为了解决这些问题,在本文中,提出了一种新颖的模型引导的可解释的深层展开网络(MGDUN),用于医学图像SR重建。模型引导的图像SR重建方法求解手动设计的目标函数以重建HR MRI。我们通过将MRI观察矩阵和显式多对比度关系矩阵考虑到末端到端优化期间,将迭代的MGDUN算法展示为新型模型引导的深层展开网络。多对比度IXI数据集和Brats 2019数据集进行了广泛的实验,证明了我们提出的模型的优势。
translated by 谷歌翻译
当前的文本到视频检索方法(T2VR)经过培训和测试,并在视频捕获方向的数据集(例如MSVD,MSR-VTT和VATEX)上进行了测试。这些数据集的一个关键属性是,假定视频在短时间内被暂时预先修剪,而提供的字幕很好地描述了视频内容的要旨。因此,对于给定的配对视频和标题,该视频应该与标题完全相关。但是,实际上,由于查询尚不清楚,因此预处理的视频剪辑可能不包含足够的内容来完全满足查询。这表明文学与现实世界之间存在差距。为了填补空白,我们在本文中提出了一个新颖的T2VR子任务,称为部分相关的视频检索(PRVR)。未修剪的视频被认为是部分相关的W.R.T.给定的文本查询是否包含与查询相关的时刻。 PRVR旨在从大量未修剪视频中检索此类相关视频。 PRVR与单个视频时刻检索和视频语料库时刻的检索有所不同,因为后两个是要检索时刻而不是未修剪的视频。我们将PRVR作为多个实例学习(MIL)问题,同时将视频视为一袋视频片段和一袋视频帧。剪辑和帧表示不同时间尺度的视频内容。我们提出了一个多尺度的相似性学习(MS-SL)网络,该网络共同学习PRVR的剪辑规模和框架尺度相似性。在三个数据集(TVR,ActivityNet字幕和Charades-STA)上进行了广泛的实验,证明了该方法的可行性。我们还表明,我们的方法可用于改善视频语料库时刻的检索。
translated by 谷歌翻译
住房质量是区域财富,安全和健康的重要代理。了解住房质量的分布对于揭示农村发展状况并提供政治建议至关重要。但是,目前的农村房屋质量数据在很大程度上取决于在国家或省级的自上而下,耗时的调查,但未能在村庄一级解开住房质量。为了填补准确描述农村住房质量条件和数据不足之间的空白,我们收集大量的农村图像,并邀请用户按大规模评估其住房质量。此外,提出了一个深度学习框架,以根据众包农村图像自动有效地预测住房质量。
translated by 谷歌翻译
本文报告了建立在线语言学习工具的进步,以通过使用对话系统作为对话实践伙伴为学习者提供对话体验。我们的系统可以随时适应用户的语言水平。我们还提供自动语法错误反馈,以帮助用户从错误中学习。根据我们的第一个采用者,我们的系统娱乐和有用。此外,我们将为学习技术社区提供有关语言学习和语法校正的大规模对话数据集。我们的下一步是通过使用强化学习算法使我们的系统更适应用户配置文件。
translated by 谷歌翻译
我们努力努力探索的任务很少,名为Insbestantial对象检测(IOD),该任务旨在以以下特征定位对象:(1)具有不明显的边界的无定形形状; (2)与周围环境相似; (3)颜色不存在。因此,在单个静态框架中区分不理性对象是更具挑战性的,而空间和时间信息的协作表示至关重要。因此,我们构建了一个由600个视频(141,017帧)组成的iod-video数据集,其中涵盖了各种距离,尺寸,可见性和不同光谱范围捕获的场景。此外,我们为IOD开发了一个时空聚合框架,其中部署了不同的骨架,并精心设计了时空聚合损失(Staloss),以利用沿时轴的一致性来利用一致性。在IOD-VIDEO数据集上进行的实验表明,时空聚集可以显着改善IOD的性能。我们希望我们的工作能够吸引进一步的研究,以完成这项有价值但充满挑战的任务。该代码将在:\ url {https://github.com/calayzhou/iod-video}上可用。
translated by 谷歌翻译