智能论文笔记

Minimal Neural Atlas: Parameterizing Complex Surfaces with Minimal Charts and Distortion

Weng Fei Low , Gim Hee Lee

分类：计算机视觉 | 人工智能

2022-07-29

显式神经表面表示可以在任意精度上精确有效地提取编码表面，以及差异几何特性（例如表面正常和曲率）的分析推导。这种理想的属性在其隐式对应物中没有，使其非常适合计算机视觉，图形和机器人技术中的各种应用。但是，SOTA的作品在可以有效描述的拓扑结构方面受到限制，它引入了重建复杂表面和模型效率的失真。在这项工作中，我们提出了最小的神经图集，这是一种新型基于地图集的显式神经表面表示。从其核心处是一个完全可学习的参数域，由在参数空间的开放正方形上定义的隐式概率占用字段给出。相比之下，先前的工作通常预定了参数域。附加的灵活性使图表能够允许任意拓扑和边界。因此，我们的表示形式可以学习3个图表的最小地图集，这些图表对任意拓扑表面的表面（包括具有任意连接的组件的闭合和开放表面），具有变形最小的参数化。我们的实验支持了这一假设，并表明我们的重建在整体几何形状方面更为准确，这是由于对拓扑和几何形状的关注所分离。

translated by 谷歌翻译

SimCLAD: A Simple Framework for Contrastive Learning of Acronym Disambiguation

Bin Li , Fei Xia , Yixuan Weng , Xiusheng Huang , Bin Sun , Shutao Li

分类：自然语言处理 | 人工智能

2021-11-29

首字母缩略词歧义意味着从字典中找到一个暧昧的缩写的正确含义，该词典是科学文档理解的关键点之一（SDU @ Aaai-22）。最近，许多尝试通过微调预先训练的屏蔽语言模型（MLMS）来试图解决这个问题，以获得更好的缩写表示。然而，首字母缩写含义在不同的上下文中变化，其对应的句子表示是用整个表示空间的窄子集占据的各向异性分布。来自预先训练的MLM的这种表示不适合来自给定字典的缩写歧义。在本文中，我们提出了一个简单的框架，用于比较歧义（SIMCLAD）方法的对比学习，以更好地了解缩略语意义。具体而言，我们设计了一种新的持续对比预训练方法，通过学习首字母句话表现的各向同性和歧视性分布来增强预先训练的模型的泛化能力。结果对英语科学领域的缩写歧义表明，该方法优于所有其他竞争最先进的（SOTA）方法。

translated by 谷歌翻译

PSG: Prompt-based Sequence Generation for Acronym Extraction

Bin Li , Fei Xia , Yixuan Weng , Xiusheng Huang , Bin Sun , Shutao Li

分类：自然语言处理 | 人工智能

2021-11-29

首字母缩略词提取旨在从文件中找到首字母缩略词（即，短文）及其含义（即，长形式），这对于科学文件理解（SDU @ Aaai-22）任务很重要。以前的作品致力于将此任务建模为段落级序列标记问题。但是，它缺乏有效利用外部知识，尤其是当数据集处于低资源设置时。最近，具有庞大培训的语言模型的基于及时的方法可以显着提高低资源下游任务的性能。在本文中，我们提出了一种用于缩写式提取任务的基于行的序列生成（PSG）方法。具体来说，我们设计一个模板，用于提示带有自动回归的提取的缩写文本。位置提取算法旨在提取所生成答案的位置。在低资源设置中越南语和波斯语的缩写提取的结果表明，所提出的方法优于所有其他竞争全能（SOTA）方法。

translated by 谷歌翻译

Vision-Based Environmental Perception for Autonomous Driving

Fei Liu , Zihao Lu , Xianke Lin

分类：计算机视觉

2022-12-22

Visual perception plays an important role in autonomous driving. One of the primary tasks is object detection and identification. Since the vision sensor is rich in color and texture information, it can quickly and accurately identify various road information. The commonly used technique is based on extracting and calculating various features of the image. The recent development of deep learning-based method has better reliability and processing speed and has a greater advantage in recognizing complex elements. For depth estimation, vision sensor is also used for ranging due to their small size and low cost. Monocular camera uses image data from a single viewpoint as input to estimate object depth. In contrast, stereo vision is based on parallax and matching feature points of different views, and the application of deep learning also further improves the accuracy. In addition, Simultaneous Location and Mapping (SLAM) can establish a model of the road environment, thus helping the vehicle perceive the surrounding environment and complete the tasks. In this paper, we introduce and compare various methods of object detection and identification, then explain the development of depth estimation and compare various methods based on monocular, stereo, and RDBG sensors, next review and compare various methods of SLAM, and finally summarize the current problems and present the future development trends of vision technologies.

translated by 谷歌翻译

DAMP: Doubly Aligned Multilingual Parser for Task-Oriented Dialogue

William Held , Christopher Hidey , Fei Liu , Eric Zhu , Rahul Goel , Diyi Yang , Rushin Shah

分类：自然语言处理 | 机器学习

2022-12-15

Modern virtual assistants use internal semantic parsing engines to convert user utterances to actionable commands. However, prior work has demonstrated that semantic parsing is a difficult multilingual transfer task with low transfer efficiency compared to other tasks. In global markets such as India and Latin America, this is a critical issue as switching between languages is prevalent for bilingual users. In this work we dramatically improve the zero-shot performance of a multilingual and codeswitched semantic parsing system using two stages of multilingual alignment. First, we show that constrastive alignment pretraining improves both English performance and transfer efficiency. We then introduce a constrained optimization approach for hyperparameter-free adversarial alignment during finetuning. Our Doubly Aligned Multilingual Parser (DAMP) improves mBERT transfer performance by 3x, 6x, and 81x on the Spanglish, Hinglish and Multilingual Task Oriented Parsing benchmarks respectively and outperforms XLM-R and mT5-Large using 3.2x fewer parameters.

translated by 谷歌翻译

Concealed Object Detection for Passive Millimeter-Wave Security Imaging Based on Task-Aligned Detection Transformer

Cheng Guo , Fei Hu , Yan Hu

分类：计算机视觉

2022-12-01

Passive millimeter-wave (PMMW) is a significant potential technique for human security screening. Several popular object detection networks have been used for PMMW images. However, restricted by the low resolution and high noise of PMMW images, PMMW hidden object detection based on deep learning usually suffers from low accuracy and low classification confidence. To tackle the above problems, this paper proposes a Task-Aligned Detection Transformer network, named PMMW-DETR. In the first stage, a Denoising Coarse-to-Fine Transformer (DCFT) backbone is designed to extract long- and short-range features in the different scales. In the second stage, we propose the Query Selection module to introduce learned spatial features into the network as prior knowledge, which enhances the semantic perception capability of the network. In the third stage, aiming to improve the classification performance, we perform a Task-Aligned Dual-Head block to decouple the classification and regression tasks. Based on our self-developed PMMW security screening dataset, experimental results including comparison with State-Of-The-Art (SOTA) methods and ablation study demonstrate that the PMMW-DETR obtains higher accuracy and classification confidence than previous works, and exhibits robustness to the PMMW images of low quality.

translated by 谷歌翻译

MetaNetwork: A Task-agnostic Network Parameters Generation Framework for Improving Device Model Generalization

Zheqi Lv , Feng Wang , Kun Kuang , Yongwei Wang , Zhengyu Chen , Tao Shen , Hongxia Yang , Fei Wu

分类：人工智能 | 计算机视觉

2022-09-12

在移动设备上部署机器学习模型已引起越来越多的关注。为了解决设备上硬件资源的局限性解决模型概括问题，设备模型需要通过诸如云模型的模型压缩等技术轻量级。但是，改善设备模型概括的主要障碍是云数据和设备模型数据之间的分布变化，因为设备模型上的数据分布通常会随着时间而变化（例如，用户在建议系统中可能具有不同的偏好）。尽管实时微调和蒸馏方法考虑到了这种情况，但这些方法需要进行设备训练，由于计算能力较低和设备上缺乏实时标记样品，因此实际上是不可行的。在本文中，我们提出了一个名为Metanetwork的新型任务无关框架，用于从云中生成自适应设备模型参数，而无需进行设备训练。具体而言，我们的元网络部署在云上，由元培养剂和转移器模块组成。 Metagenerator旨在学习从样本到模型参数的映射函数，并且可以根据从设备上传到云的样本生成和传递自适应参数到设备。转移剂旨在减少元烯剂的振荡，加速收敛并在训练和推理过程中提高模型性能。我们使用三个数据集评估了两个任务的方法。广泛的实验表明，元网可以以不同的方式实现竞争性能。

translated by 谷歌翻译

Personalizing Intervened Network for Long-tailed Sequential User Behavior Modeling

Zheqi Lv , Feng Wang , Shengyu Zhang , Kun Kuang , Hongxia Yang , Fei Wu

分类：人工智能

2022-08-19

在信息爆炸的时代，推荐系统通过促进内容探索在人们的日常生活中起着重要作用。众所周知，用户的活动性，即行为数量，倾向于遵循长尾分布，大多数用户的积极性低。在实践中，我们观察到，在联合培训后，尾巴用户的质量推荐率明显低于首席用户。我们进一步确定，由于数据有限，因此在尾巴用户上训练的模型仍然取得了较低的结果。尽管长尾分布在推荐系统中无处不在，但在研究和行业中，提高尾巴用户的推荐性能仍然仍然是挑战。直接应用长尾分配的相关方法可能有可能伤害首席用户的经验，这是不起作用的，因为一小部分具有高积极性的首席用户贡献了平台收入的一部分。在本文中，我们提出了一种新颖的方法，可以显着提高尾巴用户的建议性能，同时至少在基本模型上为首席用户提供至少可比的性能。这种方法的本质是一种新颖的梯度聚合技术，该技术将所有用户共享的常识知识分为主干模型，然后为Head用户和Tail用户个性化提供单独的插件预测网络。至于常识学习，我们利用因果关系理论的向后调整来消除梯度估计，从而掩盖了混杂因素的骨干训练，即用户的积极性。我们对两个公共建议基准数据集和一个从支撑台平台收集的大规模工业数据集进行了广泛的实验。实证研究验证了我们方法的合理性和有效性。

translated by 谷歌翻译

A Linear and Exact Algorithm for Whole-Body Collision Evaluation via Scale Optimization

Qianhao Wang , Zhepei Wang , Fei Gao

分类：机器人

2022-08-12

碰撞评估在各种应用中至关重要。但是，现有方法要么很麻烦地计算出实际值的差距。在本文中，我们提出了一个零范围的全身碰撞评估，该评估可以作为低维线性程序的配方。该评估可以在O（M）计算时间分析上解决，其中M是该线性程序中线性不平等的总数。此外，提出的方法有效地获得了其梯度，因此可以轻松地应用于基于优化的应用程序。

translated by 谷歌翻译

MILAN: Masked Image Pretraining on Language Assisted Representation

Zejiang Hou , Fei Sun , Yen-Kuang Chen , Yuan Xie , Sun-Yuan Kung

分类：计算机视觉 | 自然语言处理 | 机器学习

2022-08-11

在过去的几年中，基于自我注意力的变压器模型一直在主导许多计算机视觉任务。它们的出色模型质量在很大程度上取决于标记过多的图像数据集。为了减少对大型标记数据集的依赖，基于重建的掩盖自动编码器正在获得流行，这些自动编码器从未标记的图像中学习了高质量的可转移表示形式。出于同样的目的，最近弱监督的图像预处理方法探索了图像随附的文本字幕的语言监督。在这项工作中，我们提出了对语言辅助代表的预读图像，称为米兰。我们的预处理目标不是预测原始像素或低级别的特征，而是用使用字幕监督获得的大量语义信号来重建图像特征。此外，为了适应我们的重建目标，我们提出了更有效的促使解码器体系结构和语义意识到的掩码采样机制，从而进一步推进了预告片模型的传输性能。实验结果表明，米兰的精度比以前的工作更高。当掩盖的自动编码器在ImagEnet-1K数据集上进行了预估计并以224x224的输入分辨率进行了填充时，米兰在VITB/16上的前1位准确性达到了85.4％，使以前的先前最先前的艺术品达到1％。在下游的语义分割任务中，米兰在ADE20K数据集上使用VIT-B/16骨架达到52.7 MIOU，表现优于先前的蒙版预读结果4分。

translated by 谷歌翻译