智能论文笔记

Model Predictive Spherical Image-Based Visual Servoing On $SO(3)$ for Aggressive Aerial Tracking

Chao Qin , Qiuyu Yu , Hugh H. T. Liu

分类：机器人

2022-12-19

This paper presents an image-based visual servo control (IBVS) method for a first-person-view (FPV) quadrotor to conduct aggressive aerial tracking. There are three major challenges to maneuvering an underactuated vehicle using IBVS: (i) finding a visual feature representation that is robust to large rotations and is suited to be an optimization variable; (ii) keeping the target visible without sacrificing the robot's agility; and (iii) compensating for the rotational effects in the detected features. We propose a complete design framework to address these problems. First, we employ a rotation on $SO(3)$ to represent a spherical image feature on $S^{2}$ to gain singularity-free and second-order differentiable properties. To ensure target visibility, we formulate the IBVS as a nonlinear model predictive control (NMPC) problem with three constraints taken into account: the robot's physical limits, target visibility, and time-to-collision (TTC). Furthermore, we propose a novel attitude-compensation scheme to enable formulating the visibility constraint in the actual image plane instead of a virtual fix-orientation image plane. It guarantees that the visibility constraint is valid under large rotations. Extensive experimental results show that our method can track a fast-moving target stably and aggressively without the aid of a localization system.

translated by 谷歌翻译

All-in-One: A Highly Representative DNN Pruning Framework for Edge Devices with Dynamic Power Management

Yifan Gong , Zheng Zhan , Pu Zhao , Yushu Wu , Chao Wu , Caiwen Ding , Weiwen Jiang , Minghai Qin , Yanzhi Wang

分类：机器学习 | 人工智能 | 计算机视觉

2022-12-09

During the deployment of deep neural networks (DNNs) on edge devices, many research efforts are devoted to the limited hardware resource. However, little attention is paid to the influence of dynamic power management. As edge devices typically only have a budget of energy with batteries (rather than almost unlimited energy support on servers or workstations), their dynamic power management often changes the execution frequency as in the widely-used dynamic voltage and frequency scaling (DVFS) technique. This leads to highly unstable inference speed performance, especially for computation-intensive DNN models, which can harm user experience and waste hardware resources. We firstly identify this problem and then propose All-in-One, a highly representative pruning framework to work with dynamic power management using DVFS. The framework can use only one set of model weights and soft masks (together with other auxiliary parameters of negligible storage) to represent multiple models of various pruning ratios. By re-configuring the model to the corresponding pruning ratio for a specific execution frequency (and voltage), we are able to achieve stable inference speed, i.e., keeping the difference in speed performance under various execution frequencies as small as possible. Our experiments demonstrate that our method not only achieves high accuracy for multiple models of different pruning ratios, but also reduces their variance of inference latency for various frequencies, with minimal memory consumption of only one model and one soft mask.

translated by 谷歌翻译

FoPro: Few-Shot Guided Robust Webly-Supervised Prototypical Learning

Yulei Qin , Xingyu Chen , Chao Chen , Yunhang Shen , Bo Ren , Yun Gu , Jie Yang , Chunhua Shen

分类：计算机视觉

2022-12-01

Recently, webly supervised learning (WSL) has been studied to leverage numerous and accessible data from the Internet. Most existing methods focus on learning noise-robust models from web images while neglecting the performance drop caused by the differences between web domain and real-world domain. However, only by tackling the performance gap above can we fully exploit the practical value of web datasets. To this end, we propose a Few-shot guided Prototypical (FoPro) representation learning method, which only needs a few labeled examples from reality and can significantly improve the performance in the real-world domain. Specifically, we initialize each class center with few-shot real-world data as the ``realistic" prototype. Then, the intra-class distance between web instances and ``realistic" prototypes is narrowed by contrastive learning. Finally, we measure image-prototype distance with a learnable metric. Prototypes are polished by adjacent high-quality web images and involved in removing distant out-of-distribution samples. In experiments, FoPro is trained on web datasets with a few real-world examples guided and evaluated on real-world datasets. Our method achieves the state-of-the-art performance on three fine-grained datasets and two large-scale datasets. Compared with existing WSL methods under the same few-shot settings, FoPro still excels in real-world generalization. Code is available at https://github.com/yuleiqin/fopro.

translated by 谷歌翻译

Modeling Perceptual Loudness of Piano Tone: Theory and Applications

Yang Qu , Yutian Qin , Lecheng Chao , Hangkai Qian , Ziyu Wang , Gus Xia

分类：机器学习

2022-09-21

在计算机音乐和心理声学中，感知响度与身体属性之间的关系是一个重要的主题。对“相等大通轮廓”的早期研究可以追溯到1920年代，从那以后，对强度和频率进行了测量的响度已被修订了多次。然而，大多数研究仅关注合成的声音，并且很少有合理的自然色调理论。为此，我们通过建模钢琴音调在本文中研究了天然音调感知的理论和应用。该理论部分包含：1）对音高的钢琴相等大小轮廓的准确测量，以及2）一个机器学习模型，能够纯粹基于基于人类主题测量的光谱特征来推断响度。至于应用程序，我们将理论应用于钢琴控制转移，其中我们调整了两个不同玩家钢琴（在不同的声学环境中）上的MIDI速度，以达到相同的感知效果。实验表明，我们的理论响度建模和相应的性能控制转移算法都显着优于其基准。

translated by 谷歌翻译

UniFormer: Unified Multi-view Fusion Transformer for Spatial-Temporal Representation in Bird's-Eye-View

Zequn Qin , Jingyu Chen , Chao Chen , Xiaozhi Chen , Xi Li

分类：计算机视觉

2022-07-18

Bird's Eye View（BEV）表示是一种基于空间融合的自动驾驶的新知觉公式。此外，在BEV表示中还引入了时间融合并获得了巨大的成功。在这项工作中，我们提出了一种统一空间和时间融合的新方法，并将它们合并为统一的数学公式。统一的融合不仅可以为BEV融合提供新的观点，而且还可以带来新的功能。借助拟议的统一时空融合，我们的方法可以支持远程融合，这在常规的BEV方法中很难实现。此外，我们工作中的BEV融合是时间自适应的，时间融合的重量是可以学习的。相比之下，常规方法主要使用固定权重和相等的权重进行时间融合。此外，拟议的统一融合可以避免在常规的BEV融合方法中丢失的信息，并充分利用功能。对Nuscenes数据集进行的广泛实验和消融研究表明，该方法的有效性，我们的方法在MAP分割任务中获得了最新性能。

translated by 谷歌翻译

Contextual Information-Directed Sampling

Botao Hao , Tor Lattimore , Chao Qin

分类：机器学习 | (统计)机器学习

2022-05-22

信息指导的采样（IDS）最近证明了其作为数据效率增强学习算法的潜力。但是，目前尚不清楚当可用上下文信息时，要优化的信息比的正确形式是什么。我们通过两个上下文强盗问题研究IDS设计：具有图形反馈和稀疏线性上下文匪徒的上下文强盗。我们证明了上下文ID比条件ID的优势，并强调考虑上下文分布的重要性。主要信息是，智能代理人应该在有条件的ID可能是近视的情况下对未来看不见的环境有益的行动进行更多的投资。我们进一步提出了基于Actor-Critic的上下文ID的计算效率版本，并在神经网络上下文的强盗上进行经验评估。

translated by 谷歌翻译

FlowFormer: A Transformer Architecture for Optical Flow

Zhaoyang Huang , Xiaoyu Shi , Chao Zhang , Qiang Wang , Ka Chun Cheung , Hongwei Qin , Jifeng Dai , Hongsheng Li

分类：计算机视觉

2022-03-30

我们介绍了光流变压器，被称为流动型，这是一种基于变压器的神经网络体系结构，用于学习光流。流动形式将图像对构建的4D成本量构成，将成本令牌编码为成本记忆，并在新颖的潜在空间中使用备用组变压器（AGT）层编码成本记忆，并通过反复的变压器解码器与动态位置成本查询来解码成本记忆。在SINTEL基准测试中，流动型在干净和最终通行证上达到1.144和2.183平均末端PONIT-ERROR（AEPE），从最佳发布的结果（1.388和2.47）降低了17.6％和11.6％的误差。此外，流程度还达到了强大的概括性能。在不接受Sintel的培训的情况下，FlowFormer在Sintel训练套装清洁通行证上达到了0.95 AEPE，优于最佳发布结果（1.29），提高了26.9％。

translated by 谷歌翻译

Adaptivity and Confounding in Multi-Armed Bandit Experiments

Chao Qin , Daniel Russo

分类：机器学习 | (统计)机器学习

2022-02-18

我们探索了一个新的强盗实验模型，其中潜在的非组织序列会影响武器的性能。上下文 - 统一算法可能会混淆，而那些执行正确的推理面部信息延迟的算法。我们的主要见解是，我们称之为Deconfounst Thompson采样的算法在适应性和健壮性之间取得了微妙的平衡。它的适应性在易于固定实例中带来了最佳效率，但是在硬性非平稳性方面显示出令人惊讶的弹性，这会导致其他自适应算法失败。

translated by 谷歌翻译

Optimal Fixed-Budget Best Arm Identification using the Augmented Inverse Probability Estimator in Two-Armed Gaussian Bandits with Unknown Variances

Masahiro Kato , Kaito Ariu , Masaaki Imaizumi , Masatoshi Uehara , Masahiro Nomura , Chao Qin

分类： (统计)机器学习 | 机器学习

2022-01-12

我们考虑使用未知差异的双臂高斯匪徒的固定预算最佳臂识别问题。当差异未知时，性能保证与下限的性能保证匹配的算法最紧密的下限和算法的算法很长。当算法不可知到ARM的最佳比例算法。在本文中，我们提出了一种策略，该策略包括在估计的ARM绘制的目标分配概率之后具有随机采样（RS）的采样规则，并且使用增强的反概率加权（AIPW）估计器通常用于因果推断文学。我们将我们的战略称为RS-AIPW战略。在理论分析中，我们首先推导出鞅的大偏差原理，当第二次孵化的均值时，可以使用，并将其应用于我们提出的策略。然后，我们表明，拟议的策略在错误识别的可能性达到了Kaufmann等人的意义上是渐近最佳的。（2016）当样品尺寸无限大而双臂之间的间隙变为零。

translated by 谷歌翻译

Optimal Simple Regret in Bayesian Best Arm Identification

Junpei Komiyama , Kaito Ariu , Masahiro Kato , Chao Qin

分类：机器学习 | (统计)机器学习

2021-11-18

我们考虑在多武装匪徒问题中拜耳最佳武器识别。假设先前的某些连续性条件，我们表征了贝叶斯简单遗憾的速度。与贝叶斯遗憾的不同（Lai，1987），贝叶斯简单遗憾的主要因素来自最佳和次优臂之间的差距小于$ \ sqrt {\ frac {\ log t} {t}}$。我们提出了一种简单且易于计算的算法，其前导因子与下限达到恒定因子;仿真结果支持我们的理论发现。

translated by 谷歌翻译