Binaural rendering of ambisonic signals is of broad interest to virtual reality and immersive media. Conventional methods often require manually measured Head-Related Transfer Functions (HRTFs). To address this issue, we collect a paired ambisonic-binaural dataset and propose a deep learning framework in an end-to-end manner. Experimental results show that neural networks outperform the conventional method in objective metrics and achieve comparable subjective metrics. To validate the proposed framework, we experimentally explore different settings of the input features, model structures, output features, and loss functions. Our proposed system achieves an SDR of 7.32 and MOSs of 3.83, 3.58, 3.87, 3.58 in quality, timbre, localization, and immersion dimensions.
translated by 谷歌翻译
尽管在情感分析方面取得了巨大的成功,但现有的神经模型在隐式情感分析中挣扎。这可能是由于它们可能会锁定虚假的相关性(例如,“捷径”,例如,仅关注明确的情感词),从而破坏了学习模型的有效性和鲁棒性。在这项工作中,我们提出了一种使用仪器变量(ISAIV)的因果干预模型,用于隐式情感分析。我们首先从因果角度审查情感分析,并分析此任务中存在的混杂因素。然后,我们引入了一个仪器变量,以消除混杂的因果效应,从而在句子和情感之间提取纯粹的因果效应。我们将所提出的ISAIV模型与几个强大的基线进行比较,同时是一般的隐式情感分析和基于方面的隐式情感分析任务。结果表明我们模型的巨大优势以及隐性情感推理的功效。
translated by 谷歌翻译
低光环境对强大的无人驾驶汽车(UAV)跟踪也构成了巨大的挑战,即使使用最新的(SOTA)跟踪器,由于潜在的图像特征在不利的光条件下很难提取。此外,由于可见性较低,人类监视器的准确在线选择也极为难以在地面控制站中初始化无人机跟踪。为了解决这些问题,这项工作提出了一个新颖的增强剂,即凸线网,以点燃人类操作员和无人机跟踪器的潜在对象。通过采用变压器,LightlightNet可以根据全局特征调整增强参数,因此可以适应照明变化。引入了像素级范围掩模,以使光明网络更加专注于没有光源的跟踪对象和区域的增强。此外,建立了一种软截断机制,以防止背景噪声被误认为关键特征。对图像增强基准测试的评估表明,光明网络在促进人类感知方面具有优势。公共Uavdark135基准进行的实验表明,HightlightNet比其他SOTA低光增强剂更适合无人机跟踪任务。此外,在典型的无人机平台上进行的现实世界测试验证了HightlightNet在夜间航空跟踪相关应用中的实用性和效率。代码和演示视频可在https://github.com/vision4robotics/highlightnet上找到。
translated by 谷歌翻译
基于变压器的视觉对象跟踪已广泛使用。但是,变压器结构缺乏足够的电感偏差。此外,仅专注于编码全局功能会损害建模本地细节,这限制了航空机器人中跟踪的能力。具体而言,通过局部模型为全球搜索机制,提出的跟踪器将全局编码器替换为新型的局部识别编码器。在使用的编码器中,仔细设计了局部识别的关注和局部元素校正网络,以减少全局冗余信息干扰和增加局部归纳偏见。同时,后者可以通过详细信息网络准确地在空中视图下对本地对象详细信息进行建模。所提出的方法在几种权威的空中基准中实现了竞争精度和鲁棒性,总共有316个序列。拟议的跟踪器的实用性和效率已通过现实世界测试得到了验证。
translated by 谷歌翻译
通过仅使用训练有素的分类器,模型内(MI)攻击可以恢复用于训练分类器的数据,从而导致培训数据的隐私泄漏。为了防止MI攻击,先前的工作利用单方面依赖优化策略,即,在培训分类器期间,最大程度地减少了输入(即功能)和输出(即标签)之间的依赖关系。但是,这样的最小化过程与最小化监督损失相冲突,该损失旨在最大程度地提高输入和输出之间的依赖关系,从而在模型鲁棒性针对MI攻击和模型实用程序上对分类任务进行明确的权衡。在本文中,我们旨在最大程度地减少潜在表示和输入之间的依赖性,同时最大化潜在表示和输出之间的依赖关系,称为双边依赖性优化(BIDO)策略。特别是,除了对深神经网络的常用损失(例如,跨渗透性)外,我们还将依赖性约束用作普遍适用的正常化程序,可以根据不同的任务将其实例化使用适当的依赖标准。为了验证我们策略的功效,我们通过使用两种不同的依赖性度量提出了两种BIDO的实施:具有约束协方差的Bido(Bido-Coco)(Bido-Coco)和Bido具有Hilbert-Schmidt独立标准(Bido-HSIC)。实验表明,比多(Bido防御MI攻击的道路。
translated by 谷歌翻译
基于无人机(UAV)基于无人机的视觉对象跟踪已实现了广泛的应用,并且由于其多功能性和有效性而引起了智能运输系统领域的越来越多的关注。作为深度学习革命性趋势的新兴力量,暹罗网络在基于无人机的对象跟踪中闪耀,其准确性,稳健性和速度有希望的平衡。由于开发了嵌入式处理器和深度神经网络的逐步优化,暹罗跟踪器获得了广泛的研究并实现了与无人机的初步组合。但是,由于无人机在板载计算资源和复杂的现实情况下,暹罗网络的空中跟踪仍然在许多方面都面临严重的障碍。为了进一步探索基于无人机的跟踪中暹罗网络的部署,这项工作对前沿暹罗跟踪器进行了全面的审查,以及使用典型的无人机板载处理器进行评估的详尽无人用分析。然后,进行板载测试以验证代表性暹罗跟踪器在现实世界无人机部署中的可行性和功效。此外,为了更好地促进跟踪社区的发展,这项工作分析了现有的暹罗跟踪器的局限性,并进行了以低弹片评估表示的其他实验。最后,深入讨论了基于无人机的智能运输系统的暹罗跟踪的前景。领先的暹罗跟踪器的统一框架,即代码库及其实验评估的结果,请访问https://github.com/vision4robotics/siamesetracking4uav。
translated by 谷歌翻译
自动驾驶可以感知其周围的决策,这是视觉感知中最复杂的情​​况之一。范式创新在解决2D对象检测任务方面的成功激发了我们寻求优雅,可行和可扩展的范式,以从根本上推动该领​​域的性能边界。为此,我们在本文中贡献了BEVDET范式。 BEVDET在鸟眼视图(BEV)中执行3D对象检测,其中大多数目标值被定义并可以轻松执行路线计划。我们只是重复使用现有模块来构建其框架,但通过构建独家数据增强策略并升级非最大抑制策略来实质性地发展其性能。在实验中,BEVDET在准确性和时间效率之间提供了极好的权衡。作为快速版本,nuscenes val设置的BEVDET微小分数为31.2%的地图和39.2%的NDS。它与FCOS3D相当,但仅需要11%的计算预算为215.3 GFLOPS,并且在15.6 fps的速度中运行的速度快9.2倍。另一个称为BEVDET基本的高精度版本得分为39.3%的地图和47.2%的NDS,大大超过了所有已发布的结果。具有可比的推理速度,它超过了 +9.8%地图和 +10.0%ND的大幅度的FCOS3D。源代码可在https://github.com/huangjunjie2017/bevdet上公开研究。
translated by 谷歌翻译
As a crucial robotic perception capability, visual tracking has been intensively studied recently. In the real-world scenarios, the onboard processing time of the image streams inevitably leads to a discrepancy between the tracking results and the real-world states. However, existing visual tracking benchmarks commonly run the trackers offline and ignore such latency in the evaluation. In this work, we aim to deal with a more realistic problem of latency-aware tracking. The state-of-the-art trackers are evaluated in the aerial scenarios with new metrics jointly assessing the tracking accuracy and efficiency. Moreover, a new predictive visual tracking baseline is developed to compensate for the latency stemming from the onboard computation. Our latency-aware benchmark can provide a more realistic evaluation of the trackers for the robotic applications. Besides, exhaustive experiments have proven the effectiveness of the proposed predictive visual tracking baseline approach.
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Accurate determination of a small molecule candidate (ligand) binding pose in its target protein pocket is important for computer-aided drug discovery. Typical rigid-body docking methods ignore the pocket flexibility of protein, while the more accurate pose generation using molecular dynamics is hindered by slow protein dynamics. We develop a tiered tensor transform (3T) algorithm to rapidly generate diverse protein-ligand complex conformations for both pose and affinity estimation in drug screening, requiring neither machine learning training nor lengthy dynamics computation, while maintaining both coarse-grain-like coordinated protein dynamics and atomistic-level details of the complex pocket. The 3T conformation structures we generate are closer to experimental co-crystal structures than those generated by docking software, and more importantly achieve significantly higher accuracy in active ligand classification than traditional ensemble docking using hundreds of experimental protein conformations. 3T structure transformation is decoupled from the system physics, making future usage in other computational scientific domains possible.
translated by 谷歌翻译