Recent research in clustering face embeddings has found that unsupervised, shallow, heuristic-based methods -- including $k$-means and hierarchical agglomerative clustering -- underperform supervised, deep, inductive methods. While the reported improvements are indeed impressive, experiments are mostly limited to face datasets, where the clustered embeddings are highly discriminative or well-separated by class (Recall@1 above 90% and often nearing ceiling), and the experimental methodology seemingly favors the deep methods. We conduct a large-scale empirical study of 17 clustering methods across three datasets and obtain several robust findings. Notably, deep methods are surprisingly fragile for embeddings with more uncertainty, where they match or even perform worse than shallow, heuristic-based methods. When embeddings are highly discriminative, deep methods do outperform the baselines, consistent with past results, but the margin between methods is much smaller than previously reported. We believe our benchmarks broaden the scope of supervised clustering methods beyond the face domain and can serve as a foundation on which these methods could be improved. To enable reproducibility, we include all necessary details in the appendices, and plan to release the code.
translated by 谷歌翻译
深度神经网络在数据流是I.I.D的规范环境中的预测和分类任务上表现良好,标记的数据很丰富,并且类标签平衡。随着分配变化的挑战,包括非平稳或不平衡数据流。解决了这一挑战的一种强大方法是在大量未标记的数据上对大型编码器进行自我监督的预处理,然后进行特定于任务的调整。鉴于一项新任务,更新这些编码器的权重是具有挑战性的,因为需要微调大量权重,因此,他们忘记了有关先前任务的信息。在目前的工作中,我们提出了一个模型体系结构来解决此问题,以一个离散的瓶颈为基础,其中包含成对的单独和可学习的(键,价值)代码。在此设置中,我们遵循编码;通过离散瓶颈处理表示形式;和解码范式,其中输入被馈送到预处理的编码器中,编码器的输出用于选择最近的键,并将相应的值馈送到解码器以求解当前任务。该模型只能在推理过程中获取和重复使用有限数量的这些(密钥,值)对,从而启用本地化和上下文依赖的模型更新。从理论上讲,我们研究了所提出的模型最小化分布的影响的能力,并表明与(键,值)配对的这种离散瓶颈降低了假设类别的复杂性。我们经验验证了提出的方法在各种基准数据集的挑战性分配转移方案下的好处,并表明所提出的模型将共同的脆弱性降低到非i.i.d。与其他各种基线相比,非平稳培训分布。
translated by 谷歌翻译
视觉世界可以以稀疏相互作用的不同实体来嘲笑。在动态视觉场景中发现这种组合结构已被证明对端到端的计算机视觉方法有挑战,除非提供明确的实例级别的监督。利用运动提示的基于老虎机的模型最近在学习代表,细分和跟踪对象的情况下没有直接监督显示了巨大的希望,但是它们仍然无法扩展到复杂的现实世界多对象视频。为了弥合这一差距,我们从人类发展中汲取灵感,并假设以深度信号形式的场景几何形状的信息可以促进以对象为中心的学习。我们介绍了一种以对象为中心的视频模型SAVI ++,该模型经过训练,可以预测基于插槽的视频表示的深度信号。通过进一步利用模型缩放的最佳实践,我们能够训练SAVI ++以细分使用移动摄像机记录的复杂动态场景,其中包含在自然主义背景上具有不同外观的静态和移动对象,而无需进行分割监督。最后,我们证明,通过使用从LIDAR获得的稀疏深度信号,Savi ++能够从真实World Waymo Open DataSet中的视频中学习新兴对象细分和跟踪。
translated by 谷歌翻译
传输学习方法旨在使用在丰富的源域上掠过的模型来提高数据稀缺目标域中的性能。一种成本效益的策略,线性探测涉及冻结源模型并培训目标域的新分类头。此策略的表现优于更昂贵但最先进的方法 - 将源模型的所有参数微调到目标域 - 可能是因为微调允许模型从中间层利用有用的信息否则被稍后的净化层丢弃。我们探讨了这些中间层可能直接剥削的假设。我们提出了一种方法,头对脚趾探测(Head2ToE),其从源模型的所有层中选择特征,以训练目标域的分类头。在VTAB-1K的评估中,Head2Toe与平均微调获得的性能相匹配,同时减少培训和储存成本一百倍或更多,但批判性地,用于分配转移,头部2ToE优于微调。
translated by 谷歌翻译
最佳决策要求分类器产生与其经验准确性一致的不确定性估计。然而,深度神经网络通常在他们的预测中受到影响或过度自信。因此,已经开发了方法,以改善培训和后HOC期间的预测性不确定性的校准。在这项工作中,我们提出了可分解的损失,以改善基于频流校准误差估计底层的钻孔操作的软(连续)版本的校准。当纳入训练时,这些软校准损耗在多个数据集中实现最先进的单一模型ECE,精度低于1%的数量。例如,我们观察到ECE的82%(相对于HOC后射出ECE 70%),以换取相对于CIFAR-100上的交叉熵基线的准确性0.7%的相对降低。在培训后结合时,基于软合成的校准误差目标会改善温度缩放,一种流行的重新校准方法。总体而言,跨损失和数据集的实验表明,使用校准敏感程序在数据集移位下产生更好的不确定性估计,而不是使用跨熵损失和后HOC重新校准方法的标准做法。
translated by 谷歌翻译
最近的工作据称,利用Softmax跨熵的分类损失不仅可以用于固定设定的分类任务,而且还通过专门为开放式任务开发的优于开销的损失,包括几次射击学习和检索。使用不同的嵌入几何形状研究了软MAX分类器 - 欧几里德,双曲线和球形,并且已经对一个或另一个的优越性进行了索赔,但它们没有得到精心控制的系统。我们对各种固定设定分类和图像检索任务的软MAX损失嵌入几何的实证研究。对于球形损失观察到的一个有趣的财产导致我们提出了一种基于VON MISES-FISHER分配的概率分类器,我们表明它具有最先进的方法竞争,同时生产出完善的盒子校准。我们提供有关亏损之间的权衡以及如何在其中选择的指导。
translated by 谷歌翻译
虽然深馈神经网络与灵长类动物视觉系统共享一些特征,但一个关键区别是他们的动态。深网络通常在串行阶段操作,其中每个层在处理开始于后续层之前完成其计算。相反,生物系统具有级联动力学:信息从所有层的神经元并行地传播,但是逐渐发生变速器,即使在馈送架构中也逐渐发生速度准确性贸易。我们通过构造级联的RESNET来探讨生物学激活的并行硬件的后果,其中每个残差块具有传播延迟,但所有块以状态方式更新。由于通过跳过连接传输的信息避免了延迟,所以架构的功能深度随着时间的推移而增加,因此随时通过内部处理时间来改善的任何时间预测。我们介绍了一个时间差异的培训损失,通过标准损耗实现了严格卓越的速度准确性概况,并使级联架构能够以最先进的任何时间预测方法。级联体系结构具有迷恋属性,包括:它比非典型实例更快地分类典型实例;对于持久性和瞬态噪声比传统的reset来说更强大;其时变输出跟踪提供了一种可以利用以改善信息处理和推理的信号。
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
We present a dynamic path planning algorithm to navigate an amphibious rotor craft through a concave time-invariant obstacle field while attempting to minimize energy usage. We create a nonlinear quaternion state model that represents the rotor craft dynamics above and below the water. The 6 degree of freedom dynamics used within a layered architecture to generate motion paths for the vehicle to follow and the required control inputs. The rotor craft has a 3 dimensional map of its surroundings that is updated via limited range onboard sensor readings within the current medium (air or water). Path planning is done via PRM and D* Lite.
translated by 谷歌翻译
While the capabilities of autonomous systems have been steadily improving in recent years, these systems still struggle to rapidly explore previously unknown environments without the aid of GPS-assisted navigation. The DARPA Subterranean (SubT) Challenge aimed to fast track the development of autonomous exploration systems by evaluating their performance in real-world underground search-and-rescue scenarios. Subterranean environments present a plethora of challenges for robotic systems, such as limited communications, complex topology, visually-degraded sensing, and harsh terrain. The presented solution enables long-term autonomy with minimal human supervision by combining a powerful and independent single-agent autonomy stack, with higher level mission management operating over a flexible mesh network. The autonomy suite deployed on quadruped and wheeled robots was fully independent, freeing the human supervision to loosely supervise the mission and make high-impact strategic decisions. We also discuss lessons learned from fielding our system at the SubT Final Event, relating to vehicle versatility, system adaptability, and re-configurable communications.
translated by 谷歌翻译