Vision Transformers (ViTs) have gained significant popularity in recent years and have proliferated into many applications. However, it is not well explored how varied their behavior is under different learning paradigms. We compare ViTs trained through different methods of supervision, and show that they learn a diverse range of behaviors in terms of their attention, representations, and downstream performance. We also discover ViT behaviors that are consistent across supervision, including the emergence of Offset Local Attention Heads. These are self-attention heads that attend to a token adjacent to the current token with a fixed directional offset, a phenomenon that to the best of our knowledge has not been highlighted in any prior work. Our analysis shows that ViTs are highly flexible and learn to process local and global information in different orders depending on their training method. We find that contrastive self-supervised methods learn features that are competitive with explicitly supervised features, and they can even be superior for part-level tasks. We also find that the representations of reconstruction-based models show non-trivial similarity to contrastive self-supervised models. Finally, we show how the "best" layer for a given task varies by both supervision method and task, further demonstrating the differing order of information processing in ViTs.
Deploying machine learning models in production may allow adversaries to infer sensitive information about training data. There is a vast literature analyzing different types of inference risks, ranging from membership inference to reconstruction attacks. Inspired by the success of games (i.e., probabilistic experiments) to study security properties in cryptography, some authors describe privacy inference risks in machine learning using a similar game-based style. However, adversary capabilities and goals are often stated in subtly different ways from one presentation to the other, which makes it hard to relate and compose results. In this paper, we present a game-based framework to systematize the body of knowledge on privacy inference risks in machine learning.
A distribution inference attack aims to infer statistical properties of data used to train machine learning models. These attacks are sometimes surprisingly potent, but the factors that impact distribution inference risk are not well understood and demonstrated attacks often rely on strong and unrealistic assumptions such as full knowledge of training environments even in supposedly black-box threat scenarios. To improve understanding of distribution inference risks, we develop a new black-box attack that even outperforms the best known white-box attack in most settings. Using this new attack, we evaluate distribution inference risk while relaxing a variety of assumptions about the adversary's knowledge under black-box access, like known model architectures and label-only access. Finally, we evaluate the effectiveness of previously proposed defenses and introduce new defenses. We find that although noise-based defenses appear to be ineffective, a simple re-sampling defense can be highly effective. Code is available at
Object-goal navigation (Object-nav) entails searching, recognizing and navigating to a target object. Object-nav has been extensively studied by the Embodied-AI community, but most solutions are often restricted to considering static objects (e.g., television, fridge, etc.). We propose a modular framework for object-nav that is able to efficiently search indoor environments for not just static objects but also movable objects (e.g. fruits, glasses, phones, etc.) that frequently change their positions due to human intervention. Our contextual-bandit agent efficiently explores the environment by showing optimism in the face of uncertainty and learns a model of the likelihood of spotting different objects from each navigable location. The likelihoods are used as rewards in a weighted minimum latency solver to deduce a trajectory for the robot. We evaluate our algorithms in two simulated environments and a real-world setting, to demonstrate high sample efficiency and reliability.
深度学习研究引起了广泛的兴趣,导致出现了各种各样的技术创新和应用。由于深度学习研究的很大比例关注基于视觉的应用,因此存在使用其中一些技术来实现低功率便携式医疗保健诊断支持解决方案的潜力。在本文中,我们提出了一个基于硬件的嵌入式软件实施显微镜诊断支持系统,用于POC案例研究:(a)厚血液涂片中的疟疾,(b)痰液样品中的结核病,以及(c)(c)粪便中的肠道寄生虫感染样品。我们使用基于挤压网络的模型来减少网络大小和计算时间。我们还利用训练有素的量化技术来进一步减少学习模型的记忆足迹。这使基于显微镜的病原体检测将实验室专家级别的精度分类为独立的嵌入式硬件平台。与基于CPU的常规实施相比,提议的实施功率更高6倍,并且推理时间为$ \ sim $ 3 ms/示例。
最紧迫的社会问题之一是与虚假新闻的斗争。虚假的主张很难暴露,造成了很多损害。为了解决这个问题,事实验证变得至关重要,因此是不同研究社区中感兴趣的话题。仅使用数据的文本形式,我们建议解决问题的解决方案,并通过其他方法实现竞争结果。我们基于两种方法(基于训练的语言模型)基于两种方法和基于提示的方法提供解决方案。基于PLM的方法使用传统的监督学习,其中训练模型以“ X”为输入和输出预测为P(Y | X)。鉴于,基于及时的学习反映了设计输入以适合模型的想法,以便可以将原始目标重新构成(掩盖)语言建模的问题。我们可能会进一步刺激PLM提供的丰富知识,以通过采用额外提示来微调PLM,以更好地完成下游任务。我们的实验表明,所提出的方法的性能不仅仅是微调PLM。我们在Trancify数据集中获得了0.6946的F1分数,在比赛负责人板上获得了第七名。
