In this paper, we study the effect of a novel regularization scheme on contrastive language-image pre-trained (CLIP) models. Our approach is based on the observation that, in many domains, text tokens should only describe a small number of image regions and, likewise, each image region should correspond to only a few text tokens. In CLIP-style models, this implies that text-token embeddings should have high similarity to only a small number of image-patch embeddings for a given image-text pair. We formalize this observation using a novel regularization scheme that penalizes the entropy of the text-token to image-patch similarity scores. We qualitatively and quantitatively demonstrate that the proposed regularization scheme shrinks the text-token and image-patch similarity scores towards zero, thus achieving the desired effect. We demonstrate the promise of our approach in an important medical context where this underlying hypothesis naturally arises. Using our proposed approach, we achieve state of the art (SOTA) zero-shot performance on all tasks from the CheXpert chest x-ray dataset, outperforming an unregularized version of the model and several recently published self-supervised models.
translated by 谷歌翻译
以完全监督的方式训练的深度学习模型已显示出依赖所谓的“快捷方式”功能。快捷功能是与培训数据感兴趣的结果相关的输入,但不再关联或在测试或部署设置中存在。在这里,我们提供的实验显示了在图像和文本上训练的最新自我监管模型提供了更强大的图像表示,并在现实的医学成像示例中降低了模型对视觉快捷键功能的依赖。此外,我们发现这些自我监督模型“忘记”快捷方式比在标记数据进行微调时要比完全监督的模型更快。尽管不是一个完整的解决方案,但我们的实验提供了令人信服的证据,表明在图像和文本上训练的自我监督模型为视觉快捷特征提供了韧性。
translated by 谷歌翻译
未经测量的混杂假设被广泛用于鉴定观察性研究中的因果效应。关于近端推理的最新工作提供了替代性识别结果,即使在没有观察到的混杂因子的存在下,也可以成功,但前提是人们测量了一组足够丰富的代理变量,并满足了特定的结构条件。但是,近端推断需要解决一个不适合的积分方程。先前的方法使用了各种机器学习技术来估计该积分方程的解决方案,通常称为桥梁函数。但是,通常通过依靠预指定的内核函数来限制先前的工作,这些函数不是数据适应性的,并且难以扩展到大型数据集。在这项工作中,我们基于深度神经网络引入了一种灵活且可扩展的方法,以估计存在使用近端推理的混淆的存在。我们的方法在两个公认的近端推理基准上实现了最先进的性能。最后,我们为我们的方法提供理论一致性保证。
translated by 谷歌翻译
This paper considers a combination of actuation tendons and measurement strings to achieve accurate shape sensing and direct kinematics of continuum robots. Assuming general string routing, a methodical Lie group formulation for the shape sensing of these robots is presented. The shape kinematics is expressed using arc-length-dependent curvature distributions parameterized by modal functions, and the Magnus expansion for Lie group integration is used to express the shape as a product of exponentials. The tendon and string length kinematic constraints are solved for the modal coefficients and the configuration space and body Jacobian are derived. The noise amplification index for the shape reconstruction problem is defined and used for optimizing the string/tendon routing paths, and a planar simulation study shows the minimal number of strings/tendons needed for accurate shape reconstruction. A torsionally stiff continuum segment is used for experimental evaluation, demonstrating mean (maximal) end-effector absolute position error of less than 2% (5%) of total length. Finally, a simulation study of a torsionally compliant segment demonstrates the approach for general deflections and string routings. We believe that the methods of this paper can benefit the design process, sensing and control of continuum and soft robots.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Out-of-distribution generalization (OODG) is a longstanding challenge for neural networks. This challenge is quite apparent in tasks with well-defined variables and rules, where explicit use of the rules could solve problems independently of the particular values of the variables, but networks tend to be tied to the range of values sampled in their training data. Large transformer-based language models have pushed the boundaries on how well neural networks can solve previously unseen problems, but their complexity and lack of clarity about the relevant content in their training data obfuscates how they achieve such robustness. As a step toward understanding how transformer-based systems generalize, we explore the question of OODG in small scale transformers trained with examples from a known distribution. Using a reasoning task based on the puzzle Sudoku, we show that OODG can occur on a complex problem if the training set includes examples sampled from the whole distribution of simpler component tasks. Successful generalization depends on carefully managing positional alignment when absolute position encoding is used, but we find that suppressing sensitivity to absolute positions overcomes this limitation. Taken together our results represent a small step toward understanding and promoting systematic generalization in transformers.
translated by 谷歌翻译
Large language models have recently shown promising progress in mathematical reasoning when fine-tuned with human-generated sequences walking through a sequence of solution steps. However, the solution sequences are not formally structured and the resulting model-generated sequences may not reflect the kind of systematic reasoning we might expect an expert human to produce. In this paper, we study how to build stronger reasoning capability in language models using the idea of relational abstractions. We introduce new types of sequences that more explicitly provide an abstract characterization of the transitions through intermediate solution steps to the goal state. We find that models that are supplied with such sequences as prompts can solve tasks with a significantly higher accuracy, and models that are trained to produce such sequences solve problems better than those that are trained with previously used human-generated sequences and other baselines. Our work thus takes several steps toward elucidating and improving how language models perform on tasks requiring multi-step mathematical reasoning.
translated by 谷歌翻译
通常声称由软材料制成的腿部机器人比其刚性材料表现出更安全,更健壮的环境相互作用。但是,软机器人的这种激励特征需要更严格的开发才能与刚性运动进行比较。本文介绍了一个柔软的机器人平台Horton和一个反馈控制系统,并在其操作的某些方面保证了安全性。该机器人是使用一系列软肢构造的,由热形记忆合金(SMA)线肌肉作用,其位置和执行器温度的传感器。监督控制方案在机器人姿势的单独控制器操作过程中维护安全执行者状态。实验表明,霍顿可以举起腿并保持平衡姿势,这是运动的前身。在平衡过程中,通过人类交互测试在硬件中验证了主管,使所有SMA肌肉保持在温度阈值以下。这项工作代表了任何柔软的腿机器人的安全验证反馈系统的首次演示。
translated by 谷歌翻译
我们认为,被认为是成功执行任务的处置的情报是由代理及其上下文组成的系统的属性。这是扩展智力的论点。我们认为,如果允许其上下文变化,通常不会保留代理的性能。因此,这种处置不是由代理人独自拥有的,而是由由代理及其上下文组成的系统所拥有的,我们将其配置为具有代理的代理。代理商的背景可能包括环境,其他代理,文化文物(例如语言,技术)或所有这些,就像人类和人工智能系统以及许多非人类动物一样。根据扩展情报的论点,我们认为智能是上下文结合的,任务局部和不可限制的代理商。我们的论文对在心理学和人工智能的背景下如何分析智力具有很大的影响。
translated by 谷歌翻译
在本文中,我们将预处理技术应用于具有不同长度的多通道时间序列数据,我们称之为对齐问题,用于下游机器学习。多种原因可能发生多种渠道时间序列数据的未对准,原因有多种原因,例如丢失的数据,变化的采样率或不一致的收集时间。我们考虑从MIT SuperCloud高性能计算(HPC)中心收集的多渠道时间序列数据,其中不同的工作开始时间和HPC作业的运行时间不同,导致数据不对准。这种未对准使得为计算工作负载分类等任务构建AI/ML方法具有挑战性。在先前使用MIT SuperCloud数据集的监督分类工作的基础上,我们通过三种宽阔的低间接空间方法解决了对齐问题:从全职系列中抽样固定子集,在全职系列上执行摘要统计信息,并对系数进行取样。从映射到频域的时间序列。我们最佳性能模型的分类精度大于95%,以先前的方法对MIT SuperCloud数据集的多通道时间序列分类的表现优于5%。这些结果表明,我们的低间接费用方法与标准机器学习技术结合使用,能够达到高水平的分类准确性,并作为解决对齐问题(例如内核方法)的未来方法的基准。
translated by 谷歌翻译