本文分析了面部检测体系结构的设计选择,以提高计算成本和准确性之间的效率。具体而言,我们重新检查了标准卷积块作为面部检测的轻质骨干结构的有效性。与当前的轻质体系结构设计的趋势(大量利用了可分开的卷积层)不同,我们表明,使用类似的参数大小时,大量通道绕的标准卷积层可以实现更好的准确性和推理速度。关于目标数据域的特征的分析,该观察结果得到了支持。根据我们的观察,我们建议使用高度降低的通道使用Resnet,与其他移动友好网络(例如Mobilenet-V1,-V2,-V3)相比,它具有高度效率。从广泛的实验中,我们表明所提出的主链可以以更快的推理速度替换最先进的面部检测器的主链。此外,我们进一步提出了一种最大化检测性能的新功能聚合方法。我们提出的检测器ERESFD获得了更宽的面部硬子子集的80.4%地图,该图仅需37.7 ms即可在CPU上进行VGA图像推断。代码将在https://github.com/clovaai/eresfd上找到。
translated by 谷歌翻译
弱监督的实例分割(WSIS)被认为是比虚弱的语义细分(WSSS)更具挑战性的任务。与WSSS相比,WSIS需要实例的本地化,这很难从图像级标签中提取。为了解决问题,大多数WSIS方法都使用实例或对象级标签需要预先训练的现成提案技术,偏离完全图像级监督设置的基本定义。在本文中,我们提出了一种新的方法,包括两种创新组件。首先,我们提出了一种语义知识转移,通过将WSSS的知识转移到WSIS来获取伪实例标签,同时消除了对现货附加提案的需求。其次,我们提出了一种自我细化方法,可以在自我监督方案中优化伪实例标签,并以在线方式使用精制标签进行培训。在这里,我们发现伪实例标签中缺失的实例被分类为背景类的缺失实例发生了错误的现象。这种语义漂移发生了背景和实例在训练中的混淆,因此降低了分割性能。我们将此问题术语作为语义漂移问题,并表明我们所提出的自我细化方法消除了语义漂移问题。对Pascal VOC 2012和Coco的广泛实验证明了我们的方法的有效性,并且在没有现成的提案技术的情况下实现了相当大的表现。代码即将推出。
translated by 谷歌翻译
本文介绍了类增量语义分割(CISS)问题的固态基线。虽然最近的CISS算法利用了知识蒸馏(KD)技术的变体来解决问题,但他们未能充分解决CISS引起灾难性遗忘的关键挑战;背景类的语义漂移和多标签预测问题。为了更好地解决这些挑战,我们提出了一种新方法,被称为SSUL-M(具有内存的未知标签的语义分割),通过仔细组合为语义分割量身定制的技术。具体来说,我们要求三项主要贡献。 (1)在背景课程中定义未知的类,以帮助学习未来的课程(帮助可塑性),(2)冻结骨干网以及与二进制交叉熵丢失和伪标签的跨熵丢失的分类器,以克服灾难性的遗忘(帮助稳定)和(3)首次利用微小的示例存储器在CISS中提高可塑性和稳定性。广泛进行的实验表明了我们的方法的有效性,而不是标准基准数据集上最近的最新的基线的性能明显更好。此外,与彻底的消融分析有关我们对彻底消融分析的贡献,并与传统的类增量学习针对分类相比,讨论了CISS问题的不同自然。官方代码可在https://github.com/clovaai/ssul获得。
translated by 谷歌翻译
The 3D-aware image synthesis focuses on conserving spatial consistency besides generating high-resolution images with fine details. Recently, Neural Radiance Field (NeRF) has been introduced for synthesizing novel views with low computational cost and superior performance. While several works investigate a generative NeRF and show remarkable achievement, they cannot handle conditional and continuous feature manipulation in the generation procedure. In this work, we introduce a novel model, called Class-Continuous Conditional Generative NeRF ($\text{C}^{3}$G-NeRF), which can synthesize conditionally manipulated photorealistic 3D-consistent images by projecting conditional features to the generator and the discriminator. The proposed $\text{C}^{3}$G-NeRF is evaluated with three image datasets, AFHQ, CelebA, and Cars. As a result, our model shows strong 3D-consistency with fine details and smooth interpolation in conditional feature manipulation. For instance, $\text{C}^{3}$G-NeRF exhibits a Fr\'echet Inception Distance (FID) of 7.64 in 3D-aware face image synthesis with a $\text{128}^{2}$ resolution. Additionally, we provide FIDs of generated 3D-aware images of each class of the datasets as it is possible to synthesize class-conditional images with $\text{C}^{3}$G-NeRF.
translated by 谷歌翻译
In both terrestrial and marine ecology, physical tagging is a frequently used method to study population dynamics and behavior. However, such tagging techniques are increasingly being replaced by individual re-identification using image analysis. This paper introduces a contrastive learning-based model for identifying individuals. The model uses the first parts of the Inception v3 network, supported by a projection head, and we use contrastive learning to find similar or dissimilar image pairs from a collection of uniform photographs. We apply this technique for corkwing wrasse, Symphodus melops, an ecologically and commercially important fish species. Photos are taken during repeated catches of the same individuals from a wild population, where the intervals between individual sightings might range from a few days to several years. Our model achieves a one-shot accuracy of 0.35, a 5-shot accuracy of 0.56, and a 100-shot accuracy of 0.88, on our dataset.
translated by 谷歌翻译
Feature selection helps reduce data acquisition costs in ML, but the standard approach is to train models with static feature subsets. Here, we consider the dynamic feature selection (DFS) problem where a model sequentially queries features based on the presently available information. DFS is often addressed with reinforcement learning (RL), but we explore a simpler approach of greedily selecting features based on their conditional mutual information. This method is theoretically appealing but requires oracle access to the data distribution, so we develop a learning approach based on amortized optimization. The proposed method is shown to recover the greedy policy when trained to optimality and outperforms numerous existing feature selection methods in our experiments, thus validating it as a simple but powerful approach for this problem.
translated by 谷歌翻译
The purpose of this work was to tackle practical issues which arise when using a tendon-driven robotic manipulator with a long, passive, flexible proximal section in medical applications. A separable robot which overcomes difficulties in actuation and sterilization is introduced, in which the body containing the electronics is reusable and the remainder is disposable. A control input which resolves the redundancy in the kinematics and a physical interpretation of this redundancy are provided. The effect of a static change in the proximal section angle on bending angle error was explored under four testing conditions for a sinusoidal input. Bending angle error increased for increasing proximal section angle for all testing conditions with an average error reduction of 41.48% for retension, 4.28% for hysteresis, and 52.35% for re-tension + hysteresis compensation relative to the baseline case. Two major sources of error in tracking the bending angle were identified: time delay from hysteresis and DC offset from the proximal section angle. Examination of these error sources revealed that the simple hysteresis compensation was most effective for removing time delay and re-tension compensation for removing DC offset, which was the primary source of increasing error. The re-tension compensation was also tested for dynamic changes in the proximal section and reduced error in the final configuration of the tip by 89.14% relative to the baseline case.
translated by 谷歌翻译
According to the rapid development of drone technologies, drones are widely used in many applications including military domains. In this paper, a novel situation-aware DRL- based autonomous nonlinear drone mobility control algorithm in cyber-physical loitering munition applications. On the battlefield, the design of DRL-based autonomous control algorithm is not straightforward because real-world data gathering is generally not available. Therefore, the approach in this paper is that cyber-physical virtual environment is constructed with Unity environment. Based on the virtual cyber-physical battlefield scenarios, a DRL-based automated nonlinear drone mobility control algorithm can be designed, evaluated, and visualized. Moreover, many obstacles exist which is harmful for linear trajectory control in real-world battlefield scenarios. Thus, our proposed autonomous nonlinear drone mobility control algorithm utilizes situation-aware components those are implemented with a Raycast function in Unity virtual scenarios. Based on the gathered situation-aware information, the drone can autonomously and nonlinearly adjust its trajectory during flight. Therefore, this approach is obviously beneficial for avoiding obstacles in obstacle-deployed battlefields. Our visualization-based performance evaluation shows that the proposed algorithm is superior from the other linear mobility control algorithms.
translated by 谷歌翻译
In robotics and computer vision communities, extensive studies have been widely conducted regarding surveillance tasks, including human detection, tracking, and motion recognition with a camera. Additionally, deep learning algorithms are widely utilized in the aforementioned tasks as in other computer vision tasks. Existing public datasets are insufficient to develop learning-based methods that handle various surveillance for outdoor and extreme situations such as harsh weather and low illuminance conditions. Therefore, we introduce a new large-scale outdoor surveillance dataset named eXtremely large-scale Multi-modAl Sensor dataset (X-MAS) containing more than 500,000 image pairs and the first-person view data annotated by well-trained annotators. Moreover, a single pair contains multi-modal data (e.g. an IR image, an RGB image, a thermal image, a depth image, and a LiDAR scan). This is the first large-scale first-person view outdoor multi-modal dataset focusing on surveillance tasks to the best of our knowledge. We present an overview of the proposed dataset with statistics and present methods of exploiting our dataset with deep learning-based algorithms. The latest information on the dataset and our study are available at https://github.com/lge-robot-navi, and the dataset will be available for download through a server.
translated by 谷歌翻译
Springs are efficient in storing and returning elastic potential energy but are unable to hold the energy they store in the absence of an external load. Lockable springs use clutches to hold elastic potential energy in the absence of an external load but have not yet been widely adopted in applications, partly because clutches introduce design complexity, reduce energy efficiency, and typically do not afford high-fidelity control over the energy stored by the spring. Here, we present the design of a novel lockable compression spring that uses a small capstan clutch to passively lock a mechanical spring. The capstan clutch can lock up to 1000 N force at any arbitrary deflection, unlock the spring in less than 10 ms with a control force less than 1 % of the maximal spring force, and provide an 80 % energy storage and return efficiency (comparable to a highly efficient electric motor operated at constant nominal speed). By retaining the form factor of a regular spring while providing high-fidelity locking capability even under large spring forces, the proposed design could facilitate the development of energy-efficient spring-based actuators and robots.
translated by 谷歌翻译