In this paper, we aim to address the large domain gap between high-resolution face images, e.g., from professional portrait photography, and low-quality surveillance images, e.g., from security cameras. Establishing an identity match between disparate sources like this is a classical surveillance face identification scenario, which continues to be a challenging problem for modern face recognition techniques. To that end, we propose a method that combines face super-resolution, resolution matching, and multi-scale template accumulation to reliably recognize faces from long-range surveillance footage, including from low quality sources. The proposed approach does not require training or fine-tuning on the target dataset of real surveillance images. Extensive experiments show that our proposed method is able to outperform even existing methods fine-tuned to the SCFace dataset.
translated by 谷歌翻译
Object detectors are conventionally trained by a weighted sum of classification and localization losses. Recent studies (e.g., predicting IoU with an auxiliary head, Generalized Focal Loss, Rank & Sort Loss) have shown that forcing these two loss terms to interact with each other in non-conventional ways creates a useful inductive bias and improves performance. Inspired by these works, we focus on the correlation between classification and localization and make two main contributions: (i) We provide an analysis about the effects of correlation between classification and localization tasks in object detectors. We identify why correlation affects the performance of various NMS-based and NMS-free detectors, and we devise measures to evaluate the effect of correlation and use them to analyze common detectors. (ii) Motivated by our observations, e.g., that NMS-free detectors can also benefit from correlation, we propose Correlation Loss, a novel plug-in loss function that improves the performance of various object detectors by directly optimizing correlation coefficients: E.g., Correlation Loss on Sparse R-CNN, an NMS-free method, yields 1.6 AP gain on COCO and 1.8 AP gain on Cityscapes dataset. Our best model on Sparse R-CNN reaches 51.0 AP without test-time augmentation on COCO test-dev, reaching state-of-the-art. Code is available at https://github.com/fehmikahraman/CorrLoss
translated by 谷歌翻译
Wind power forecasting helps with the planning for the power systems by contributing to having a higher level of certainty in decision-making. Due to the randomness inherent to meteorological events (e.g., wind speeds), making highly accurate long-term predictions for wind power can be extremely difficult. One approach to remedy this challenge is to utilize weather information from multiple points across a geographical grid to obtain a holistic view of the wind patterns, along with temporal information from the previous power outputs of the wind farms. Our proposed CNN-RNN architecture combines convolutional neural networks (CNNs) and recurrent neural networks (RNNs) to extract spatial and temporal information from multi-dimensional input data to make day-ahead predictions. In this regard, our method incorporates an ultra-wide learning view, combining data from multiple numerical weather prediction models, wind farms, and geographical locations. Additionally, we experiment with global forecasting approaches to understand the impact of training the same model over the datasets obtained from multiple different wind farms, and we employ a method where spatial information extracted from convolutional layers is passed to a tree ensemble (e.g., Light Gradient Boosting Machine (LGBM)) instead of fully connected layers. The results show that our proposed CNN-RNN architecture outperforms other models such as LGBM, Extra Tree regressor and linear regression when trained globally, but fails to replicate such performance when trained individually on each farm. We also observe that passing the spatial information from CNN to LGBM improves its performance, providing further evidence of CNN's spatial feature extraction capabilities.
translated by 谷歌翻译
Development of guidance, navigation and control frameworks/algorithms for swarms attracted significant attention in recent years. That being said, algorithms for planning swarm allocations/trajectories for engaging with enemy swarms is largely an understudied problem. Although small-scale scenarios can be addressed with tools from differential game theory, existing approaches fail to scale for large-scale multi-agent pursuit evasion (PE) scenarios. In this work, we propose a reinforcement learning (RL) based framework to decompose to large-scale swarm engagement problems into a number of independent multi-agent pursuit-evasion games. We simulate a variety of multi-agent PE scenarios, where finite time capture is guaranteed under certain conditions. The calculated PE statistics are provided as a reward signal to the high level allocation layer, which uses an RL algorithm to allocate controlled swarm units to eliminate enemy swarm units with maximum efficiency. We verify our approach in large-scale swarm-to-swarm engagement simulations.
translated by 谷歌翻译
The emergence of COVID-19 has had a global and profound impact, not only on society as a whole, but also on the lives of individuals. Various prevention measures were introduced around the world to limit the transmission of the disease, including face masks, mandates for social distancing and regular disinfection in public spaces, and the use of screening applications. These developments also triggered the need for novel and improved computer vision techniques capable of (i) providing support to the prevention measures through an automated analysis of visual data, on the one hand, and (ii) facilitating normal operation of existing vision-based services, such as biometric authentication schemes, on the other. Especially important here, are computer vision techniques that focus on the analysis of people and faces in visual data and have been affected the most by the partial occlusions introduced by the mandates for facial masks. Such computer vision based human analysis techniques include face and face-mask detection approaches, face recognition techniques, crowd counting solutions, age and expression estimation procedures, models for detecting face-hand interactions and many others, and have seen considerable attention over recent years. The goal of this survey is to provide an introduction to the problems induced by COVID-19 into such research and to present a comprehensive review of the work done in the computer vision based human analysis field. Particular attention is paid to the impact of facial masks on the performance of various methods and recent solutions to mitigate this problem. Additionally, a detailed review of existing datasets useful for the development and evaluation of methods for COVID-19 related applications is also provided. Finally, to help advance the field further, a discussion on the main open challenges and future research direction is given.
translated by 谷歌翻译
在未知环境中存在动态障碍的情况下,避免碰撞是无人系统最关键的挑战之一。在本文中,我们提出了一种方法,该方法可以鉴定出椭圆形的障碍,以估计线性和角度障碍速度。我们提出的方法是基于任何对象的概念,可以由椭圆形表示。为了实现这一目标,我们提出了一种基于高斯混合模型,kyachiyan算法和改进算法的变异贝叶斯估计的方法。与现有的基于优化的方法不同,我们提出的方法不需要了解集群数量,并且可以实时操作。此外,我们定义一个基于椭圆形的特征向量以匹配两个及时的接近点帧。我们的方法可以应用于具有静态和动态障碍的任何环境,包括具有旋转障碍的环境。我们将算法与其他聚类方法进行比较,并表明当与轨迹计划器结合时,整体系统可以在存在动态障碍物的情况下有效地穿越未知环境。
translated by 谷歌翻译
这项工作总结了2022年2022年国际生物识别联合会议(IJCB 2022)的IJCB被遮挡的面部识别竞赛(IJCB-OCFR-2022)。OCFR-2022从学术界吸引了总共3支参与的团队。最终,提交了六个有效的意见书,然后由组织者评估。在严重的面部阻塞面前,举行了竞争是为了应对面部识别的挑战。参与者可以自由使用任何培训数据,并且通过使用众所周知的数据集构成面部图像的部分来构建测试数据。提交的解决方案提出了创新,并以所考虑的基线表现出色。这项竞争的主要输出是具有挑战性,现实,多样化且公开可用的遮挡面部识别基准,并具有明确的评估协议。
translated by 谷歌翻译
由于钻孔对准的困难以及任务的固有不稳定性,在手动完成时,在弯曲的表面上钻一个孔很容易失败,可能会对工人造成伤害和疲劳。另一方面,在实际制造环境中充分自动化此类任务可能是不切实际的,因为到达装配线的零件可以具有各种复杂形状,在这些零件上不容易访问钻头位置,从而使自动化路径计划变得困难。在这项工作中,开发并部署了一个具有6个自由度的自适应入学控制器,并部署在Kuka LBR IIWA 7配件上,使操作员能够用一只手舒适地在机器人上安装在机器人上的钻头,并在弯曲的表面上开放孔,并在弯曲的表面上开放孔。通过AR界面提供的玉米饼和视觉指导的触觉指导。接收阻尼的实时适应性在自由空间中驱动机器人时,可以在确保钻孔过程中稳定时提供更高的透明度。用户将钻头足够靠近钻头目标并大致与所需的钻探角度对齐后,触觉指导模块首先对对齐进行微调,然后将用户运动仅限于钻孔轴,然后操作员仅将钻头推动钻头以最小的努力进入工件。进行了两组实验,以定量地研究触觉指导模块的潜在好处(实验I),以及根据参与者的主观意见(实验II),提出的用于实际制造环境的PHRI系统的实际价值。
translated by 谷歌翻译
深度学习对医学成像产生了极大的兴趣,特别是在使用卷积神经网络(CNN)来开发自动诊断工具方面。其非侵入性获取的设施使视网膜底面成像适合这种自动化方法。使用CNN分析底面图像的最新工作依靠访问大量数据进行培训和验证 - 成千上万的图像。但是,数据驻留和数据隐私限制阻碍了这种方法在患者机密性是任务的医疗环境中的适用性。在这里,我们展示了小型数据集上DL的性能的结果,以从眼睛图像中对患者性别进行分类 - 直到最近,底眼前图像中才出现或可量化的特征。我们微调了一个RESNET-152模型,其最后一层已修改以进行二进制分类。在几个实验中,我们使用一个私人(DOV)和一个公共(ODIR)数据源评估在小数据集上下文中的性能。我们的模型使用大约2500张底面图像开发,实现了高达0.72的AUC评分(95%CI:[0.67,0.77])。尽管与文献中的先前工作相比,数据集大小降低了近1000倍,但这仅仅是降低25%的性能。即使从视网膜图像中进行性别分类等艰巨的任务,我们也会发现使用非常小的数据集可以进行分类。此外,我们在DOV和ODIR之间进行了域适应实验。探索数据策展对培训和概括性的影响;并调查模型结合在小型开发数据集中最大化CNN分类器性能。
translated by 谷歌翻译
在Bora等。 (2017年),在测量矩阵为高斯,信号结构是生成神经网络(GNN)的范围的设置中开发了一个数学框架,用于压缩传感保证。此后,当测量矩阵和/或网络权重遵循Subgaussian分布时,对GNNS进行压缩感测的问题进行了广泛的分析。我们超越了高斯的假设,以通过在单一基质的随机行中均匀地采样(包括作为特殊情况下的亚采样傅立叶测量值)来得出的测量矩阵。具体而言,我们证明了使用亚次采样的二型限制感测的第一个已知的限制等轴测保证,并提供了几乎有序的样品复杂性的恢复边界,解决了Scarlett等人的开放问题。 (2022,第10页)。恢复功效的特征是连贯性,这是一个新参数,该参数测量了网络范围与测量矩阵之间的相互作用。我们的方法依赖于子空间计数论点和思想的核心概率。此外,我们提出了一种正规化策略,以使GNN与测量运算符具有有利的连贯性。我们提供令人信服的数值模拟来支持这种正规训练策略:我们的策略产生低相干网络,需要更少的信号回收测量。这与我们的理论结果一起支持连贯性作为自然量,用于表征与亚次采样的生成压缩感测。
translated by 谷歌翻译