尽管在零射门学习(ZSL)方面取得了巨大进展,但大多数现有方法仍然依赖于人类通知的属性,这些属性很难注释和扩展。一个无监督的替代方法是使用与其语义类名称相关的单词嵌入来表示每个类。但是,从预训练的语言模型中提取的单词嵌入不一定会捕获视觉相似性,从而导致零拍的性能差。在这项工作中,我们认为在线文本文档,例如Wikipedia,包含有关对象类的丰富视觉描述,因此可以用作ZSL的强大无监督的侧面信息。为此,我们提出了I2Dformer,这是一种基于变压器的新型ZSL框架,共同学会通过在共享嵌入空间中对齐两个方式来编码图像和文档。为了从嘈杂的文档中提取歧视性的视觉单词,我们介绍了一个新的跨模式注意模块,该模块可以学习图像补丁和文档单词之间的细粒度相互作用。因此,我们的i2dformer不仅学习了捕获视觉相似性的高度歧视文档的嵌入,而且还获得了将视觉相关单词定位在图像区域中的能力。定量地,我们证明我们的i2形式在三个公共数据集上的零照片和广义零局学习设置下都显着优于先前无监督的语义嵌入。定性地,我们表明我们的方法会导致高度可解释的结果,其中文档单词可以基于图像区域。
translated by 谷歌翻译
零件代表不同对象的几何和语义相似性的基本单位。我们争辩说,部分知识应与观察到的对象课程中有款组合。对此,我们将3D组成零射击学习作为从看作识的零件泛化的问题,从而看成了语义分割。我们通过将任务与所提出的组成部分数据集进行基准测试,提供结构化研究。该数据集是通过处理原始PartNet来创建的,以最大化不同对象的部分重叠。现有点云部分段方法未能在此设置中概括到未遵守的对象类。作为解决方案,我们提出了分解共识,其将零件分割网络与部分评分网络相结合。我们方法的关键直觉是某些部件的分割掩码应该具有与其部分分数分开的零件分数的共识。在生成最合适的分割掩模之前在每个对象部分中定义的不同部分组合的两个网络原因。我们展示了我们的方法允许组成零射分段和广义零拍分类,并在两个任务中建立最先进的状态。
translated by 谷歌翻译
Spatial perception is a key task in several robotics applications. In general, it involves the nonlinear estimation of hidden variables that represent the state of the robot/environment. However, in the presence of outliers the standard nonlinear least squared formulation results in poor estimates. Several methods have been considered in the literature to improve the reliability of the estimation process. Most methods are based on heuristics since guaranteed global robust estimation is not generally practical due to high computational costs. Recently general purpose robust estimation heuristics have been proposed that leverage existing non-minimal solvers available for the outlier-free formulations without the need for an initial guess. In this work, we propose two similar heuristics backed by Bayesian theory. We evaluate these heuristics in practical scenarios to demonstrate their merits in different applications including 3D point cloud registration, mesh registration and pose graph optimization.
translated by 谷歌翻译
Entity matching in Customer 360 is the task of determining if multiple records represent the same real world entity. Entities are typically people, organizations, locations, and events represented as attributed nodes in a graph, though they can also be represented as records in relational data. While probabilistic matching engines and artificial neural network models exist for this task, explaining entity matching has received less attention. In this demo, we present our Explainable Entity Matching (xEM) system and discuss the different AI/ML considerations that went into its implementation.
translated by 谷歌翻译
Unmanned air vehicles (UAVs) popularity is on the rise as it enables the services like traffic monitoring, emergency communications, deliveries, and surveillance. However, the unauthorized usage of UAVs (a.k.a drone) may violate security and privacy protocols for security-sensitive national and international institutions. The presented challenges require fast, efficient, and precise detection of UAVs irrespective of harsh weather conditions, the presence of different objects, and their size to enable SafeSpace. Recently, there has been significant progress in using the latest deep learning models, but those models have shortcomings in terms of computational complexity, precision, and non-scalability. To overcome these limitations, we propose a precise and efficient multiscale and multifeature UAV detection network for SafeSpace, i.e., \textit{MultiFeatureNet} (\textit{MFNet}), an improved version of the popular object detection algorithm YOLOv5s. In \textit{MFNet}, we perform multiple changes in the backbone and neck of the YOLOv5s network to focus on the various small and ignored features required for accurate and fast UAV detection. To further improve the accuracy and focus on the specific situation and multiscale UAVs, we classify the \textit{MFNet} into small (S), medium (M), and large (L): these are the combinations of various size filters in the convolution and the bottleneckCSP layers, reside in the backbone and neck of the architecture. This classification helps to overcome the computational cost by training the model on a specific feature map rather than all the features. The dataset and code are available as an open source: github.com/ZeeshanKaleem/MultiFeatureNet.
translated by 谷歌翻译
A major challenge in machine learning is resilience to out-of-distribution data, that is data that exists outside of the distribution of a model's training data. Training is often performed using limited, carefully curated datasets and so when a model is deployed there is often a significant distribution shift as edge cases and anomalies not included in the training data are encountered. To address this, we propose the Input Optimisation Network, an image preprocessing model that learns to optimise input data for a specific target vision model. In this work we investigate several out-of-distribution scenarios in the context of semantic segmentation for autonomous vehicles, comparing an Input Optimisation based solution to existing approaches of finetuning the target model with augmented training data and an adversarially trained preprocessing model. We demonstrate that our approach can enable performance on such data comparable to that of a finetuned model, and subsequently that a combined approach, whereby an input optimization network is optimised to target a finetuned model, delivers superior performance to either method in isolation. Finally, we propose a joint optimisation approach, in which input optimization network and target model are trained simultaneously, which we demonstrate achieves significant further performance gains, particularly in challenging edge-case scenarios. We also demonstrate that our architecture can be reduced to a relatively compact size without a significant performance impact, potentially facilitating real time embedded applications.
translated by 谷歌翻译
In this paper, deep-learning-based approaches namely fine-tuning of pretrained convolutional neural networks (VGG16 and VGG19), and end-to-end training of a developed CNN model, have been used in order to classify X-Ray images into four different classes that include COVID-19, normal, opacity and pneumonia cases. A dataset containing more than 20,000 X-ray scans was retrieved from Kaggle and used in this experiment. A two-stage classification approach was implemented to be compared to the one-shot classification approach. Our hypothesis was that a two-stage model will be able to achieve better performance than a one-shot model. Our results show otherwise as VGG16 achieved 95% accuracy using one-shot approach over 5-fold of training. Future work will focus on a more robust implementation of the two-stage classification model Covid-TSC. The main improvement will be allowing data to flow from the output of stage-1 to the input of stage-2, where stage-1 and stage-2 models are VGG16 models fine-tuned on the Covid-19 dataset.
translated by 谷歌翻译
Social media platforms allow users to freely share their opinions about issues or anything they feel like. However, they also make it easier to spread hate and abusive content. The Fulani ethnic group has been the victim of this unfortunate phenomenon. This paper introduces the HERDPhobia - the first annotated hate speech dataset on Fulani herders in Nigeria - in three languages: English, Nigerian-Pidgin, and Hausa. We present a benchmark experiment using pre-trained languages models to classify the tweets as either hateful or non-hateful. Our experiment shows that the XML-T model provides better performance with 99.83% weighted F1. We released the dataset at https://github.com/hausanlp/HERDPhobia for further research.
translated by 谷歌翻译
Despite the immense success of neural networks in modeling system dynamics from data, they often remain physics-agnostic black boxes. In the particular case of physical systems, they might consequently make physically inconsistent predictions, which makes them unreliable in practice. In this paper, we leverage the framework of Irreversible port-Hamiltonian Systems (IPHS), which can describe most multi-physics systems, and rely on Neural Ordinary Differential Equations (NODEs) to learn their parameters from data. Since IPHS models are consistent with the first and second principles of thermodynamics by design, so are the proposed Physically Consistent NODEs (PC-NODEs). Furthermore, the NODE training procedure allows us to seamlessly incorporate prior knowledge of the system properties in the learned dynamics. We demonstrate the effectiveness of the proposed method by learning the thermodynamics of a building from the real-world measurements and the dynamics of a simulated gas-piston system. Thanks to the modularity and flexibility of the IPHS framework, PC-NODEs can be extended to learn physically consistent models of multi-physics distributed systems.
translated by 谷歌翻译
Keyless entry systems in cars are adopting neural networks for localizing its operators. Using test-time adversarial defences equip such systems with the ability to defend against adversarial attacks without prior training on adversarial samples. We propose a test-time adversarial example detector which detects the input adversarial example through quantifying the localized intermediate responses of a pre-trained neural network and confidence scores of an auxiliary softmax layer. Furthermore, in order to make the network robust, we extenuate the non-relevant features by non-iterative input sample clipping. Using our approach, mean performance over 15 levels of adversarial perturbations is increased by 55.33% for the fast gradient sign method (FGSM) and 6.3% for both the basic iterative method (BIM) and the projected gradient method (PGD).
translated by 谷歌翻译