我们在回归任务的背景下研究二元激活的神经网络,为这些特定网络的表现提供保证,并提出一种用于构建此类网络的贪婪算法。为了满足预测因素的资源需求较小,贪婪的方法无需提前修复网络的架构:一次构建一层,一次是一个神经元,导致预测因子并不必不是宽。深入执行给定的任务。与增强算法类似,我们的方法可以保证每次将神经元添加到一层时都会减少训练损失。这与大多数依赖于随机梯度下降的训练方案有很大的不同(避免了由替代物(如直通估计器或连续二进制化)等二进制激活功能的二进制激活功能的0个衍生衍生物问题)。我们表明,我们的方法提供了紧凑而稀疏的预测因子,同时获得了与训练二进制激活网络的最先进方法相似的性能。
translated by 谷歌翻译
在许多低到中型收入(LMIC)国家中,超声用于评估胸腔积液。通常,积液的程度是由超声检查员手动测量的,导致明显的内部/观察者间变异性。在这项工作中,我们研究了深度学习(DL)以自动化超声图像中胸腔积液分割的过程。在在LMIC设置中获得的两个数据集上,我们使用NNU-NET DL模型获得了中位骰子相似性系数(DSC)为0.82和0.74。我们还研究了DL模型中坐标卷积的使用,发现这会导致第一个数据集的中间DSC在0.85上的统计学显着改善,而第二个数据集则没有显着更改。这项工作首次展示了DL在LMIC环境中超声评估的过程中自动化的潜力,在LMIC环境中,通常缺乏经验丰富的放射科医生来执行此类任务。
translated by 谷歌翻译
在文化遗产部门中,在将机器学习技术应用于数字收藏时,已经做出了越来越多的努力来考虑关键的社会技术视角。尽管文化遗产社区共同开发了一大批工作,详细介绍了在组织层面的图书馆和其他文化遗产机构中的机器学习负责任的运营,但仍有很少专门针对从业人员踏上机器学习项目的实践者。将机器学习应用于文化遗产所涉及的歧管赌注和敏感性强调了制定此类准则的重要性。本文通过在开发利用文化遗产数据的机器学习项目时使用指导性问题和实践来制定详细的清单,从而为这一需求做出了贡献。我将结果清单称为“收集为ML数据”清单,完成后,该清单可以通过项目的可交付成果发布。通过调查现有项目,包括我自己的项目,报纸导航员,我证明了“作为ML数据的收集”清单是合理的,并证明了如何采用和操作该制定的指导问题。
translated by 谷歌翻译
气孔(螳螂虾)视觉系统最近提供了一种用于设计范式转换极化和多光谱成像传感器的蓝图,使解决方案能够挑战医疗和遥感问题。然而,这些生物透视传感器缺乏气孔视觉系统的高动态范围(HDR)和异步偏振视觉功能,将时间分辨率限制为\〜12 ms和动态范围到\〜72 dB。在这里,我们提出了一种新的Stomatopod-Inspireation相机,其模仿持续和瞬态的生物视觉途径,以节省超出最大奈奎斯特帧速率的功率和样本数据。该生物启发传感器同时捕获同步强度帧和异步偏振亮度改变信息与百万倍的照明范围内的子毫秒延迟。我们的PDAVIS摄像机由346x260像素组成,组织在2×2宏像素中,该型滤光器有4个线性偏振滤波器偏移45度。使用基于低成本和延迟事件的算法和更准确但深度神经网络的更准确而是重建极化信息。我们的传感器用于图像在快速循环载荷下观察牛筋膜中单胶原纤维的单胶原纤维的动态性能
translated by 谷歌翻译
Time Series Classification (TSC) is an important and challenging problem in data mining. With the increase of time series data availability, hundreds of TSC algorithms have been proposed. Among these methods, only a few have considered Deep Neural Networks (DNNs) to perform this task. This is surprising as deep learning has seen very successful applications in the last years. DNNs have indeed revolutionized the field of computer vision especially with the advent of novel deeper architectures such as Residual and Convolutional Neural Networks. Apart from images, sequential data such as text and audio can also be processed with DNNs to reach state-of-the-art performance for document classification and speech recognition. In this article, we study the current state-ofthe-art performance of deep learning algorithms for TSC by presenting an empirical study of the most recent DNN architectures for TSC. We give an overview of the most successful deep learning applications in various time series domains under a unified taxonomy of DNNs for TSC. We also provide an open source deep learning framework to the TSC community where we implemented each of the compared approaches and evaluated them on a univariate TSC benchmark (the UCR/UEA archive) and 12 multivariate time series datasets. By training 8,730 deep learning models on 97 time series datasets, we propose the most exhaustive study of DNNs for TSC to date.
translated by 谷歌翻译
Panoptic Part Segmentation (PPS) unifies panoptic segmentation and part segmentation into one task. Previous works utilize separated approaches to handle thing, stuff, and part predictions without shared computation and task association. We aim to unify these tasks at the architectural level, designing the first end-to-end unified framework named Panoptic-PartFormer. Moreover, we find the previous metric PartPQ biases to PQ. To handle both issues, we make the following contributions: Firstly, we design a meta-architecture that decouples part feature and things/stuff feature, respectively. We model things, stuff, and parts as object queries and directly learn to optimize all three forms of prediction as a unified mask prediction and classification problem. We term our model as Panoptic-PartFormer. Secondly, we propose a new metric Part-Whole Quality (PWQ) to better measure such task from both pixel-region and part-whole perspectives. It can also decouple the error for part segmentation and panoptic segmentation. Thirdly, inspired by Mask2Former, based on our meta-architecture, we propose Panoptic-PartFormer++ and design a new part-whole cross attention scheme to further boost part segmentation qualities. We design a new part-whole interaction method using masked cross attention. Finally, the extensive ablation studies and analysis demonstrate the effectiveness of both Panoptic-PartFormer and Panoptic-PartFormer++. Compared with previous Panoptic-PartFormer, our Panoptic-PartFormer++ achieves 2% PartPQ and 3% PWQ improvements on the Cityscapes PPS dataset and 5% PartPQ on the Pascal Context PPS dataset. On both datasets, Panoptic-PartFormer++ achieves new state-of-the-art results with a significant cost drop of 70% on GFlops and 50% on parameters. Our models can serve as a strong baseline and aid future research in PPS. Code will be available.
translated by 谷歌翻译
Vision transformers have emerged as powerful tools for many computer vision tasks. It has been shown that their features and class tokens can be used for salient object segmentation. However, the properties of segmentation transformers remain largely unstudied. In this work we conduct an in-depth study of the spatial attentions of different backbone layers of semantic segmentation transformers and uncover interesting properties. The spatial attentions of a patch intersecting with an object tend to concentrate within the object, whereas the attentions of larger, more uniform image areas rather follow a diffusive behavior. In other words, vision transformers trained to segment a fixed set of object classes generalize to objects well beyond this set. We exploit this by extracting heatmaps that can be used to segment unknown objects within diverse backgrounds, such as obstacles in traffic scenes. Our method is training-free and its computational overhead negligible. We use off-the-shelf transformers trained for street-scene segmentation to process other scene types.
translated by 谷歌翻译
Vision transformers (ViTs) encoding an image as a sequence of patches bring new paradigms for semantic segmentation.We present an efficient framework of representation separation in local-patch level and global-region level for semantic segmentation with ViTs. It is targeted for the peculiar over-smoothness of ViTs in semantic segmentation, and therefore differs from current popular paradigms of context modeling and most existing related methods reinforcing the advantage of attention. We first deliver the decoupled two-pathway network in which another pathway enhances and passes down local-patch discrepancy complementary to global representations of transformers. We then propose the spatially adaptive separation module to obtain more separate deep representations and the discriminative cross-attention which yields more discriminative region representations through novel auxiliary supervisions. The proposed methods achieve some impressive results: 1) incorporated with large-scale plain ViTs, our methods achieve new state-of-the-art performances on five widely used benchmarks; 2) using masked pre-trained plain ViTs, we achieve 68.9% mIoU on Pascal Context, setting a new record; 3) pyramid ViTs integrated with the decoupled two-pathway network even surpass the well-designed high-resolution ViTs on Cityscapes; 4) the improved representations by our framework have favorable transferability in images with natural corruptions. The codes will be released publicly.
translated by 谷歌翻译
The following article presents a memetic algorithm with applying deep reinforcement learning (DRL) for solving practically oriented dual resource constrained flexible job shop scheduling problems (DRC-FJSSP). In recent years, there has been extensive research on DRL techniques, but without considering realistic, flexible and human-centered shopfloors. A research gap can be identified in the context of make-to-order oriented discontinuous manufacturing as it is often represented in medium-size companies with high service levels. From practical industry projects in this domain, we recognize requirements to depict flexible machines, human workers and capabilities, setup and processing operations, material arrival times, complex job paths with parallel tasks for bill of material (BOM) manufacturing, sequence-depended setup times and (partially) automated tasks. On the other hand, intensive research has been done on metaheuristics in the context of DRC-FJSSP. However, there is a lack of suitable and generic scheduling methods that can be holistically applied in sociotechnical production and assembly processes. In this paper, we first formulate an extended DRC-FJSSP induced by the practical requirements mentioned. Then we present our proposed hybrid framework with parallel computing for multicriteria optimization. Through numerical experiments with real-world data, we confirm that the framework generates feasible schedules efficiently and reliably. Utilizing DRL instead of random operations leads to better results and outperforms traditional approaches.
translated by 谷歌翻译
Active learning as a paradigm in deep learning is especially important in applications involving intricate perception tasks such as object detection where labels are difficult and expensive to acquire. Development of active learning methods in such fields is highly computationally expensive and time consuming which obstructs the progression of research and leads to a lack of comparability between methods. In this work, we propose and investigate a sandbox setup for rapid development and transparent evaluation of active learning in deep object detection. Our experiments with commonly used configurations of datasets and detection architectures found in the literature show that results obtained in our sandbox environment are representative of results on standard configurations. The total compute time to obtain results and assess the learning behavior can thereby be reduced by factors of up to 14 when comparing with Pascal VOC and up to 32 when comparing with BDD100k. This allows for testing and evaluating data acquisition and labeling strategies in under half a day and contributes to the transparency and development speed in the field of active learning for object detection.
translated by 谷歌翻译