The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
GitHub是Internet上最大的开源软件主机。这个大型,可自由访问的数据库吸引了从业人员和研究人员的注意。但是,随着Github的增长的继续,越来越难以导航遍布广泛领域的大量存储库。过去的工作表明,考虑到应用程序域对于预测存储库的普及以及有关项目质量的推理的任务至关重要。在这项工作中,我们建立在先前注释的5,000个GitHub存储库的数据集上,以设计自动分类器,以通过其应用程序域对存储库进行分类。分类器使用最先进的自然语言处理技术和机器学习,根据五个应用程序域从多个数据源和目录存储库中学习。我们用(1)自动分类器贡献,该分类器可以将流行的存储库分配给每个应用程序域,至少具有70%的精度,(2)对该方法在不流行的存储库中的性能进行调查,以及(3)这种方法对这种方法的实际应用程序,用于回答软件工程实践的采用如何在应用程序域之间有何不同。我们的工作旨在帮助GitHub社区确定感兴趣的存储库,并为未来的工作开放有希望的途径,以调查来自不同应用领域的存储库之间的差异。
translated by 谷歌翻译
FIB/SEM断层扫描代表了电池研究和许多其他领域中三维纳米结构表征的必不可少的工具。然而,在许多情况下,对比度和3D分类/重建问题出现,这极大地限制了该技术的适用性,尤其是在多孔材料上,例如电池或燃料电池中用于电极材料的材料。区分不同的组件(例如主动LI存储颗粒和碳/粘合剂材料)很困难,并且通常可以防止对图像数据进行可靠的定量分析,甚至可能导致关于结构 - 质地关系的错误结论。在这项贡献中,我们提出了一种新型的数据分类方法,该方法是通过FIB/SEM断层扫描获得的三维图像数据及其在NMC电池电极材料中的应用。我们使用两个不同的图像信号,即Angled SE2腔室检测器和Inlens检测器信号的信号,将信号组合在一起并训练一个随机森林,即特定的机器学习算法。我们证明,这种方法可以克服适合多相测量的现有技术的当前局限性,并且即使在当前的最新技术失败或对大型训练集的需求之后,它也可以进行定量数据重建。这种方法可能会作为使用FIB/SEM断层扫描的未来研究指南。
translated by 谷歌翻译
库存在矿业价值链中是必不可少的,协助最大化的价值和生产。库存矿物质的质量控制是储存经理的主要问题,未能满足一些要求可能导致亏损。最近使用单个回收器和基本假设来调查此问题。本研究扩展了考虑多次回收人员准备短期和长期交付的方法。多次恢复者的参与使得在他们在准备交付时的交互方面使问题变得复杂化和安全距离的再生家。我们还考虑更现实的设置,例如用不同类型的回收器处理不同的矿物质。我们提出了构建解决方案的方法,以逐步符合牲畜轿车中所有收集者的优先约束。我们使用贪婪算法,蚁群优化(ACO)来研究各种问题的实例,并提出了一种确定有效计划的集成本地搜索方法。我们微调并比较算法,并表明ACO与本地搜索相结合,可以产生高效的解决方案。
translated by 谷歌翻译
桁架优化可以制定为组合和多模态问题,其中定位不同的最佳设计允许从业者根据他们的偏好选择最佳设计。已经成功地应用了Bilevel优化以分别考虑拓扑和尺寸的拓扑和下层尺寸。我们介绍精确的枚举,以严格分析拓扑搜索空间,并删除对小问题的随机性。我们还提出了新颖性驱动的二元粒子群优化,以通过最大化新颖性来发现上层的新设计。对于较低的级别,我们采用可靠的进化优化器来解决问题的布局配置方面。我们考虑桁架优化问题实例,其中设计人员需要选择与练习代码约束的离散集中的条形大小。我们的实验研究表明,我们的方法优于目前最先进的方法,并获得多种高质量解决方案。
translated by 谷歌翻译
Machine learning models are typically evaluated by computing similarity with reference annotations and trained by maximizing similarity with such. Especially in the bio-medical domain, annotations are subjective and suffer from low inter- and intra-rater reliability. Since annotations only reflect the annotation entity's interpretation of the real world, this can lead to sub-optimal predictions even though the model achieves high similarity scores. Here, the theoretical concept of Peak Ground Truth (PGT) is introduced. PGT marks the point beyond which an increase in similarity with the reference annotation stops translating to better Real World Model Performance (RWMP). Additionally, a quantitative technique to approximate PGT by computing inter- and intra-rater reliability is proposed. Finally, three categories of PGT-aware strategies to evaluate and improve model performance are reviewed.
translated by 谷歌翻译
In recent years, several metrics have been developed for evaluating group fairness of rankings. Given that these metrics were developed with different application contexts and ranking algorithms in mind, it is not straightforward which metric to choose for a given scenario. In this paper, we perform a comprehensive comparative analysis of existing group fairness metrics developed in the context of fair ranking. By virtue of their diverse application contexts, we argue that such a comparative analysis is not straightforward. Hence, we take an axiomatic approach whereby we design a set of thirteen properties for group fairness metrics that consider different ranking settings. A metric can then be selected depending on whether it satisfies all or a subset of these properties. We apply these properties on eleven existing group fairness metrics, and through both empirical and theoretical results we demonstrate that most of these metrics only satisfy a small subset of the proposed properties. These findings highlight limitations of existing metrics, and provide insights into how to evaluate and interpret different fairness metrics in practical deployment. The proposed properties can also assist practitioners in selecting appropriate metrics for evaluating fairness in a specific application.
translated by 谷歌翻译
Partial differential equations (PDEs) are important tools to model physical systems, and including them into machine learning models is an important way of incorporating physical knowledge. Given any system of linear PDEs with constant coefficients, we propose a family of Gaussian process (GP) priors, which we call EPGP, such that all realizations are exact solutions of this system. We apply the Ehrenpreis-Palamodov fundamental principle, which works like a non-linear Fourier transform, to construct GP kernels mirroring standard spectral methods for GPs. Our approach can infer probable solutions of linear PDE systems from any data such as noisy measurements, or initial and boundary conditions. Constructing EPGP-priors is algorithmic, generally applicable, and comes with a sparse version (S-EPGP) that learns the relevant spectral frequencies and works better for big data sets. We demonstrate our approach on three families of systems of PDE, the heat equation, wave equation, and Maxwell's equations, where we improve upon the state of the art in computation time and precision, in some experiments by several orders of magnitude.
translated by 谷歌翻译
Classically, the development of humanoid robots has been sequential and iterative. Such bottom-up design procedures rely heavily on intuition and are often biased by the designer's experience. Exploiting the non-linear coupled design space of robots is non-trivial and requires a systematic procedure for exploration. We adopt the top-down design strategy, the V-model, used in automotive and aerospace industries. Our co-design approach identifies non-intuitive designs from within the design space and obtains the maximum permissible range of the design variables as a solution space, to physically realise the obtained design. We show that by constructing the solution space, one can (1) decompose higher-level requirements onto sub-system-level requirements with tolerance, alleviating the "chicken-or-egg" problem during the design process, (2) decouple the robot's morphology from its controller, enabling greater design flexibility, (3) obtain independent sub-system level requirements, reducing the development time by parallelising the development process.
translated by 谷歌翻译
Recent diffusion-based AI art platforms are able to create impressive images from simple text descriptions. This makes them powerful tools for concept design in any discipline that requires creativity in visual design tasks. This is also true for early stages of architectural design with multiple stages of ideation, sketching and modelling. In this paper, we investigate how applicable diffusion-based models already are to these tasks. We research the applicability of the platforms Midjourney, DALL-E 2 and StableDiffusion to a series of common use cases in architectural design to determine which are already solvable or might soon be. We also analyze how they are already being used by analyzing a data set of 40 million Midjourney queries with NLP methods to extract common usage patterns. With this insights we derived a workflow to interior and exterior design that combines the strengths of the individual platforms.
translated by 谷歌翻译