在这项工作中,我们提出了用于商业产品分类的多模式模型,该模型结合了使用简单的融合技术从Textual(Camembert和Flaubert)和视觉数据(SE-Resnext-50)中提取的功能。所提出的方法显着优于单峰模型的性能以及在我们的特定任务上报告的类似模型的报告。我们进行了多种融合技术的实验,并发现,结合单峰网络的单个嵌入的最佳性能技术是基于结合串联和平均特征向量的方法。每种模式都补充了其他方式的缺点,表明增加模态的数量可能是改善多标签和多模式分类问题的有效方法。
translated by 谷歌翻译
我们使用改进的最小路径Eikonal方程向3D图像引入一种新的对象分割方法。该方法利用隐式约束 - 对eikonal的非均匀最小路径的二阶校正 - 防止相邻的最小路径轨迹无法控制地分歧。所提出的修改大大减少了通过最小路径揭示的表面积,允许使用计算的最小路径设置为近似表面的参数线。它还具有与也推导出真正的最小表面eikonal方程的松散连接。
translated by 谷歌翻译
众所周知,客户师沟通可能是联邦学习中的主要瓶颈。在这项工作中,我们通过一种新颖的客户端采样方案解决了这个问题,我们将允许的客户数量限制为将其更新传达给主节点的数量。在每个通信回合中,所有参与的客户都会计算他们的更新,但只有具有“重要”更新的客户可以与主人通信。我们表明,可以仅使用更新的规范来衡量重要性,并提供一个公式以最佳客户参与。此公式将所有客户参与的完整更新与我们有限的更新(参与客户数量受到限制)之间的距离最小化。此外,我们提供了一种简单的算法,该算法近似于客户参与的最佳公式,该公式仅需要安全的聚合,因此不会损害客户的隐私。我们在理论上和经验上都表明,对于分布式SGD(DSGD)和联合平均(FedAvg),我们的方法的性能可以接近完全参与,并且优于基线,在参与客户均匀地采样的基线。此外,我们的方法与现有的减少通信开销(例如本地方法和通信压缩方法)的现有方法兼容。
translated by 谷歌翻译
现代深度学习模型通常在分布式机器集合中并行培训,以减少训练时间。在这种情况下,机器之间模型更新的通信变成了一个重要的性能瓶颈,并且已经提出了各种有损的压缩技术来减轻此问题。在这项工作中,我们介绍了一种新的,简单但理论上和实践上有效的压缩技术:自然压缩(NC)。我们的技术分别应用于要进行压缩的更新向量的所有条目,并通过随机舍入到两个的(负或正)两种功能,可以通过忽略Mantissa来以“自然”方式计算。我们表明,与没有压缩相比,NC将压缩向量的第二刻增加不超过微小因子$ \ frac {9} {8} $,这意味着NC对流行训练算法的收敛速度的影响,例如分布式SGD,可以忽略不计。但是,NC启用的通信节省是可观的,导致$ 3 $ - $ 4 \ times $ $改善整体理论运行时间。对于需要更具侵略性压缩的应用,我们将NC推广到自然抖动,我们证明这比常见的随机抖动技术要好得多。我们的压缩操作员可以自行使用,也可以与现有操作员结合使用,从而产生更具侵略性的结合效果,并在理论和实践中提供新的最先进。
translated by 谷歌翻译
Supervised Question Answering systems (QA systems) rely on domain-specific human-labeled data for training. Unsupervised QA systems generate their own question-answer training pairs, typically using secondary knowledge sources to achieve this outcome. Our approach (called PIE-QG) uses Open Information Extraction (OpenIE) to generate synthetic training questions from paraphrased passages and uses the question-answer pairs as training data for a language model for a state-of-the-art QA system based on BERT. Triples in the form of <subject, predicate, object> are extracted from each passage, and questions are formed with subjects (or objects) and predicates while objects (or subjects) are considered as answers. Experimenting on five extractive QA datasets demonstrates that our technique achieves on-par performance with existing state-of-the-art QA systems with the benefit of being trained on an order of magnitude fewer documents and without any recourse to external reference data sources.
translated by 谷歌翻译
This paper presents a machine learning approach to multidimensional item response theory (MIRT), a class of latent factor models that can be used to model and predict student performance from observed assessment data. Inspired by collaborative filtering, we define a general class of models that includes many MIRT models. We discuss the use of penalized joint maximum likelihood (JML) to estimate individual models and cross-validation to select the best performing model. This model evaluation process can be optimized using batching techniques, such that even sparse large-scale data can be analyzed efficiently. We illustrate our approach with simulated and real data, including an example from a massive open online course (MOOC). The high-dimensional model fit to this large and sparse dataset does not lend itself well to traditional methods of factor interpretation. By analogy to recommender-system applications, we propose an alternative "validation" of the factor model, using auxiliary information about the popularity of items consulted during an open-book exam in the course.
translated by 谷歌翻译
We introduce Argoverse 2 (AV2) - a collection of three datasets for perception and forecasting research in the self-driving domain. The annotated Sensor Dataset contains 1,000 sequences of multimodal data, encompassing high-resolution imagery from seven ring cameras, and two stereo cameras in addition to lidar point clouds, and 6-DOF map-aligned pose. Sequences contain 3D cuboid annotations for 26 object categories, all of which are sufficiently-sampled to support training and evaluation of 3D perception models. The Lidar Dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose. This dataset is the largest ever collection of lidar sensor data and supports self-supervised learning and the emerging task of point cloud forecasting. Finally, the Motion Forecasting Dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene. Models are tasked with the prediction of future motion for "scored actors" in each scenario and are provided with track histories that capture object location, heading, velocity, and category. In all three datasets, each scenario contains its own HD Map with 3D lane and crosswalk geometry - sourced from data captured in six distinct cities. We believe these datasets will support new and existing machine learning research problems in ways that existing datasets do not. All datasets are released under the CC BY-NC-SA 4.0 license.
translated by 谷歌翻译
The celebrated FedAvg algorithm of McMahan et al. (2017) is based on three components: client sampling (CS), data sampling (DS) and local training (LT). While the first two are reasonably well understood, the third component, whose role is to reduce the number of communication rounds needed to train the model, resisted all attempts at a satisfactory theoretical explanation. Malinovsky et al. (2022) identified four distinct generations of LT methods based on the quality of the provided theoretical communication complexity guarantees. Despite a lot of progress in this area, none of the existing works were able to show that it is theoretically better to employ multiple local gradient-type steps (i.e., to engage in LT) than to rely on a single local gradient-type step only in the important heterogeneous data regime. In a recent breakthrough embodied in their ProxSkip method and its theoretical analysis, Mishchenko et al. (2022) showed that LT indeed leads to provable communication acceleration for arbitrarily heterogeneous data, thus jump-starting the $5^{\rm th}$ generation of LT methods. However, while these latest generation LT methods are compatible with DS, none of them support CS. We resolve this open problem in the affirmative. In order to do so, we had to base our algorithmic development on new algorithmic and theoretical foundations.
translated by 谷歌翻译
Graph clustering is a fundamental problem in unsupervised learning, with numerous applications in computer science and in analysing real-world data. In many real-world applications, we find that the clusters have a significant high-level structure. This is often overlooked in the design and analysis of graph clustering algorithms which make strong simplifying assumptions about the structure of the graph. This thesis addresses the natural question of whether the structure of clusters can be learned efficiently and describes four new algorithmic results for learning such structure in graphs and hypergraphs. All of the presented theoretical results are extensively evaluated on both synthetic and real-word datasets of different domains, including image classification and segmentation, migration networks, co-authorship networks, and natural language processing. These experimental results demonstrate that the newly developed algorithms are practical, effective, and immediately applicable for learning the structure of clusters in real-world data.
translated by 谷歌翻译
Selecting the number of topics in LDA models is considered to be a difficult task, for which alternative approaches have been proposed. The performance of the recently developed singular Bayesian information criterion (sBIC) is evaluated and compared to the performance of alternative model selection criteria. The sBIC is a generalization of the standard BIC that can be implemented to singular statistical models. The comparison is based on Monte Carlo simulations and carried out for several alternative settings, varying with respect to the number of topics, the number of documents and the size of documents in the corpora. Performance is measured using different criteria which take into account the correct number of topics, but also whether the relevant topics from the DGPs are identified. Practical recommendations for LDA model selection in applications are derived.
translated by 谷歌翻译