近年来,Experts(MOE)的混合物已成为一种有前途的深度学习技术,可以将模型能力扩展为万亿多个参数,同时通过稀疏计算降低计算成本。虽然MoE开设了一个非常大的模型的新领域,但由于MOE的动态性质与系统的静态平行性/管道层之间的不匹配,因此其数以千计的GPU的实现受到限制。我们提出了Tutel,这是一种具有动态自适应并行性和管道的高度可扩展的堆栈设计和实现。 TUTEL在运行时提供自适应并行性切换和自适应管道,分别达到1.74倍和2.00倍的单MOE层加速度。我们还提出了一种用于MOE通信速度的新颖的二维层次结构算法,该算法的表现超过了2,048 GPU的先前最先前的最新时间。 Tutel汇总了所有技术,最终在16 GPU和2,048 GPU上分别提供了4.96倍和5.75倍的加速度,分别通过Fairseq:Meta的Facebook AI AI研究序列到序列工具Kit(Tutel(Tutel)(Tutel)(Tutel)(现在由Fairseq部分采用)。 Tutel源代码可在公共场所获得:https://github.com/microsoft/tutel。我们的评估表明,Tutel有效,有效地运行了一个基于现实的MOE模型,名为Swinv2-Moe,建立在Swin Transformer V2上,这是一种最先进的计算机视觉体系结构。在效率方面,Tutel加速了Swinv2-MoE,在FairSeq的训练和推理中分别达到1.55倍和2.11倍的速度。关于有效性,SWINV2-MOE模型在预训练和下游计算机视觉任务(例如可可对象检测)方面都比对应的密度密度模型都达到了卓越的精度,这表明Tutel准备对端到端现实世界模型训练的准备就绪和推理。 Swinv2-Moe在https://github.com/microsoft/swin-transformer中开放。
translated by 谷歌翻译
迭代线性二次调节器(ILQR)在解决非线性系统模型的轨迹优化问题方面已广泛普及。但是,作为一种基于模型的拍摄方法,它在很大程度上依赖于准确的系统模型来更新最佳控制动作和通过正向集成确定的轨迹,从而变得容易受到不可避免的模型的影响。最近,针对最佳控制问题的基于学习的方法进行的大量研究工作在解决未知系统模型方面已经取得了显着发展,尤其是当系统与环境具有复杂的相互作用时。然而,通常需要一个深层的神经网络来拟合大量的采样数据。在这项工作中,我们提出了神经-ILQR,这是一种在不受约束的控制空间上进行学习的拍摄方法,其中使用简单结构的神经网络代表局部系统模型。在此框架中,通过同时完善最佳策略和神经网络迭代,可以实现轨迹优化任务,而无需依靠系统模型的先验知识。通过对两项说明性控制任务的全面评估,在系统模型中存在不准确性的情况下,提出的方法显示出胜过常规ILQR。
translated by 谷歌翻译
在自主驾驶的背景下,已知迭代线性二次调节器(ILQR)是在运动计划问题中处理非线性车辆模型的有效方法。特别是,受约束的ILQR算法在不同类型的一般限制下实现运动计划任务方面表现出了值得注意的计算效率结果。但是,受约束的ILQR方法需要在使用对数屏障函数时在第一次迭代时作为先决条件进行可行的轨迹。同样,该方法为纳入快速,高效和有效的优化方法开辟了可能性,以进一步加快优化过程,从而可以成功地满足实时实施的要求。在本文中,定义明确的运动计划问题是在非线性车辆动力学和各种约束下提出的,并利用了乘数的交替方向方法来确定利用ILQR的最佳控制动作。该方法能够在第一次迭代时规避轨迹的可行性要求。然后研究了自动驾驶汽车运动计划的说明性示例。拟议的开发实现了高度计算效率的值得注意的成就。与基于对数屏障函数的约束ILQR算法进行比较,我们提出的方法在三种驾驶场景中,平均计算时间降低了31.93%,38.52%和44.57%;与优化求解器IPOPT相比,我们提出的方法将平均计算时间降低了46.02%,53.26%和88.43%。结果,可以通过我们提出的框架实现实时计算和实施,因此它为公路驾驶任务提供了额外的安全性。
translated by 谷歌翻译
Panoptic Part Segmentation (PPS) unifies panoptic segmentation and part segmentation into one task. Previous works utilize separated approaches to handle thing, stuff, and part predictions without shared computation and task association. We aim to unify these tasks at the architectural level, designing the first end-to-end unified framework named Panoptic-PartFormer. Moreover, we find the previous metric PartPQ biases to PQ. To handle both issues, we make the following contributions: Firstly, we design a meta-architecture that decouples part feature and things/stuff feature, respectively. We model things, stuff, and parts as object queries and directly learn to optimize all three forms of prediction as a unified mask prediction and classification problem. We term our model as Panoptic-PartFormer. Secondly, we propose a new metric Part-Whole Quality (PWQ) to better measure such task from both pixel-region and part-whole perspectives. It can also decouple the error for part segmentation and panoptic segmentation. Thirdly, inspired by Mask2Former, based on our meta-architecture, we propose Panoptic-PartFormer++ and design a new part-whole cross attention scheme to further boost part segmentation qualities. We design a new part-whole interaction method using masked cross attention. Finally, the extensive ablation studies and analysis demonstrate the effectiveness of both Panoptic-PartFormer and Panoptic-PartFormer++. Compared with previous Panoptic-PartFormer, our Panoptic-PartFormer++ achieves 2% PartPQ and 3% PWQ improvements on the Cityscapes PPS dataset and 5% PartPQ on the Pascal Context PPS dataset. On both datasets, Panoptic-PartFormer++ achieves new state-of-the-art results with a significant cost drop of 70% on GFlops and 50% on parameters. Our models can serve as a strong baseline and aid future research in PPS. Code will be available.
translated by 谷歌翻译
Rankings are widely collected in various real-life scenarios, leading to the leakage of personal information such as users' preferences on videos or news. To protect rankings, existing works mainly develop privacy protection on a single ranking within a set of ranking or pairwise comparisons of a ranking under the $\epsilon$-differential privacy. This paper proposes a novel notion called $\epsilon$-ranking differential privacy for protecting ranks. We establish the connection between the Mallows model (Mallows, 1957) and the proposed $\epsilon$-ranking differential privacy. This allows us to develop a multistage ranking algorithm to generate synthetic rankings while satisfying the developed $\epsilon$-ranking differential privacy. Theoretical results regarding the utility of synthetic rankings in the downstream tasks, including the inference attack and the personalized ranking tasks, are established. For the inference attack, we quantify how $\epsilon$ affects the estimation of the true ranking based on synthetic rankings. For the personalized ranking task, we consider varying privacy preferences among users and quantify how their privacy preferences affect the consistency in estimating the optimal ranking function. Extensive numerical experiments are carried out to verify the theoretical results and demonstrate the effectiveness of the proposed synthetic ranking algorithm.
translated by 谷歌翻译
In this work, we focus on instance-level open vocabulary segmentation, intending to expand a segmenter for instance-wise novel categories without mask annotations. We investigate a simple yet effective framework with the help of image captions, focusing on exploiting thousands of object nouns in captions to discover instances of novel classes. Rather than adopting pretrained caption models or using massive caption datasets with complex pipelines, we propose an end-to-end solution from two aspects: caption grounding and caption generation. In particular, we devise a joint Caption Grounding and Generation (CGG) framework based on a Mask Transformer baseline. The framework has a novel grounding loss that performs explicit and implicit multi-modal feature alignments. We further design a lightweight caption generation head to allow for additional caption supervision. We find that grounding and generation complement each other, significantly enhancing the segmentation performance for novel categories. We conduct extensive experiments on the COCO dataset with two settings: Open Vocabulary Instance Segmentation (OVIS) and Open Set Panoptic Segmentation (OSPS). The results demonstrate the superiority of our CGG framework over previous OVIS methods, achieving a large improvement of 6.8% mAP on novel classes without extra caption data. Our method also achieves over 15% PQ improvements for novel classes on the OSPS benchmark under various settings.
translated by 谷歌翻译
Temporal sentence grounding (TSG) aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query. All existing works first utilize a sparse sampling strategy to extract a fixed number of video frames and then conduct multi-modal interactions with query sentence for reasoning. However, we argue that these methods have overlooked two indispensable issues: 1) Boundary-bias: The annotated target segment generally refers to two specific frames as corresponding start and end timestamps. The video downsampling process may lose these two frames and take the adjacent irrelevant frames as new boundaries. 2) Reasoning-bias: Such incorrect new boundary frames also lead to the reasoning bias during frame-query interaction, reducing the generalization ability of model. To alleviate above limitations, in this paper, we propose a novel Siamese Sampling and Reasoning Network (SSRN) for TSG, which introduces a siamese sampling mechanism to generate additional contextual frames to enrich and refine the new boundaries. Specifically, a reasoning strategy is developed to learn the inter-relationship among these frames and generate soft labels on boundaries for more accurate frame-query reasoning. Such mechanism is also able to supplement the absent consecutive visual semantics to the sampled sparse frames for fine-grained activity understanding. Extensive experiments demonstrate the effectiveness of SSRN on three challenging datasets.
translated by 谷歌翻译
Brain midline shift (MLS) is one of the most critical factors to be considered for clinical diagnosis and treatment decision-making for intracranial hemorrhage. Existing computational methods on MLS quantification not only require intensive labeling in millimeter-level measurement but also suffer from poor performance due to their dependence on specific landmarks or simplified anatomical assumptions. In this paper, we propose a novel semi-supervised framework to accurately measure the scale of MLS from head CT scans. We formulate the MLS measurement task as a deformation estimation problem and solve it using a few MLS slices with sparse labels. Meanwhile, with the help of diffusion models, we are able to use a great number of unlabeled MLS data and 2793 non-MLS cases for representation learning and regularization. The extracted representation reflects how the image is different from a non-MLS image and regularization serves an important role in the sparse-to-dense refinement of the deformation field. Our experiment on a real clinical brain hemorrhage dataset has achieved state-of-the-art performance and can generate interpretable deformation fields.
translated by 谷歌翻译
It is crucial to evaluate the quality and determine the optimal number of clusters in cluster analysis. In this paper, the multi-granularity characterization of the data set is carried out to obtain the hyper-balls. The cluster internal evaluation index based on hyper-balls(HCVI) is defined. Moreover, a general method for determining the optimal number of clusters based on HCVI is proposed. The proposed methods can evaluate the clustering results produced by the several classic methods and determine the optimal cluster number for data sets containing noises and clusters with arbitrary shapes. The experimental results on synthetic and real data sets indicate that the new index outperforms existing ones.
translated by 谷歌翻译
Normalizing flow is a class of deep generative models for efficient sampling and density estimation. In practice, the flow often appears as a chain of invertible neural network blocks; to facilitate training, existing works have regularized flow trajectories and designed special network architectures. The current paper develops a neural ODE flow network inspired by the Jordan-Kinderleherer-Otto (JKO) scheme, which allows efficient block-wise training of the residual blocks and avoids inner loops of score matching or variational learning. As the JKO scheme unfolds the dynamic of gradient flow, the proposed model naturally stacks residual network blocks one-by-one, reducing the memory load and difficulty of performing end-to-end training of deep flow networks. We also develop adaptive time reparameterization of the flow network with a progressive refinement of the trajectory in probability space, which improves the model training efficiency and accuracy in practice. Using numerical experiments with synthetic and real data, we show that the proposed JKO-iFlow model achieves similar or better performance in generating new samples compared with existing flow and diffusion models at a significantly reduced computational and memory cost.
translated by 谷歌翻译