Data heterogeneity across clients in federated learning (FL) settings is a widely acknowledged challenge. In response, personalized federated learning (PFL) emerged as a framework to curate local models for clients' tasks. In PFL, a common strategy is to develop local and global models jointly - the global model (for generalization) informs the local models, and the local models (for personalization) are aggregated to update the global model. A key observation is that if we can improve the generalization ability of local models, then we can improve the generalization of global models, which in turn builds better personalized models. In this work, we consider class imbalance, an overlooked type of data heterogeneity, in the classification setting. We propose FedNH, a novel method that improves the local models' performance for both personalization and generalization by combining the uniformity and semantics of class prototypes. FedNH initially distributes class prototypes uniformly in the latent space and smoothly infuses the class semantics into class prototypes. We show that imposing uniformity helps to combat prototype collapse while infusing class semantics improves local models. Extensive experiments were conducted on popular classification datasets under the cross-device setting. Our results demonstrate the effectiveness and stability of our method over recent works.
translated by 谷歌翻译
Object goal navigation (ObjectNav) in unseen environments is a fundamental task for Embodied AI. Agents in existing works learn ObjectNav policies based on 2D maps, scene graphs, or image sequences. Considering this task happens in 3D space, a 3D-aware agent can advance its ObjectNav capability via learning from fine-grained spatial information. However, leveraging 3D scene representation can be prohibitively unpractical for policy learning in this floor-level task, due to low sample efficiency and expensive computational cost. In this work, we propose a framework for the challenging 3D-aware ObjectNav based on two straightforward sub-policies. The two sub-polices, namely corner-guided exploration policy and category-aware identification policy, simultaneously perform by utilizing online fused 3D points as observation. Through extensive experiments, we show that this framework can dramatically improve the performance in ObjectNav through learning from 3D scene representation. Our framework achieves the best performance among all modular-based methods on the Matterport3D and Gibson datasets, while requiring (up to 30x) less computational cost for training.
translated by 谷歌翻译
Compared to the great progress of large-scale vision transformers (ViTs) in recent years, large-scale models based on convolutional neural networks (CNNs) are still in an early state. This work presents a new large-scale CNN-based foundation model, termed InternImage, which can obtain the gain from increasing parameters and training data like ViTs. Different from the recent CNNs that focus on large dense kernels, InternImage takes deformable convolution as the core operator, so that our model not only has the large effective receptive field required for downstream tasks such as detection and segmentation, but also has the adaptive spatial aggregation conditioned by input and task information. As a result, the proposed InternImage reduces the strict inductive bias of traditional CNNs and makes it possible to learn stronger and more robust patterns with large-scale parameters from massive data like ViTs. The effectiveness of our model is proven on challenging benchmarks including ImageNet, COCO, and ADE20K. It is worth mentioning that InternImage-H achieved the new record 65.4 mAP on COCO test-dev. The code will be released at https://github.com/OpenGVLab/InternImage.
translated by 谷歌翻译
Breast cancer is one of the common cancers that endanger the health of women globally. Accurate target lesion segmentation is essential for early clinical intervention and postoperative follow-up. Recently, many convolutional neural networks (CNNs) have been proposed to segment breast tumors from ultrasound images. However, the complex ultrasound pattern and the variable tumor shape and size bring challenges to the accurate segmentation of the breast lesion. Motivated by the selective kernel convolution, we introduce an enhanced selective kernel convolution for breast tumor segmentation, which integrates multiple feature map region representations and adaptively recalibrates the weights of these feature map regions from the channel and spatial dimensions. This region recalibration strategy enables the network to focus more on high-contributing region features and mitigate the perturbation of less useful regions. Finally, the enhanced selective kernel convolution is integrated into U-net with deep supervision constraints to adaptively capture the robust representation of breast tumors. Extensive experiments with twelve state-of-the-art deep learning segmentation methods on three public breast ultrasound datasets demonstrate that our method has a more competitive segmentation performance in breast ultrasound images.
translated by 谷歌翻译
Language models (LMs) now excel at many tasks such as few-shot learning, question answering, reasoning, and dialog. However, they sometimes generate unsupported or misleading content. A user cannot easily determine whether their outputs are trustworthy or not, because most LMs do not have any built-in mechanism for attribution to external evidence. To enable attribution while still preserving all the powerful advantages of recent generation models, we propose RARR (Retrofit Attribution using Research and Revision), a system that 1) automatically finds attribution for the output of any text generation model and 2) post-edits the output to fix unsupported content while preserving the original output as much as possible. When applied to the output of several state-of-the-art LMs on a diverse set of generation tasks, we find that RARR significantly improves attribution while otherwise preserving the original input to a much greater degree than previously explored edit models. Furthermore, the implementation of RARR requires only a handful of training examples, a large language model, and standard web search.
translated by 谷歌翻译
图形神经网络(GNNS)在图表表示学习中获得了动力,并在各种领域(例如数据挖掘)(\ emph {e.g。,}社交网络分析和推荐系统),计算机视觉(\ emph {例如,}对象检测和点云学习)和自然语言处理(\ emph {e.g。,}关系提取和序列学习),仅举几例。随着自然语言处理和计算机视觉中变压器的出现,图形变压器将图形结构嵌入到变压器体系结构中,以克服局部邻域聚集的局限性,同时避免严格的结构电感偏见。在本文中,我们从面向任务的角度介绍了计算机视觉中GNN和图形变压器的全面综述。具体来说,我们根据输入数据的模式,\ emph {i.e。,} 2D自然图像,视频,3D数据,Vision +语言和医学图像,将其在计算机视觉中的应用分为五个类别。在每个类别中,我们根据一组视觉任务进一步对应用程序进行划分。这种面向任务的分类法使我们能够检查如何通过不同的基于GNN的方法以及这些方法的表现如何解决每个任务。基于必要的初步,我们提供了任务的定义和挑战,对代表性方法的深入报道以及有关见解,局限性和未来方向的讨论。
translated by 谷歌翻译
增加片上光子神经网络(PNN)的层数对于改善其模型性能至关重要。但是,网络隐藏层的连续级联导致更大的集成光子芯片区域。为了解决此问题,我们提出了光学神经常规微分方程(ON-ON-ON-OD-ON-OD-ON-OD-ON-OD-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ODINE),该架构用光ODE求解器参数化了隐藏层的连续动力学。 On-Ode包括PNN,然后是光子积分器和光反馈回路,可以配置为代表残留的神经网络(RESNET)和复发性神经网络,并有效地降低了芯片面积占用率。对于基于干扰的光电非线性隐藏层,数值实验表明,单个隐藏层ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ON-ONE表示与图像分类任务中的两层光学重新系统大致相同。此外,Onode提高了基于衍射的全光线性隐藏层的模型分类精度。 On-Eod的时间依赖性动力学属性进一步应用于高精度的轨迹预测。
translated by 谷歌翻译
在本文中,我们证明了基于深度学习的方法可用于融合多对象密度。给定一个带有几个传感器可能不同视野的传感器的方案,跟踪器在每个传感器中在本地执行跟踪,该跟踪器会产生随机有限的集合多对象密度。为了融合来自不同跟踪器的输出,我们调整了最近提出的基于变压器的多对象跟踪器,其中融合结果是一个全局的多对象密度,描述了当前时间的所有活物体。我们将基于变压器的融合方法与基于模型的贝叶斯融合方法的性能进行比较,在几种模拟方案中,使用合成数据进行了不同的参数设置。仿真结果表明,基于变压器的融合方法在我们的实验场景中优于基于模型的贝叶斯方法。
translated by 谷歌翻译
这项研究提出了一种基于深度学习的超声(US)图像引导放射疗法的跟踪方法。拟议的级联深度学习模型由注意力网络,基于掩模区域的卷积神经网络(Mask R-CNN)和长期短期记忆(LSTM)网络组成。注意网络从美国图像到可疑的具有里程碑意义的运动区域,以减少搜索区域。然后,面膜R-CNN在减少区域中产生多个利益区域(ROI)建议,并通过三个网络头确定拟议的地标:边界框回归,提案分类和地标分段。 LSTM网络对连续的图像框架之间的时间关系建模,以进行边界框回归和建议分类。为了合并最终建议,根据顺序框架之间的相似性设计选择方法。该方法在肝脏美国跟踪数据集中测试了医疗图像计算和计算机辅助干预措施(MICCAI)2015年的挑战,其中有三位经验丰富的观察者注释了地标,以获得其平均位置。在24个鉴于我们具有地面真相的序列的24个序列上,所有地标的平均跟踪误差为0.65 +/- 0.56毫米,所有地标的误差均在2 mm之内。我们进一步测试了从测试数据集中的69个地标上提出的模型,该模型具有与训练模式相似的图像模式,从而导致平均跟踪误差为0.94 +/- 0.83 mm。我们的实验结果表明,我们提出的方法使用US图像跟踪肝解剖学地标的可行性和准确性,为放射治疗期间的主动运动管理提供了潜在的解决方案。
translated by 谷歌翻译
基于文本的游戏(TBG)是复杂的环境,允许用户或计算机代理进行文本交互并实现游戏目标。为基于文本的游戏构建面向目标的计算机代理是一项挑战,尤其是当我们使用逐步反馈作为模型的唯一文本输入时。此外,代理商很难通过从更大的文本输入空间中评估灵活的长度和形式。在本文中,我们对应用于基于文本的游戏字段的深度学习方法进行了广泛的分析。
translated by 谷歌翻译