Generalizable 3D part segmentation is important but challenging in vision and robotics. Training deep models via conventional supervised methods requires large-scale 3D datasets with fine-grained part annotations, which are costly to collect. This paper explores an alternative way for low-shot part segmentation of 3D point clouds by leveraging a pretrained image-language model, GLIP, which achieves superior performance on open-vocabulary 2D detection. We transfer the rich knowledge from 2D to 3D through GLIP-based part detection on point cloud rendering and a novel 2D-to-3D label lifting algorithm. We also utilize multi-view 3D priors and few-shot prompt tuning to boost performance significantly. Extensive evaluation on PartNet and PartNet-Mobility datasets shows that our method enables excellent zero-shot 3D part segmentation. Our few-shot version not only outperforms existing few-shot approaches by a large margin but also achieves highly competitive results compared to the fully supervised counterpart. Furthermore, we demonstrate that our method can be directly applied to iPhone-scanned point clouds without significant domain gaps.
translated by 谷歌翻译
公平测试旨在减轻数据驱动的AI系统决策过程中的意外歧视。当AI模型为仅根据受保护属性(例如年龄和种族)区分的两个不同的个体做出不同的决定时,可能会发生个人歧视。这样的实例揭示了偏见的AI行为,被称为个人歧视实例(IDI)。在本文中,我们提出了一种选择初始种子以生成IDI进行公平测试的方法。先前的研究主要使用随机的初始种子来实现这一目标。但是,这个阶段至关重要,因为这些种子是后续IDIS生成的基础。我们称我们提出的种子选择方法I&D。它产生了大量的初始IDI,表现出极大的多样性,旨在提高公平测试的整体性能。我们的实证研究表明,I&D能够就四种最先进的种子生成方法产生更多的IDI,平均产生1.68倍的IDI。此外,我们比较I&D在训练机器学习模型中的使用,并发现与最先进的ART相比,使用I&D将剩余IDI的数量减少了29%,因此表明I&D有效地改善了模型公平性
translated by 谷歌翻译
非凸AC-OPF问题的多个负载分解映射的存在对深神经网络(DNN)方案构成了根本挑战。由于训练数据集可能包含与不同负载分解映射相对应的数据点的混合物,因此DNN可能无法学习合法的映射并生成劣质解决方案。我们建议DeepOpf-al作为解决此问题的增强学习方法。这个想法是训练DNN,以学习从增强输入(即(负载,初始点))的唯一映射到由具有负载和初始点作为进气口的迭代OPF求解器生成的解决方案。然后,我们将学习的增强映射应用于求解AC-OPF问题的速度要快得多。与最近的DNN方案相比,IEEE测试案例的模拟结果表明,DeepOPF-AL可以明显地取得更好的最优性和相似的可行性和加速性能,具有相同的DNN大小却提高了训练的复杂性。
translated by 谷歌翻译
为了实现可以模仿人类智能的强大人工智能的目标,AI系统将有能力适应不断变化的场景并连续地学习新知识,而不会忘记先前获得的知识。当机器学习模型经过连续的多个任务进行连续培训时,其在以前学习的任务上的性能可能会在新见到的任务的学习过程中急剧下降。为了避免这种现象被称为灾难性的遗忘,已经提出了持续学习,也称为终身学习,并成为机器学习中最新的研究领域之一。近年来,随着量子机学习的开花,开发量子持续学习很有趣。本文着重于用于量子数据的量子模型的情况,其中计算模型和要处理的数据都是量子。梯度情节记忆方法被合并为设计一种量子连续学习方案,该方案克服了灾难性的遗忘,并实现了知识向后传递。具体而言,一系列量子状态分类任务是由差异量子分类器不断学习的,该分类器的参数通过经典的基于梯度的优化器进行了优化。当前任务的梯度被投影到最接近的梯度,避免了以前任务的损失增加,但允许减少。数值仿真结果表明,我们的方案不仅克服了灾难性的遗忘,而且还要实现知识向后转移,这意味着分类器在先前任务上的绩效得到了增强,而不是在学习新任务时受到损害。
translated by 谷歌翻译
由于固有的DNN预测误差,确保解决方案可行性是开发用于解决受约束优化问题的深度神经网络(DNN)方案的关键挑战。在本文中,我们提出了一种“预防性学习”的框架,以系统地保证DNN解决方案可行性的凸起约束和一般客观函数的问题。我们首先应用预测和重建设计,不仅保证平等约束,还可以利用它们来减少DNN预测的变量的数量。然后,作为关键方法贡献,我们系统地校准了DNN训练中使用的不等式约束,从而预测预测误差并确保所得到的解决方案仍然可行。我们表征校准量大和DNN尺寸,足以确保通用可行性。我们提出了一种新的敌对样本意识到培训算法,以改善DNN的最优性能而不牺牲可行性保证。总的来说,该框架提供了两个DNN。表征足够的DNN大小的第一个可以保证通用可行性,而来自所提出的培训算法的另一个进一步提高了最优性并同时保持DNN的通用可行性。我们应用预防性学习框架来开发Deepopf +,以解决网格运行中的基本DC最佳功率流量问题。它在确保在轻负载和重载制度中的可行性和获得一致的理想加速性能时,它可以改善现有的基于DNN的方案。仿真结果对IEEE案例-30 / 118/300测试用例显示DeepoPF +与最优性损失的最优损失和最高幅度计算加速度为100 \%$ 0.5%的可行解决方案,相比之下艺术迭代求解器。
translated by 谷歌翻译
可再生能源世代的高百分比渗透对电力系统引起了重大不确定性。它要求网格操作员更频繁地解决替代电流最佳功率流(AC-OPF)问题,以便在传输和分配网格中进行经济和可靠的操作。在本文中,我们开发了一种Deep神经网络(DNN)方法,称为DEEPOPF,用于在传统求解器使用的时间中解决AC-OPF问题。应用机器学习技术解决AC-OPF问题的关键困难在于确保获得的解决方案尊重平等和不平等的物理和操作约束。在[1],[2]中概括了2阶段的过程,DEEPOPF首先训练DNN模型,以预测一组独立的操作变量,然后通过求解功率流方程直接计算剩余的可靠性变量。这种方法不仅保留了平衡平等的限制,而且还减少了DNN预测的变量数量,从而减少了所需的神经元和训练数据的数量。然后,DeePOPF在培训过程中采用零级梯度估计技术采用惩罚方法,以保留其余的不平等约束。作为另一个贡献,我们根据所需的近似精度来驱动调整DNN的大小的条件,该准确性测量了DNN的概括能力。它为使用DNN解决AC-OPF问题提供了理论上的理由。 IEEE 30/118/300-BU和合成2000总线测试用例的仿真结果表明,与最先进的求解器相比,DEEPOPF最多将计算时间速度高达两个数量级,费用为费用$ <$ <$ 0.1%的成本差异。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译