飞机行业不断努力在人类的努力,计算时间和资源消耗方面寻求更有效的设计优化方法。当替代模型和最终过渡到HF模型的开关机制均被正确校准时,混合替代物优化保持了高效果,同时提供快速的设计评估。前馈神经网络(FNN)可以捕获高度非线性输入输出映射,从而为飞机绩效因素提供有效的替代物。但是,FNN通常无法概括分布(OOD)样本,这阻碍了它们在关键飞机设计优化中的采用。通过Smood,我们基于平滑度的分布检测方法,我们建议用优化的FNN替代物来编码一个依赖模型的OOD指标,以产生具有选择性但可信度的预测的值得信赖的替代模型。与常规的不确定性接地方法不同,Smood利用了HF模拟的固有平滑性特性,可以通过揭示其可疑敏感性有效地暴露OOD,从而避免对OOD样品的过度自信不确定性估计。通过使用SMOOD,仅将高风险的OOD输入转发到HF模型以进行重新评估,从而以低开销成本获得更准确的结果。研究了三个飞机性能模型。结果表明,基于FNN的代理在预测性能方面优于其高斯过程。此外,在所有研究案例中,Smood的确覆盖了85%的实际OOD。当Smood Plus FNN替代物被部署在混合替代优化设置中时,它们的错误率分别降低了34.65%和计算速度的降低率分别为58.36次。
translated by 谷歌翻译
在飞机系统绩效评估的背景下,深度学习技术可以快速从实验测量中推断模型,其详细的系统知识比基于物理的建模通常所需的详细知识。但是,这种廉价的模型开发也带来了有关模型可信度的新挑战。这项工作提出了一种新颖的方法,即物理学引导的对抗机学习(ML),从而提高了对模型物理一致性的信心。首先,该方法执行了物理引导的对抗测试阶段,以搜索测试输入,以显示行为系统不一致,同时仍落在可预见的操作条件范围内。然后,它进行了物理知识的对抗训练,以通过迭代降低先前未经证实的反描述的不需要的输出偏差来教授与系统相关的物理领域的模型。对两个飞机系统绩效模型的经验评估显示了我们对抗性ML方法在暴露两种模型的身体不一致方面的有效性,并提高其与物理领域知识一致的倾向。
translated by 谷歌翻译
量化是在嵌入式系统或手机上部署训练有素的DNN模型时,是最应用的深神经网络(DNN)压缩策略之一。这是由于其对广泛的应用和情况的简单性和适应性,而不是特定的人工智能(AI)加速器和编译器,这些加速器和编译器通常仅用于某些特定的硬件(例如Google Coral Edge TPU)。随着对量化的需求不断增长,确保该策略的可靠性成为一个关键挑战。传统的测试方法收集越来越多的真实数据以进行更好的评估,通常是不切实际的,因为输入空间的尺寸很大,并且原始DNN及其量化的对应物之间的相似性很高。结果,高级评估策略已变得至关重要。在本文中,我们提出了Diverget,这是一个基于搜索的测试框架,用于量化评估。 Diverget定义了变质关系的空间,该空间模拟了输入上的自然扭曲。然后,它最佳地探索了这些关系,以揭示不同算术精度的DNN之间的分歧。我们评估了应用于高光谱遥感图像的最先进的DNN上的Diverget的性能。我们选择了遥感DNN,因为它们越来越多地部署在诸如气候变化研究和天文学之类的关键领域中的边缘(例如,高级无人机)。我们的结果表明,Diverget成功地挑战了已建立的量化技术的鲁棒性,以防止自然变化的数据,并胜过其最新的并发,Diffchaser,其成功率(平均)是四倍。
translated by 谷歌翻译
在各个领域采用深度学习(DL)的行业和学术界都有日益增长的需求,以解决现实世界的问题。深度加强学习(DRL)是DL在加固学习领域(RL)的应用。与任何软件系统一样,由于其程序中的故障,DRL应用程序可能会失败。在本文中,我们介绍了第一次尝试在DRL程序中分类故障。我们手动分析了使用众所周知的DRL框架(Openai健身房,多巴胺,Keras-RL,TensoRForce)和开发人员/用户报告的错误开发的DRL程序的761个文物(来自Stack Overflow帖子和GitHub问题)。我们通过几轮讨论标记和分类为已识别的故障。使用与19名开发人员/研究人员的在线调查验证了产生的分类法。为了允许在DRL程序中自动检测故障,我们已经确定了DRL程序的元模型,并开发了DRLINTER,一种利用静态分析和图形转换的基于模型的故障检测方法。 DRLINTINT的执行流程在于解析DRL程序,以生成符合我们元模型的模型,并在模型上应用检测规则以识别故障出现。使用15种合成DRLPRAGIONS评估DRLINTER的有效性,其中我们在分析的分析伪影中观察到的故障。结果表明,Drlinter可以在所有合成错误程序中成功检测故障。
translated by 谷歌翻译
Extracting complex structures from grid-based data is a common key step in automated medical image analysis. The conventional solution to recovering tree-structured geometries typically involves computing the minimal cost path through intermediate representations derived from segmentation masks. However, this methodology has significant limitations in the context of projective imaging of tree-structured 3D anatomical data such as coronary arteries, since there are often overlapping branches in the 2D projection. In this work, we propose a novel approach to predicting tree connectivity structure which reformulates the task as an optimization problem over individual steps of a recursive process. We design and train a two-stage model which leverages the UNet and Transformer architectures and introduces an image-based prompting technique. Our proposed method achieves compelling results on a pair of synthetic datasets, and outperforms a shortest-path baseline.
translated by 谷歌翻译
Curriculum learning and self-paced learning are the training strategies that gradually feed the samples from easy to more complex. They have captivated increasing attention due to their excellent performance in robotic vision. Most recent works focus on designing curricula based on difficulty levels in input samples or smoothing the feature maps. However, smoothing labels to control the learning utility in a curriculum manner is still unexplored. In this work, we design a paced curriculum by label smoothing (P-CBLS) using paced learning with uniform label smoothing (ULS) for classification tasks and fuse uniform and spatially varying label smoothing (SVLS) for semantic segmentation tasks in a curriculum manner. In ULS and SVLS, a bigger smoothing factor value enforces a heavy smoothing penalty in the true label and limits learning less information. Therefore, we design the curriculum by label smoothing (CBLS). We set a bigger smoothing value at the beginning of training and gradually decreased it to zero to control the model learning utility from lower to higher. We also designed a confidence-aware pacing function and combined it with our CBLS to investigate the benefits of various curricula. The proposed techniques are validated on four robotic surgery datasets of multi-class, multi-label classification, captioning, and segmentation tasks. We also investigate the robustness of our method by corrupting validation data into different severity levels. Our extensive analysis shows that the proposed method improves prediction accuracy and robustness.
translated by 谷歌翻译
Temporal reasoning is the task of predicting temporal relations of event pairs with corresponding contexts. While some temporal reasoning models perform reasonably well on in-domain benchmarks, we have little idea of the systems' generalizability due to existing datasets' limitations. In this work, we introduce a novel task named TODAY that bridges this gap with temporal differential analysis, which as the name suggests, evaluates if systems can correctly understand the effect of incremental changes. Specifically, TODAY makes slight context changes for given event pairs, and systems need to tell how this subtle contextual change will affect temporal relation distributions. To facilitate learning, TODAY also annotates human explanations. We show that existing models, including GPT-3, drop to random guessing on TODAY, suggesting that they heavily rely on spurious information rather than proper reasoning for temporal predictions. On the other hand, we show that TODAY's supervision style and explanation annotations can be used in joint learning and encourage models to use more appropriate signals during training and outperform across several benchmarks. TODAY can also be used to train models to solicit incidental supervision from noisy sources such as GPT-3 and moves farther towards generic temporal reasoning systems.
translated by 谷歌翻译
State-of-the-art 3D semantic segmentation models are trained on the off-the-shelf public benchmarks, but they often face the major challenge when these well-trained models are deployed to a new domain. In this paper, we propose an Active-and-Adaptive Segmentation (ADAS) baseline to enhance the weak cross-domain generalization ability of a well-trained 3D segmentation model, and bridge the point distribution gap between domains. Specifically, before the cross-domain adaptation stage begins, ADAS performs an active sampling operation to select a maximally-informative subset from both source and target domains for effective adaptation, reducing the adaptation difficulty under 3D scenarios. Benefiting from the rise of multi-modal 2D-3D datasets, ADAS utilizes a cross-modal attention-based feature fusion module that can extract a representative pair of image features and point features to achieve a bi-directional image-point feature interaction for better safe adaptation. Experimentally, ADAS is verified to be effective in many cross-domain settings including: 1) Unsupervised Domain Adaptation (UDA), which means that all samples from target domain are unlabeled; 2) Unsupervised Few-shot Domain Adaptation (UFDA) which means that only a few unlabeled samples are available in the unlabeled target domain; 3) Active Domain Adaptation (ADA) which means that the selected target samples by ADAS are manually annotated. Their results demonstrate that ADAS achieves a significant accuracy gain by easily coupling ADAS with self-training methods or off-the-shelf UDA works.
translated by 谷歌翻译
As language models (LMs) scale, they develop many novel behaviors, good and bad, exacerbating the need to evaluate how they behave. Prior work creates evaluations with crowdwork (which is time-consuming and expensive) or existing data sources (which are not always available). Here, we automatically generate evaluations with LMs. We explore approaches with varying amounts of human effort, from instructing LMs to write yes/no questions to making complex Winogender schemas with multiple stages of LM-based generation and filtering. Crowdworkers rate the examples as highly relevant and agree with 90-100% of labels, sometimes more so than corresponding human-written datasets. We generate 154 datasets and discover new cases of inverse scaling where LMs get worse with size. Larger LMs repeat back a dialog user's preferred answer ("sycophancy") and express greater desire to pursue concerning goals like resource acquisition and goal preservation. We also find some of the first examples of inverse scaling in RL from Human Feedback (RLHF), where more RLHF makes LMs worse. For example, RLHF makes LMs express stronger political views (on gun rights and immigration) and a greater desire to avoid shut down. Overall, LM-written evaluations are high-quality and let us quickly discover many novel LM behaviors.
translated by 谷歌翻译
In this paper, we discuss an imitation learning based method for reducing the calibration error for a mixed reality system consisting of a vision sensor and a projector. Unlike a head mounted display, in this setup, augmented information is available to a human subject via the projection of a scene into the real world. Inherently, the camera and projector need to be calibrated as a stereo setup to project accurate information in 3D space. Previous calibration processes require multiple recording and parameter tuning steps to achieve the desired calibration, which is usually time consuming process. In order to avoid such tedious calibration, we train a CNN model to iteratively correct the extrinsic offset given a QR code and a projected pattern. We discuss the overall system setup, data collection for training, and results of the auto-correction model.
translated by 谷歌翻译