Pavement Distress Recognition (PDR) is an important step in pavement inspection and can be powered by image-based automation to expedite the process and reduce labor costs. Pavement images are often in high-resolution with a low ratio of distressed to non-distressed areas. Advanced approaches leverage these properties via dividing images into patches and explore discriminative features in the scale space. However, these approaches usually suffer from information loss during image resizing and low efficiency due to complex learning frameworks. In this paper, we propose a novel and efficient method for PDR. A light network named the Kernel Inversed Pyramidal Resizing Network (KIPRN) is introduced for image resizing, and can be flexibly plugged into the image classification network as a pre-network to exploit resolution and scale information. In KIPRN, pyramidal convolution and kernel inversed convolution are specifically designed to mine discriminative information across different feature granularities and scales. The mined information is passed along to the resized images to yield an informative image pyramid to assist the image classification network for PDR. We applied our method to three well-known Convolutional Neural Networks (CNNs), and conducted an evaluation on a large-scale pavement image dataset named CQU-BPDD. Extensive results demonstrate that KIPRN can generally improve the pavement distress recognition of these CNN models and show that the simple combination of KIPRN and EfficientNet-B3 significantly outperforms the state-of-the-art patch-based method in both performance and efficiency.
translated by 谷歌翻译
We consider the problem of automatically generating stories in multiple languages. Compared to prior work in monolingual story generation, crosslingual story generation allows for more universal research on story planning. We propose to use Prompting Large Language Models with Plans to study which plan is optimal for story generation. We consider 4 types of plans and systematically analyse how the outputs differ for different planning strategies. The study demonstrates that formulating the plans as question-answer pairs leads to more coherent generated stories while the plan gives more control to the story creators.
translated by 谷歌翻译
We propose a method that leverages graph neural networks, multi-level message passing, and unsupervised training to enable real-time prediction of realistic clothing dynamics. Whereas existing methods based on linear blend skinning must be trained for specific garments, our method is agnostic to body shape and applies to tight-fitting garments as well as loose, free-flowing clothing. Our method furthermore handles changes in topology (e.g., garments with buttons or zippers) and material properties at inference time. As one key contribution, we propose a hierarchical message-passing scheme that efficiently propagates stiff stretching modes while preserving local detail. We empirically show that our method outperforms strong baselines quantitatively and that its results are perceived as more realistic than state-of-the-art methods.
translated by 谷歌翻译
Many visualization techniques have been created to help explain the behavior of convolutional neural networks (CNNs), but they largely consist of static diagrams that convey limited information. Interactive visualizations can provide more rich insights and allow users to more easily explore a model's behavior; however, they are typically not easily reusable and are specific to a particular model. We introduce Visual Feature Search, a novel interactive visualization that is generalizable to any CNN and can easily be incorporated into a researcher's workflow. Our tool allows a user to highlight an image region and search for images from a given dataset with the most similar CNN features. It supports searching through large image datasets with an efficient cache-based search implementation. We demonstrate how our tool elucidates different aspects of model behavior by performing experiments on supervised, self-supervised, and human-edited CNNs. We also release a portable Python library and several IPython notebooks to enable researchers to easily use our tool in their own experiments. Our code can be found at https://github.com/lookingglasslab/VisualFeatureSearch.
translated by 谷歌翻译
最近,致力于通过现代机器学习方法预测脑部疾病的最新神经影像学研究通常包括单一模态并依靠监督的过度参数化模型。但是,单一模态仅提供了高度复杂的大脑的有限视图。至关重要的是,临床环境中的有监督模型缺乏用于培训的准确诊断标签。粗标签不会捕获脑疾病表型的长尾谱,这导致模型的普遍性丧失,从而使它们在诊断环境中的有用程度降低。这项工作提出了一个新型的多尺度协调框架,用于从多模式神经影像数据中学习多个表示。我们提出了一般的归纳偏见分类法,以捕获多模式自学融合中的独特和联合信息。分类法构成了一个无解码器模型的家族,具有降低的计算复杂性,并捕获多模式输入的本地和全局表示之间的多尺度关系。我们使用各种阿尔茨海默氏病表型中使用功能和结构磁共振成像(MRI)数据对分类法进行了全面评估,并表明自我监督模型揭示了与疾病相关的大脑区域和多模态链接,而无需在预先访问PRE-PRE-the PRE-the PRE-the PRE-the PRE-PRECTEN NICKES NOCKER NOCKER NOCKER NOCKER NOCKER NOCE访问。训练。拟议的多模式自学学习的学习能够表现出两种模式的分类表现。伴随的丰富而灵活的无监督的深度学习框架捕获了复杂的多模式关系,并提供了符合或超过更狭窄的监督分类分析的预测性能。我们提供了详尽的定量证据,表明该框架如何显着提高我们对复杂脑部疾病中缺失的联系的搜索。
translated by 谷歌翻译
对比度学习依赖于假设正对包含相关视图,例如,视频的图像或视频的共同发生的多峰信号,其共享关于实例的某些基础信息。但如果违反了这个假设怎么办?该文献表明,对比学学习在存在嘈杂的视图中产生次优表示,例如,没有明显共享信息的假正对。在这项工作中,我们提出了一种新的对比损失函数,这是对嘈杂的观点的强大。我们通过显示嘈杂二进制分类的强大对称损失的连接提供严格的理论理由,并通过基于Wassersein距离测量来建立新的对比界限进行新的对比。拟议的损失是完全的方式无话无双,并且对Innoconce损失的更换简单的替代品,这使得适用于现有的对比框架。我们表明,我们的方法提供了在展示各种现实世界噪声模式的图像,视频和图形对比学习基准上的一致性改进。
translated by 谷歌翻译
我们仔细比较了两种无模型控制算法,演进策略和近端政策优化(PPO),具有后退地平线模型预测控制(MPC),用于操作模拟,价格响应式热水器。考虑了四个MPC变体:单次控制器,具有完美预测产生最佳控制;一个有限的地平控制器,具有完美预测;基于平均的预测控制器;使用历史情景,一个两阶段随机编程控制器。在所有情况下,水温和电价的MPC模型精确;只有水需求不确定。为了比较,ES和PPO通过在MPC使用的相同场景下直接与模拟环境直接交互来学习基于神经网络的策略。然后在需求时间序列的单独一周继续的单独一周内进行评估所有方法。我们证明了对这个问题的最佳控制是具有挑战性的,需要超过8小时的MPC寻找,具有完美预测来获得最低成本。尽管存在这一挑战,但ES和PPO都学会了在平均成本方面优于平均预测和两级随机MPC控制器的良好通用政策,并且在计算动作时速度越来越多的数量级。我们表明ES尤其可以利用并行性,使用1150 CPU核心在90秒内学习策略。
translated by 谷歌翻译
要使用深神经网络预测罕见的极端事件,一个人遇到所谓的小数据问题,因为即使是长期观测通常常见的事件常见。在这里,我们研究了一种模型辅助框架,其中训练数据是从数值模拟获得的,而不是观察,具有来自极端事件的适当样本。但是,为了确保培训的网络在实践中适用,无法在完整的仿真数据上执行培训;相反,我们只使用可以在实践中测量的可观察量的小子集。我们调查这一模型辅助框架在三种不同动力系统(Rossler Larguger Or,Fitzhugh - Nagumo Model和湍流流体流量)和三种不同的深神经网络架构(前馈,长短期内存和储层计算)上的可行性)。在每种情况下,我们研究了预测准确性,稳健性对噪声,重复训练的再现性,以及对输入数据类型的敏感性。特别是,我们发现长期的短期内存网络是最强大的噪声,并产生相对准确的预测,同时需要最小的高考的微调。
translated by 谷歌翻译
接受差异隐私(DP)训练的生成模型可用于生成合成数据,同时最大程度地降低隐私风险。我们分析了DP对数据的影响不足的数据/子组的影响,特别是研究:1)合成数据中类/子组的大小和2)分类任务的准确性在其上运行。我们还评估了各种不平衡和隐私预算的影响。我们的分析使用了三种最先进的DP模型(Privbayes,DP-WGAN和PATE-GAN),并表明DP在生成的合成数据中产生相反的大小分布。它影响了多数族裔和少数族裔/亚组之间的差距;在某些情况下,通过减少它(一种“罗宾汉”效应),而在其他情况下则通过增加它(一种“马修”效应)。无论哪种方式,这都会导致(类似)对合成数据的分类任务准确性的(类似)不同的影响,从而更加不成比例地影响了代表性不足的数据。因此,当培训模型对合成数据时,可能会导致不均匀地处理不同亚群的风险,从而得出不可靠或不公平的结论。
translated by 谷歌翻译
直觉上,人们所期望的训练的神经网络对测试样本进行相关预测与如何密集的该样本是由表示太空中看到的训练样本包围的准确性。在这项工作中,我们提供了理论依据和支持这一假设的实验。我们提出了一种误差函数为分段线性,需要一个局部区域中的网络的输入空间,并输出平滑经验训练误差,这是一个从平均通过网络表示距离加权其他区域经验训练误差的神经网络。甲绑定在预期平滑误差为每个区域尺度成反比地表示空间训练样本密度。根据经验,我们验证这个边界是网络的预测上测试样品不准确的一个强有力的预测。对于看不见的测试设备,包括那些外的分布样本,通过结合当地区域的错误排名测试样品和最高界限丢弃样品提高了20%的绝对数字来看,对图像分类数据集的预测精度。
translated by 谷歌翻译