地震波的频域模拟在地震反演中起着重要作用,但在大型模型中仍然具有挑战性。作为有效的深度学习方法,最近提出的物理知识的神经网络(PINN)在解决广泛的偏微分方程(PDES)方面取得了成功的应用,并且在这方面仍然有改进的余地。例如,当PDE系数不平滑并描述结构复合介质时,PINN可能导致溶液不准确。在本文中,我们通过使用PINN而不是波方程来求解频域中的声学和Visco声学散射的场波方程,以消除源奇异性。我们首先说明,当在损失函数中未实现边界条件时,非平滑速度模型导致波场不准确。然后,我们在PINN的损耗函数中添加了完美匹配的层(PML)条件,并设计了二次神经网络,以克服PINN中非平滑模型的有害影响。我们表明,PML和二次神经元改善了结果和衰减,并讨论了这种改进的原因。我们还说明,在波场模拟中训练的网络可用于预先训练PDE-Coeff及时改变后另一个波场模拟的神经网络,并相应地提高收敛速度。当两次连续迭代或两个连续的实验之间的模型扰动时,这种预训练策略应在迭代全波形反转(FWI)和时置目标成像中找到应用。
translated by 谷歌翻译
复杂的流量分析,例如加密的流量分析和未知的恶意软件检测,强调需要进行高级方法来分析网络流量。使用固定模式,签名匹配和检测网络流量中已知模式的规则的传统方法已被AI(人工智能)驱动算法取代。但是,缺乏高性能AI网络特定的框架使得不可能在网络工作负载中部署基于AI的实时处理。在本文中,我们描述了流量分析开发工具包(TADK)的设计,这是一个针对基于AI的网络工作负载处理的行业标准框架。 TADK可以在数据中心到边缘的网络设备中基于实时的AI网络工作负载处理,而无需专门硬件(例如GPU,神经处理单元等)。我们已经在商品WAF和5G UPF中部署了TADK,评估结果表明,Tadk可以在流量功能提取时达到每个核心最多35.3Gbps的吞吐量,每核6.5Gbps在流量分类中,并且可以减少SQLI/XSS检测到下降至4.5us每个请求的精度比固定模式解决方案更高。
translated by 谷歌翻译
Efficient use of the space in an elevator is very necessary for a service robot, due to the need for reducing the amount of time caused by waiting for the next elevator. To provide a solution for this, we propose a hybrid approach that combines reinforcement learning (RL) with voice interaction for robot navigation in the scene of entering the elevator. RL provides robots with a high exploration ability to find a new clear path to enter the elevator compared to traditional navigation methods such as Optimal Reciprocal Collision Avoidance (ORCA). The proposed method allows the robot to take an active clear path action towards the elevator whilst a crowd of people stands at the entrance of the elevator wherein there are still lots of space. This is done by embedding a clear path action (voice prompt) into the RL framework, and the proposed navigation policy helps the robot to finish tasks efficiently and safely. Our model approach provides a great improvement in the success rate and reward of entering the elevator compared to state-of-the-art navigation policies without active clear path operation.
translated by 谷歌翻译
注意机制在点云分析中发挥了越来越重要的作用,并且渠道注意是热点之一。通过这么多的频道信息,神经网络难以筛选有用的信道信息。因此,提出了一种自适应信道编码机制以在本文中捕获信道关系。它通过明确地编码其特征信道之间的相互依赖来提高网络生成的表示的质量。具体地,提出了一种通道 - 明智的卷积(通道-Chim)以自适应地学习坐标和特征之间的关系,以便编码信道。与流行的重量方案不同,本文提出的通道CONN实现了卷积操作的适应性,而不是简单地为频道分配不同的权重。对现有基准的广泛实验验证了我们的方法实现了艺术的状态。
translated by 谷歌翻译
变压器在各种计算机视觉地区发挥着越来越重要的作用,并且在点云分析中也取得了显着的成就。由于它们主要专注于点亮变压器,因此本文提出了一种自适应通道编码变压器。具体地,被设计为对频道的通道卷积旨在对信道进行编码。它可以通过捕获坐标和特征之间的潜在关系来编码特征通道。与简单地为每个通道分配注意重量相比,我们的方法旨在自适应地对信道进行编码。此外,我们的网络采用了邻域搜索方法的低级和高级双语义接收领域,以提高性能。广泛的实验表明,我们的方法优于三个基准数据集的最先进的点云分类和分段方法。
translated by 谷歌翻译
受到深入学习的巨大成功通过云计算和边缘芯片的快速发展的影响,人工智能研究(AI)的研究已经转移到计算范例,即云计算和边缘计算。近年来,我们目睹了在云服务器上开发更高级的AI模型,以超越传统的深度学习模型,以造成模型创新(例如,变压器,净化家庭),训练数据爆炸和飙升的计算能力。但是,边缘计算,尤其是边缘和云协同计算,仍然在其初期阶段,因为由于资源受限的IOT场景,因此由于部署了非常有限的算法而导致其成功。在本调查中,我们对云和边缘AI进行系统审查。具体而言,我们是第一个设置云和边缘建模的协作学习机制,通过彻底的审查使能够实现这种机制的架构。我们还讨论了一些正在进行的先进EDGE AI主题的潜在和实践经验,包括预先训练模型,图形神经网络和加强学习。最后,我们讨论了这一领域的有希望的方向和挑战。
translated by 谷歌翻译
最近,深度神经网络在3D点云分类方面取得了显着成就。然而,现有的分类方法主要在理想化点云上实施,并在非理想情况下遭受重大降解的每种性能。为了处理该Prob-LEM,提出了一个名为双邻居深度融合网络(DNDFN)的特征表示学习方法,以用作非理想点云分类任务的改进点云编码器。 DNDFN利用称为TN学习的培训邻域学习方法来捕获全局关键邻域。然后,全球邻居与当地邻居融合,以帮助网络实现更强大的推理能力。此外,提出了一个信息传输卷积(IT-CONV)为DNDFN学习点对对之间的边缘信息,并使特征传输过程受益。 IT-CONV中的信息传输类似于图中的信息的传播,其使DNDF​​N更靠近人工理工模式。关于现有基准的广泛实验尤其是非理想的数据集验证了DNDFN和DNDFN实现了最先进的效果。
translated by 谷歌翻译
Generalizability to unseen forgery types is crucial for face forgery detectors. Recent works have made significant progress in terms of generalization by synthetic forgery data augmentation. In this work, we explore another path for improving the generalization. Our goal is to reduce the features that are easy to learn in the training phase, so as to reduce the risk of overfitting on specific forgery types. Specifically, in our method, a teacher network takes as input the face images and generates an attention map of the deep features by a diverse multihead attention ViT. The attention map is used to guide a student network to focus on the low-attended features by reducing the highly-attended deep features. A deep feature mixup strategy is also proposed to synthesize forgeries in the feature domain. Experiments demonstrate that, without data augmentation, our method is able to achieve promising performances on unseen forgeries and highly compressed data.
translated by 谷歌翻译
In this work, we investigate improving the generalizability of GAN-generated image detectors by performing data augmentation in the fingerprint domain. Specifically, we first separate the fingerprints and contents of the GAN-generated images using an autoencoder based GAN fingerprint extractor, followed by random perturbations of the fingerprints. Then the original fingerprints are substituted with the perturbed fingerprints and added to the original contents, to produce images that are visually invariant but with distinct fingerprints. The perturbed images can successfully imitate images generated by different GANs to improve the generalization of the detectors, which is demonstrated by the spectra visualization. To our knowledge, we are the first to conduct data augmentation in the fingerprint domain. Our work explores a novel prospect that is distinct from previous works on spatial and frequency domain augmentation. Extensive cross-GAN experiments demonstrate the effectiveness of our method compared to the state-of-the-art methods in detecting fake images generated by unknown GANs.
translated by 谷歌翻译
We present X-Decoder, a generalized decoding model that can predict pixel-level segmentation and language tokens seamlessly. X-Decodert takes as input two types of queries: (i) generic non-semantic queries and (ii) semantic queries induced from text inputs, to decode different pixel-level and token-level outputs in the same semantic space. With such a novel design, X-Decoder is the first work that provides a unified way to support all types of image segmentation and a variety of vision-language (VL) tasks. Further, our design enables seamless interactions across tasks at different granularities and brings mutual benefits by learning a common and rich pixel-level visual-semantic understanding space, without any pseudo-labeling. After pretraining on a mixed set of a limited amount of segmentation data and millions of image-text pairs, X-Decoder exhibits strong transferability to a wide range of downstream tasks in both zero-shot and finetuning settings. Notably, it achieves (1) state-of-the-art results on open-vocabulary segmentation and referring segmentation on eight datasets; (2) better or competitive finetuned performance to other generalist and specialist models on segmentation and VL tasks; and (3) flexibility for efficient finetuning and novel task composition (e.g., referring captioning and image editing). Code, demo, video, and visualization are available at https://x-decoder-vl.github.io.
translated by 谷歌翻译