自从Navier Stokes方程的推导以来,已经有可能在数值上解决现实世界的粘性流问题(计算流体动力学(CFD))。然而,尽管中央处理单元(CPU)的性能取得了迅速的进步,但模拟瞬态流量的计算成本非常小,时间/网格量表物理学仍然是不现实的。近年来,机器学习(ML)技术在整个行业中都受到了极大的关注,这一大浪潮已经传播了流体动力学界的各种兴趣。最近的ML CFD研究表明,随着数据驱动方法的训练时间和预测时间之间的间隔增加,完全抑制了误差的增加是不现实的。应用ML的实用CFD加速方法的开发是剩余的问题。因此,这项研究的目标是根据物理信息传递学习制定现实的ML策略,并使用不稳定的CFD数据集验证了该策略的准确性和加速性能。该策略可以在监视跨耦合计算框架中管理方程的残差时确定转移学习的时间。因此,我们的假设是可行的,即连续流体流动时间序列的预测是可行的,因为中间CFD模拟定期不仅减少了增加残差,还可以更新网络参数。值得注意的是,具有基于网格的网络模型的交叉耦合策略不会损害计算加速度的仿真精度。在层流逆流CFD数据集条件下,该模拟加速了1.8次,包括参数更新时间。此可行性研究使用了开源CFD软件OpenFOAM和开源ML软件TensorFlow。
translated by 谷歌翻译
In robotics and computer vision communities, extensive studies have been widely conducted regarding surveillance tasks, including human detection, tracking, and motion recognition with a camera. Additionally, deep learning algorithms are widely utilized in the aforementioned tasks as in other computer vision tasks. Existing public datasets are insufficient to develop learning-based methods that handle various surveillance for outdoor and extreme situations such as harsh weather and low illuminance conditions. Therefore, we introduce a new large-scale outdoor surveillance dataset named eXtremely large-scale Multi-modAl Sensor dataset (X-MAS) containing more than 500,000 image pairs and the first-person view data annotated by well-trained annotators. Moreover, a single pair contains multi-modal data (e.g. an IR image, an RGB image, a thermal image, a depth image, and a LiDAR scan). This is the first large-scale first-person view outdoor multi-modal dataset focusing on surveillance tasks to the best of our knowledge. We present an overview of the proposed dataset with statistics and present methods of exploiting our dataset with deep learning-based algorithms. The latest information on the dataset and our study are available at https://github.com/lge-robot-navi, and the dataset will be available for download through a server.
translated by 谷歌翻译
Springs are efficient in storing and returning elastic potential energy but are unable to hold the energy they store in the absence of an external load. Lockable springs use clutches to hold elastic potential energy in the absence of an external load but have not yet been widely adopted in applications, partly because clutches introduce design complexity, reduce energy efficiency, and typically do not afford high-fidelity control over the energy stored by the spring. Here, we present the design of a novel lockable compression spring that uses a small capstan clutch to passively lock a mechanical spring. The capstan clutch can lock up to 1000 N force at any arbitrary deflection, unlock the spring in less than 10 ms with a control force less than 1 % of the maximal spring force, and provide an 80 % energy storage and return efficiency (comparable to a highly efficient electric motor operated at constant nominal speed). By retaining the form factor of a regular spring while providing high-fidelity locking capability even under large spring forces, the proposed design could facilitate the development of energy-efficient spring-based actuators and robots.
translated by 谷歌翻译
This paper proposes a new regularization algorithm referred to as macro-block dropout. The overfitting issue has been a difficult problem in training large neural network models. The dropout technique has proven to be simple yet very effective for regularization by preventing complex co-adaptations during training. In our work, we define a macro-block that contains a large number of units from the input to a Recurrent Neural Network (RNN). Rather than applying dropout to each unit, we apply random dropout to each macro-block. This algorithm has the effect of applying different drop out rates for each layer even if we keep a constant average dropout rate, which has better regularization effects. In our experiments using Recurrent Neural Network-Transducer (RNN-T), this algorithm shows relatively 4.30 % and 6.13 % Word Error Rates (WERs) improvement over the conventional dropout on LibriSpeech test-clean and test-other. With an Attention-based Encoder-Decoder (AED) model, this algorithm shows relatively 4.36 % and 5.85 % WERs improvement over the conventional dropout on the same test sets.
translated by 谷歌翻译
The cone-beam computed tomography (CBCT) provides 3D volumetric imaging of a target with low radiation dose and cost compared with conventional computed tomography, and it is widely used in the detection of paranasal sinus disease. However, it lacks the sensitivity to detect soft tissue lesions owing to reconstruction constraints. Consequently, only physicians with expertise in CBCT reading can distinguish between inherent artifacts or noise and diseases, restricting the use of this imaging modality. The development of artificial intelligence (AI)-based computer-aided diagnosis methods for CBCT to overcome the shortage of experienced physicians has attracted substantial attention. However, advanced AI-based diagnosis addressing intrinsic noise in CBCT has not been devised, discouraging the practical use of AI solutions for CBCT. To address this issue, we propose an AI-based computer-aided diagnosis method using CBCT with a denoising module. This module is implemented before diagnosis to reconstruct the internal ground-truth full-dose scan corresponding to an input CBCT image and thereby improve the diagnostic performance. The external validation results for the unified diagnosis of sinus fungal ball, chronic rhinosinusitis, and normal cases show that the proposed method improves the micro-, macro-average AUC, and accuracy by 7.4, 5.6, and 9.6% (from 86.2, 87.0, and 73.4 to 93.6, 92.6, and 83.0%), respectively, compared with a baseline while improving human diagnosis accuracy by 11% (from 71.7 to 83.0%), demonstrating technical differentiation and clinical effectiveness. This pioneering study on AI-based diagnosis using CBCT indicates denoising can improve diagnostic performance and reader interpretability in images from the sinonasal area, thereby providing a new approach and direction to radiographic image reconstruction regarding the development of AI-based diagnostic solutions.
translated by 谷歌翻译
The recent advent of play-to-earn (P2E) systems in massively multiplayer online role-playing games (MMORPGs) has made in-game goods interchangeable with real-world values more than ever before. The goods in the P2E MMORPGs can be directly exchanged with cryptocurrencies such as Bitcoin, Ethereum, or Klaytn via blockchain networks. Unlike traditional in-game goods, once they had been written to the blockchains, P2E goods cannot be restored by the game operation teams even with chargeback fraud such as payment fraud, cancellation, or refund. To tackle the problem, we propose a novel chargeback fraud prediction method, PU GNN, which leverages graph attention networks with PU loss to capture both the players' in-game behavior with P2E token transaction patterns. With the adoption of modified GraphSMOTE, the proposed model handles the imbalanced distribution of labels in chargeback fraud datasets. The conducted experiments on two real-world P2E MMORPG datasets demonstrate that PU GNN achieves superior performances over previously suggested methods.
translated by 谷歌翻译
In this study, we propose a lung nodule detection scheme which fully incorporates the clinic workflow of radiologists. Particularly, we exploit Bi-Directional Maximum intensity projection (MIP) images of various thicknesses (i.e., 3, 5 and 10mm) along with a 3D patch of CT scan, consisting of 10 adjacent slices to feed into self-distillation-based Multi-Encoders Network (MEDS-Net). The proposed architecture first condenses 3D patch input to three channels by using a dense block which consists of dense units which effectively examine the nodule presence from 2D axial slices. This condensed information, along with the forward and backward MIP images, is fed to three different encoders to learn the most meaningful representation, which is forwarded into the decoded block at various levels. At the decoder block, we employ a self-distillation mechanism by connecting the distillation block, which contains five lung nodule detectors. It helps to expedite the convergence and improves the learning ability of the proposed architecture. Finally, the proposed scheme reduces the false positives by complementing the main detector with auxiliary detectors. The proposed scheme has been rigorously evaluated on 888 scans of LUNA16 dataset and obtained a CPM score of 93.6\%. The results demonstrate that incorporating of bi-direction MIP images enables MEDS-Net to effectively distinguish nodules from surroundings which help to achieve the sensitivity of 91.5% and 92.8% with false positives rate of 0.25 and 0.5 per scan, respectively.
translated by 谷歌翻译
在这项工作中,我们提出了一个具有结构性图形的新型不确定性感知对象检测框架,其中节点和边缘分别用对象及其空间语义相似性表示。具体而言,我们旨在考虑对象之间的关系,以有效地将它们背景化。为了实现这一目标,我们首先检测对象,然后测量其语义和空间距离以构建对象图,然后由图形神经网络(GNN)表示,用于完善对象的视觉CNN特征。但是,精炼CNN功能和每个对象的检测结果效率低下,可能不需要,因为其中包括不确定性低的正确预测。因此,我们建议通过将表示形式从某些对象(源)转移到有向图上的不确定对象(目标)来处理不确定的对象,而且还仅在对象上改善CNN功能,因为对象被认为是不确定的,其代表性输出来自GNN。此外,我们通过在不确定的物体上给予更大的权重来计算训练损失,以专注于改善不确定的对象预测,同时保持某些对象的高性能。我们将模型称为对象检测(UAGDET)的不确定性感知图网络。然后,我们在实验中验证了我们的大规模空中图像数据集,即DOTA,该数据集由大量对象组成,这些对象在图像中具有很小至大的对象,在该图像上,我们的对象可以改善现有对象检测网络的性能。
translated by 谷歌翻译
最近的深度学习模型在言语增强方面已经达到了高性能。但是,获得快速和低复杂模型而没有明显的性能降解仍然是一项挑战。以前的知识蒸馏研究对言语增强无法解决这个问题,因为它们的输出蒸馏方法在某些方面不符合语音增强任务。在这项研究中,我们提出了基于特征的蒸馏多视图注意转移(MV-AT),以在时域中获得有效的语音增强模型。基于多视图功能提取模型,MV-AT将教师网络的多视图知识传输到学生网络,而无需其他参数。实验结果表明,所提出的方法始终提高瓦伦蒂尼和深噪声抑制(DNS)数据集的各种规模的学生模型的性能。与基线模型相比,使用我们提出的方法(一种用于有效部署的轻巧模型)分别使用了15.4倍和4.71倍(FLOPS),与具有相似性能的基线模型相比,Many-S-8.1GF分别达到了15.4倍和4.71倍。
translated by 谷歌翻译
随着计算机视觉和NLP的进步,视觉语言(VL)正在成为研究的重要领域。尽管很重要,但研究领域的评估指标仍处于开发的初步阶段。在本文中,我们提出了定量度量的“符号分数”和评估数据集“人类难题”,以评估VL模型是否理解像人类这样的图像。我们观察到,VL模型没有解释输入图像的整体上下文,而是对形成本地上下文的特定对象或形状显示出偏差。我们旨在定量测量模型在理解环境中的表现。为了验证当前现有VL模型的功能,我们将原始输入图像切成零件并随机放置,从而扭曲了图像的全局上下文。我们的论文讨论了每个VL模型在全球环境上的解释水平,并解决了结构特征如何影响结果。
translated by 谷歌翻译