由于遮挡引起的严重观察,基于手动对象相互作用的单个基于手动对象相互作用的重建具有挑战性。本文提出了一种基于物理的方法,以更好地解决重建中的歧义。它首先提出了一个基于力的动力学模型,该模型不仅恢复了未观察到的触点,而且还解决了合理的接触力。接下来,提出了一种基于置信的幻灯片预防方案,该方案将运动学上的信心和接触力都结合在一起,共同模拟静态和滑动接触运动。定性和定量实验表明,该提出的技术在物理上可行,更准确的手动相互作用,并使用单个RGBD传感器实时估计可见的接触力。
translated by 谷歌翻译
同时传输和反射可重构的智能表面(星际摩托车)是一种有前途的被动装置,通过同时传输和反映入射信号,从而有助于全空间覆盖。作为无线通信的新范式,如何分析星际轮胎的覆盖范围和能力性能变得至关重要,但具有挑战性。为了解决星际辅助网络中的覆盖范围和容量优化(CCO)问题,提出了多目标近端策略优化(MO-PPO)算法来处理长期利益,而不是传统优化算法。为了在每个目标之间取得平衡,MO-PPO算法提供了一组最佳解决方案,以形成Pareto前部(PF),其中PF上的任何解决方案都被视为最佳结果。此外,研究了为了提高MO-PPO算法的性能,两种更新策略,即基于动作值的更新策略(AVU)和基于损失功能的更新策略(LFUS)。对于AVU,改进的点是整合覆盖范围和容量的动作值,然后更新损失函数。对于LFU,改进的点仅是为覆盖范围和容量损失函数分配动态权重,而权重在每个更新时由最小值求解器计算出来。数值结果表明,调查的更新策略在不同情况下的固定权重优化算法优于MO优化算法,其中包括不同数量的样品网格,星轮的数量,星轮中的元素数量和大小星际船。此外,星际辅助网络比没有星际轮胎的传统无线网络获得更好的性能。此外,具有相同的带宽,毫米波能够提供比低6 GHz更高的容量,但覆盖率较小。
translated by 谷歌翻译
提出了一种新型可重构智能表面辅助的多机器人网络,其中多个移动机器人通过非正交多重访问(NOMA)提供了多个移动机器人(AP)。目的是通过共同优化机器人的轨迹和NOMA解码顺序,RIS的相移系数以及AP的功率分配,从而最大化多机器人系统的整个轨迹的总和率机器人的位置和每个机器人的服务质量(QoS)。为了解决这个问题,提出了一个集成的机器学习(ML)方案,该方案结合了长期记忆(LSTM) - 自动进取的集成移动平均线(ARIMA)模型和Duel Duel Double Deep Q-network(D $^{3} $ QN)算法。对于机器人的初始和最终位置预测,LSTM-ARIMA能够克服非平稳和非线性数据序列的梯度销售问题。为了共同确定相移矩阵和机器人的轨迹,调用D $^{3} $ qn用于解决动作值高估的问题。基于提议的方案,每个机器人都基于整个轨迹的最大总和率持有全局最佳轨迹,该轨迹揭示了机器人为整个轨迹设计追求长期福利。数值结果表明:1)LSTM-ARIMA模型提供了高精度预测模型; 2)提出的d $^{3} $ qn算法可以实现快速平均收敛; 3)具有较高分辨率位的RI提供的轨迹比率比低分辨率比特更大; 4)与RIS AID的正交对应物相比,RIS-NOMA网络的网络性能卓越。
translated by 谷歌翻译
Understanding objects is a central building block of artificial intelligence, especially for embodied AI. Even though object recognition excels with deep learning, current machines still struggle to learn higher-level knowledge, e.g., what attributes an object has, and what can we do with an object. In this work, we propose a challenging Object Concept Learning (OCL) task to push the envelope of object understanding. It requires machines to reason out object affordances and simultaneously give the reason: what attributes make an object possesses these affordances. To support OCL, we build a densely annotated knowledge base including extensive labels for three levels of object concept (category, attribute, affordance), and the causal relations of three levels. By analyzing the causal structure of OCL, we present a baseline, Object Concept Reasoning Network (OCRN). It leverages causal intervention and concept instantiation to infer the three levels following their causal relations. In experiments, OCRN effectively infers the object knowledge while following the causalities well. Our data and code are available at https://mvig-rhos.com/ocl.
translated by 谷歌翻译
Multi-modal named entity recognition (NER) and relation extraction (RE) aim to leverage relevant image information to improve the performance of NER and RE. Most existing efforts largely focused on directly extracting potentially useful information from images (such as pixel-level features, identified objects, and associated captions). However, such extraction processes may not be knowledge aware, resulting in information that may not be highly relevant. In this paper, we propose a novel Multi-modal Retrieval based framework (MoRe). MoRe contains a text retrieval module and an image-based retrieval module, which retrieve related knowledge of the input text and image in the knowledge corpus respectively. Next, the retrieval results are sent to the textual and visual models respectively for predictions. Finally, a Mixture of Experts (MoE) module combines the predictions from the two models to make the final decision. Our experiments show that both our textual model and visual model can achieve state-of-the-art performance on four multi-modal NER datasets and one multi-modal RE dataset. With MoE, the model performance can be further improved and our analysis demonstrates the benefits of integrating both textual and visual cues for such tasks.
translated by 谷歌翻译
Spatiotemporal traffic data imputation is of great significance in intelligent transportation systems and data-driven decision-making processes. To make an accurate reconstruction on partially observed traffic data, we assert the importance of characterizing both global and local trends in traffic time series. In the literature, substantial prior works have demonstrated the effectiveness of utilizing low-rankness property of traffic data by matrix/tensor completion models. In this study, we first introduce a Laplacian kernel to temporal regularization for characterizing local trends in traffic time series, which can be formulated in the form of circular convolution. Then, we develop a low-rank Laplacian convolutional representation (LCR) model by putting the nuclear norm of a circulant matrix and the Laplacian temporal regularization together, which is proved to meet a unified framework that takes a fast Fourier transform solution in a relatively low time complexity. Through extensive experiments on some traffic datasets, we demonstrate the superiority of LCR for imputing traffic time series of various time series behaviors (e.g., data noises and strong/weak periodicity). The proposed LCR model is an efficient and effective solution to large-scale traffic data imputation over the existing baseline models. The adapted datasets and Python implementation are publicly available at https://github.com/xinychen/transdim.
translated by 谷歌翻译
Swarm learning (SL) is an emerging promising decentralized machine learning paradigm and has achieved high performance in clinical applications. SL solves the problem of a central structure in federated learning by combining edge computing and blockchain-based peer-to-peer network. While there are promising results in the assumption of the independent and identically distributed (IID) data across participants, SL suffers from performance degradation as the degree of the non-IID data increases. To address this problem, we propose a generative augmentation framework in swarm learning called SL-GAN, which augments the non-IID data by generating the synthetic data from participants. SL-GAN trains generators and discriminators locally, and periodically aggregation via a randomly elected coordinator in SL network. Under the standard assumptions, we theoretically prove the convergence of SL-GAN using stochastic approximations. Experimental results demonstrate that SL-GAN outperforms state-of-art methods on three real world clinical datasets including Tuberculosis, Leukemia, COVID-19.
translated by 谷歌翻译
Recently, Vehicle-to-Everything(V2X) cooperative perception has attracted increasing attention. Infrastructure sensors play a critical role in this research field, however, how to find the optimal placement of infrastructure sensors is rarely studied. In this paper, we investigate the problem of infrastructure sensor placement and propose a pipeline that can efficiently and effectively find optimal installation positions for infrastructure sensors in a realistic simulated environment. To better simulate and evaluate LiDAR placement, we establish a Realistic LiDAR Simulation library that can simulate the unique characteristics of different popular LiDARs and produce high-fidelity LiDAR point clouds in the CARLA simulator. Through simulating point cloud data in different LiDAR placements, we can evaluate the perception accuracy of these placements using multiple detection models. Then, we analyze the correlation between the point cloud distribution and perception accuracy by calculating the density and uniformity of regions of interest. Experiments show that the placement of infrastructure LiDAR can heavily affect the accuracy of perception. We also analyze the correlation between perception performance in the region of interest and LiDAR point cloud distribution and validate that density and uniformity can be indicators of performance.
translated by 谷歌翻译
The problem of broad practical interest in spatiotemporal data analysis, i.e., discovering interpretable dynamic patterns from spatiotemporal data, is studied in this paper. Towards this end, we develop a time-varying reduced-rank vector autoregression (VAR) model whose coefficient matrices are parameterized by low-rank tensor factorization. Benefiting from the tensor factorization structure, the proposed model can simultaneously achieve model compression and pattern discovery. In particular, the proposed model allows one to characterize nonstationarity and time-varying system behaviors underlying spatiotemporal data. To evaluate the proposed model, extensive experiments are conducted on various spatiotemporal data representing different nonlinear dynamical systems, including fluid dynamics, sea surface temperature, USA surface temperature, and NYC taxi trips. Experimental results demonstrate the effectiveness of modeling spatiotemporal data and characterizing spatial/temporal patterns with the proposed model. In the spatial context, the spatial patterns can be automatically extracted and intuitively characterized by the spatial modes. In the temporal context, the complex time-varying system behaviors can be revealed by the temporal modes in the proposed model. Thus, our model lays an insightful foundation for understanding complex spatiotemporal data in real-world dynamical systems. The adapted datasets and Python implementation are publicly available at https://github.com/xinychen/vars.
translated by 谷歌翻译
Neural-symbolic computing aims at integrating robust neural learning and sound symbolic reasoning into a single framework, so as to leverage the complementary strengths of both of these, seemingly unrelated (maybe even contradictory) AI paradigms. The central challenge in neural-symbolic computing is to unify the formulation of neural learning and symbolic reasoning into a single framework with common semantics, that is, to seek a joint representation between a neural model and a logical theory that can support the basic grounding learned by the neural model and also stick to the semantics of the logical theory. In this paper, we propose differentiable fuzzy $\mathcal{ALC}$ (DF-$\mathcal{ALC}$) for this role, as a neural-symbolic representation language with the desired semantics. DF-$\mathcal{ALC}$ unifies the description logic $\mathcal{ALC}$ and neural models for symbol grounding; in particular, it infuses an $\mathcal{ALC}$ knowledge base into neural models through differentiable concept and role embeddings. We define a hierarchical loss to the constraint that the grounding learned by neural models must be semantically consistent with $\mathcal{ALC}$ knowledge bases. And we find that capturing the semantics in grounding solely by maximizing satisfiability cannot revise grounding rationally. We further define a rule-based loss for DF adapting to symbol grounding problems. The experiment results show that DF-$\mathcal{ALC}$ with rule-based loss can improve the performance of image object detectors in an unsupervised learning way, even in low-resource situations.
translated by 谷歌翻译