能够创建一个可以与人类就他们所观看的东西进行有意义的对话的系统,这将是一项技术壮举。针对该目标的设置作为视频对话任务表示,要求系统在正在进行的对话框中对问题产生自然话语。该任务带来了伟大的视觉,语言和推理挑战,如果没有适当的表示方案,可以轻松克服支持高级推理的视频和对话。为了应对这些挑战,我们提出了一个新的以对象为中心的视频对话框架,该框架支持神经推理称为成本 - 代表时空中有关对象的对话。在这里,视频中的动态时空视觉内容首先解析为对象轨迹。鉴于此视频抽象,成本维护并跟踪与对象相关的对话框状态,这些对话框在收到新问题后会更新。对象相互作用是动态和条件地推断出每个问题的,并且它们是它们之间关系推理的基础。成本还保留了以前答案的历史记录,这允许检索相关的以对象为中心的信息以丰富答案形成过程。然后,语言生产以逐步进行,进入当前话语,现有对话和当前问题的背景。我们评估了DSTC7和DSTC8基准的成本,证明了其对最先进的竞争力。
translated by 谷歌翻译
Unmanned aerial vehicle (UAV) swarms are considered as a promising technique for next-generation communication networks due to their flexibility, mobility, low cost, and the ability to collaboratively and autonomously provide services. Distributed learning (DL) enables UAV swarms to intelligently provide communication services, multi-directional remote surveillance, and target tracking. In this survey, we first introduce several popular DL algorithms such as federated learning (FL), multi-agent Reinforcement Learning (MARL), distributed inference, and split learning, and present a comprehensive overview of their applications for UAV swarms, such as trajectory design, power control, wireless resource allocation, user assignment, perception, and satellite communications. Then, we present several state-of-the-art applications of UAV swarms in wireless communication systems, such us reconfigurable intelligent surface (RIS), virtual reality (VR), semantic communications, and discuss the problems and challenges that DL-enabled UAV swarms can solve in these applications. Finally, we describe open problems of using DL in UAV swarms and future research directions of DL enabled UAV swarms. In summary, this survey provides a comprehensive survey of various DL applications for UAV swarms in extensive scenarios.
translated by 谷歌翻译
Mixup is a popular data augmentation technique for training deep neural networks where additional samples are generated by linearly interpolating pairs of inputs and their labels. This technique is known to improve the generalization performance in many learning paradigms and applications. In this work, we first analyze Mixup and show that it implicitly regularizes infinitely many directional derivatives of all orders. We then propose a new method to improve Mixup based on the novel insight. To demonstrate the effectiveness of the proposed method, we conduct experiments across various domains such as images, tabular data, speech, and graphs. Our results show that the proposed method improves Mixup across various datasets using a variety of architectures, for instance, exhibiting an improvement over Mixup by 0.8% in ImageNet top-1 accuracy.
translated by 谷歌翻译
This study proposes an approach for establishing an optimal multihop ad-hoc network using multiple unmanned aerial vehicles (UAVs) to provide emergency communication in disaster areas. The approach includes two stages, one uses particle swarm optimization (PSO) to find optimal positions to deploy UAVs, and the other uses a behavior-based controller to navigate the UAVs to their assigned positions without colliding with obstacles in an unknown environment. Several constraints related to the UAVs' sensing and communication ranges have been imposed to ensure the applicability of the proposed approach in real-world scenarios. A number of simulation experiments with data loaded from real environments have been conducted. The results show that our proposed approach is not only successful in establishing multihop ad-hoc routes but also meets the requirements for real-time deployment of UAVs.
translated by 谷歌翻译
This paper is devoted to the numerical resolution of McKean-Vlasov control problems via the class of mean-field neural networks introduced in our companion paper [25] in order to learn the solution on the Wasserstein space. We propose several algorithms either based on dynamic programming with control learning by policy or value iteration, or backward SDE from stochastic maximum principle with global or local loss functions. Extensive numerical results on different examples are presented to illustrate the accuracy of each of our eight algorithms. We discuss and compare the pros and cons of all the tested methods.
translated by 谷歌翻译
Unsupervised object discovery aims to localize objects in images, while removing the dependence on annotations required by most deep learning-based methods. To address this problem, we propose a fully unsupervised, bottom-up approach, for multiple objects discovery. The proposed approach is a two-stage framework. First, instances of object parts are segmented by using the intra-image similarity between self-supervised local features. The second step merges and filters the object parts to form complete object instances. The latter is performed by two CNN models that capture semantic information on objects from the entire dataset. We demonstrate that the pseudo-labels generated by our method provide a better precision-recall trade-off than existing single and multiple objects discovery methods. In particular, we provide state-of-the-art results for both unsupervised class-agnostic object detection and unsupervised image segmentation.
translated by 谷歌翻译
Semantic communication (SemCom) and edge computing are two disruptive solutions to address emerging requirements of huge data communication, bandwidth efficiency and low latency data processing in Metaverse. However, edge computing resources are often provided by computing service providers and thus it is essential to design appealingly incentive mechanisms for the provision of limited resources. Deep learning (DL)- based auction has recently proposed as an incentive mechanism that maximizes the revenue while holding important economic properties, i.e., individual rationality and incentive compatibility. Therefore, in this work, we introduce the design of the DLbased auction for the computing resource allocation in SemComenabled Metaverse. First, we briefly introduce the fundamentals and challenges of Metaverse. Second, we present the preliminaries of SemCom and edge computing. Third, we review various incentive mechanisms for edge computing resource trading. Fourth, we present the design of the DL-based auction for edge resource allocation in SemCom-enabled Metaverse. Simulation results demonstrate that the DL-based auction improves the revenue while nearly satisfying the individual rationality and incentive compatibility constraints.
translated by 谷歌翻译
Air pollution is an emerging problem that needs to be solved especially in developed and developing countries. In Vietnam, air pollution is also a concerning issue in big cities such as Hanoi and Ho Chi Minh cities where air pollution comes mostly from vehicles such as cars and motorbikes. In order to tackle the problem, the paper focuses on developing a solution that can estimate the emitted PM2.5 pollutants by counting the number of vehicles in the traffic. We first investigated among the recent object detection models and developed our own traffic surveillance system. The observed traffic density showed a similar trend to the measured PM2.5 with a certain lagging in time, suggesting a relation between traffic density and PM2.5. We further express this relationship with a mathematical model which can estimate the PM2.5 value based on the observed traffic density. The estimated result showed a great correlation with the measured PM2.5 plots in the urban area context.
translated by 谷歌翻译
The introduction of high-quality image generation models, particularly the StyleGAN family, provides a powerful tool to synthesize and manipulate images. However, existing models are built upon high-quality (HQ) data as desired outputs, making them unfit for in-the-wild low-quality (LQ) images, which are common inputs for manipulation. In this work, we bridge this gap by proposing a novel GAN structure that allows for generating images with controllable quality. The network can synthesize various image degradation and restore the sharp image via a quality control code. Our proposed QC-StyleGAN can directly edit LQ images without altering their quality by applying GAN inversion and manipulation techniques. It also provides for free an image restoration solution that can handle various degradations, including noise, blur, compression artifacts, and their mixtures. Finally, we demonstrate numerous other applications such as image degradation synthesis, transfer, and interpolation.
translated by 谷歌翻译
A large number of empirical studies on applying self-attention models in the domain of recommender systems are based on offline evaluation and metrics computed on standardized datasets, without insights on how these models perform in real life scenarios. Moreover, many of them do not consider information such as item and customer metadata, although deep-learning recommenders live up to their full potential only when numerous features of heterogeneous types are included. Also, typically recommendation models are designed to serve well only a single use case, which increases modeling complexity and maintenance costs, and may lead to inconsistent customer experience. In this work, we present a reusable Attention-based Fashion Recommendation Algorithm (AFRA), that utilizes various interaction types with different fashion entities such as items (e.g., shirt), outfits and influencers, and their heterogeneous features. Moreover, we leverage temporal and contextual information to address both short and long-term customer preferences. We show its effectiveness on outfit recommendation use cases, in particular: 1) personalized ranked feed; 2) outfit recommendations by style; 3) similar item recommendation and 4) in-session recommendations inspired by most recent customer actions. We present both offline and online experimental results demonstrating substantial improvements in customer retention and engagement.
translated by 谷歌翻译