This article presents a survey of literature in the area of Human-Robot Interaction (HRI), specifically on systems containing more than two agents (i.e., having multiple humans and/or multiple robots). We identify three core aspects of ``Multi-agent" HRI systems that are useful for understanding how these systems differ from dyadic systems and from one another. These are the Team structure, Interaction style among agents, and the system's Computational characteristics. Under these core aspects, we present five attributes of HRI systems, namely Team size, Team composition, Interaction model, Communication modalities, and Robot control. These attributes are used to characterize and distinguish one system from another. We populate resulting categories with examples from recent literature along with a brief discussion of their applications and analyze how these attributes differ from the case of dyadic human-robot systems. We summarize key observations from the current literature, and identify challenges and promising areas for future research in this domain. In order to realize the vision of robots being part of the society and interacting seamlessly with humans, there is a need to expand research on multi-human -- multi-robot systems. Not only do these systems require coordination among several agents, they also involve multi-agent and indirect interactions which are absent from dyadic HRI systems. Adding multiple agents in HRI systems requires advanced interaction schemes, behavior understanding and control methods to allow natural interactions among humans and robots. In addition, research on human behavioral understanding in mixed human-robot teams also requires more attention. This will help formulate and implement effective robot control policies in HRI systems with large numbers of heterogeneous robots and humans; a team composition reflecting many real-world scenarios.
translated by 谷歌翻译
在本文中,我们考虑了在具有多个自动机器人的系统中分配人类操作员协助的问题。每个机器人都需要完成独立任务,每个任务定义为一系列任务。在执行任务时,机器人可以自主操作,也可以由人类操作员远程执行,以更快地完成任务。我们表明,创建详细时间表的问题使系统的制造量最小化是NP-HARD。我们将问题提出为混合整数线性程序,可用于最佳地解决小到中等大小的问题实例。我们还开发了一种随时随地的算法,该算法利用问题结构来提供对操作员调度问题的快速和高质量解决方案,即使对于更大的问题实例也是如此。我们的关键见解是在贪婪创建的时间表中识别阻止任务,并迭代地删除这些块以提高解决方案的质量。通过数值模拟,我们证明了所提出的算法的好处是一种高于其他贪婪方法的有效且可扩展的方法。
translated by 谷歌翻译
在本文中,我们考虑在具有多个半自治机器人的系统中分配人类运营商的问题。每个机器人都需要执行独立的任务序列,经历了一次失败并在每个任务时陷入故障状态的可能性。如果需要,人类运营商可以帮助或漫游机器人。传统的MDP技术用于解决这些问题的面临可扩展性问题,因为具有机器人和运营商的数量的状态和行动空间的指数增长。在本文中,我们推出了操作员分配问题可转向的条件,从而实现了削弱指数启发式的使用。可以容易地检查条件以验证可索引性,我们表明他们持有广泛的兴趣问题。我们的主要洞察力是利用各个机器人的价值函数的结构,从而导致可以针对每个机器人的每个状态分开验证的条件。我们将这些条件应用于远程机器人监控系统中常见的两种转换。通过数值模拟,我们展示了削减指数政策作为近乎最佳和可扩展方法的功效,以实现现有的可扩展方法。
translated by 谷歌翻译
The ability for an agent to continuously learn new skills without catastrophically forgetting existing knowledge is of critical importance for the development of generally intelligent agents. Most methods devised to address this problem depend heavily on well-defined task boundaries, and thus depend on human supervision. Our task-agnostic method, Self-Activating Neural Ensembles (SANE), uses a modular architecture designed to avoid catastrophic forgetting without making any such assumptions. At the beginning of each trajectory, a module in the SANE ensemble is activated to determine the agent's next policy. During training, new modules are created as needed and only activated modules are updated to ensure that unused modules remain unchanged. This system enables our method to retain and leverage old skills, while growing and learning new ones. We demonstrate our approach on visually rich procedurally generated environments.
translated by 谷歌翻译
Implicit Neural Representations (INR) have recently shown to be powerful tool for high-quality video compression. However, existing works are limiting as they do not explicitly exploit the temporal redundancy in videos, leading to a long encoding time. Additionally, these methods have fixed architectures which do not scale to longer videos or higher resolutions. To address these issues, we propose NIRVANA, which treats videos as groups of frames and fits separate networks to each group performing patch-wise prediction. This design shares computation within each group, in the spatial and temporal dimensions, resulting in reduced encoding time of the video. The video representation is modeled autoregressively, with networks fit on a current group initialized using weights from the previous group's model. To further enhance efficiency, we perform quantization of the network parameters during training, requiring no post-hoc pruning or quantization. When compared with previous works on the benchmark UVG dataset, NIRVANA improves encoding quality from 37.36 to 37.70 (in terms of PSNR) and the encoding speed by 12X, while maintaining the same compression rate. In contrast to prior video INR works which struggle with larger resolution and longer videos, we show that our algorithm is highly flexible and scales naturally due to its patch-wise and autoregressive designs. Moreover, our method achieves variable bitrate compression by adapting to videos with varying inter-frame motion. NIRVANA achieves 6X decoding speed and scales well with more GPUs, making it practical for various deployment scenarios.
translated by 谷歌翻译
We propose AnyTOD, an end-to-end task-oriented dialog (TOD) system with zero-shot capability for unseen tasks. We view TOD as a program executed by a language model (LM), where program logic and ontology is provided by a designer in the form of a schema. To enable generalization onto unseen schemas and programs without prior training, AnyTOD adopts a neuro-symbolic approach. A neural LM keeps track of events that occur during a conversation, and a symbolic program implementing the dialog policy is executed to recommend next actions AnyTOD should take. This approach drastically reduces data annotation and model training requirements, addressing a long-standing challenge in TOD research: rapidly adapting a TOD system to unseen tasks and domains. We demonstrate state-of-the-art results on the STAR and ABCD benchmarks, as well as AnyTOD's strong zero-shot transfer capability in low-resource settings. In addition, we release STARv2, an updated version of the STAR dataset with richer data annotations, for benchmarking zero-shot end-to-end TOD models.
translated by 谷歌翻译
Most research on task oriented dialog modeling is based on written text input. However, users interact with practical dialog systems often using speech as input. Typically, systems convert speech into text using an Automatic Speech Recognition (ASR) system, introducing errors. Furthermore, these systems do not address the differences in written and spoken language. The research on this topic is stymied by the lack of a public corpus. Motivated by these considerations, our goal in hosting the speech-aware dialog state tracking challenge was to create a public corpus or task which can be used to investigate the performance gap between the written and spoken forms of input, develop models that could alleviate this gap, and establish whether Text-to-Speech-based (TTS) systems is a reasonable surrogate to the more-labor intensive human data collection. We created three spoken versions of the popular written-domain MultiWoz task -- (a) TTS-Verbatim: written user inputs were converted into speech waveforms using a TTS system, (b) Human-Verbatim: humans spoke the user inputs verbatim, and (c) Human-paraphrased: humans paraphrased the user inputs. Additionally, we provided different forms of ASR output to encourage wider participation from teams that may not have access to state-of-the-art ASR systems. These included ASR transcripts, word time stamps, and latent representations of the audio (audio encoder outputs). In this paper, we describe the corpus, report results from participating teams, provide preliminary analyses of their results, and summarize the current state-of-the-art in this domain.
translated by 谷歌翻译
With the growing global deployment of carbon capture and sequestration technology to combat climate change, monitoring and detection of potential CO2 leakage through existing or storage induced faults are critical to the safe and long-term viability of the technology. Recent work on time-lapse seismic monitoring of CO2 storage has shown promising results in its ability to monitor the growth of the CO2 plume from surface recorded seismic data. However, due to the low sensitivity of seismic imaging to CO2 concentration, additional developments are required to efficiently interpret the seismic images for leakage. In this work, we introduce a binary classification of time-lapse seismic images to delineate CO2 plumes (leakage) using state-of-the-art deep learning models. Additionally, we localize the leakage region of CO2 plumes by leveraging Class Activation Mapping methods.
translated by 谷歌翻译
Indian e-commerce industry has evolved over the last decade and is expected to grow over the next few years. The focus has now shifted to turnaround time (TAT) due to the emergence of many third-party logistics providers and higher customer expectations. The key consideration for delivery providers is to balance their overall operating costs while meeting the promised TAT to their customers. E-commerce delivery partners operate through a network of facilities whose strategic locations help to run the operations efficiently. In this work, we identify the locations of hubs throughout the country and their corresponding mapping with the distribution centers. The objective is to minimize the total network costs with TAT adherence. We use Genetic Algorithm and leverage business constraints to reduce the solution search space and hence the solution time. The results indicate an improvement of 9.73% in TAT compliance compared with the current scenario.
translated by 谷歌翻译
Warning: this paper contains content that may be offensive or upsetting. In the current context where online platforms have been effectively weaponized in a variety of geo-political events and social issues, Internet memes make fair content moderation at scale even more difficult. Existing work on meme classification and tracking has focused on black-box methods that do not explicitly consider the semantics of the memes or the context of their creation. In this paper, we pursue a modular and explainable architecture for Internet meme understanding. We design and implement multimodal classification methods that perform example- and prototype-based reasoning over training cases, while leveraging both textual and visual SOTA models to represent the individual cases. We study the relevance of our modular and explainable models in detecting harmful memes on two existing tasks: Hate Speech Detection and Misogyny Classification. We compare the performance between example- and prototype-based methods, and between text, vision, and multimodal models, across different categories of harmfulness (e.g., stereotype and objectification). We devise a user-friendly interface that facilitates the comparative analysis of examples retrieved by all of our models for any given meme, informing the community about the strengths and limitations of these explainable methods.
translated by 谷歌翻译