电缆在许多环境中无处不在,但容易出现自我闭合和结,使它们难以感知和操纵。挑战通常会随着电缆长度而增加:长电缆需要更复杂的松弛管理和策略,以促进可观察性和可及性。在本文中,我们专注于使用双边机器人自动弄清长达3米的电缆。我们开发了新的运动原语,以有效地解开长电缆和专门用于此任务的新型Gripper Jaws。我们提出了缠结操作(SGTM)的滑动和抓握,该算法将这些原始物与RGBD视觉构成迭代性毫无障碍。SGTM在隔离的外手上取消了67%的成功率,图8节和更复杂的配置上的50%。可以在https://sites.google.com/view/rss-2022-untangling/home上找到补充材料,可视化和视频。
translated by 谷歌翻译
最近的工作表明,2臂“ Fling”运动对于服装平滑可能是有效的。我们考虑单臂弹性运动。与几乎不需要机器人轨迹参数调整的2臂fling运动不同,单臂fling运动对轨迹参数很敏感。我们考虑一个单一的6多机器人臂,该机器人臂学习跨越轨迹以实现高衣覆盖率。给定服装抓握点,机器人在物理实验中探索了不同的参数化fling轨迹。为了提高学习效率,我们提出了一种粗到精细的学习方法,该方法首先使用多军匪徒(MAB)框架有效地找到候选动作,然后通过连续优化方法来完善。此外,我们提出了基于Fling Fall结果不确定性的新颖培训和执行时间停止标准。与基线相比,我们表明所提出的方法显着加速学习。此外,由于通过自学人员收集的类似服装的先前经验,新服装的MAB学习时间最多减少了87%。我们评估了6种服装类型:毛巾,T恤,长袖衬衫,礼服,汗衫和牛仔裤。结果表明,使用先前的经验,机器人需要30分钟以下的时间才能为达到60-94%覆盖率的新型服装学习一项动作。
translated by 谷歌翻译
使用单个参数化动态动作操纵可变形物体对蝇钓,宽毯和播放洗牌板等任务非常有用。此类任务作为输入所需的最终状态并输出一个参数化的开环动态机器人动作,它向最终状态产生轨迹。这对于具有涉及摩擦力的复杂动态的长地平轨迹尤其具有挑战性。本文探讨了平面机器人铸造的任务(PRC):其中握住电缆一端的机器人手腕的一个平面运动使另一端朝向所需的目标滑过平面。 PRC允许电缆达到机器人工作区以外的点,并在家庭,仓库和工厂中具有电缆管理的应用。为了有效地学习给定电缆的PRC策略,我们提出了Real2Sim2Real,一个自动收集物理轨迹示例的自我监督框架,以使用差分演进调谐动态模拟器的参数,生成许多模拟示例,然后使用加权学习策略模拟和物理数据的组合。我们使用三种模拟器,ISAAC健身房分段,ISAAC健身房 - 混合动力和Pybullet,两个功能近似器,高斯工艺和神经网络(NNS),以及具有不同刚度,扭转和摩擦的三个电缆。结果每条电缆的16个举出的测试目标表明,使用ISAAC健身房分段的NN PRC策略达到中位误差距离(电缆长度的百分比),范围为8%至14%,表现优于真实或仅培训的基线和政策。只有模拟的例子。 https://tinyurl.com/robotcast可以使用代码,数据和视频。
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
We present a dynamic path planning algorithm to navigate an amphibious rotor craft through a concave time-invariant obstacle field while attempting to minimize energy usage. We create a nonlinear quaternion state model that represents the rotor craft dynamics above and below the water. The 6 degree of freedom dynamics used within a layered architecture to generate motion paths for the vehicle to follow and the required control inputs. The rotor craft has a 3 dimensional map of its surroundings that is updated via limited range onboard sensor readings within the current medium (air or water). Path planning is done via PRM and D* Lite.
translated by 谷歌翻译
While the capabilities of autonomous systems have been steadily improving in recent years, these systems still struggle to rapidly explore previously unknown environments without the aid of GPS-assisted navigation. The DARPA Subterranean (SubT) Challenge aimed to fast track the development of autonomous exploration systems by evaluating their performance in real-world underground search-and-rescue scenarios. Subterranean environments present a plethora of challenges for robotic systems, such as limited communications, complex topology, visually-degraded sensing, and harsh terrain. The presented solution enables long-term autonomy with minimal human supervision by combining a powerful and independent single-agent autonomy stack, with higher level mission management operating over a flexible mesh network. The autonomy suite deployed on quadruped and wheeled robots was fully independent, freeing the human supervision to loosely supervise the mission and make high-impact strategic decisions. We also discuss lessons learned from fielding our system at the SubT Final Event, relating to vehicle versatility, system adaptability, and re-configurable communications.
translated by 谷歌翻译
We present Muse, a text-to-image Transformer model that achieves state-of-the-art image generation performance while being significantly more efficient than diffusion or autoregressive models. Muse is trained on a masked modeling task in discrete token space: given the text embedding extracted from a pre-trained large language model (LLM), Muse is trained to predict randomly masked image tokens. Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding. The use of a pre-trained LLM enables fine-grained language understanding, translating to high-fidelity image generation and the understanding of visual concepts such as objects, their spatial relationships, pose, cardinality etc. Our 900M parameter model achieves a new SOTA on CC3M, with an FID score of 6.06. The Muse 3B parameter model achieves an FID of 7.88 on zero-shot COCO evaluation, along with a CLIP score of 0.32. Muse also directly enables a number of image editing applications without the need to fine-tune or invert the model: inpainting, outpainting, and mask-free editing. More results are available at https://muse-model.github.io
translated by 谷歌翻译
The visual dimension of cities has been a fundamental subject in urban studies, since the pioneering work of scholars such as Sitte, Lynch, Arnheim, and Jacobs. Several decades later, big data and artificial intelligence (AI) are revolutionizing how people move, sense, and interact with cities. This paper reviews the literature on the appearance and function of cities to illustrate how visual information has been used to understand them. A conceptual framework, Urban Visual Intelligence, is introduced to systematically elaborate on how new image data sources and AI techniques are reshaping the way researchers perceive and measure cities, enabling the study of the physical environment and its interactions with socioeconomic environments at various scales. The paper argues that these new approaches enable researchers to revisit the classic urban theories and themes, and potentially help cities create environments that are more in line with human behaviors and aspirations in the digital age.
translated by 谷歌翻译
Logic Mill is a scalable and openly accessible software system that identifies semantically similar documents within either one domain-specific corpus or multi-domain corpora. It uses advanced Natural Language Processing (NLP) techniques to generate numerical representations of documents. Currently it leverages a large pre-trained language model to generate these document representations. The system focuses on scientific publications and patent documents and contains more than 200 million documents. It is easily accessible via a simple Application Programming Interface (API) or via a web interface. Moreover, it is continuously being updated and can be extended to text corpora from other domains. We see this system as a general-purpose tool for future research applications in the social sciences and other domains.
translated by 谷歌翻译
The release of ChatGPT, a language model capable of generating text that appears human-like and authentic, has gained significant attention beyond the research community. We expect that the convincing performance of ChatGPT incentivizes users to apply it to a variety of downstream tasks, including prompting the model to simplify their own medical reports. To investigate this phenomenon, we conducted an exploratory case study. In a questionnaire, we asked 15 radiologists to assess the quality of radiology reports simplified by ChatGPT. Most radiologists agreed that the simplified reports were factually correct, complete, and not potentially harmful to the patient. Nevertheless, instances of incorrect statements, missed key medical findings, and potentially harmful passages were reported. While further studies are needed, the initial insights of this study indicate a great potential in using large language models like ChatGPT to improve patient-centered care in radiology and other medical domains.
translated by 谷歌翻译