智能论文笔记

Virtual Elastic Objects

Hsiao-yu Chen , Edgar Tretschk , Tuur Stuyck , Petr Kadlecek , Ladislav Kavan , Etienne Vouga , Christoph Lassner

分类：计算机视觉

2022-01-12

我们呈现虚拟弹性物体（VEOS）：虚拟物体，不仅看起来像他们的真实同行，而且也表现得像他们一样，即使在进行新颖的互动时也是如此。实现这一挑战：不仅必须捕获对象，包括对它们上的物理力量，然后忠实地重建和呈现，而且还发现和模拟了合理的材料参数。要创建VEOS，我们构建了一个多视图捕获系统，捕获压缩空气流的影响下的物体。建立近期型号动态神经辐射区域的进步，我们重建了物体和相应的变形字段。我们建议使用可差异的基于粒子的模拟器来使用这些变形字段来查找代表性的材料参数，这使我们能够运行新的模拟。为了渲染模拟对象，我们设计了一种用神经辐射场将模拟结果集成的方法。结果方法适用于各种场景：它可以处理由非均匀材料组成的物体，具有非常不同的形状，它可以模拟与其他虚拟对象的交互。我们在各种力字段下使用12个对象的新收集的数据集介绍了我们的结果，这将与社区共享。

translated by 谷歌翻译

Online TSP with Predictions

Hsiao-Yu Hu , Hao-Ting Wei , Meng-Hsi Li , Kai-Min Chung , Chung-Shou Liao

分类：机器学习

2022-06-30

我们启动对在线路由问题进行预测的研究，这是受到学习效果算法领域的最新成果的启发。一个学习的在线算法，如果预测是准确的，同时否则可以维持理论保证，即使预测非常错误，则以黑盒方式纳入了预测，以胜过现有的算法。在这项研究中，我们特别开始研究经典的在线旅行推销员问题（OLTSP），其中未来的请求得到了预测。与以前其他研究中的预测模型不同，OLTSP中的每个实际请求与其到达时间和位置相关，可能与预测的每个实际请求不一致，这些预测会导致麻烦的情况。我们的主要结果是研究不同的预测模型和设计算法，以改善不同环境中最著名的结果。此外，我们将提出的结果概括为在线拨号问题。

translated by 谷歌翻译

Physion: Evaluating Physical Prediction from Vision in Humans and Machines

Daniel M. Bear , Elias Wang , Damian Mrowca , Felix J. Binder , Hsiao-Yu Fish Tung , R. T. Pramod , Cameron Holdaway , Sirui Tao , Kevin Smith , Fan-Yun Sun

分类：人工智能 | 计算机视觉

2021-06-15

尽管当前的视觉算法在许多具有挑战性的任务上都表现出色，但尚不清楚他们如何理解现实世界环境的物理动态。在这里，我们介绍了Physion，一种数据集和基准，用于严格评估预测物理场景如何随着时间而发展的能力。我们的数据集具有对各种物理现象的现实模拟，包括刚性和软体体碰撞，稳定的多对象配置，滚动，滑动和弹丸运动，因此比以前的基准提供了更全面的挑战。我们使用Physion来基准一套模型，其体系结构，学习目标，投入输出结构和培训数据各不相同。同时，我们在同一场景上获得了人类预测行为的精确测量，从而使我们能够直接评估任何模型能够近似人类行为的效果。我们发现，学习以对象为中心的表示的视觉算法通常优于那些没有人的表现，但仍未达到人类绩效。另一方面，绘制具有直接访问物理状态信息的神经网络的表现效果更好，并且做出与人类制作的预测更相似。这些结果表明，提取场景的物理表征是在视力算法中实现人类水平和类似人类的物理理解的主要瓶颈。我们已公开发布了所有数据和代码，以促进使用物理以完全可重现的方式对其他模型进行基准测试，从而使对视觉算法的进度进行系统的评估，这些算法像人们一样坚固地了解物理环境。

translated by 谷歌翻译

Generative appearance replay for continual unsupervised domain adaptation

Boqi Chen , Kevin Thandiackal , Pushpak Pati , Orcun Goksel

分类：计算机视觉 | 人工智能

2023-01-03

Deep learning models can achieve high accuracy when trained on large amounts of labeled data. However, real-world scenarios often involve several challenges: Training data may become available in installments, may originate from multiple different domains, and may not contain labels for training. Certain settings, for instance medical applications, often involve further restrictions that prohibit retention of previously seen data due to privacy regulations. In this work, to address such challenges, we study unsupervised segmentation in continual learning scenarios that involve domain shift. To that end, we introduce GarDA (Generative Appearance Replay for continual Domain Adaptation), a generative-replay based approach that can adapt a segmentation model sequentially to new domains with unlabeled data. In contrast to single-step unsupervised domain adaptation (UDA), continual adaptation to a sequence of domains enables leveraging and consolidation of information from multiple domains. Unlike previous approaches in incremental UDA, our method does not require access to previously seen data, making it applicable in many practical scenarios. We evaluate GarDA on two datasets with different organs and modalities, where it substantially outperforms existing techniques.

translated by 谷歌翻译

MGTAB: A Multi-Relational Graph-Based Twitter Account Detection Benchmark

Shuhao Shi , Kai Qiao , Jian Chen , Shuai Yang , Jie Yang , Baojie Song , Linyuan Wang , Bin Yan

分类：计算机视觉

2023-01-03

The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.

translated by 谷歌翻译

Explaining Imitation Learning through Frames

Boyuan Zheng , Jianlong Zhou , Chunjie Liu , Yiqiao Li , Fang Chen

分类：机器学习 | 计算机视觉

2023-01-03

As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.

translated by 谷歌翻译

Saliency-Aware Spatio-Temporal Artifact Detection for Compressed Video Quality Assessment

Liqun Lin , Yang Zheng , Weiling Chen , Chengdong Lan , Tiesong Zhao

分类：计算机视觉

2023-01-03

Compressed videos often exhibit visually annoying artifacts, known as Perceivable Encoding Artifacts (PEAs), which dramatically degrade video visual quality. Subjective and objective measures capable of identifying and quantifying various types of PEAs are critical in improving visual quality. In this paper, we investigate the influence of four spatial PEAs (i.e. blurring, blocking, bleeding, and ringing) and two temporal PEAs (i.e. flickering and floating) on video quality. For spatial artifacts, we propose a visual saliency model with a low computational cost and higher consistency with human visual perception. In terms of temporal artifacts, self-attention based TimeSFormer is improved to detect temporal artifacts. Based on the six types of PEAs, a quality metric called Saliency-Aware Spatio-Temporal Artifacts Measurement (SSTAM) is proposed. Experimental results demonstrate that the proposed method outperforms state-of-the-art metrics. We believe that SSTAM will be beneficial for optimizing video coding techniques.

translated by 谷歌翻译

Risk-Averse MDPs under Reward Ambiguity

Haolin Ruan , Zhi Chen , Chin Pang Ho

分类：机器学习

2023-01-03

We propose a distributionally robust return-risk model for Markov decision processes (MDPs) under risk and reward ambiguity. The proposed model optimizes the weighted average of mean and percentile performances, and it covers the distributionally robust MDPs and the distributionally robust chance-constrained MDPs (both under reward ambiguity) as special cases. By considering that the unknown reward distribution lies in a Wasserstein ambiguity set, we derive the tractable reformulation for our model. In particular, we show that that the return-risk model can also account for risk from uncertain transition kernel when one only seeks deterministic policies, and that a distributionally robust MDP under the percentile criterion can be reformulated as its nominal counterpart at an adjusted risk level. A scalable first-order algorithm is designed to solve large-scale problems, and we demonstrate the advantages of our proposed model and algorithm through numerical experiments.

translated by 谷歌翻译

Policy Pre-training for End-to-end Autonomous Driving via Self-supervised Geometric Modeling

Penghao Wu , Li Chen , Hongyang Li , Xiaosong Jia , Junchi Yan , Yu Qiao

分类：计算机视觉

2023-01-03

Witnessing the impressive achievements of pre-training techniques on large-scale data in the field of computer vision and natural language processing, we wonder whether this idea could be adapted in a grab-and-go spirit, and mitigate the sample inefficiency problem for visuomotor driving. Given the highly dynamic and variant nature of the input, the visuomotor driving task inherently lacks view and translation invariance, and the visual input contains massive irrelevant information for decision making, resulting in predominant pre-training approaches from general vision less suitable for the autonomous driving task. To this end, we propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving. We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos. The proposed PPGeo is performed in two stages to support effective self-supervised training. In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input. In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only. As such, the pre-trained visual encoder is equipped with rich driving policy related representations and thereby competent for multiple visuomotor driving tasks. Extensive experiments covering a wide span of challenging scenarios have demonstrated the superiority of our proposed approach, where improvements range from 2% to even over 100% with very limited data. Code and models will be available at https://github.com/OpenDriveLab/PPGeo.

translated by 谷歌翻译

More is Better: A Database for Spontaneous Micro-Expression with High Frame Rates

Sirui Zhao , Huaying Tang , Xinglong Mao , Shifeng Liu , Hanqing Tao , Hao Wang , Tong Xu , Enhong Chen

分类：计算机视觉

2023-01-03

As one of the most important psychic stress reactions, micro-expressions (MEs), are spontaneous and transient facial expressions that can reveal the genuine emotions of human beings. Thus, recognizing MEs (MER) automatically is becoming increasingly crucial in the field of affective computing, and provides essential technical support in lie detection, psychological analysis and other areas. However, the lack of abundant ME data seriously restricts the development of cutting-edge data-driven MER models. Despite the recent efforts of several spontaneous ME datasets to alleviate this problem, it is still a tiny amount of work. To solve the problem of ME data hunger, we construct a dynamic spontaneous ME dataset with the largest current ME data scale, called DFME (Dynamic Facial Micro-expressions), which includes 7,526 well-labeled ME videos induced by 671 participants and annotated by more than 20 annotators throughout three years. Afterwards, we adopt four classical spatiotemporal feature learning models on DFME to perform MER experiments to objectively verify the validity of DFME dataset. In addition, we explore different solutions to the class imbalance and key-frame sequence sampling problems in dynamic MER respectively on DFME, so as to provide a valuable reference for future research. The comprehensive experimental results show that our DFME dataset can facilitate the research of automatic MER, and provide a new benchmark for MER. DFME will be published via https://mea-lab-421.github.io.

translated by 谷歌翻译