智能论文笔记

UMIX: Improving Importance Weighting for Subpopulation Shift via Uncertainty-Aware Mixup

Zongbo Han , Zhipeng Liang , Fan Yang , Liu Liu , Lanqing Li , Yatao Bian , Peilin Zhao , Bingzhe Wu , Changqing Zhang , Jianhua Yao

分类：机器学习

2022-09-19

在许多现实世界中的机器学习应用中，亚种群的转移存在着极大地存在，指的是包含相同亚种群组的培训和测试分布，但在亚种群频率中有所不同。重要性重新加权是通过对训练数据集中每个样本施加恒定或自适应抽样权重来处理亚种群转移问题的正常方法。但是，最近的一些研究已经认识到，这些方法中的大多数无法改善性能，而不是经验风险最小化，尤其是当应用于过度参数化的神经网络时。在这项工作中，我们提出了一个简单而实用的框架，称为“不确定性感知混合”（UMIX），以根据样品不确定性重新加权“混合”样品来减轻过度参数化模型中的过度拟合问题。基于训练 - 注射器的不确定性估计为每个样品的拟议UMIX配备，以灵活地表征亚群分布。我们还提供有见地的理论分析，以验证UMIX是否在先前的工作中实现了更好的概括界限。此外，我们在广泛的任务上进行了广泛的经验研究，以验证我们方法的有效性，既有定性和定量。

translated by 谷歌翻译

Improving Out-of-Distribution Robustness via Selective Augmentation

Huaxiu Yao , Yu Wang , Sai Li , Linjun Zhang , Weixin Liang , James Zou , Chelsea Finn

分类：机器学习

2022-01-02

机器学习算法通常假设培训和测试示例是从相同的分布中汲取的。然而，分发转移是现实世界应用中的常见问题，并且可以在测试时间造成模型急剧执行。在本文中，我们特别考虑域移位和亚泊素班次的问题（例如，不平衡数据）。虽然先前的作品通常会寻求明确地将模型的内部表示和预测器进行明确，以成为域不变的，但我们旨在规范整个功能而不限制模型的内部表示。这导致了一种简单的基于混合技术，它通过名为LISA的选择性增强来学习不变函数。 Lisa选择性地用相同的标签而单独地插值样本，但不同的域或具有相同的域但不同的标签。我们分析了线性设置，从理论上展示了LISA如何导致较小的最差组错误。凭经验，我们研究了LISA对从亚本化转变到域移位的九个基准的有效性，我们发现LISA一直以其他最先进的方法表达。

translated by 谷歌翻译

Improving group robustness under noisy labels using predictive uncertainty

Dongpin Oh , Dae Lee , Jeunghyun Byun , Bonggun Shin

分类：机器学习 | 计算机视觉

2022-12-14

The standard empirical risk minimization (ERM) can underperform on certain minority groups (i.e., waterbirds in lands or landbirds in water) due to the spurious correlation between the input and its label. Several studies have improved the worst-group accuracy by focusing on the high-loss samples. The hypothesis behind this is that such high-loss samples are \textit{spurious-cue-free} (SCF) samples. However, these approaches can be problematic since the high-loss samples may also be samples with noisy labels in the real-world scenarios. To resolve this issue, we utilize the predictive uncertainty of a model to improve the worst-group accuracy under noisy labels. To motivate this, we theoretically show that the high-uncertainty samples are the SCF samples in the binary classification problem. This theoretical result implies that the predictive uncertainty is an adequate indicator to identify SCF samples in a noisy label setting. Motivated from this, we propose a novel ENtropy based Debiasing (END) framework that prevents models from learning the spurious cues while being robust to the noisy labels. In the END framework, we first train the \textit{identification model} to obtain the SCF samples from a training set using its predictive uncertainty. Then, another model is trained on the dataset augmented with an oversampled SCF set. The experimental results show that our END framework outperforms other strong baselines on several real-world benchmarks that consider both the noisy labels and the spurious-cues.

translated by 谷歌翻译

Just Train Twice: Improving Group Robustness without Training Group Information

Evan Zheran Liu , Behzad Haghgoo , Annie S. Chen , Aditi Raghunathan , Pang Wei Koh , Shiori Sagawa , Percy Liang , Chelsea Finn

分类：

2021-07-19

Standard training via empirical risk minimization (ERM) can produce models that achieve high accuracy on average but low accuracy on certain groups, especially in the presence of spurious correlations between the input and label. Prior approaches that achieve high worst-group accuracy, like group distributionally robust optimization (group DRO) require expensive group annotations for each training point, whereas approaches that do not use such group annotations typically achieve unsatisfactory worst-group accuracy. In this paper, we propose a simple two-stage approach, JTT, that first trains a standard ERM model for several epochs, and then trains a second model that upweights the training examples that the first model misclassified. Intuitively, this upweights examples from groups on which standard ERM models perform poorly, leading to improved worst-group performance. Averaged over four image classification and natural language processing tasks with spurious correlations, JTT closes 75% of the gap in worst-group accuracy between standard ERM and group DRO, while only requiring group annotations on a small validation set in order to tune hyperparameters.

translated by 谷歌翻译

BARACK: Partially Supervised Group Robustness With Guarantees

Nimit Sohoni , Maziar Sanjabi , Nicolas Ballas , Aditya Grover , Shaoliang Nie , Hamed Firooz , Christopher Ré

分类：机器学习

2021-12-31

虽然神经网络在平均病例的性能方面对分类任务的成功显着，但它们通常无法在某些数据组上表现良好。这样的组信息可能是昂贵的;因此，即使在培训数据不可用的组标签不可用，较稳健性和公平的最新作品也提出了改善最差组性能的方法。然而，这些方法通常在培训时间使用集团信息的表现不佳。在这项工作中，我们假设没有组标签的较大数据集一起访问少量组标签。我们提出了一个简单的两步框架，利用这个部分组信息来提高最差组性能：训练模型以预测训练数据的丢失组标签，然后在强大的优化目标中使用这些预测的组标签。从理论上讲，我们在最差的组性能方面为我们的方法提供泛化界限，展示了泛化误差如何相对于培训点总数和具有组标签的培训点的数量。凭经验，我们的方法优于不使用群组信息的基线表达，即使只有1-33％的积分都有组标签。我们提供消融研究，以支持我们框架的稳健性和可扩展性。

translated by 谷歌翻译

Boosted CVaR Classification

Runtian Zhai , Chen Dan , Arun Sai Suggala , Zico Kolter , Pradeep Ravikumar

分类：机器学习 | (统计)机器学习

2021-10-26

许多现代化的机器学习任务需要具有高尾部性能的模型，即在数据集中最严格的样本上的高性能。该问题已广泛研究了算法公平，类别不平衡和风险敏感决策等领域。一种最大化模型的尾部性能的流行方法是最大限度地减少CVAR（风险条件值）损失，这计算了损失尾部的平均风险。然而，对于通过零一次损耗评估模型的分类任务，我们表明，如果分类器是确定性的，那么平均零一个损耗的最小值也会最小化CVAR零一次损耗，表明CVAR损耗最小化是最小化的没有额外的假设没有帮助。我们通过最大限度地减少随机分类器的CVAR损失来规避这种负面结果，其中平均零一个损耗和CVAR零一次损耗的最小化器不再相同，因此最小化后者可能导致更好的尾部性能。为了学习这样的随机分类器，我们提出了增强的CVAR分类框架，该框架通过CVAR与称为LPBoost的经典升压算法之间的直接关系而激励。基于此框架，我们设计了一种称为$ \ alpha $ -adalpboost的算法。我们在四个基准数据集中凭经验评估了我们所提出的算法，并显示它比确定性模型训练方法更高的尾部性能。

translated by 谷歌翻译

Towards Group Robustness in the presence of Partial Group Labels

Vishnu Suresh Lokhande , Kihyuk Sohn , Jinsung Yoon , Madeleine Udell , Chen-Yu Lee , Tomas Pfister

分类：机器学习 | 人工智能 | 计算机视觉 | (统计)机器学习

2022-01-10

学习不变表示是在数据集中虚假相关驱动的机器学习模型时的重要要求。这些杂散相关性，在输入样本和目标标签之间，错误地指导了神经网络预测，导致某些组的性能差，尤其是少数群体。针对这些虚假相关性的强大培训需要每个样本的组成员资格。这种要求在少数群体或稀有群体的数据标签努力的情况下是显着费力的，或者包括数据集的个人选择隐藏敏感信息的情况。另一方面，存在这种数据收集的存在力度导致包含部分标记的组信息的数据集。最近的作品解决了完全无监督的场景，没有用于组的标签。因此，我们的目标是通过解决更现实的设置来填补文献中的缺失差距，这可以在培训期间利用部分可用的敏感或群体信息。首先，我们构造一个约束集并导出组分配所属的高概率绑定到该集合。其次，我们提出了一种从约束集中优化了优化最严格的组分配的算法。通过对图像和表格数据集的实验，我们显示少数集团的性能的改进，同时在跨组中保持整体汇总精度。

translated by 谷歌翻译

Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization

Shiori Sagawa , Pang Wei Koh , Tatsunori B. Hashimoto , Percy Liang

分类：

2019-11-20

Overparameterized neural networks can be highly accurate on average on an i.i.d.test set yet consistently fail on atypical groups of the data (e.g., by learning spurious correlations that hold on average but not in such groups). Distributionally robust optimization (DRO) allows us to learn models that instead minimize the worst-case training loss over a set of pre-defined groups. However, we find that naively applying group DRO to overparameterized neural networks fails: these models can perfectly fit the training data, and any model with vanishing average training loss also already has vanishing worst-case training loss. Instead, the poor worst-case performance arises from poor generalization on some groups. By coupling group DRO models with increased regularization-a stronger-than-typical 2 penalty or early stopping-we achieve substantially higher worst-group accuracies, with 10-40 percentage point improvements on a natural language inference task and two image tasks, while maintaining high average accuracies. Our results suggest that regularization is important for worst-group generalization in the overparameterized regime, even if it is not needed for average generalization. Finally, we introduce a stochastic optimization algorithm, with convergence guarantees, to efficiently train group DRO models.

translated by 谷歌翻译

Avoiding spurious correlations via logit correction

Sheng Liu , Xu Zhang , Nitesh Sekhar , Yue Wu , Prateek Singhal , Carlos Fernandez-Granda

分类：机器学习 | 自然语言处理 | 计算机视觉 | (统计)机器学习

2022-12-02

Empirical studies suggest that machine learning models trained with empirical risk minimization (ERM) often rely on attributes that may be spuriously correlated with the class labels. Such models typically lead to poor performance during inference for data lacking such correlations. In this work, we explicitly consider a situation where potential spurious correlations are present in the majority of training data. In contrast with existing approaches, which use the ERM model outputs to detect the samples without spurious correlations, and either heuristically upweighting or upsampling those samples; we propose the logit correction (LC) loss, a simple yet effective improvement on the softmax cross-entropy loss, to correct the sample logit. We demonstrate that minimizing the LC loss is equivalent to maximizing the group-balanced accuracy, so the proposed LC could mitigate the negative impacts of spurious correlations. Our extensive experimental results further reveal that the proposed LC loss outperforms the SoTA solutions on multiple popular benchmarks by a large margin, an average 5.5% absolute improvement, without access to spurious attribute labels. LC is also competitive with oracle methods that make use of the attribute labels. Code is available at https://github.com/shengliu66/LC.

translated by 谷歌翻译

Class-Aware Universum Inspired Re-Balance Learning for Long-Tailed Recognition

Enhao Zhang , Chuanxing Geng , Songcan Chen

分类：计算机视觉

2022-07-26

少数族裔类的数据增强是长尾识别的有效策略，因此开发了大量方法。尽管这些方法都确保了样本数量的平衡，但是增强样品的质量并不总是令人满意的，识别且容易出现过度拟合，缺乏多样性，语义漂移等问题。对于这些问题，我们建议班级感知的大学启发了重新平衡学习（CAUIRR），以进行长尾识别，这使Universum具有班级感知的能力，可以从样本数量和质量中重新平衡个人少数族裔。特别是，我们从理论上证明，凯尔学到的分类器与从贝叶斯的角度从平衡状态下学到的那些人一致。此外，我们进一步开发了一种高阶混合方法，该方法可以自动生成类感知的Universum（CAU）数据，而无需诉诸任何外部数据。与传统的大学不同，此类产生的全球还考虑了域的相似性，阶级可分离性和样本多样性。基准数据集的广泛实验证明了我们方法的令人惊讶的优势，尤其是与最先进的方法相比，少数族裔类别的TOP1准确性提高了1.9％6％。

translated by 谷歌翻译

SWAD: Domain Generalization by Seeking Flat Minima

Junbum Cha , Sanghyuk Chun , Kyungjae Lee , Han-Cheol Cho , Seunghyun Park , Yunsung Lee , Sungrae Park

分类：机器学习 | 计算机视觉

2021-02-17

域泛化（DG）方法旨在通过仅使用来自源域的训练数据来实现未经证明的目标域的概括性。虽然已经提出了各种DG方法，但最近的一项研究表明，在一个公平的评估方案下，称为域底，简单的经验风险最小化（ERM）方法可与以前的方法相当。不幸的是，简单地解决了ERM在复杂的非凸损函数上，可以通过寻求尖锐的最小值来容易地导致次优化的普遍性。在本文中，我们理论上表明发现扁平最小值导致较小的域泛化差距。我们还提出了一种简单而有效的方法，名为随机重量平均（纵向），找到扁平的最小值。瑞郎发现更漂亮的最小值，并且由于通过密集和过度感知的随机重量采样策略而遭受的过度装备不足。瑞士瑞士展示了五个DG基准测试，即PACS，VLC，OfficeHome，Terraincognita和Domainnet的最先进的表演，符合域名准确度的一致和大幅度+ 1.6％。我们还与常规的泛化方法（如数据增强和一致性正则化方法）进行比较，以验证显着的性能改进是通过寻求扁平的最小值，而不是更好的域概括性。最后但并非最不重要的是，瑞士剧本适应现有的DG方法而无需修改;施联和现有DG方法的组合进一步提高了DG性能。源代码可在https://github.com/khanrc/swad提供。

translated by 谷歌翻译

AGRO: Adversarial Discovery of Error-prone groups for Robust Optimization

Bhargavi Paranjape , Pradeep Dasigi , Vivek Srikumar , Luke Zettlemoyer , Hannaneh Hajishirzi

分类：机器学习 | 人工智能 | 自然语言处理

2022-12-02

Models trained via empirical risk minimization (ERM) are known to rely on spurious correlations between labels and task-independent input features, resulting in poor generalization to distributional shifts. Group distributionally robust optimization (G-DRO) can alleviate this problem by minimizing the worst-case loss over a set of pre-defined groups over training data. G-DRO successfully improves performance of the worst-group, where the correlation does not hold. However, G-DRO assumes that the spurious correlations and associated worst groups are known in advance, making it challenging to apply it to new tasks with potentially multiple unknown spurious correlations. We propose AGRO -- Adversarial Group discovery for Distributionally Robust Optimization -- an end-to-end approach that jointly identifies error-prone groups and improves accuracy on them. AGRO equips G-DRO with an adversarial slicing model to find a group assignment for training examples which maximizes worst-case loss over the discovered groups. On the WILDS benchmark, AGRO results in 8% higher model performance on average on known worst-groups, compared to prior group discovery approaches used with G-DRO. AGRO also improves out-of-distribution performance on SST2, QQP, and MS-COCO -- datasets where potential spurious correlations are as yet uncharacterized. Human evaluation of ARGO groups shows that they contain well-defined, yet previously unstudied spurious correlations that lead to model errors.

translated by 谷歌翻译

Improved OOD Generalization via Conditional Invariant Regularizer

Mingyang Yi , Ruoyu Wang , Jiachen Sun , Zhenguo Li , Zhi-Ming Ma

分类：机器学习

2022-07-14

最近，对分布（OOD）数据具有相关性转移的概括引起了极大的关注。相关转移是由与类标签相关的虚假属性引起的，因为它们之间的相关性可能在训练和测试数据中有所不同。对于这样一个问题，我们表明，鉴于类标签，有条件独立的虚假属性模型是可推广的。基于此，提出了控制OOD泛化误差的度量条件伪变异（CSV），以衡量这种条件独立性。为了改善OOD的概括，我们将培训过程正常使用拟议的CSV。在温和的假设下，我们的训练目标可以作为非Convex-Concave Mini-Max问题提出。提出了具有可证明的收敛速率的算法来解决该问题。广泛的经验结果验证了我们算法在改善OOD概括方面的功效。

translated by 谷歌翻译

Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective

Zhengzhuo Xu , Zenghao Chai , Chun Yuan

分类：计算机视觉 | 机器学习

2021-11-06

现实世界数据普遍面对严重的类别 - 不平衡问题，并且展示了长尾分布，即，大多数标签与有限的情况有关。由此类数据集监督的NA \“IVE模型更愿意占主导地位标签，遇到严重的普遍化挑战并变得不佳。我们从先前的角度提出了两种新的方法，以减轻这种困境。首先，我们推导了一个以平衡为导向的数据增强命名均匀的混合物（Unimix）促进长尾情景中的混合，采用先进的混合因子和采样器，支持少数民族。第二，受贝叶斯理论的动机，我们弄清了贝叶斯偏见（北美），是由此引起的固有偏见先前的不一致，并将其补偿为对标准交叉熵损失的修改。我们进一步证明了所提出的方法理论上和经验地确保分类校准。广泛的实验验证我们的策略是否有助于更好校准的模型，以及他们的策略组合在CIFAR-LT，ImageNet-LT和Inattations 2018上实现最先进的性能。

translated by 谷歌翻译

Importance Tempering: Group Robustness for Overparameterized Models

Yiping Lu , Wenlong Ji , Zachary Izzo , Lexing Ying

分类：机器学习 | 人工智能 | (统计)机器学习

2022-09-19

尽管过度参数化的模型已经在许多机器学习任务上表现出成功，但与培训不同的测试分布的准确性可能会下降。这种准确性下降仍然限制了在野外应用机器学习的限制。同时，重要的加权是一种处理分配转移的传统技术，已被证明在经验和理论上对过度参数化模型的影响较小甚至没有影响。在本文中，我们提出了重要的回火来改善决策界限，并为过度参数化模型取得更好的结果。从理论上讲，我们证明在标签移位和虚假相关设置下，组温度的选择可能不同。同时，我们还证明正确选择的温度可以解脱出少数群体崩溃的分类不平衡。从经验上讲，我们使用重要性回火来实现最严重的小组分类任务的最新结果。

translated by 谷歌翻译

AutoBalance: Optimized Loss Functions for Imbalanced Data

Mingchen Li , Xuechen Zhang , Christos Thrampoulidis , Jiasi Chen , Samet Oymak

分类：机器学习

2022-01-04

现代机器学习问题中的不平衡数据集是司空见惯的。具有敏感属性的代表性课程或群体的存在导致关于泛化和公平性的担忧。这种担忧进一步加剧了大容量深网络可以完全适合培训数据，似乎在训练期间达到完美的准确性和公平，但在测试期间表现不佳。为了解决这些挑战，我们提出了自动化，一个自动设计培训损失功能的双层优化框架，以优化准确性和寻求公平目标的混合。具体地，较低级别的问题列举了模型权重，并且上级问题通过监视和优化通过验证数据的期望目标来调谐损耗功能。我们的损耗设计通过采用参数跨熵损失和个性化数据增强方案，可以为类/组进行个性化处理。我们评估我们对不平衡和群体敏感分类的应用方案的方法的好处和性能。广泛的经验评估表明了自动矛盾最先进的方法的益处。我们的实验结果与损耗功能设计的理论见解和培训验证分裂的好处相辅相成。所有代码都是可用的开源。

translated by 谷歌翻译

Contrastive Adapters for Foundation Model Group Robustness

Michael Zhang , Christopher Ré

分类：机器学习

2022-07-14

虽然大型审计的基础模型（FMS）对数据集级别的分布变化显示出显着的零击分类鲁棒性，但它们对亚群或组移动的稳健性相对却相对不受欢迎。我们研究了这个问题，并发现诸如剪辑之类的FMS可能对各种群体转移可能不健壮。在9个稳健性基准中，其嵌入式分类零射击分类导致平均和最差组精度之间的差距高达80.7个百分点（PP）。不幸的是，现有的改善鲁棒性的方法需要重新培训，这在大型基础模型上可能非常昂贵。我们还发现，改善模型推理的有效方法（例如，通过适配器，具有FM嵌入式作为输入的轻量级网络）不会持续改进，有时与零击相比会伤害组鲁棒性（例如，将精度差距提高到50.1 pp on 50.1 pp on On on 50.1 pp on Celeba）。因此，我们制定了一种适配器培训策略，以有效有效地改善FM组的鲁棒性。我们激励的观察是，尽管同一阶级中的群体中较差的鲁棒性在基础模型“嵌入空间”中分开，但标准适配器训练可能不会使这些要点更加紧密。因此，我们提出了对比度的适应，该适应器会通过对比度学习进行训练适配器，以使样品嵌入在同一类中的地面真相类嵌入和其他样品嵌入。在整个9个基准测试中，我们的方法始终提高组鲁棒性，使最差的组精度提高了8.5至56.0 pp。我们的方法也是有效的，这样做的方法也没有任何FM芬太尼，只有一组固定的冷冻FM嵌入。在水鸟和Celeba等基准上，这导致最差的组精度可与最先进的方法相媲美，而最先进的方法可以重新训练整个模型，而仅训练$ \ leq $ 1％的模型参数。

translated by 谷歌翻译

On Multi-Domain Long-Tailed Recognition, Imbalanced Domain Generalization and Beyond

Yuzhe Yang , Hao Wang , Dina Katabi

分类：机器学习 | 人工智能 | 计算机视觉

2022-03-17

现实世界中的数据通常显示出不平衡的标签分布。有关数据不平衡的现有研究集中在单域设置上，即样本来自相同的数据分布。但是，自然数据可以起源于不同的领域，在一个领域中的少数族裔可以从其他域中具有丰富的实例。我们正式化了多域长尾识别（MDLT）的任务，该任务从多域不平衡数据中学习，解决了跨域的标签不平衡，域移动和不同标签分布，并将其推广到所有域级对。我们首先开发了域类的可传递性图，并表明这种可传递性决定了MDLT中学习的成功。然后，我们提出了Boda，这是一种理论上的学习策略，可以跟踪可转移性统计的上限，并确保跨域级分布之间的平衡对齐和校准。我们策划了基于广泛使用的多域数据集的五个MDLT基准测试，并将BODA与跨越不同学习策略的二十个算法进行比较。广泛而严格的实验验证了BODA的出色性能。此外，作为副产品，Boda建立了有关域泛化基准测试的新的最新最先进，强调了解决跨域数据不平衡的重要性，这对于改善概括至看不见的域可能至关重要。代码和数据可在以下网址获得：https：//github.com/yyzharry/multi-domain-mmbalance。

translated by 谷歌翻译

Environment Inference for Invariant Learning

Elliot Creager , Jörn-Henrik Jacobsen , Richard Zemel

分类：

2020-10-14

Learning models that gracefully handle distribution shifts is central to research on domain generalization, robust optimization, and fairness. A promising formulation is domain-invariant learning, which identifies the key issue of learning which features are domain-specific versus domaininvariant. An important assumption in this area is that the training examples are partitioned into "domains" or "environments". Our focus is on the more common setting where such partitions are not provided. We propose EIIL, a general framework for domain-invariant learning that incorporates Environment Inference to directly infer partitions that are maximally informative for downstream Invariant Learning. We show that EIIL outperforms invariant learning methods on the CMNIST benchmark without using environment labels, and significantly outperforms ERM on worst-group performance in the Waterbirds and CivilComments datasets. Finally, we establish connections between EIIL and algorithmic fairness, which enables EIIL to improve accuracy and calibration in a fair prediction problem.

translated by 谷歌翻译

RORL: Robust Offline Reinforcement Learning via Conservative Smoothing

Rui Yang , Chenjia Bai , Xiaoteng Ma , Zhaoran Wang , Chongjie Zhang , Lei Han

分类：机器学习 | 人工智能 | (统计)机器学习

2022-06-06

离线增强学习（RL）提供了一个有希望的方向，可以利用大量离线数据来实现复杂的决策任务。由于分配转移问题，当前的离线RL算法通常被设计为在价值估计和行动选择方面是保守的。但是，这种保守主义在现实情况下遇到观察偏差时，例如传感器错误和对抗性攻击时会损害学习政策的鲁棒性。为了权衡鲁棒性和保守主义，我们通过一种新颖的保守平滑技术提出了强大的离线增强学习（RORL）。在RORL中，我们明确地介绍了数据集附近国家的策略和价值函数的正则化，以及对这些OOD状态的其他保守价值估计。从理论上讲，我们表明RORL比线性MDP中的最新理论结果更紧密地构成。我们证明RORL可以在一般离线RL基准上实现最新性能，并且对对抗性观察的扰动非常强大。

translated by 谷歌翻译