智能论文笔记

Enhancing and Adversarial: Improve ASR with Speaker Labels

Wei Zhou , Haotian Wu , Jingjing Xu , Mohammad Zeineldeen , Christoph Lüscher , Ralf Schlüter , Hermann Ney

分类：机器学习

2022-11-11

ASR can be improved by multi-task learning (MTL) with domain enhancing or domain adversarial training, which are two opposite objectives with the aim to increase/decrease domain variance towards domain-aware/agnostic ASR, respectively. In this work, we study how to best apply these two opposite objectives with speaker labels to improve conformer-based ASR. We also propose a novel adaptive gradient reversal layer for stable and effective adversarial training without tuning effort. Detailed analysis and experimental verification are conducted to show the optimal positions in the ASR neural network (NN) to apply speaker enhancing and adversarial training. We also explore their combination for further improvement, achieving the same performance as i-vectors plus adversarial training. Our best speaker-based MTL achieves 7\% relative improvement on the Switchboard Hub5'00 set. We also investigate the effect of such speaker-based MTL w.r.t. cleaner dataset and weaker ASR NN.

translated by 谷歌翻译

Lawin Transformer: Improving Semantic Segmentation Transformer with Multi-Scale Representations via Large Window Attention

Haotian Yan , Chuang Zhang , Ming Wu

分类：计算机视觉

2022-01-05

多尺度表示对于语义细分至关重要。社区目睹了利用多尺度上下文信息的语义分割卷积神经网络（CNN）的蓬勃发展。通过视觉变压器（VIV）的动机是强大的图像分类，最近提出了一些语义分割VITS，其中大多数是令人印象深刻的结果，但以计算经济为代价。在本文中，我们通过窗口注意机制成功地将多尺度表示引入语义分割vit，并进一步提高了性能和效率。为此，我们介绍了大型窗口关注，允许本地窗口在略微计算开销时仅查询大面积的上下文窗口。通过调节上下文区域与查询区域的比率，我们可以在多个尺度上捕获大量窗口注意。此外，采用空间金字塔汇集的框架与大窗口关注合作，这提出了一种名为大型窗口注意空间金字塔池（LawinAspp）的新型解码器，用于语义细分vit。我们所产生的Vit，草坪变压器由一个高效的定理视觉变压器（HVT）作为编码器和作为解码器的草坪Appp。实证结果表明，与现有方法相比，草坪变压器提供了提高的效率。草坪变压器进一步为城市景观（84.4 \％Miou），ADE20K（56.2 \％Miou）和Coco-incumate集进行了新的最先进的性能。代码将在https://github.com/yan-hao-tian/lawin发布。

translated by 谷歌翻译

Multi-Centroid Representation Network for Domain Adaptive Person Re-ID

Yuhang Wu , Tengteng Huang , Haotian Yao , Chi Zhang , Yuanjie Shao , Chuchu Han , Changxin Gao , Nong Sang

分类：计算机视觉

2021-12-22

最近，许多方法通过基于伪标签的对比学习来解决无监督的域自适应人员重新识别（UDA RE-ID）问题。在培训期间，通过简单地平均来自具有相同伪标签的集群的所有实例特征来获得UNI-Firedroid表示。然而，由于群集结果不完美的聚类结果，群集可能包含具有不同标识（标签噪声）的图像，这使得UNI质心表示不适当。在本文中，我们介绍了一种新的多质心存储器（MCM），以在群集中自适应地捕获不同的身份信息。 MCM可以通过为查询图像选择适当的正/负质心来有效地减轻标签噪声问题。此外，我们进一步提出了两种策略来改善对比学习过程。首先，我们介绍了一个域特定的对比度学习（DSCL）机制，通过仅通过相同域进行比较样本来完全探索局部信息。其次，我们提出了二阶最近的插值（Soni）以获得丰富和信息性的负样本。我们将MCM，DSCL和Soni集成到一个名为Multi-Firedroid表示网络（MCRN）的统一框架中。广泛的实验证明了MCRN在多个UDA重新ID任务上的最先进方法和完全无监督的重新ID任务的优越性。

translated by 谷歌翻译

A RL-based Policy Optimization Method Guided by Adaptive Stability Certification

Shengjie Wang , Fengbo Lan , Xiang Zheng , Yuxue Cao , Oluwatosin Oseni , Haotian Xu , Yang Gao , Tao Zhang

分类：机器人 | 机器学习

2023-01-02

In contrast to the control-theoretic methods, the lack of stability guarantee remains a significant problem for model-free reinforcement learning (RL) methods. Jointly learning a policy and a Lyapunov function has recently become a promising approach to ensuring the whole system with a stability guarantee. However, the classical Lyapunov constraints researchers introduced cannot stabilize the system during the sampling-based optimization. Therefore, we propose the Adaptive Stability Certification (ASC), making the system reach sampling-based stability. Because the ASC condition can search for the optimal policy heuristically, we design the Adaptive Lyapunov-based Actor-Critic (ALAC) algorithm based on the ASC condition. Meanwhile, our algorithm avoids the optimization problem that a variety of constraints are coupled into the objective in current approaches. When evaluated on ten robotic tasks, our method achieves lower accumulated cost and fewer stability constraint violations than previous studies.

translated by 谷歌翻译

Discovering Latent Knowledge in Language Models Without Supervision

Collin Burns , Haotian Ye , Dan Klein , Jacob Steinhardt

分类：自然语言处理 | 人工智能 | 机器学习

2022-12-07

Existing techniques for training language models can be misaligned with the truth: if we train models with imitation learning, they may reproduce errors that humans make; if we train them to generate text that humans rate highly, they may output errors that human evaluators can't detect. We propose circumventing this issue by directly finding latent knowledge inside the internal activations of a language model in a purely unsupervised way. Specifically, we introduce a method for accurately answering yes-no questions given only unlabeled model activations. It works by finding a direction in activation space that satisfies logical consistency properties, such as that a statement and its negation have opposite truth values. We show that despite using no supervision and no model outputs, our method can recover diverse knowledge represented in large language models: across 6 models and 10 question-answering datasets, it outperforms zero-shot accuracy by 4\% on average. We also find that it cuts prompt sensitivity in half and continues to maintain high accuracy even when models are prompted to generate incorrect answers. Our results provide an initial step toward discovering what language models know, distinct from what they say, even when we don't have access to explicit ground truth labels.

translated by 谷歌翻译

Knowledge is Power: Understanding Causality Makes Legal judgment Prediction Models More Generalizable and Robust

Haotian Chen , Lingwei Zhang , Fanchao Chen , Yang Yu

分类：自然语言处理 | 人工智能

2022-11-06

Legal judgment Prediction (LJP), aiming to predict a judgment based on fact descriptions, serves as legal assistance to mitigate the great work burden of limited legal practitioners. Most existing methods apply various large-scale pre-trained language models (PLMs) finetuned in LJP tasks to obtain consistent improvements. However, we discover the fact that the state-of-the-art (SOTA) model makes judgment predictions according to wrong (or non-casual) information, which not only weakens the model's generalization capability but also results in severe social problems like discrimination. Here, we analyze the causal mechanism misleading the LJP model to learn the spurious correlations, and then propose a framework to guide the model to learn the underlying causality knowledge in the legal texts. Specifically, we first perform open information extraction (OIE) to refine the text having a high proportion of causal information, according to which we generate a new set of data. Then, we design a model learning the weights of the refined data and the raw data for LJP model training. The extensive experimental results show that our model is more generalizable and robust than the baselines and achieves a new SOTA performance on two commonly used legal-specific datasets.

translated by 谷歌翻译

End-to-End Entity Detection with Proposer and Regressor

Xueru Wen , Changjiang Zhou , Haotian Tang , Luguang Liang , Yu Jiang , Hong Qi

分类：自然语言处理

2022-10-19

Named entity recognition is a traditional task in natural language processing. In particular, nested entity recognition receives extensive attention for the widespread existence of the nesting scenario. The latest research migrates the well-established paradigm of set prediction in object detection to cope with entity nesting. However, the manual creation of query vectors, which fail to adapt to the rich semantic information in the context, limits these approaches. An end-to-end entity detection approach with proposer and regressor is presented in this paper to tackle the issues. First, the proposer utilizes the feature pyramid network to generate high-quality entity proposals. Then, the regressor refines the proposals for generating the final prediction. The model adopts encoder-only architecture and thus obtains the advantages of the richness of query semantics, high precision of entity localization, and easiness of model training. Moreover, we introduce the novel spatially modulated attention and progressive refinement for further improvement. Extensive experiments demonstrate that our model achieves advanced performance in flat and nested NER, achieving a new state-of-the-art F1 score of 80.74 on the GENIA dataset and 72.38 on the WeiboNER dataset.

translated by 谷歌翻译

Automated Quality Controlled Analysis of 2D Phase Contrast Cardiovascular Magnetic Resonance Imaging

Emily Chan , Ciaran O'Hanlon , Carlota Asegurado Marquez , Marwenie Petalcorin , Jorge Mariscal-Harana , Haotian Gu , Raymond J. Kim , Robert M. Judd , Phil Chowienczyk , Julia A. Schnabel

分类：计算机视觉

2022-09-28

使用相对比心脏磁共振成像（PC-CMR）进行的流量分析可以量化用于评估心血管功能的重要参数。该分析的重要部分是鉴定正确的CMR视图和质量控制（QC），以检测可能影响流量定量的伪像。我们提出了一个新型的基于深度学习的框架，用于对完整CMR扫描的流量进行完全自动化的分析，该框架首先使用两个顺序卷积神经网络进行这些视图选择和QC步骤，然后进行自动主动脉和肺动脉分段，以实现对量化的量化。钥匙流参数。对于观察分类和QC，获得了0.958和0.914的精度值。对于细分，骰子分数为$> $ 0.969，而平淡的altman情节表示手动和自动峰流量值之间的一致性很高。此外，我们在外部验证数据集上测试了管道，结果表明管道的鲁棒性。这项工作是使用由986例病例组成的多生临床数据进行的，表明在临床环境中使用该管道的潜力。

translated by 谷歌翻译

VREN: Volleyball Rally Dataset with Expression Notation Language

Haotian Xia , Rhys Tracy , Yun Zhao , Erwan Fraisse , Yuan-Fang Wang , Linda Petzold

分类：机器学习

2022-09-28

这项研究旨在实现两个目标：第一个目标是策划一个大型且信息丰富的数据集，其中包含有关球员的行动和位置的关键和简洁的摘要，以及在专业和NCAA中排球的来回旅行模式Div-i室内排球游戏。尽管几项先前的研究旨在为其他运动创建类似的数据集（例如羽毛球和足球），但尚未实现为室内排球创建这样的数据集。第二个目标是引入排球描述性语言，以充分描述游戏中的集会过程并将语言应用于我们的数据集。基于精选的数据集和我们的描述性运动语言，我们使用我们的数据集介绍了三项用于自动化排球行动和战术分析的任务：（1）排球拉力赛预测，旨在预测集会的结果，并帮助球员和教练改善决策制定决策在实践中，（2）设置类型和命中类型预测，以帮助教练和球员更有效地为游戏做准备，以及（3）排球策略和进攻区统计，以提供高级排球统计数据，并帮助教练了解游戏和对手的策略更好的。我们进行了案例研究，以展示实验结果如何为排球分析社区提供见解。此外，基于现实世界数据的实验评估为我们的数据集和语言的未来研究和应用建立了基准。这项研究弥合了室内排球场与计算机科学之间的差距。

translated by 谷歌翻译

Synthesize Efficient Safety Certificates for Learning-Based Safe Control using Magnitude Regularization

Haotian Zheng , Haitong Ma , Sifa Zheng , Shengbo Eben Li , Jianqiang Wang

分类：机器人

2022-09-23

基于能量功能的安全证书可以为复杂机器人系统的安全控制任务提供可证明的安全保证。但是，所有有关基于学习的能量功能合成的最新研究仅考虑可行性，这可能会导致过度保存并导致效率较低的控制器。在这项工作中，我们提出了幅度的正规化技术，以通过降低能量功能内部的保守性，同时保持有希望的可证明的安全保证，以提高安全控制器的效率。具体而言，我们通过能量函数的幅度来量化保守性，并通过在合成损失中增加幅度的正则化项来降低保守性。我们提出了使用加固学习（RL）进行合成的SAFEMR算法来统一安全控制器和能量功能的学习过程。实验结果表明，所提出的方法确实会降低能量功能的保守性，并在控制器效率方面优于基准，同时确保安全性。

translated by 谷歌翻译