智能论文笔记

Investigating Glyph Phonetic Information for Chinese Spell Checking: What Works and What's Next

Xiaotian Zhang , Yanjun Zheng , Hang Yan , Xipeng Qiu

分类：自然语言处理 | 人工智能

2022-12-08

While pre-trained Chinese language models have demonstrated impressive performance on a wide range of NLP tasks, the Chinese Spell Checking (CSC) task remains a challenge. Previous research has explored using information such as glyphs and phonetics to improve the ability to distinguish misspelled characters, with good results. However, the generalization ability of these models is not well understood: it is unclear whether they incorporate glyph-phonetic information and, if so, whether this information is fully utilized. In this paper, we aim to better understand the role of glyph-phonetic information in the CSC task and suggest directions for improvement. Additionally, we propose a new, more challenging, and practical setting for testing the generalizability of CSC models. All code is made publicly available.

translated by 谷歌翻译

SDCL: Self-Distillation Contrastive Learning for Chinese Spell Checking

Xiaotian Zhang , Hang Yan , Yu Sun , Xipeng Qiu

分类：自然语言处理 | 人工智能

2022-10-31

Due to the ambiguity of homophones, Chinese Spell Checking (CSC) has widespread applications. Existing systems typically utilize BERT for text encoding. However, CSC requires the model to account for both phonetic and graphemic information. To adapt BERT to the CSC task, we propose a token-level self-distillation contrastive learning method. We employ BERT to encode both the corrupted and corresponding correct sentence. Then, we use contrastive learning loss to regularize corrupted tokens' hidden states to be closer to counterparts in the correct sentence. On three CSC datasets, we confirmed our method provides a significant improvement above baselines.

translated by 谷歌翻译

Multimodal Learning with Channel-Mixing and Masked Autoencoder on Facial Action Unit Detection

Xiang Zhang , Huiyuan Yang , Taoyue Wang , Xiaotian Li , Lijun Yin

分类：计算机视觉

2022-09-25

最近利用多模式数据旨在建立面部动作单元（AU）检测模型的研究。但是，由于多模式数据的异质性，多模式表示学习成为主要挑战之一。一方面，很难通过仅通过一个特征提取器从多模式中提取相关特征，另一方面，先前的研究并未完全探索多模式融合策略的潜力。例如，早期融合通常需要在推理期间存在所有方式，而晚期融合和中间融合则增加了特征学习的网络大小。与晚期融合的大量工作相反，早期融合探索渠道信息的作品很少。本文提出了一个新型的多模式网络，称为多模式通道混合（MCM），作为一种预训练的模型，以学习强大的表示形式，以促进多模式融合。我们在自动面部动作单元检测的下游任务上评估学习的表示形式。具体而言，它是一个单个流编码器网络，该网络在早期融合中使用频道混合模块，在下游检测任务中仅需要一种模态。我们还利用蒙版的VIT编码器从融合图像中学习特征，并使用两个VIT解码器重建两个模式。我们已经在两个公共数据集（称为BP4D和DISFA）上进行了广泛的实验，以评估所提出的多模式框架的有效性和鲁棒性。结果表明我们的方法是可比或优越的，它与最新的基线方法相当。

translated by 谷歌翻译

COEM: Cross-Modal Embedding for MetaCell Identification

Haiyi Mao , Minxue Jia , Jason Xiaotian Dou Haotian Zhang Panayiotis V. Benos

分类：人工智能

2022-07-15

元素是单细胞曲线的不相交和均匀的组，代表离散和高度颗粒细胞状态。现有的元算法倾向于仅使用一种模态来推断元素，即使单细胞多摩变数据集谱图在同一细胞内多个分子模态。在这里，我们提出\ textbf {c} ross-m \ textbf {o} dal \ textbf {e} mbedding for \ textbf {m} etacell标识（coem），它利用嵌入式空间，利用scatac-seq和scatac-seq和scatac-seq和SCRNA-SEQ执行聚合，平衡精细分辨率和足够的测序覆盖范围之间的权衡。COEM通过有效识别具有连续和离散细胞类型的数据集的准确且分离良好的元素来优于最先进的方法海科。此外，COEM显着改善了峰到基因的关联分析，并促进了复杂的基因调节推理任务。

translated by 谷歌翻译

LordNet: Learning to Solve Parametric Partial Differential Equations without Simulated Data

Wenlei Shi , Xinquan Huang , Xiaotian Gao , Xinran Wei , Jia Zhang , Jiang Bian , Mao Yang , Tie-Yan Liu

分类：机器学习

2022-06-19

事实证明，神经操作员是无限维函数空间之间非线性算子的强大近似值，在加速偏微分方程（PDE）的溶液方面是有希望的。但是，它需要大量的模拟数据，这些数据可能成本高昂，从而导致鸡肉 - 蛋的困境并限制其在求解PDE中的使用。为了摆脱困境，我们提出了一个无数据的范式，其中神经网络直接从由离散的PDE构成的平方平方残留（MSR）损失中学习物理。我们研究了MSR损失中的物理信息，并确定神经网络必须具有对PDE空间域中的远距离纠缠建模的挑战，PDE的空间域中的模式在不同的PDE中有所不同。因此，我们提出了低级分解网络（Lordnet），该网络可调节，并且也有效地建模各种纠缠。具体而言，Lordnet通过简单的完全连接的层学习了与全球纠缠的低级别近似值，从而以降低的计算成本来提取主要模式。关于解决泊松方程和纳维尔 - 长方式方程的实验表明，MSR损失的物理约束可以提高神经网络的精确度和泛化能力。此外，Lordnet在PDE中的其他现代神经网络体系结构都优于最少的参数和最快的推理速度。对于Navier-Stokes方程式，学习的运算符的速度比具有相同计算资源的有限差异解决方案快50倍。

translated by 谷歌翻译

CholecTriplet2021: A benchmark challenge for surgical action triplet recognition

Chinedu Innocent Nwoye , Deepak Alapatt , Tong Yu , Armine Vardazaryan , Fangfang Xia , Zixuan Zhao , Tong Xia , Fucang Jia , Yuxuan Yang , Hao Wang

分类：计算机视觉

2022-04-10

Context-aware decision support in the operating room can foster surgical safety and efficiency by leveraging real-time feedback from surgical workflow analysis. Most existing works recognize surgical activities at a coarse-grained level, such as phases, steps or events, leaving out fine-grained interaction details about the surgical activity; yet those are needed for more helpful AI assistance in the operating room. Recognizing surgical actions as triplets of <instrument, verb, target> combination delivers comprehensive details about the activities taking place in surgical videos. This paper presents CholecTriplet2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos. The challenge granted private access to the large-scale CholecT50 dataset, which is annotated with action triplet information. In this paper, we present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge. A total of 4 baseline methods from the challenge organizers and 19 new deep learning algorithms by competing teams are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%. This study also analyzes the significance of the results obtained by the presented approaches, performs a thorough methodological comparison between them, in-depth result analysis, and proposes a novel ensemble method for enhanced recognition. Our analysis shows that surgical workflow analysis is not yet solved, and also highlights interesting directions for future research on fine-grained surgical activity recognition which is of utmost importance for the development of AI in surgery.

translated by 谷歌翻译

MMPTRACK: Large-scale Densely Annotated Multi-camera Multiple People Tracking Benchmark

Xiaotian Han , Quanzeng You , Chunyu Wang , Zhizheng Zhang , Peng Chu , Houdong Hu , Jiang Wang , Zicheng Liu

分类：计算机视觉

2021-11-30

多摄像机跟踪系统在需要高质量跟踪结果的应用中获得普及，例如摩擦结账，因为单眼多物体跟踪（MOT）系统由于闭塞而在杂乱和拥挤的环境中经常失败。通过恢复部分3D信息，多个高度重叠的相机可以显着减轻问题。但是，使用不同的相机设置和背景创建高质量多摄像头跟踪数据集的成本在该域中的数据集比例限制了数据集尺度。在本文中，我们在自动注释系统的帮助下提供了五种不同环境的大型密集标记的多摄像头跟踪数据集。该系统使用重叠和校准的深度和RGB相机来构建高性能3D跟踪器，可自动生成3D跟踪结果。使用摄像机参数将3D跟踪结果投影到每个RGB摄像头视图以创建2D跟踪结果。然后，我们手动检查并更正3D跟踪结果以确保标签质量，比完全手动注释便宜得多。我们使用两个实时多相机跟踪器和具有不同设置的人重新识别（REID）模型进行了广泛的实验。该数据集在杂乱和拥挤的环境中提供了更可靠的多摄像头，多目标跟踪系统的基准。此外，我们的结果表明，在此数据集中调整跟踪器和REID模型显着提高了它们的性能。我们的数据集将在接受这项工作后公开发布。

translated by 谷歌翻译

Learning to Rank Ace Neural Architectures via Normalized Discounted Cumulative Gain

Yuge Zhang , Quanlu Zhang , Li Lyna Zhang , Yaming Yang , Chenqian Yan , Xiaotian Gao , Yuqing Yang

分类：计算机视觉 | 人工智能

2021-08-06

神经体系结构搜索（NAS）的主要挑战之一是有效地对体系结构的性能进行排名。绩效排名者的主流评估使用排名相关性（例如，肯德尔的tau），这对整个空间都同样关注。但是，NAS的优化目标是识别顶级体系结构，同时对搜索空间中其他体系结构的关注更少。在本文中，我们从经验和理论上都表明，标准化的累积累积增益（NDCG）对于排名者来说是一个更好的指标。随后，我们提出了一种新算法Acenas，该算法直接通过Lambdarank优化NDCG。它还利用体重共享NAS产生的弱标签来预先培训排名，以便进一步降低搜索成本。对12个NAS基准和大规模搜索空间进行的广泛实验表明，我们的方法始终超过SOTA NAS方法，精度提高了3.67％，搜索成本降低了8倍。

translated by 谷歌翻译

Dynamic Speed Guidance for CAV Ramp Merging in Non-Cooperative Environment: An On-Site Experiment

Wei Ji , Yechi Ma , Guangzhang Cui , Xiaotian Qin , Wei Hua

分类：机器人

2022-12-21

Ramp merging is a typical application of cooperative intelligent transportation system (C-ITS). Vehicle trajectories perceived by roadside sensors are importation complement to the limited visual field of on-board perception. Vehicle tracking and trajectory denoising algorithm is proposed in this paper to take full advantage of roadside cameras for vehicle trajectory and speed profile estimation. Dynamic speed guidance algorithm is proposed to help on-ramp vehicles to merge into mainline smoothly, even in non-cooperative environment where mainline vehicles are not expected to slow down to accommodate on-ramp vehicles. On-site experiments were taken out in a merging area of Hangzhou Belt Highway to testify our prototype system, and simulation analysis shows our proposed algorithm can achieve significant fuel savings during the ramp merging process.

translated by 谷歌翻译

Decomposable Sparse Tensor on Tensor Regression

Haiyi Mao , Jason Xiaotian Dou

分类：机器学习

2022-12-09

Most regularized tensor regression research focuses on tensors predictors with scalars responses or vectors predictors to tensors responses. We consider the sparse low rank tensor on tensor regression where predictors $\mathcal{X}$ and responses $\mathcal{Y}$ are both high-dimensional tensors. By demonstrating that the general inner product or the contracted product on a unit rank tensor can be decomposed into standard inner products and outer products, the problem can be simply transformed into a tensor to scalar regression followed by a tensor decomposition. So we propose a fast solution based on stagewise search composed by contraction part and generation part which are optimized alternatively. We successfully demonstrate our method can out perform current methods in terms of accuracy, predictors selection by effectively incorporating the structural information.

translated by 谷歌翻译