智能论文笔记

3D Cross Pseudo Supervision (3D-CPS): A semi-supervised nnU-Net architecture for abdominal organ segmentation

Yongzhi Huang , Hanwen Zhang , Yan Yan , Haseeb Hassan , Bingding Huang

分类：计算机视觉

2022-09-19

大型策划数据集是必要的，但是注释医学图像是一个耗时，费力且昂贵的过程。因此，最近的监督方法着重于利用大量未标记的数据。但是，这样做是一项具有挑战性的任务。为了解决这个问题，我们提出了一种新的3D Cross伪监督（3D-CPS）方法，这是一种基于NNU-NET的半监督网络体系结构，采用交叉伪监督方法。我们设计了一种新的基于NNU-NET的预处理方法，并在推理阶段采用强制间距设置策略来加快推理时间。此外，我们将半监督的损耗重量设置为与每个时期的线性扩展，以防止在早期训练过程中模型从低质量的伪标签中。我们提出的方法在MICCAI Flare2022验证集（20例）上，平均骰子相似系数（DSC）为0.881，平均归一化表面距离（NSD）为0.913。

translated by 谷歌翻译

Improving Accuracy Without Losing Interpretability: A ML Approach for Time Series Forecasting

Yiqi Sun , Zhengxin Shi , Jianshen Zhang , Yongzhi Qi , Hao Hu , Zuojun Max Shen

分类：机器学习

2022-12-13

In time series forecasting, decomposition-based algorithms break aggregate data into meaningful components and are therefore appreciated for their particular advantages in interpretability. Recent algorithms often combine machine learning (hereafter ML) methodology with decomposition to improve prediction accuracy. However, incorporating ML is generally considered to sacrifice interpretability inevitably. In addition, existing hybrid algorithms usually rely on theoretical models with statistical assumptions and focus only on the accuracy of aggregate predictions, and thus suffer from accuracy problems, especially in component estimates. In response to the above issues, this research explores the possibility of improving accuracy without losing interpretability in time series forecasting. We first quantitatively define interpretability for data-driven forecasts and systematically review the existing forecasting algorithms from the perspective of interpretability. Accordingly, we propose the W-R algorithm, a hybrid algorithm that combines decomposition and ML from a novel perspective. Specifically, the W-R algorithm replaces the standard additive combination function with a weighted variant and uses ML to modify the estimates of all components simultaneously. We mathematically analyze the theoretical basis of the algorithm and validate its performance through extensive numerical experiments. In general, the W-R algorithm outperforms all decomposition-based and ML benchmarks. Based on P50_QL, the algorithm relatively improves by 8.76% in accuracy on the practical sales forecasts of JD.com and 77.99% on a public dataset of electricity loads. This research offers an innovative perspective to combine the statistical and ML algorithms, and JD.com has implemented the W-R algorithm to make accurate sales predictions and guide its marketing activities.

translated by 谷歌翻译

Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding

Yang Jin , Yongzhi Li , Zehuan Yuan , Yadong Mu

分类：计算机视觉

2022-09-27

时空视频接地（STVG）的重点是检索由自由形式的文本表达式描绘的特定物体的时空管。现有方法主要将这一复杂的任务视为平行框架的问题，因此遭受了两种类型的不一致缺点：特征对齐不一致和预测不一致。在本文中，我们提出了一个端到端的一阶段框架，称为时空的一致性变压器（STCAT），以减轻这些问题。特别是，我们引入了一个新颖的多模式模板，作为解决此任务的全球目标，该目标明确限制了接地区域并将所有视频框架之间的预测联系起来。此外，为了在足够的视频文本感知下生成上述模板，提出了一个编码器架构来进行有效的全局上下文建模。由于这些关键设计，STCAT享有更一致的跨模式特征对齐和管预测，而无需依赖任何预训练的对象探测器。广泛的实验表明，我们的方法在两个具有挑战性的视频基准（VIDSTG和HC-STVG）上胜过先前的最先进的，这说明了拟议框架的优越性，以更好地理解视觉与自然语言之间的关联。代码可在\ url {https://github.com/jy0205/stcat}上公开获得。

translated by 谷歌翻译

Differentially Private Condorcet Voting

Zhechen Li , Ao Liu , Lirong Xia , Yongzhi Cao , Hanpin Wang

分类：人工智能

2022-06-27

设计私人投票规则是值得信赖的民主的重要问题。在本文中，根据差异隐私的框架，我们根据知名的Condorcet方法提出了三类随机投票规则：Laplacian Condorcet方法（$ cm^{lap} _ \ lambda $），指数condorcet方法（$ cmcmential condorcet方法^{exp} _ \ lambda $）和随机响应condorcet方法（$ cm^{rr} _ \ lambda $），其中$ \ lambda $代表噪声级别。通过准确估计随机性引入的错误，我们表明$ cm^{exp} _ \ lambda $是大多数情况下最准确的机制。我们证明，我们的所有规则都满足绝对单调性，Lexi参与，概率帕累托效率，近似概率孔孔标准和近似SD-StrategyProofness。此外，$ cm^{rr} _ \ lambda $满足（非适当的）概率condorcet标准，而$ cm^{lap} _ \ lambda $和$ cm^{exp} _ \ \ lambda _ 。最后，我们将差异隐私视为投票公理，并讨论其与其他公理的关系。

translated by 谷歌翻译

FreSCo: Frequency-Domain Scan Context for LiDAR-based Place Recognition with Translation and Rotation Invariance

Yongzhi Fan , Xin Du , Jizhong Shen

分类：机器人

2022-06-25

位置识别在机器人和车辆的重新定位和循环封闭检测任务中起着至关重要的作用。本文为基于激光雷达的位置识别寻求明确定义的全球描述符。与本地描述符相比，全球描述符在城市道路场景中表现出色，但通常依赖于观点。为此，我们提出了一个简单而坚固的全局描述符，称为壁画，通过利用傅立叶变换和圆形转移技术，可以分解重新访问期间的视点差异，并实现翻译和旋转不变性。此外，还提出了一种快速的两阶段姿势估计方法，以利用从场景中提取的紧凑型2D点云来估计位置回收后的相对姿势。实验表明，在来自多个数据集的不同场景的序列上，壁画表现出比同期方法表现出更好的性能。该代码将在https://github.com/soytony/fresco上公开获取。

translated by 谷歌翻译

IKEA Object State Dataset: A 6DoF object pose estimation dataset and benchmark for multi-state assembly objects

Yongzhi Su , Mingxin Liu , Jason Rambach , Antonia Pehrson , Anton Berg , Didier Stricker

分类：计算机视觉

2021-11-16

利用6DOF（自由度）对象的姿势信息及其组件对于对象状态检测任务至关重要。我们展示了IKEA对象状态数据集，该数据集包含宜家家具3D模型，装配过程的RGBD视频，家具部件的6dof姿势及其边界盒。建议的数据集将在https://github.com/mxllmx/ikeaObjectstateTateDataSet上使用。

translated by 谷歌翻译

Rethinking Mobile Block for Efficient Neural Models

Jiangning Zhang , Xiangtai Li , Jian Li , Liang Liu , Zhucun Xue , Boshen Zhang , Zhengkai Jiang , Tianxin Huang , Yabiao Wang , Chengjie Wang

分类：计算机视觉

2023-01-03

This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.

translated by 谷歌翻译

PIE-QG: Paraphrased Information Extraction for Unsupervised Question Generation from Small Corpora

Dinesh Nagumothu , Bahadorreza Ofoghi , Guangyan Huang , Peter W. Eklund

分类：自然语言处理 | 人工智能

2023-01-03

Supervised Question Answering systems (QA systems) rely on domain-specific human-labeled data for training. Unsupervised QA systems generate their own question-answer training pairs, typically using secondary knowledge sources to achieve this outcome. Our approach (called PIE-QG) uses Open Information Extraction (OpenIE) to generate synthetic training questions from paraphrased passages and uses the question-answer pairs as training data for a language model for a state-of-the-art QA system based on BERT. Triples in the form of <subject, predicate, object> are extracted from each passage, and questions are formed with subjects (or objects) and predicates while objects (or subjects) are considered as answers. Experimenting on five extractive QA datasets demonstrates that our technique achieves on-par performance with existing state-of-the-art QA systems with the benefit of being trained on an order of magnitude fewer documents and without any recourse to external reference data sources.

translated by 谷歌翻译

A New Perspective to Boost Vision Transformer for Medical Image Classification

Yuexiang Li , Yawen Huang , Nanjun He , Kai Ma , Yefeng Zheng

分类：计算机视觉 | 人工智能

2023-01-03

Transformer has achieved impressive successes for various computer vision tasks. However, most of existing studies require to pretrain the Transformer backbone on a large-scale labeled dataset (e.g., ImageNet) for achieving satisfactory performance, which is usually unavailable for medical images. Additionally, due to the gap between medical and natural images, the improvement generated by the ImageNet pretrained weights significantly degrades while transferring the weights to medical image processing tasks. In this paper, we propose Bootstrap Own Latent of Transformer (BOLT), a self-supervised learning approach specifically for medical image classification with the Transformer backbone. Our BOLT consists of two networks, namely online and target branches, for self-supervised representation learning. Concretely, the online network is trained to predict the target network representation of the same patch embedding tokens with a different perturbation. To maximally excavate the impact of Transformer from limited medical data, we propose an auxiliary difficulty ranking task. The Transformer is enforced to identify which branch (i.e., online/target) is processing the more difficult perturbed tokens. Overall, the Transformer endeavours itself to distill the transformation-invariant features from the perturbed tokens to simultaneously achieve difficulty measurement and maintain the consistency of self-supervised representations. The proposed BOLT is evaluated on three medical image processing tasks, i.e., skin lesion classification, knee fatigue fracture grading and diabetic retinopathy grading. The experimental results validate the superiority of our BOLT for medical image classification, compared to ImageNet pretrained weights and state-of-the-art self-supervised learning approaches.

translated by 谷歌翻译

Analogical Inference Enhanced Knowledge Graph Embedding

Yao Zhen , Zhang Wen , Chen Mingyang , Huang Yufeng , Yang Yi , Chen Huajun

分类：人工智能 | 自然语言处理

2023-01-03

Knowledge graph embedding (KGE), which maps entities and relations in a knowledge graph into continuous vector spaces, has achieved great success in predicting missing links in knowledge graphs. However, knowledge graphs often contain incomplete triples that are difficult to inductively infer by KGEs. To address this challenge, we resort to analogical inference and propose a novel and general self-supervised framework AnKGE to enhance KGE models with analogical inference capability. We propose an analogical object retriever that retrieves appropriate analogical objects from entity-level, relation-level, and triple-level. And in AnKGE, we train an analogy function for each level of analogical inference with the original element embedding from a well-trained KGE model as input, which outputs the analogical object embedding. In order to combine inductive inference capability from the original KGE model and analogical inference capability enhanced by AnKGE, we interpolate the analogy score with the base model score and introduce the adaptive weights in the score function for prediction. Through extensive experiments on FB15k-237 and WN18RR datasets, we show that AnKGE achieves competitive results on link prediction task and well performs analogical inference.

translated by 谷歌翻译