智能论文笔记

Prototypical few-shot segmentation for cross-institution male pelvic structures with spatial registration

Yiwen Li , Yunguan Fu , Iani Gayo , Qianye Yang , Zhe Min , Shaheer Saeed , Wen Yan , Yipei Wang , J. Alison Noble , Mark Emberton

分类：计算机视觉

2022-09-12

在医学图像分析中需要进行几次学习的能力是对支持图像数据的有效利用，该数据被标记为对新类进行分类或细分新类，该任务否则需要更多的培训图像和专家注释。这项工作描述了一种完全3D原型的几种分段算法，因此，训练有素的网络可以有效地适应培训中缺乏的临床有趣结构，仅使用来自不同研究所的几个标记图像。首先，为了弥补机构在新型类别的情节适应中的广泛认识的空间变异性，新型的空间注册机制被整合到原型学习中，由分割头和空间对齐模块组成。其次，为了帮助训练观察到的不完美比对，提出了支持掩模调节模块，以进一步利用支持图像中可用的注释。使用589个骨盆T2加权MR图像的数据集分割了八个对介入计划的解剖结构的应用，该实验是针对介入八个机构的八个解剖结构的应用。结果证明了3D公式中的每种，空间登记和支持掩模条件的功效，所有这些条件都独立或集体地做出了积极的贡献。与先前提出的2D替代方案相比，不管支持数据来自相同还是不同的机构，都具有统计学意义的少量分割性能。

translated by 谷歌翻译

Sphere Face Model:A 3D Morphable Model with Hypersphere Manifold Latent Space

Diqiong Jiang , Yiwei Jin , Fanglue Zhang , Zhe Zhu , Yun Zhang , Ruofeng Tong , Min Tang

分类：计算机视觉

2021-12-04

3D可线模型（3DMMS）是面部形状和外观的生成模型。然而，传统3DMMS的形状参数满足多变量高斯分布，而嵌入式嵌入满足过边距分布，并且这种冲突使得面部重建模型同时保持忠诚度和形状一致性的挑战。为了解决这个问题，我们提出了一种用于单眼脸部重建的新型3DMM的球体面部模型（SFM），这可以保持既有忠诚度和身份一致性。我们的SFM的核心是可以用于重建3D面形状的基矩阵，并且通过采用在第一和第二阶段中使用3D和2D训练数据的两级训练方法来学习基本矩阵。为了解决分发不匹配，我们设计一种新的损失，使形状参数具有超球的潜在空间。广泛的实验表明，SFM具有高表示能力和形状参数空间的聚类性能。此外，它产生富翼面形状，并且形状在单眼性重建中的挑战条件下是一致的。

translated by 谷歌翻译

FFConv: Fast Factorized Convolutional Neural Network Inference on Encrypted Data

Yuxiao Lu , Jie Lin , Chao Jin , Zhe Wang , Min Wu , Khin Mi Mi Aung , Xiaoli Li

分类：人工智能

2021-02-06

同态加密（HE），允许对加密数据（Ciphertext）进行计算，而无需首先解密，因此可以实现对云中隐私性的应用程序的安全性缓慢的卷积神经网络（CNN）推断。为了减少推理潜伏期，一种方法是将多个消息打包到单个密文中，以减少密文的数量并支持同型多态多重蓄能（HMA）操作的大量并行性。尽管HECNN的推断速度更快，但主流包装方案密集的包装（密度）和卷积包装（Convpack）仍将昂贵的旋转开销引入了昂贵的旋转开销，这延长了HECNN的推断潜伏期，以实现更深和更广泛的CNN体系结构。在本文中，我们提出了一种名为FFCONV的低级分解方法，该方法专门用于有效的密文填料，用于减少旋转台面和HMA操作。 FFCONV近似于低级分解卷积的A D X D卷积层，其中D X D低率卷积具有较少的通道，然后是1 x 1卷积以恢复通道。 D X D低级别卷积带有密度，导致旋转操作显着降低，而1 x 1卷积的旋转开销接近零。据我们所知，FFCONV是能够同时减少densepack和Convpack产生的旋转头顶的第一项工作，而无需将其他特殊块引入HECNN推理管道。与先前的Art Lola和Falcon相比，我们的方法分别将推理潜伏期降低了88％和21％，其精度在MNIST和CIFAR-10上具有可比的精度。

translated by 谷歌翻译

A Survey On Few-shot Knowledge Graph Completion with Structural and Commonsense Knowledge

Haodi Ma , Daisy Zhe Wang

分类：自然语言处理 | 人工智能 | 机器学习

2023-01-03

Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.

translated by 谷歌翻译

Model-Driven Deep Learning for Non-Coherent Massive Machine-Type Communications

Zhe Ma , Wen Wu , Feifei Gao , Xuemin , Shen

分类：机器学习

2023-01-02

In this paper, we investigate the joint device activity and data detection in massive machine-type communications (mMTC) with a one-phase non-coherent scheme, where data bits are embedded in the pilot sequences and the base station simultaneously detects active devices and their embedded data bits without explicit channel estimation. Due to the correlated sparsity pattern introduced by the non-coherent transmission scheme, the traditional approximate message passing (AMP) algorithm cannot achieve satisfactory performance. Therefore, we propose a deep learning (DL) modified AMP network (DL-mAMPnet) that enhances the detection performance by effectively exploiting the pilot activity correlation. The DL-mAMPnet is constructed by unfolding the AMP algorithm into a feedforward neural network, which combines the principled mathematical model of the AMP algorithm with the powerful learning capability, thereby benefiting from the advantages of both techniques. Trainable parameters are introduced in the DL-mAMPnet to approximate the correlated sparsity pattern and the large-scale fading coefficient. Moreover, a refinement module is designed to further advance the performance by utilizing the spatial feature caused by the correlated sparsity pattern. Simulation results demonstrate that the proposed DL-mAMPnet can significantly outperform traditional algorithms in terms of the symbol error rate performance.

translated by 谷歌翻译

On Realization of Intelligent Decision-Making in the Real World: A Foundation Decision Model Perspective

Ying Wen , Ziyu Wan , Ming Zhou , Shufang Hou , Zhe Cao , Chenyang Le , Jingxiao Chen , Zheng Tian , Weinan Zhang , Jun Wang

分类：人工智能 | 机器学习

2022-12-24

Our situated environment is full of uncertainty and highly dynamic, thus hindering the widespread adoption of machine-led Intelligent Decision-Making (IDM) in real world scenarios. This means IDM should have the capability of continuously learning new skills and efficiently generalizing across wider applications. IDM benefits from any new approaches and theoretical breakthroughs that exhibit Artificial General Intelligence (AGI) breaking the barriers between tasks and applications. Recent research has well-examined neural architecture, Transformer, as a backbone foundation model and its generalization to various tasks, including computer vision, natural language processing, and reinforcement learning. We therefore argue that a foundation decision model (FDM) can be established by formulating various decision-making tasks as a sequence decoding task using the Transformer architecture; this would be a promising solution to advance the applications of IDM in more complex real world tasks. In this paper, we elaborate on how a foundation decision model improves the efficiency and generalization of IDM. We also discuss potential applications of a FDM in multi-agent game AI, production scheduling, and robotics tasks. Finally, through a case study, we demonstrate our realization of the FDM, DigitalBrain (DB1) with 1.2 billion parameters, which achieves human-level performance over 453 tasks, including text generation, images caption, video games playing, robotic control, and traveling salesman problems. As a foundation decision model, DB1 would be a baby step towards more autonomous and efficient real world IDM applications.

translated by 谷歌翻译

Generalized Decoding for Pixel, Image, and Language

Xueyan Zou , Zi-Yi Dou , Jianwei Yang , Zhe Gan , Linjie Li , Chunyuan Li , Xiyang Dai , Harkirat Behl , Jianfeng Wang , Lu Yuan

分类：计算机视觉 | 自然语言处理

2022-12-21

We present X-Decoder, a generalized decoding model that can predict pixel-level segmentation and language tokens seamlessly. X-Decodert takes as input two types of queries: (i) generic non-semantic queries and (ii) semantic queries induced from text inputs, to decode different pixel-level and token-level outputs in the same semantic space. With such a novel design, X-Decoder is the first work that provides a unified way to support all types of image segmentation and a variety of vision-language (VL) tasks. Further, our design enables seamless interactions across tasks at different granularities and brings mutual benefits by learning a common and rich pixel-level visual-semantic understanding space, without any pseudo-labeling. After pretraining on a mixed set of a limited amount of segmentation data and millions of image-text pairs, X-Decoder exhibits strong transferability to a wide range of downstream tasks in both zero-shot and finetuning settings. Notably, it achieves (1) state-of-the-art results on open-vocabulary segmentation and referring segmentation on eight datasets; (2) better or competitive finetuned performance to other generalist and specialist models on segmentation and VL tasks; and (3) flexibility for efficient finetuning and novel task composition (e.g., referring captioning and image editing). Code, demo, video, and visualization are available at https://x-decoder-vl.github.io.

translated by 谷歌翻译

Modeling Global Distribution for Federated Learning with Label Distribution Skew

Tao Sheng , Chengchao Shen , Yuan Liu , Yeyu Ou , Zhe Qu , Jianxin Wang

分类：机器学习 | 计算机视觉

2022-12-17

Federated learning achieves joint training of deep models by connecting decentralized data sources, which can significantly mitigate the risk of privacy leakage. However, in a more general case, the distributions of labels among clients are different, called ``label distribution skew''. Directly applying conventional federated learning without consideration of label distribution skew issue significantly hurts the performance of the global model. To this end, we propose a novel federated learning method, named FedMGD, to alleviate the performance degradation caused by the label distribution skew issue. It introduces a global Generative Adversarial Network to model the global data distribution without access to local datasets, so the global model can be trained using the global information of data distribution without privacy leakage. The experimental results demonstrate that our proposed method significantly outperforms the state-of-the-art on several public benchmarks. Code is available at \url{https://github.com/Sheng-T/FedMGD}.

translated by 谷歌翻译

Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models

Qiucheng Wu , Yujian Liu , Handong Zhao , Ajinkya Kale , Trung Bui , Tong Yu , Zhe Lin , Yang Zhang , Shiyu Chang

分类：计算机视觉

2022-12-16

Generative models have been widely studied in computer vision. Recently, diffusion models have drawn substantial attention due to the high quality of their generated images. A key desired property of image generative models is the ability to disentangle different attributes, which should enable modification towards a style without changing the semantic content, and the modification parameters should generalize to different images. Previous studies have found that generative adversarial networks (GANs) are inherently endowed with such disentanglement capability, so they can perform disentangled image editing without re-training or fine-tuning the network. In this work, we explore whether diffusion models are also inherently equipped with such a capability. Our finding is that for stable diffusion models, by partially changing the input text embedding from a neutral description (e.g., "a photo of person") to one with style (e.g., "a photo of person with smile") while fixing all the Gaussian random noises introduced during the denoising process, the generated images can be modified towards the target style without changing the semantic content. Based on this finding, we further propose a simple, light-weight image editing algorithm where the mixing weights of the two text embeddings are optimized for style matching and content preservation. This entire process only involves optimizing over around 50 parameters and does not fine-tune the diffusion model itself. Experiments show that the proposed method can modify a wide range of attributes, with the performance outperforming diffusion-model-based image-editing algorithms that require fine-tuning. The optimized weights generalize well to different images. Our code is publicly available at https://github.com/UCSB-NLP-Chang/DiffusionDisentanglement.

translated by 谷歌翻译

LegalRelectra: Mixed-domain Language Modeling for Long-range Legal Text Comprehension

Wenyue Hua , Yuchen Zhang , Zhe Chen , Josie Li , Melanie Weber

分类：自然语言处理

2022-12-16

The application of Natural Language Processing (NLP) to specialized domains, such as the law, has recently received a surge of interest. As many legal services rely on processing and analyzing large collections of documents, automating such tasks with NLP tools emerges as a key challenge. Many popular language models, such as BERT or RoBERTa, are general-purpose models, which have limitations on processing specialized legal terminology and syntax. In addition, legal documents may contain specialized vocabulary from other domains, such as medical terminology in personal injury text. Here, we propose LegalRelectra, a legal-domain language model that is trained on mixed-domain legal and medical corpora. We show that our model improves over general-domain and single-domain medical and legal language models when processing mixed-domain (personal injury) text. Our training architecture implements the Electra framework, but utilizes Reformer instead of BERT for its generator and discriminator. We show that this improves the model's performance on processing long passages and results in better long-range text comprehension.

translated by 谷歌翻译