智能论文笔记

Understanding Imbalanced Semantic Segmentation Through Neural Collapse

Zhisheng Zhong , Jiequan Cui , Yibo Yang , Xiaoyang Wu , Xiaojuan Qi , Xiangyu Zhang , Jiaya Jia

分类：计算机视觉 | 机器学习

2023-01-03

A recent study has shown a phenomenon called neural collapse in that the within-class means of features and the classifier weight vectors converge to the vertices of a simplex equiangular tight frame at the terminal phase of training for classification. In this paper, we explore the corresponding structures of the last-layer feature centers and classifiers in semantic segmentation. Based on our empirical and theoretical analysis, we point out that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes, which breaks the equiangular and maximally separated structure of neural collapse for both feature centers and classifiers. However, such a symmetric structure is beneficial to discrimination for the minor classes. To preserve these advantages, we introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure in imbalanced semantic segmentation. Experimental results show that our method can bring significant improvements on both 2D and 3D semantic segmentation benchmarks. Moreover, our method ranks 1st and sets a new record (+6.8% mIoU) on the ScanNet200 test leaderboard. Code will be available at https://github.com/dvlab-research/Imbalanced-Learning.

translated by 谷歌翻译

A Cooperative-Competitive Multi-Agent Framework for Auto-bidding in Online Advertising

Chao Wen , Miao Xu , Zhilin Zhang , Zhenzhe Zheng , Yuhui Wang , Xiangyu Liu , Yu Rong , Dong Xie , Xiaoyang Tan , Chuan Yu

分类：人工智能

2021-06-11

在线广告中，自动竞标已成为广告商通过简单地表达高级活动目标和约束来优化其首选广告性能指标的重要工具。以前的作品从单个代理的视图中设计了自动竞争工具，而不会在代理之间建模相互影响。在本文中，我们从分布式多功能代理人的角度来看，请考虑这个问题，并提出一个常规$ \强调{m} $ ulti - $ \强调{a} $ gent加强学习框架，以便为$ clown {a} $ uto - $ \ Underline {b} $ IDDIND，即MAAB，了解自动竞标策略。首先，我们调查自动招标代理商之间的竞争与合作关系，并提出了一个温度定期的信用分配，以建立混合合作竞争范式。通过在代理商中仔细开展竞争和合作权衡，我们可以达到均衡状态，不仅担保个人广告商的实用程序，而且保证了系统性能（即社会福利）。其次，为避免竞争低价潜在勾结行为的合作，我们进一步提交了律师代理，为每位专家设定个性化招标酒吧，然后减轻由于合作而导致的收入退化。第三，要在大型广告系统中部署MAAB，我们提出了一种平均现场方法。通过将具有与平均自动竞标代理商相同的广告商进行分组，大规模广告商之间的互动大大简化，使得培训MAAB有效地培训。在离线工业数据集和阿里巴巴广告平台上进行了广泛的实验表明，我们的方法在社会福利和收入方面优于几种基线方法。

translated by 谷歌翻译

Greedy-Step Off-Policy Reinforcement Learning

Yuhui Wang , Qingyuan Wu , Pengcheng He , Xiaoyang Tan

分类：机器学习 | 人工智能

2021-02-23

大多数政策评估算法基于Bellman期望和最优性方程的理论，它导出了两个流行的方法 - 政策迭代（PI）和价值迭代（VI）。然而，由于多步骤禁止校正的大方差，多步引导往往是在基于PI的基于PI的方法的交叉目的和禁止策略学习。相比之下，基于VI的方法是自然的违规政策，但受到一步学习的影响。本文通过利用具有最优值函数的多步自举函数的潜在结构来推导新的多步贝尔曼最优性方程。通过这种新的等式，我们推出了一种新的多步值迭代方法，该方法将以指数收缩率$ \ mathcal {o}（\ gamma ^ n）$但仅线性计算复杂度收敛到最佳值函数。此外，它可以自然地推导出一套多步脱离策略算法，可以安全地利用任意策略收集的数据，无需校正。实验表明，所提出的方法是可靠的，易于实施和实现最先进的性能在一系列标准基准数据集上。

translated by 谷歌翻译

DDColor: Towards Photo-Realistic and Semantic-Aware Image Colorization via Dual Decoders

Xiaoyang Kang , Tao Yang , Wenqi Ouyang , Peiran Ren , Lingzhi Li , Xuansong Xie

分类：计算机视觉

2022-12-22

Automatic image colorization is a particularly challenging problem. Due to the high illness of the problem and multi-modal uncertainty, directly training a deep neural network usually leads to incorrect semantic colors and low color richness. Existing transformer-based methods can deliver better results but highly depend on hand-crafted dataset-level empirical distribution priors. In this work, we propose DDColor, a new end-to-end method with dual decoders, for image colorization. More specifically, we design a multi-scale image decoder and a transformer-based color decoder. The former manages to restore the spatial resolution of the image, while the latter establishes the correlation between semantic representations and color queries via cross-attention. The two decoders incorporate to learn semantic-aware color embedding by leveraging the multi-scale visual features. With the help of these two decoders, our method succeeds in producing semantically consistent and visually plausible colorization results without any additional priors. In addition, a simple but effective colorfulness loss is introduced to further improve the color richness of generated results. Our extensive experiments demonstrate that the proposed DDColor achieves significantly superior performance to existing state-of-the-art works both quantitatively and qualitatively. Codes will be made publicly available.

translated by 谷歌翻译

Discrimination, calibration, and point estimate accuracy of GRU-D-Weibull architecture for real-time individualized endpoint prediction

Xiaoyang Ruan , Liwei Wang , Michelle Mai , Charat Thongprayoon , Wisit Cheungpasitporn , Hongfang Liu

分类：机器学习

2022-12-19

Real-time individual endpoint prediction has always been a challenging task but of great clinic utility for both patients and healthcare providers. With 6,879 chronic kidney disease stage 4 (CKD4) patients as a use case, we explored the feasibility and performance of gated recurrent units with decay that models Weibull probability density function (GRU-D-Weibull) as a semi-parametric longitudinal model for real-time individual endpoint prediction. GRU-D-Weibull has a maximum C-index of 0.77 at 4.3 years of follow-up, compared to 0.68 achieved by competing models. The L1-loss of GRU-D-Weibull is ~66% of XGB(AFT), ~60% of MTLR, and ~30% of AFT model at CKD4 index date. The average absolute L1-loss of GRU-D-Weibull is around one year, with a minimum of 40% Parkes serious error after index date. GRU-D-Weibull is not calibrated and significantly underestimates true survival probability. Feature importance tests indicate blood pressure becomes increasingly important during follow-up, while eGFR and blood albumin are less important. Most continuous features have non-linear/parabola impact on predicted survival time, and the results are generally consistent with existing knowledge. GRU-D-Weibull as a semi-parametric temporal model shows advantages in built-in parameterization of missing, native support for asynchronously arrived measurement, capability of output both probability and point estimates at arbitrary time point for arbitrary prediction horizon, improved discrimination and point estimate accuracy after incorporating newly arrived data. Further research on its performance with more comprehensive input features, in-process or post-process calibration are warranted to benefit CKD4 or alike terminally-ill patients.

translated by 谷歌翻译

OASum: Large-Scale Open Domain Aspect-based Summarization

Xianjun Yang , Kaiqiang Song , Sangwoo Cho , Xiaoyang Wang , Xiaoman Pan , Linda Petzold , Dong Yu

分类：自然语言处理

2022-12-19

Aspect or query-based summarization has recently caught more attention, as it can generate differentiated summaries based on users' interests. However, the current dataset for aspect or query-based summarization either focuses on specific domains, contains relatively small-scale instances, or includes only a few aspect types. Such limitations hinder further explorations in this direction. In this work, we take advantage of crowd-sourcing knowledge on Wikipedia.org and automatically create a high-quality, large-scale open-domain aspect-based summarization dataset named OASum, which contains more than 3.7 million instances with around 1 million different aspects on 2 million Wikipedia pages. We provide benchmark results on OAsum and demonstrate its ability for diverse aspect-based summarization generation. To overcome the data scarcity problem on specific domains, we also perform zero-shot, few-shot, and fine-tuning on seven downstream datasets. Specifically, zero/few-shot and fine-tuning results show that the model pre-trained on our corpus demonstrates a strong aspect or query-focused generation ability compared with the backbone model. Our dataset and pre-trained checkpoints are publicly available.

translated by 谷歌翻译

MouseGAN++: Unsupervised Disentanglement and Contrastive Representation for Multiple MRI Modalities Synthesis and Structural Segmentation of Mouse Brain

Ziqi Yu , Xiaoyang Han , Shengjie Zhang , Jianfeng Feng , Tingying Peng , Xiao-Yong Zhang

分类：计算机视觉

2022-12-04

Segmenting the fine structure of the mouse brain on magnetic resonance (MR) images is critical for delineating morphological regions, analyzing brain function, and understanding their relationships. Compared to a single MRI modality, multimodal MRI data provide complementary tissue features that can be exploited by deep learning models, resulting in better segmentation results. However, multimodal mouse brain MRI data is often lacking, making automatic segmentation of mouse brain fine structure a very challenging task. To address this issue, it is necessary to fuse multimodal MRI data to produce distinguished contrasts in different brain structures. Hence, we propose a novel disentangled and contrastive GAN-based framework, named MouseGAN++, to synthesize multiple MR modalities from single ones in a structure-preserving manner, thus improving the segmentation performance by imputing missing modalities and multi-modality fusion. Our results demonstrate that the translation performance of our method outperforms the state-of-the-art methods. Using the subsequently learned modality-invariant information as well as the modality-translated images, MouseGAN++ can segment fine brain structures with averaged dice coefficients of 90.0% (T2w) and 87.9% (T1w), respectively, achieving around +10% performance improvement compared to the state-of-the-art algorithms. Our results demonstrate that MouseGAN++, as a simultaneous image synthesis and segmentation method, can be used to fuse cross-modality information in an unpaired manner and yield more robust performance in the absence of multimodal data. We release our method as a mouse brain structural segmentation tool for free academic usage at https://github.com/yu02019.

translated by 谷歌翻译

Boosting Point Clouds Rendering via Radiance Mapping

Xiaoyang Huang , Yi Zhang , Bingbing Ni , Teng Li , Kai Chen , Wenjun Zhang

分类：计算机视觉

2022-10-27

Recent years we have witnessed rapid development in NeRF-based image rendering due to its high quality. However, point clouds rendering is somehow less explored. Compared to NeRF-based rendering which suffers from dense spatial sampling, point clouds rendering is naturally less computation intensive, which enables its deployment in mobile computing device. In this work, we focus on boosting the image quality of point clouds rendering with a compact model design. We first analyze the adaption of the volume rendering formulation on point clouds. Based on the analysis, we simplify the NeRF representation to a spatial mapping function which only requires single evaluation per pixel. Further, motivated by ray marching, we rectify the the noisy raw point clouds to the estimated intersection between rays and surfaces as queried coordinates, which could avoid \textit{spatial frequency collapse} and neighbor point disturbance. Composed of rasterization, spatial mapping and the refinement stages, our method achieves the state-of-the-art performance on point clouds rendering, outperforming prior works by notable margins, with a smaller model size. We obtain a PSNR of 31.74 on NeRF-Synthetic, 25.88 on ScanNet and 30.81 on DTU. Code and data are publicly available at https://github.com/seanywang0408/RadianceMapping.

translated by 谷歌翻译

Denoising of 3D MR images using a voxel-wise hybrid residual MLP-CNN model to improve small lesion diagnostic confidence

Haibo Yang , Shengjie Zhang , Xiaoyang Han , Botao Zhao , Yan Ren , Yaru Sheng , Xiao-Yong Zhang

分类：计算机视觉

2022-09-28

磁共振成像（MRI）图像中的小病变对于多种疾病的临床诊断至关重要。但是，MRI质量很容易被各种噪声降解，这可以极大地影响小病变的诊断准确性。尽管已经提出了一些用于降级MR图像的方法，但缺乏提高特定于任务的降级方法来提高小病变的诊断信心。在这项工作中，我们建议通过体素杂种残留MLP-CNN模型来降低具有小病变的三维（3D）MR图像。我们结合了基本的深度学习体系结构MLP和CNN，以获得适当的固有偏差，以通过添加残差连接来利用远距离信息，以使图像降低并整合MLP和CNN中的每个输出层。我们在720 T2-Flair脑图像上评估了所提出的方法，其在不同的噪声水平下具有较小的病变。结果表明，与最先进的方法相比，在定量和视觉评估中，我们的方法在测试数据集上具有优势。此外，两名经验丰富的放射科医生同意，在中等和高噪声水平下，我们的方法在恢复小病变和整体图像质量方面优于其他方法。我们的方法的实现可在https://github.com/laowangbobo/Residual_MLP_CNN_MIXER上获得。

translated by 谷歌翻译

Boosting Star-GANs for Voice Conversion with Contrastive Discriminator

Shijing Si , Jianzong Wang , Xulong Zhang , Xiaoyang Qu , Ning Cheng , Jing Xiao

分类：人工智能 | 机器学习

2022-09-21

非平行的多域语音转换方法（例如Stargan-VC）在许多情况下已被广泛应用。但是，这些模型的培训通常由于其复杂的对抗网络体系结构而构成挑战。为了解决这个问题，在这项工作中，我们利用最先进的对比学习技术，并将有效的暹罗网络结构纳入Stargan歧视者。我们的方法称为Simsiam-Stargan-VC，它提高了训练稳定性，并有效地防止了训练过程中的歧视者过度拟合问题。我们对语音转换挑战（VCC 2018）数据集进行了实验，并进行了用户研究，以验证我们的框架性能。我们的实验结果表明，Simsiam-Stargan-VC在客观和主观指标方面显着优于现有的Stargan-VC方法。

translated by 谷歌翻译