智能论文笔记

Boosting Point Clouds Rendering via Radiance Mapping

Xiaoyang Huang , Yi Zhang , Bingbing Ni , Teng Li , Kai Chen , Wenjun Zhang

分类：计算机视觉

2022-10-27

Recent years we have witnessed rapid development in NeRF-based image rendering due to its high quality. However, point clouds rendering is somehow less explored. Compared to NeRF-based rendering which suffers from dense spatial sampling, point clouds rendering is naturally less computation intensive, which enables its deployment in mobile computing device. In this work, we focus on boosting the image quality of point clouds rendering with a compact model design. We first analyze the adaption of the volume rendering formulation on point clouds. Based on the analysis, we simplify the NeRF representation to a spatial mapping function which only requires single evaluation per pixel. Further, motivated by ray marching, we rectify the the noisy raw point clouds to the estimated intersection between rays and surfaces as queried coordinates, which could avoid \textit{spatial frequency collapse} and neighbor point disturbance. Composed of rasterization, spatial mapping and the refinement stages, our method achieves the state-of-the-art performance on point clouds rendering, outperforming prior works by notable margins, with a smaller model size. We obtain a PSNR of 31.74 on NeRF-Synthetic, 25.88 on ScanNet and 30.81 on DTU. Code and data are publicly available at https://github.com/seanywang0408/RadianceMapping.

translated by 谷歌翻译

A real-time dynamic obstacle tracking and mapping system for UAV navigation and collision avoidance with an RGB-D camera

Zhefan Xu , Xiaoyang Zhan , Baihan Chen , Yumeng Xiu , Chenhao Yang , Kenji Shimada

分类：机器人 | 人工智能

2022-09-17

实时动态环境感知对于拥挤空间的自动机器人至关重要。尽管流行的基于体素的映射方法可以有效地用任意复杂的形状代表3D障碍，但它们几乎无法区分静态和动态障碍，从而导致避免障碍物的性能有限。尽管在自动驾驶中存在大量基于学习的动态障碍检测算法，但四轮驱动器的有限计算资源无法使用这些方法实现实时性能。为了解决这些问题，我们为使用RGB-D摄像机提出了一个实时动态障碍物跟踪和映射系统，以避免四肢障碍物。拟议的系统首先利用带有占用体素图的深度图像来生成潜在的动态障碍区域作为建议。通过障碍区域建议，Kalman滤波器和我们的连续性过滤器将应用于跟踪每个动态障碍物。最后，使用追踪动态障碍的状态基于马尔可夫链提出了环境感知的轨迹预测方法。我们使用定制的四轮驱动器和导航计划者实施了建议的系统。仿真和物理实验表明，我们的方法可以成功地跟踪和代表动态环境中的障碍，并安全地避免障碍。

translated by 谷歌翻译

Vision-aided UAV Navigation and Dynamic Obstacle Avoidance using Gradient-based B-spline Trajectory Optimization

Zhefan Xu , Yumeng Xiu , Xiaoyang Zhan , Baihan Chen , Kenji Shimada

分类：机器人 | 人工智能

2022-09-15

导航动态环境要求机器人生成无碰撞的轨迹，并积极避免移动障碍。大多数以前的作品都基于一个单个地图表示形式（例如几何，占用率或ESDF地图）设计路径计划算法。尽管他们在静态环境中表现出成功，但由于地图表示的限制，这些方法无法同时可靠地处理静态和动态障碍。为了解决该问题，本文提出了一种利用机器人在板载视觉的基于梯度的B-Spline轨迹优化算法。深度视觉使机器人能够基于体素图以几何形式跟踪和表示动态对象。拟议的优化首先采用基于圆的指南算法，以近似避免静态障碍的成本和梯度。然后，使用视觉检测的移动对象，我们的后水平距离场同时用于防止动态碰撞。最后，采用迭代重新指导策略来生成无碰撞轨迹。仿真和物理实验证明，我们的方法可以实时运行以安全地导航动态环境。

translated by 谷歌翻译

SkullEngine: A Multi-stage CNN Framework for Collaborative CBCT Image Segmentation and Landmark Detection

Qin Liu , Han Deng , Chunfeng Lian , Xiaoyang Chen , Deqiang Xiao , Lei Ma , Xu Chen , Tianshu Kuang , Jaime Gateno , Pew-Thian Yap

分类：计算机视觉

2021-10-07

我们提出了一种叫做SkullEngine的多级粗内CNN框架，可通过协作，集成和可扩展的JSD模型和三个分段和地标检测细化模型进行高分辨率分割和大规模地标检测。我们在临床数据集中评估了由170 CBCT / CT图像组成的临床数据集，用于分割2骨骼（Midface和Mabless）的任务，并在骨骼，牙齿和软组织上检测175个临床普通的地标。

translated by 谷歌翻译

A Cooperative-Competitive Multi-Agent Framework for Auto-bidding in Online Advertising

Chao Wen , Miao Xu , Zhilin Zhang , Zhenzhe Zheng , Yuhui Wang , Xiangyu Liu , Yu Rong , Dong Xie , Xiaoyang Tan , Chuan Yu

分类：人工智能

2021-06-11

在线广告中，自动竞标已成为广告商通过简单地表达高级活动目标和约束来优化其首选广告性能指标的重要工具。以前的作品从单个代理的视图中设计了自动竞争工具，而不会在代理之间建模相互影响。在本文中，我们从分布式多功能代理人的角度来看，请考虑这个问题，并提出一个常规$ \强调{m} $ ulti - $ \强调{a} $ gent加强学习框架，以便为$ clown {a} $ uto - $ \ Underline {b} $ IDDIND，即MAAB，了解自动竞标策略。首先，我们调查自动招标代理商之间的竞争与合作关系，并提出了一个温度定期的信用分配，以建立混合合作竞争范式。通过在代理商中仔细开展竞争和合作权衡，我们可以达到均衡状态，不仅担保个人广告商的实用程序，而且保证了系统性能（即社会福利）。其次，为避免竞争低价潜在勾结行为的合作，我们进一步提交了律师代理，为每位专家设定个性化招标酒吧，然后减轻由于合作而导致的收入退化。第三，要在大型广告系统中部署MAAB，我们提出了一种平均现场方法。通过将具有与平均自动竞标代理商相同的广告商进行分组，大规模广告商之间的互动大大简化，使得培训MAAB有效地培训。在离线工业数据集和阿里巴巴广告平台上进行了广泛的实验表明，我们的方法在社会福利和收入方面优于几种基线方法。

translated by 谷歌翻译

Understanding Imbalanced Semantic Segmentation Through Neural Collapse

Zhisheng Zhong , Jiequan Cui , Yibo Yang , Xiaoyang Wu , Xiaojuan Qi , Xiangyu Zhang , Jiaya Jia

分类：计算机视觉 | 机器学习

2023-01-03

A recent study has shown a phenomenon called neural collapse in that the within-class means of features and the classifier weight vectors converge to the vertices of a simplex equiangular tight frame at the terminal phase of training for classification. In this paper, we explore the corresponding structures of the last-layer feature centers and classifiers in semantic segmentation. Based on our empirical and theoretical analysis, we point out that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes, which breaks the equiangular and maximally separated structure of neural collapse for both feature centers and classifiers. However, such a symmetric structure is beneficial to discrimination for the minor classes. To preserve these advantages, we introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure in imbalanced semantic segmentation. Experimental results show that our method can bring significant improvements on both 2D and 3D semantic segmentation benchmarks. Moreover, our method ranks 1st and sets a new record (+6.8% mIoU) on the ScanNet200 test leaderboard. Code will be available at https://github.com/dvlab-research/Imbalanced-Learning.

translated by 谷歌翻译

DDColor: Towards Photo-Realistic and Semantic-Aware Image Colorization via Dual Decoders

Xiaoyang Kang , Tao Yang , Wenqi Ouyang , Peiran Ren , Lingzhi Li , Xuansong Xie

分类：计算机视觉

2022-12-22

Automatic image colorization is a particularly challenging problem. Due to the high illness of the problem and multi-modal uncertainty, directly training a deep neural network usually leads to incorrect semantic colors and low color richness. Existing transformer-based methods can deliver better results but highly depend on hand-crafted dataset-level empirical distribution priors. In this work, we propose DDColor, a new end-to-end method with dual decoders, for image colorization. More specifically, we design a multi-scale image decoder and a transformer-based color decoder. The former manages to restore the spatial resolution of the image, while the latter establishes the correlation between semantic representations and color queries via cross-attention. The two decoders incorporate to learn semantic-aware color embedding by leveraging the multi-scale visual features. With the help of these two decoders, our method succeeds in producing semantically consistent and visually plausible colorization results without any additional priors. In addition, a simple but effective colorfulness loss is introduced to further improve the color richness of generated results. Our extensive experiments demonstrate that the proposed DDColor achieves significantly superior performance to existing state-of-the-art works both quantitatively and qualitatively. Codes will be made publicly available.

translated by 谷歌翻译

Discrimination, calibration, and point estimate accuracy of GRU-D-Weibull architecture for real-time individualized endpoint prediction

Xiaoyang Ruan , Liwei Wang , Michelle Mai , Charat Thongprayoon , Wisit Cheungpasitporn , Hongfang Liu

分类：机器学习

2022-12-19

Real-time individual endpoint prediction has always been a challenging task but of great clinic utility for both patients and healthcare providers. With 6,879 chronic kidney disease stage 4 (CKD4) patients as a use case, we explored the feasibility and performance of gated recurrent units with decay that models Weibull probability density function (GRU-D-Weibull) as a semi-parametric longitudinal model for real-time individual endpoint prediction. GRU-D-Weibull has a maximum C-index of 0.77 at 4.3 years of follow-up, compared to 0.68 achieved by competing models. The L1-loss of GRU-D-Weibull is ~66% of XGB(AFT), ~60% of MTLR, and ~30% of AFT model at CKD4 index date. The average absolute L1-loss of GRU-D-Weibull is around one year, with a minimum of 40% Parkes serious error after index date. GRU-D-Weibull is not calibrated and significantly underestimates true survival probability. Feature importance tests indicate blood pressure becomes increasingly important during follow-up, while eGFR and blood albumin are less important. Most continuous features have non-linear/parabola impact on predicted survival time, and the results are generally consistent with existing knowledge. GRU-D-Weibull as a semi-parametric temporal model shows advantages in built-in parameterization of missing, native support for asynchronously arrived measurement, capability of output both probability and point estimates at arbitrary time point for arbitrary prediction horizon, improved discrimination and point estimate accuracy after incorporating newly arrived data. Further research on its performance with more comprehensive input features, in-process or post-process calibration are warranted to benefit CKD4 or alike terminally-ill patients.

translated by 谷歌翻译

OASum: Large-Scale Open Domain Aspect-based Summarization

Xianjun Yang , Kaiqiang Song , Sangwoo Cho , Xiaoyang Wang , Xiaoman Pan , Linda Petzold , Dong Yu

分类：自然语言处理

2022-12-19

Aspect or query-based summarization has recently caught more attention, as it can generate differentiated summaries based on users' interests. However, the current dataset for aspect or query-based summarization either focuses on specific domains, contains relatively small-scale instances, or includes only a few aspect types. Such limitations hinder further explorations in this direction. In this work, we take advantage of crowd-sourcing knowledge on Wikipedia.org and automatically create a high-quality, large-scale open-domain aspect-based summarization dataset named OASum, which contains more than 3.7 million instances with around 1 million different aspects on 2 million Wikipedia pages. We provide benchmark results on OAsum and demonstrate its ability for diverse aspect-based summarization generation. To overcome the data scarcity problem on specific domains, we also perform zero-shot, few-shot, and fine-tuning on seven downstream datasets. Specifically, zero/few-shot and fine-tuning results show that the model pre-trained on our corpus demonstrates a strong aspect or query-focused generation ability compared with the backbone model. Our dataset and pre-trained checkpoints are publicly available.

translated by 谷歌翻译

MouseGAN++: Unsupervised Disentanglement and Contrastive Representation for Multiple MRI Modalities Synthesis and Structural Segmentation of Mouse Brain

Ziqi Yu , Xiaoyang Han , Shengjie Zhang , Jianfeng Feng , Tingying Peng , Xiao-Yong Zhang

分类：计算机视觉

2022-12-04

Segmenting the fine structure of the mouse brain on magnetic resonance (MR) images is critical for delineating morphological regions, analyzing brain function, and understanding their relationships. Compared to a single MRI modality, multimodal MRI data provide complementary tissue features that can be exploited by deep learning models, resulting in better segmentation results. However, multimodal mouse brain MRI data is often lacking, making automatic segmentation of mouse brain fine structure a very challenging task. To address this issue, it is necessary to fuse multimodal MRI data to produce distinguished contrasts in different brain structures. Hence, we propose a novel disentangled and contrastive GAN-based framework, named MouseGAN++, to synthesize multiple MR modalities from single ones in a structure-preserving manner, thus improving the segmentation performance by imputing missing modalities and multi-modality fusion. Our results demonstrate that the translation performance of our method outperforms the state-of-the-art methods. Using the subsequently learned modality-invariant information as well as the modality-translated images, MouseGAN++ can segment fine brain structures with averaged dice coefficients of 90.0% (T2w) and 87.9% (T1w), respectively, achieving around +10% performance improvement compared to the state-of-the-art algorithms. Our results demonstrate that MouseGAN++, as a simultaneous image synthesis and segmentation method, can be used to fuse cross-modality information in an unpaired manner and yield more robust performance in the absence of multimodal data. We release our method as a mouse brain structural segmentation tool for free academic usage at https://github.com/yu02019.

translated by 谷歌翻译