智能论文笔记

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

GeneFormer: Learned Gene Compression using Transformer-based Context Modeling

Zhanbei Cui , Yu Liao , Tongda Xu , Yan Wang

分类：机器学习

2022-12-16

With the development of gene sequencing technology, an explosive growth of gene data has been witnessed. And the storage of gene data has become an important issue. Traditional gene data compression methods rely on general software like G-zip, which fails to utilize the interrelation of nucleotide sequence. Recently, many researchers begin to investigate deep learning based gene data compression method. In this paper, we propose a transformer-based gene compression method named GeneFormer. Specifically, we first introduce a modified transformer structure to fully explore the nucleotide sequence dependency. Then, we propose fixed-length parallel grouping to accelerate the decoding speed of our autoregressive model. Experimental results on real-world datasets show that our method saves 29.7% bit rate compared with the state-of-the-art method, and the decoding speed is significantly faster than all existing learning-based gene compression methods.

translated by 谷歌翻译

SteerNeRF: Accelerating NeRF Rendering via Smooth Viewpoint Trajectory

Sicheng Li , Hao Li , Yue Wang , Yiyi Liao , Lu Yu

分类：计算机视觉

2022-12-15

Neural Radiance Fields (NeRF) have demonstrated superior novel view synthesis performance but are slow at rendering. To speed up the volume rendering process, many acceleration methods have been proposed at the cost of large memory consumption. To push the frontier of the efficiency-memory trade-off, we explore a new perspective to accelerate NeRF rendering, leveraging a key fact that the viewpoint change is usually smooth and continuous in interactive viewpoint control. This allows us to leverage the information of preceding viewpoints to reduce the number of rendered pixels as well as the number of sampled points along the ray of the remaining pixels. In our pipeline, a low-resolution feature map is rendered first by volume rendering, then a lightweight 2D neural renderer is applied to generate the output image at target resolution leveraging the features of preceding and current frames. We show that the proposed method can achieve competitive rendering quality while reducing the rendering time with little memory overhead, enabling 30FPS at 1080P image resolution with a low memory footprint.

translated by 谷歌翻译

Learning to See Through with Events

Lei Yu , Xiang Zhang , Wei Liao , Wen Yang , Gui-Song Xia

分类：计算机视觉

2022-12-05

Although synthetic aperture imaging (SAI) can achieve the seeing-through effect by blurring out off-focus foreground occlusions while recovering in-focus occluded scenes from multi-view images, its performance is often deteriorated by dense occlusions and extreme lighting conditions. To address the problem, this paper presents an Event-based SAI (E-SAI) method by relying on the asynchronous events with extremely low latency and high dynamic range acquired by an event camera. Specifically, the collected events are first refocused by a Refocus-Net module to align in-focus events while scattering out off-focus ones. Following that, a hybrid network composed of spiking neural networks (SNNs) and convolutional neural networks (CNNs) is proposed to encode the spatio-temporal information from the refocused events and reconstruct a visual image of the occluded targets. Extensive experiments demonstrate that our proposed E-SAI method can achieve remarkable performance in dealing with very dense occlusions and extreme lighting conditions and produce high-quality images from pure events. Codes and datasets are available at https://dvs-whu.cn/projects/esai/.

translated by 谷歌翻译

Geometry-Aware Network for Domain Adaptive Semantic Segmentation

Yinghong Liao , Wending Zhou , Xu Yan , Shuguang Cui , Yizhou Yu , Zhen Li

分类：计算机视觉

2022-12-02

Measuring and alleviating the discrepancies between the synthetic (source) and real scene (target) data is the core issue for domain adaptive semantic segmentation. Though recent works have introduced depth information in the source domain to reinforce the geometric and semantic knowledge transfer, they cannot extract the intrinsic 3D information of objects, including positions and shapes, merely based on 2D estimated depth. In this work, we propose a novel Geometry-Aware Network for Domain Adaptation (GANDA), leveraging more compact 3D geometric point cloud representations to shrink the domain gaps. In particular, we first utilize the auxiliary depth supervision from the source domain to obtain the depth prediction in the target domain to accomplish structure-texture disentanglement. Beyond depth estimation, we explicitly exploit 3D topology on the point clouds generated from RGB-D images for further coordinate-color disentanglement and pseudo-labels refinement in the target domain. Moreover, to improve the 2D classifier in the target domain, we perform domain-invariant geometric adaptation from source to target and unify the 2D semantic and 3D geometric segmentation results in two domains. Note that our GANDA is plug-and-play in any existing UDA framework. Qualitative and quantitative results demonstrate that our model outperforms state-of-the-arts on GTA5->Cityscapes and SYNTHIA->Cityscapes.

translated by 谷歌翻译

Human Performance Modeling and Rendering via Neural Animated Mesh

Fuqiang Zhao , Yuheng Jiang , Kaixin Yao , Jiakai Zhang , Liao Wang , Haizhao Dai , Yuhui Zhong , Yingliang Zhang , Minye Wu , Lan Xu

分类：计算机视觉

2022-09-18

最近，我们看到了照片真实的人类建模和渲染的神经进展取得的巨大进展。但是，将它们集成到现有的下游应用程序中的现有网络管道中仍然具有挑战性。在本文中，我们提出了一种全面的神经方法，用于从密集的多视频视频中对人类表演进行高质量重建，压缩和渲染。我们的核心直觉是用一系列高效的神经技术桥接传统的动画网格工作流程。我们首先引入一个神经表面重建器，以在几分钟内进行高质量的表面产生。它与多分辨率哈希编码的截短签名距离场（TSDF）的隐式体积渲染相结合。我们进一步提出了一个混合神经跟踪器来生成动画网格，该网格将明确的非刚性跟踪与自我监督框架中的隐式动态变形结合在一起。前者将粗糙的翘曲返回到规范空间中，而后者隐含的一个隐含物进一步预测了使用4D哈希编码的位移，如我们的重建器中。然后，我们使用获得的动画网格讨论渲染方案，从动态纹理到各种带宽设置下的Lumigraph渲染。为了在质量和带宽之间取得复杂的平衡，我们通过首先渲染6个虚拟视图来涵盖表演者，然后进行闭塞感知的神经纹理融合，提出一个分层解决方案。我们证明了我们方法在各种平台上的各种基于网格的应用程序和照片真实的自由观看体验中的功效，即，通过移动AR插入虚拟人类的表演，或通过移动AR插入真实环境，或带有VR头戴式的人才表演。

translated by 谷歌翻译

Advanced Conditional Variational Autoencoders (A-CVAE): Towards interpreting open-domain conversation generation via disentangling latent feature representation

Ye Wang , Jingbo Liao , Hong Yu , Guoyin Wang , Xiaoxia Zhang , Li Liu

分类：自然语言处理 | 人工智能

2022-07-26

目前，基于端到端深度学习的开放域对话系统仍然是黑匣子模型，使其易于与数据驱动的模型生成无关的内容。具体而言，由于缺乏指导培训的先验知识，潜在变量在潜在空间中与不同的语义纠缠在一起。为了解决这个问题，本文提议通过涉及介绍量表特征分离的认知方法来利用生成模型。特别是，该模型将宏观指导类别知识和微观级别的开放域对话数据集成到培训中，并将先验知识利用到潜在空间中，从而使模型能够将潜在变量置于介镜范围内的潜在变量。此外，我们为开放域对话提出了一个新的指标，可以客观地评估潜在空间分布的解释性。最后，我们在不同的数据集上验证了我们的模型，并在实验上证明我们的模型能够比其他模型产生更高的质量和更容易解释的对话。

translated by 谷歌翻译

Physics-Informed Statistical Modeling for Wildfire Aerosols Process Using Multi-Source Geostationary Satellite Remote-Sensing Data Streams

Guanzhou Wei , Venkat Krishnan , Yu Xie , Manajit Sengupta , Yingchen Zhang , Haitao Liao , Xiao Liu

分类： (统计)机器学习

2022-06-23

随着野火产生的大气气溶胶减少了向地球的传入太阳辐射，越来越频繁的野火会显着影响太阳能的产生。通过气溶胶光学深度（AOD）测量大气气溶胶，可以通过地球静止卫星检索和监测AOD数据流。但是，多源遥感数据流通常具有异质特征，包括不同的数据缺失率，测量误差，系统偏见等。为了准确估计和预测潜在的AOD传播过程，存在实践需求和理论利益，以提出一种通过同时利用或融合多种源的异质卫星远程远程远程灵感数据来建模物理信息的统计方法。提出的方法利用光谱方法将多源卫星数据流与控制AOD传播过程的基本对流扩散方程相结合。统计模型中包括一个偏差校正过程，以说明物理模型的偏差和傅立叶系列的截断误差。提出的方法适用于从国家海洋和大气管理局获得的加利福尼亚野火AOD数据流。提供了全面的数值示例，以证明所提出方法的预测能力和模型解释性。计算机代码已在GitHub上提供。

translated by 谷歌翻译

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Aarohi Srivastava , Abhinav Rastogi , Abhishek Rao , Abu Awal Md Shoeb , Abubakar Abid , Adam Fisch , Adam R. Brown , Adam Santoro , Aditya Gupta , Adrià Garriga-Alonso

分类：自然语言处理 | 人工智能 | 机器学习 | (统计)机器学习

2022-06-09

语言模型既展示了定量的改进，又展示了新的定性功能，随着规模的增加。尽管它们具有潜在的变革性影响，但这些新能力的特征却很差。为了为未来的研究提供信息，为破坏性的新模型能力做准备，并改善社会有害的效果，至关重要的是，我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战，我们介绍了超越模仿游戏基准（Big Bench）。 Big Bench目前由204个任务组成，由132家机构的442位作者贡献。任务主题是多样的，从语言学，儿童发展，数学，常识性推理，生物学，物理学，社会偏见，软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号，Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为，跨越了数百万到数十亿个参数。此外，一个人类专家评估者团队执行了所有任务，以提供强大的基准。研究结果包括：模型性能和校准都随规模改善，但绝对的术语（以及与评估者的性能相比）；在模型类中的性能非常相似，尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分，而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标；社交偏见通常会随着含糊不清的环境而随着规模而增加，但这可以通过提示来改善。

translated by 谷歌翻译

HairCLIP: Design Your Hair by Text and Reference Image

Tianyi Wei , Dongdong Chen , Wenbo Zhou , Jing Liao , Zhentao Tan , Lu Yuan , Weiming Zhang , Nenghai Yu

分类：计算机视觉

2021-12-09

头发编辑是计算机视觉和图形中有趣和挑战的问题。许多现有方法需要粗略的草图或掩码作为用于编辑的条件输入，但是这些交互既不直接也不高效。为了从繁琐的相互作用过程中获取用户，本文提出了一种新的头发编辑交互模式，其能够基于用户提供的文本或参考图像单独地或共同地操纵头发属性。为此目的，我们通过利用对比语言图像预训练（剪辑）模型的强大图像文本表示能力来编码共享嵌入空间中的图像和文本条件，并提出统一的头发编辑框架。通过精心设计的网络结构和丢失功能，我们的框架可以以脱谕方式执行高质量的头发编辑。广泛的实验在操纵准确性，编辑结果的视觉现实主义和无关的属性保存方面表现出我们的方法的优越性。项目repo是https://github.com/wty-ustc/hairclip。

translated by 谷歌翻译