智能论文笔记

Prediction and compression of lattice QCD data using machine learning algorithms on quantum annealer

Boram Yoon , Chia Cheng Chang , Garrett T. Kenyon , Nga T. T. Nguyen , Ermal Rrapaj

分类：机器学习

2021-12-03

我们利用量子退火器的有效二进制优化能力提出了晶格QCD数据的回归和压缩算法。在回归算法中，我们将输入和输出变量与稀疏编码机学习算法中的相关性进行编码。训练有素的相关模式用于预测来自在晶格上测量的其他可观察到的看不见的晶格配置的晶格QCD可观察。在压缩算法中，我们将浮点数的晶格QCD数据定义到与来自一组基向量重建输入数据的二进制系数的映射。由于重建不是精确的，因此映射定义了有损压缩，但是，相当少量的二进制系数能够重建晶格QCD数据的输入向量向量与重建误差小于统计波动的重建误差。在这两个应用中，我们使用D波量子退火器来解决机器学习算法的NP硬二元优化问题。

translated by 谷歌翻译

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

MAGVIT: Masked Generative Video Transformer

Lijun Yu , Yong Cheng , Kihyuk Sohn , José Lezama , Han Zhang , Huiwen Chang , Alexander G. Hauptmann , Ming-Hsuan Yang , Yuan Hao , Irfan Essa

分类：计算机视觉

2022-12-10

We introduce the MAsked Generative VIdeo Transformer, MAGVIT, to tackle various video synthesis tasks with a single model. We introduce a 3D tokenizer to quantize a video into spatial-temporal visual tokens and propose an embedding method for masked video token modeling to facilitate multi-task learning. We conduct extensive experiments to demonstrate the quality, efficiency, and flexibility of MAGVIT. Our experiments show that (i) MAGVIT performs favorably against state-of-the-art approaches and establishes the best-published FVD on three video generation benchmarks, including the challenging Kinetics-600. (ii) MAGVIT outperforms existing methods in inference time by two orders of magnitude against diffusion models and by 60x against autoregressive models. (iii) A single MAGVIT model supports ten diverse generation tasks and generalizes across videos from different visual domains. The source code and trained models will be released to the public at https://magvit.cs.cmu.edu.

translated by 谷歌翻译

Perceive, Interact, Predict: Learning Dynamic and Static Clues for End-to-End Motion Prediction

Bo Jiang , Shaoyu Chen , Xinggang Wang , Bencheng Liao , Tianheng Cheng , Jiajie Chen , Helong Zhou , Qian Zhang , Wenyu Liu , Chang Huang

分类：计算机视觉 | 机器人

2022-12-05

Motion prediction is highly relevant to the perception of dynamic objects and static map elements in the scenarios of autonomous driving. In this work, we propose PIP, the first end-to-end Transformer-based framework which jointly and interactively performs online mapping, object detection and motion prediction. PIP leverages map queries, agent queries and mode queries to encode the instance-wise information of map elements, agents and motion intentions, respectively. Based on the unified query representation, a differentiable multi-task interaction scheme is proposed to exploit the correlation between perception and prediction. Even without human-annotated HD map or agent's historical tracking trajectory as guidance information, PIP realizes end-to-end multi-agent motion prediction and achieves better performance than tracking-based and HD-map-based methods. PIP provides comprehensive high-level information of the driving scene (vectorized static map and dynamic objects with motion information), and contributes to the downstream planning and control. Code and models will be released for facilitating further research.

translated by 谷歌翻译

Out-of-Candidate Rectification for Weakly Supervised Semantic Segmentation

Zesen Cheng , Pengchong Qiao , Kehan Li , Siheng Li , Pengxu Wei , Xiangyang Ji , Li Yuan , Chang Liu , Jie Chen

分类：计算机视觉

2022-11-22

Weakly supervised semantic segmentation is typically inspired by class activation maps, which serve as pseudo masks with class-discriminative regions highlighted. Although tremendous efforts have been made to recall precise and complete locations for each class, existing methods still commonly suffer from the unsolicited Out-of-Candidate (OC) error predictions that not belongs to the label candidates, which could be avoidable since the contradiction with image-level class tags is easy to be detected. In this paper, we develop a group ranking-based Out-of-Candidate Rectification (OCR) mechanism in a plug-and-play fashion. Firstly, we adaptively split the semantic categories into In-Candidate (IC) and OC groups for each OC pixel according to their prior annotation correlation and posterior prediction correlation. Then, we derive a differentiable rectification loss to force OC pixels to shift to the IC group. Incorporating our OCR with seminal baselines (e.g., AffinityNet, SEAM, MCTformer), we can achieve remarkable performance gains on both Pascal VOC (+3.2%, +3.3%, +0.8% mIoU) and MS COCO (+1.0%, +1.3%, +0.5% mIoU) datasets with negligible extra training overhead, which justifies the effectiveness and generality of our OCR.

translated by 谷歌翻译

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Teven Le Scao , Angela Fan , Christopher Akiki , Ellie Pavlick , Suzana Ilić , Daniel Hesslow , Roman Castagné , Alexandra Sasha Luccioni , François Yvon , Matthias Gallé

分类：自然语言处理

2022-11-09

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.

translated by 谷歌翻译

Towards Real World HDRTV Reconstruction: A Data Synthesis-based Approach

Zhen Cheng , Tao Wang , Yong Li , Fenglong Song , Chang Chen , Zhiwei Xiong

分类：计算机视觉

2022-11-06

Existing deep learning based HDRTV reconstruction methods assume one kind of tone mapping operators (TMOs) as the degradation procedure to synthesize SDRTV-HDRTV pairs for supervised training. In this paper, we argue that, although traditional TMOs exploit efficient dynamic range compression priors, they have several drawbacks on modeling the realistic degradation: information over-preservation, color bias and possible artifacts, making the trained reconstruction networks hard to generalize well to real-world cases. To solve this problem, we propose a learning-based data synthesis approach to learn the properties of real-world SDRTVs by integrating several tone mapping priors into both network structures and loss functions. In specific, we design a conditioned two-stream network with prior tone mapping results as a guidance to synthesize SDRTVs by both global and local transformations. To train the data synthesis network, we form a novel self-supervised content loss to constraint different aspects of the synthesized SDRTVs at regions with different brightness distributions and an adversarial loss to emphasize the details to be more realistic. To validate the effectiveness of our approach, we synthesize SDRTV-HDRTV pairs with our method and use them to train several HDRTV reconstruction networks. Then we collect two inference datasets containing both labeled and unlabeled real-world SDRTVs, respectively. Experimental results demonstrate that, the networks trained with our synthesized data generalize significantly better to these two real-world datasets than existing solutions.

translated by 谷歌翻译

Distilling Representations from GAN Generator via Squeeze and Span

Yu Yang , Xiaotian Cheng , Chang Liu , Hakan Bilen , Xiangyang Ji

分类：计算机视觉

2022-11-06

In recent years, generative adversarial networks (GANs) have been an actively studied topic and shown to successfully produce high-quality realistic images in various domains. The controllable synthesis ability of GAN generators suggests that they maintain informative, disentangled, and explainable image representations, but leveraging and transferring their representations to downstream tasks is largely unexplored. In this paper, we propose to distill knowledge from GAN generators by squeezing and spanning their representations. We squeeze the generator features into representations that are invariant to semantic-preserving transformations through a network before they are distilled into the student network. We span the distilled representation of the synthetic domain to the real domain by also using real training data to remedy the mode collapse of GANs and boost the student network performance in a real domain. Experiments justify the efficacy of our method and reveal its great significance in self-supervised representation learning. Code is available at https://github.com/yangyu12/squeeze-and-span.

translated by 谷歌翻译

MapTR: Structured Modeling and Learning for Online Vectorized HD Map Construction

Bencheng Liao , Shaoyu Chen , Xinggang Wang , Tianheng Cheng , Qian Zhang , Wenyu Liu , Chang Huang

分类：计算机视觉 | 机器人

2022-08-30

我们提出MAPTR，这是一个结构化的端到端框架，用于有效的在线矢量化高清图构建。我们提出了一种基于统一的建模方法，即将MAP元素建模为具有一组等效排列的点集，从而避免了地图元素的定义歧义并简化学习。我们采用层次查询嵌入方案来灵活编码结构化的地图信息，并对地图元素学习执行层次结构匹配。 MAPTR在Nuscenes数据集上实现了现有的矢量化MAP构造方法的最佳性能和效率。尤其是，MAPTR-NANO以RTX 3090的实时推理速度（$ 25.1 $ fps）运行，比现有的基于最新的摄像头方法快$ 8 \ times $ $，同时获得$ 3.3 $较高的地图。 Maptr-tiny在更快的速度的同时显着优于现有的最新多模式方法$ 13.5 $地图。定性结果表明，MAPTR在复杂和各种驾驶场景中保持稳定且强大的地图构造质量。可在\ url {https://github.com/hustvl/maptr}上获得丰富的演示，以证明在现实世界情景中的有效性。 MAPTR在自动驾驶中具有巨大的应用价值。代码将发布以促进进一步的研究和应用。

translated by 谷歌翻译

HTML版本

Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training

Haoxuan You , Luowei Zhou , Bin Xiao , Noel Codella , Yu Cheng , Ruochen Xu , Shih-Fu Chang , Lu Yuan

分类：计算机视觉 | 自然语言处理

2022-07-26

大规模的多模式对比预训练已经证明了通过将多种模式映射到共享嵌入空间中的一系列下游任务的可转移功能。通常，这对每种模式都采用了单独的编码器。但是，最近的工作表明，变形金刚可以支持跨多种方式学习并允许知识共享。受此启发，我们研究了各种模式共享的对比语言图像预训练（MS-CLIP）框架。更具体地说，我们质疑在对比预训练期间可以在跨模态共享变压器模型的多少个参数，并严格检查建筑设计选择，以将沿频谱共享的参数比例定位。在研究的条件下，我们观察到，视觉和语言信号的主要统一编码器优于所有其他分离更多参数的变体。此外，我们发现特定于特定于模态的平行模块进一步提高了性能。实验结果表明，所提出的MS-CLIP方法在零摄像机分类中（在YFCC-100M上进行了预训练）中，最多可超过13 \％相对的香草夹，同时支持降低参数。此外，在24个下游视觉任务的集合中，我们的方法在线性探测中优于Vanilla剪辑。此外，我们发现共享参数导致语义概念来自不同方式在嵌入空间中更接近地编码，从而促进了共同的语义结构（例如注意力模式）从语言到视觉的传递。代码可在\ href {https://github.com/hxyou/msclip} {url}中获得。

translated by 谷歌翻译