神经体系结构搜索方法寻求具有有效的体重共享超级网训练的最佳候选者。但是,最近的研究表明,关于独立架构和共享重量网络之间的性能的排名一致性差。在本文中,我们提出了提前引导的一声NAS(PGONA),以加强超级网的排名相关性。具体而言,我们首先探讨激活功能的效果,并提出基于三明治规则的平衡采样策略,以减轻超级网中的重量耦合。然后,采用了拖鞋和禅宗得分来指导超级网的训练,并具有排名相关性损失。我们的PGONA在CVPR2022第二轻型NAS挑战赛的SuperNet轨道中排名第三。代码可在https://github.com/pprp/cvpr2022-nas?competition-track1-3th-solution中找到。
translated by 谷歌翻译
在过去的几十年中,卷积神经网络(CNN)在计算机视觉方面取得了令人印象深刻的成功。图像卷积操作可帮助CNN在与图像相关的任务上获得良好的性能。但是,图像卷积具有很高的计算复杂性,难以实现。本文提出了可以在频域中训练的Cemnet。这项研究的最重要动机是,我们可以根据互相关定理替换频域中的直接元素乘法操作来替换频域中的图像卷积,从而显然降低了计算复杂性。我们进一步介绍了一种体重固定机制,以减轻过度拟合的问题,并分析批准,泄漏的速度和频域中辍学的工作行为,以设计其为Cemnet的对应物。此外,为了处理由离散的傅立叶变换带来的复杂输入,我们为CENNET设计了两个分支网络结构。实验结果表明,Cemnet在MNIST和CIFAR-10数据库上取得了良好的性能。
translated by 谷歌翻译
在本文中,我们提出了MENAS,这是一种有效的基于多试剂进化的NAS方法,人类干预较少。具体而言,我们提出了一个扩大的搜索空间(Mobilenet3-MT),用于Imagenet-1K,并提高两个方面的搜索效率。首先,MENAS共同探索建筑和最佳修剪候选人(彩票),逐渐减少了人口中的平均模型。每种型号都经过培训,并由其彩票票取代,而不是首先搜索繁琐的网络然后进行修剪。其次,我们介绍了个人体重共享,该分享专门用于多重试验NAS,旨在通过分享父母和子女网络之间的权重来摊销培训成本。与超级网的重量共享相比,单个体重分享的排名一致性更为可靠,同时通过防止复杂的超级网训练易于实现。此外,为了使被困在小型模型中的进化过程正规化,在制定父群体时,我们保留了最大模型的小比例,这被证明有益于增强模型性能。广泛的实验结果证明了十分的优势。在ImagEnet-1K数据库上,MENA可实现80.5%的TOP-1准确性,而无需涉及知识蒸馏或更大的图像分辨率。代码和型号将可用。
translated by 谷歌翻译
Geometry problem solving is a well-recognized testbed for evaluating the high-level multi-modal reasoning capability of deep models. In most existing works, two main geometry problems: calculation and proving, are usually treated as two specific tasks, hindering a deep model to unify its reasoning capability on multiple math tasks. However, in essence, these two tasks have similar problem representations and overlapped math knowledge which can improve the understanding and reasoning ability of a deep model on both two tasks. Therefore, we construct a large-scale Unified Geometry problem benchmark, UniGeo, which contains 4,998 calculation problems and 9,543 proving problems. Each proving problem is annotated with a multi-step proof with reasons and mathematical expressions. The proof can be easily reformulated as a proving sequence that shares the same formats with the annotated program sequence for calculation problems. Naturally, we also present a unified multi-task Geometric Transformer framework, Geoformer, to tackle calculation and proving problems simultaneously in the form of sequence generation, which finally shows the reasoning ability can be improved on both two tasks by unifying formulation. Furthermore, we propose a Mathematical Expression Pretraining (MEP) method that aims to predict the mathematical expressions in the problem solution, thus improving the Geoformer model. Experiments on the UniGeo demonstrate that our proposed Geoformer obtains state-of-the-art performance by outperforming task-specific model NGS with over 5.6% and 3.2% accuracies on calculation and proving problems, respectively.
translated by 谷歌翻译
How to effectively explore the colors of reference exemplars and propagate them to colorize each frame is vital for exemplar-based video colorization. In this paper, we present an effective BiSTNet to explore colors of reference exemplars and utilize them to help video colorization by a bidirectional temporal feature fusion with the guidance of semantic image prior. We first establish the semantic correspondence between each frame and the reference exemplars in deep feature space to explore color information from reference exemplars. Then, to better propagate the colors of reference exemplars into each frame and avoid the inaccurate matches colors from exemplars we develop a simple yet effective bidirectional temporal feature fusion module to better colorize each frame. We note that there usually exist color-bleeding artifacts around the boundaries of the important objects in videos. To overcome this problem, we further develop a mixed expert block to extract semantic information for modeling the object boundaries of frames so that the semantic image prior can better guide the colorization process for better performance. In addition, we develop a multi-scale recurrent block to progressively colorize frames in a coarse-to-fine manner. Extensive experimental results demonstrate that the proposed BiSTNet performs favorably against state-of-the-art methods on the benchmark datasets. Our code will be made available at \url{https://yyang181.github.io/BiSTNet/}
translated by 谷歌翻译
We study the problem of learning online packing skills for irregular 3D shapes, which is arguably the most challenging setting of bin packing problems. The goal is to consecutively move a sequence of 3D objects with arbitrary shapes into a designated container with only partial observations of the object sequence. Meanwhile, we take physical realizability into account, involving physics dynamics and constraints of a placement. The packing policy should understand the 3D geometry of the object to be packed and make effective decisions to accommodate it in the container in a physically realizable way. We propose a Reinforcement Learning (RL) pipeline to learn the policy. The complex irregular geometry and imperfect object placement together lead to huge solution space. Direct training in such space is prohibitively data intensive. We instead propose a theoretically-provable method for candidate action generation to reduce the action space of RL and the learning burden. A parameterized policy is then learned to select the best placement from the candidates. Equipped with an efficient method of asynchronous RL acceleration and a data preparation process of simulation-ready training sequences, a mature packing policy can be trained in a physics-based environment within 48 hours. Through extensive evaluation on a variety of real-life shape datasets and comparisons with state-of-the-art baselines, we demonstrate that our method outperforms the best-performing baseline on all datasets by at least 12.8% in terms of packing utility.
translated by 谷歌翻译
A critical challenge in multi-agent reinforcement learning(MARL) is for multiple agents to efficiently accomplish complex, long-horizon tasks. The agents often have difficulties in cooperating on common goals, dividing complex tasks, and planning through several stages to make progress. We propose to address these challenges by guiding agents with programs designed for parallelization, since programs as a representation contain rich structural and semantic information, and are widely used as abstractions for long-horizon tasks. Specifically, we introduce Efficient Multi-Agent Reinforcement Learning with Parallel Program Guidance(E-MAPP), a novel framework that leverages parallel programs to guide multiple agents to efficiently accomplish goals that require planning over $10+$ stages. E-MAPP integrates the structural information from a parallel program, promotes the cooperative behaviors grounded in program semantics, and improves the time efficiency via a task allocator. We conduct extensive experiments on a series of challenging, long-horizon cooperative tasks in the Overcooked environment. Results show that E-MAPP outperforms strong baselines in terms of the completion rate, time efficiency, and zero-shot generalization ability by a large margin.
translated by 谷歌翻译
Hyperbolic space is emerging as a promising learning space for representation learning, owning to its exponential growth volume. Compared with the flat Euclidean space, the curved hyperbolic space is far more ambient and embeddable, particularly for datasets with implicit tree-like architectures, such as hierarchies and power-law distributions. On the other hand, the structure of a real-world network is usually intricate, with some regions being tree-like, some being flat, and others being circular. Directly embedding heterogeneous structural networks into a homogeneous embedding space unavoidably brings inductive biases and distortions. Inspiringly, the discrete curvature can well describe the local structure of a node and its surroundings, which motivates us to investigate the information conveyed by the network topology explicitly in improving geometric learning. To this end, we explore the properties of the local discrete curvature of graph topology and the continuous global curvature of embedding space. Besides, a Hyperbolic Curvature-aware Graph Neural Network, HCGNN, is further proposed. In particular, HCGNN utilizes the discrete curvature to lead message passing of the surroundings and adaptively adjust the continuous curvature simultaneously. Extensive experiments on node classification and link prediction tasks show that the proposed method outperforms various competitive models by a large margin in both high and low hyperbolic graph data. Case studies further illustrate the efficacy of discrete curvature in finding local clusters and alleviating the distortion caused by hyperbolic geometry.
translated by 谷歌翻译
Recent advances in neural rendering imply a future of widespread visual data distributions through sharing NeRF model weights. However, while common visual data (images and videos) have standard approaches to embed ownership or copyright information explicitly or subtly, the problem remains unexplored for the emerging NeRF format. We present StegaNeRF, a method for steganographic information embedding in NeRF renderings. We design an optimization framework allowing accurate hidden information extractions from images rendered by NeRF, while preserving its original visual quality. We perform experimental evaluations of our method under several potential deployment scenarios, and we further discuss the insights discovered through our analysis. StegaNeRF signifies an initial exploration into the novel problem of instilling customizable, imperceptible, and recoverable information to NeRF renderings, with minimal impact to rendered images. Project page: https://xggnet.github.io/StegaNeRF/.
translated by 谷歌翻译
This paper describes the submission of the RoyalFlush neural machine translation system for the WMT 2022 translation efficiency task. Unlike the commonly used autoregressive translation system, we adopted a two-stage translation paradigm called Hybrid Regression Translation (HRT) to combine the advantages of autoregressive and non-autoregressive translation. Specifically, HRT first autoregressively generates a discontinuous sequence (e.g., make a prediction every $k$ tokens, $k>1$) and then fills in all previously skipped tokens at once in a non-autoregressive manner. Thus, we can easily trade off the translation quality and speed by adjusting $k$. In addition, by integrating other modeling techniques (e.g., sequence-level knowledge distillation and deep-encoder-shallow-decoder layer allocation strategy) and a mass of engineering efforts, HRT improves 80\% inference speed and achieves equivalent translation performance with the same-capacity AT counterpart. Our fastest system reaches 6k+ words/second on the GPU latency setting, estimated to be about 3.1x faster than the last year's winner.
translated by 谷歌翻译