Learning efficient and interpretable policies has been a challenging task in reinforcement learning (RL), particularly in the visual RL setting with complex scenes. While neural networks have achieved competitive performance, the resulting policies are often over-parameterized black boxes that are difficult to interpret and deploy efficiently. More recent symbolic RL frameworks have shown that high-level domain-specific programming logic can be designed to handle both policy learning and symbolic planning. However, these approaches rely on coded primitives with little feature learning, and when applied to high-dimensional visual scenes, they can suffer from scalability issues and perform poorly when images have complex object interactions. To address these challenges, we propose \textit{Differentiable Symbolic Expression Search} (DiffSES), a novel symbolic learning approach that discovers discrete symbolic policies using partially differentiable optimization. By using object-level abstractions instead of raw pixel-level inputs, DiffSES is able to leverage the simplicity and scalability advantages of symbolic expressions, while also incorporating the strengths of neural networks for feature learning and optimization. Our experiments demonstrate that DiffSES is able to generate symbolic policies that are simpler and more and scalable than state-of-the-art symbolic RL methods, with a reduced amount of symbolic prior knowledge.
translated by 谷歌翻译
Recent advances in neural rendering imply a future of widespread visual data distributions through sharing NeRF model weights. However, while common visual data (images and videos) have standard approaches to embed ownership or copyright information explicitly or subtly, the problem remains unexplored for the emerging NeRF format. We present StegaNeRF, a method for steganographic information embedding in NeRF renderings. We design an optimization framework allowing accurate hidden information extractions from images rendered by NeRF, while preserving its original visual quality. We perform experimental evaluations of our method under several potential deployment scenarios, and we further discuss the insights discovered through our analysis. StegaNeRF signifies an initial exploration into the novel problem of instilling customizable, imperceptible, and recoverable information to NeRF renderings, with minimal impact to rendered images. Project page: https://xggnet.github.io/StegaNeRF/.
translated by 谷歌翻译
Virtual reality and augmented reality (XR) bring increasing demand for 3D content. However, creating high-quality 3D content requires tedious work that a human expert must do. In this work, we study the challenging task of lifting a single image to a 3D object and, for the first time, demonstrate the ability to generate a plausible 3D object with 360{\deg} views that correspond well with the given reference image. By conditioning on the reference image, our model can fulfill the everlasting curiosity for synthesizing novel views of objects from images. Our technique sheds light on a promising direction of easing the workflows for 3D artists and XR designers. We propose a novel framework, dubbed NeuralLift-360, that utilizes a depth-aware neural radiance representation (NeRF) and learns to craft the scene guided by denoising diffusion models. By introducing a ranking loss, our NeuralLift-360 can be guided with rough depth estimation in the wild. We also adopt a CLIP-guided sampling strategy for the diffusion prior to provide coherent guidance. Extensive experiments demonstrate that our NeuralLift-360 significantly outperforms existing state-of-the-art baselines. Project page: https://vita-group.github.io/NeuralLift-360/
translated by 谷歌翻译
Implicit Neural Representations (INRs) encoding continuous multi-media data via multi-layer perceptrons has shown undebatable promise in various computer vision tasks. Despite many successful applications, editing and processing an INR remains intractable as signals are represented by latent parameters of a neural network. Existing works manipulate such continuous representations via processing on their discretized instance, which breaks down the compactness and continuous nature of INR. In this work, we present a pilot study on the question: how to directly modify an INR without explicit decoding? We answer this question by proposing an implicit neural signal processing network, dubbed INSP-Net, via differential operators on INR. Our key insight is that spatial gradients of neural networks can be computed analytically and are invariant to translation, while mathematically we show that any continuous convolution filter can be uniformly approximated by a linear combination of high-order differential operators. With these two knobs, INSP-Net instantiates the signal processing operator as a weighted composition of computational graphs corresponding to the high-order derivatives of INRs, where the weighting parameters can be data-driven learned. Based on our proposed INSP-Net, we further build the first Convolutional Neural Network (CNN) that implicitly runs on INRs, named INSP-ConvNet. Our experiments validate the expressiveness of INSP-Net and INSP-ConvNet in fitting low-level image and geometry processing kernels (e.g. blurring, deblurring, denoising, inpainting, and smoothening) as well as for high-level tasks on implicit fields such as image classification.
translated by 谷歌翻译
神经体积表示表明,MLP网络可以通过多视图校准图像来训练MLP网络,以表示场景的几何形状和外观,而无需显式3D监督。对象分割可以根据学习的辐射字段丰富许多下游应用程序。但是,引入手工制作的细分以在复杂的现实世界中定义感兴趣的区域是非平凡且昂贵的,因为它获得了每个视图注释。本文使用NERF进行复杂的现实世界场景来探索对物体分割的自我监督学习。我们的框架,nerf-sos,夫妻对象分割和神经辐射字段,以在场景中的任何视图中分割对象。通过提出一种新颖的合作对比度损失,在外观和几何水平上,NERF-SOS鼓励NERF模型将紧凑的几何学分割簇从其密度字段中提炼出紧凑的几何学分割簇以及自我监督的预训练的预训练的2D视觉特征。可以将自我监督的对象分割框架应用于各种NERF模型,这些模型既可以导致室内和室外场景的照片真实的渲染结果和令人信服的分割。 LLFF,坦克和寺庙数据集的广泛结果验证了NERF-SOS的有效性。它始终超过其他基于图像的自我监督基线,甚至比监督的语义nerf捕捉细节。
translated by 谷歌翻译
事实证明,视觉变压器(VIT)是有效的,可以通过大规模图像数据集训练2D图像理解任务;同时,作为一条单独的曲目,在对3D视觉世界进行建模时,例如体素或点云。但是,随着希望变压器能够成为异质数据的“通用”建模工具的越来越希望,到目前为止,用于2D和3D任务的VIT已经采用了截然不同的架构设计,这些设计几乎是不可传输的。这引起了一个雄心勃勃的问题:我们可以缩小2D和3D VIT体系结构之间的差距吗?作为一项试点研究,本文证明了使用标准的2D VIT体系结构了解3D视觉世界的有吸引力的承诺,仅在输入和输出水平上只有最小的定制而不会重新设计管道。为了从其2D兄弟姐妹构建3D VIT,我们将贴片嵌入和令牌序列“充气”,并配有旨在匹配3D数据几何形状的新位置编码机制。与高度自定义的3D特定设计相比,所得的“极简主义” 3D VIT(名为Simple3D Former)在流行的3D任务(例如对象分类,点云分割和室内场景检测)上表现出色。因此,它可以作为新3D VIT的强大基准。此外,我们注意到,除了科学的好奇心外,追求统一的2D-3D VIT设计具有实际相关性。具体而言,我们证明了Simple3D Former自然能够从大规模逼真的2D图像(例如Imagenet)中利用预先训练的重量的财富,可以插入以增强“免费” 3D任务性能。
translated by 谷歌翻译
与基于离散网格的表示相比,通过基于坐标的深层完全连接网络表示视觉信号在拟合复杂的细节和求解逆问题方面有优势。但是,获得这种连续的隐式神经表示(INR)需要对信号测量值进行繁琐的人均培训,这限制了其实用性。在本文中,我们提出了一个通用的INR框架,该框架通过从数据收集中学习神经隐式词典(NID)来实现数据和培训效率,并将INR表示为词典的基础采样的功能组合。我们的NID组装了一组基于坐标的子网,这些子网已调整为跨越所需的函数空间。训练后,可以通过求解编码系数立即,稳健地获取看不见的场景表示形式。为了使大量网络优化,我们借用了从专家的混合物(MOE)借用这个想法,以设计和训练我们的网络,以稀疏的门控机制。我们的实验表明,NID可以将2D图像或3D场景的重建提高2个数量级,而输入数据少98%。我们进一步证明了NID在图像浇筑和遮挡清除中的各种应用,这被认为是香草INR的挑战。我们的代码可在https://github.com/vita-group/neural-implitic-dict中找到。
translated by 谷歌翻译
神经辐射场(NERF)通过通过地面真相监督差异渲染多视图图像来回归神经参数化场景。但是,当插值新颖的观点时,NERF通常会产生不一致和视觉上不平滑的几何结果,我们认为这是可见和看不见的观点之间的概括差距。卷积神经网络的最新进展表明,随机或学到的先进的强大数据增强有望增强分布和分布外的概括。受此启发,我们提出了增强的NERF(Aug-nerf),这首先将强大的数据增强功能带入正规化NERF培训。特别是,我们的提议学会了将最坏情况的扰动无缝融合到NERF管道的三个不同级别,并包括(1)输入坐标,以模拟图像捕获中的不精确的摄像机参数; (2)中间特征,以平滑固有特征歧管; (3)预先渲染的输出,以说明多视图图像监督中的潜在降解因子。广泛的结果表明,Aug-nerf在新型视图合成(高达1.5dB PSNR增益)和基础几何重建中有效地提高了NERF性能。此外,得益于三级增强的隐含平稳先验,Aug-nerf甚至可以从严重损坏的图像中恢复场景,这是一个高度挑战性的环境,以前没有被隔离。我们的代码可在https://github.com/vita-group/aug-nerf中找到。
translated by 谷歌翻译
估计路径的旅行时间是智能运输系统的重要主题。它是现实世界应用的基础,例如交通监控,路线计划和出租车派遣。但是,为这样的数据驱动任务构建模型需要大量用户的旅行信息,这与其隐私直接相关,因此不太可能共享。数据所有者之间的非独立和相同分布的(非IID)轨迹数据也使一个预测模型变得极具挑战性,如果我们直接应用联合学习。最后,以前关于旅行时间估算的工作并未考虑道路的实时交通状态,我们认为这可以极大地影响预测。为了应对上述挑战,我们为移动用户组引入GOF-TTE,生成的在线联合学习框架以进行旅行时间估计,这是我)使用联合学习方法,允许在培训时将私人数据保存在客户端设备上,并设计设计和设计。所有客户共享的全球模型作为在线生成模型推断实时道路交通状态。 ii)除了在服务器上共享基本模型外,还针对每个客户调整了一个微调的个性化模型来研究其个人驾驶习惯,从而弥补了本地化全球模型预测的残余错误。 %iii)将全球模型设计为所有客户共享的在线生成模型,以推断实时道路交通状态。我们还对我们的框架采用了简单的隐私攻击,并实施了差异隐私机制,以进一步保证隐私安全。最后,我们对Didi Chengdu和Xi'an的两个现实世界公共出租车数据集进行了实验。实验结果证明了我们提出的框架的有效性。
translated by 谷歌翻译
由于物联网(IoT)技术的快速开发,许多在线Web应用程序(例如Google Map和Uber)估计移动设备收集的轨迹数据的旅行时间。但是,实际上,复杂的因素(例如网络通信和能量限制)使以低采样率收集的多个轨迹。在这种情况下,本文旨在解决稀疏场景中的旅行时间估计问题(TTE)和路线恢复问题,这通常会导致旅行时间的不确定标签以及连续采样的GPS点之间的路线。我们将此问题提出为不进行的监督问题,其中训练数据具有粗糙的标签,并共同解决了TTE和路线恢复的任务。我们认为,这两个任务在模型学习过程中彼此互补并保持这种关系:更精确的旅行时间可以使路由更好地推断,从而导致更准确的时间估计)。基于此假设,我们提出了一种EM算法,以替代E估计通过E步中通过弱监督的推断路线的行进时间,并根据M步骤中的估计行进时间来检索途径,以稀疏轨迹。我们对三个现实世界轨迹数据集进行了实验,并证明了该方法的有效性。
translated by 谷歌翻译