Although recent deep learning-based calibration methods can predict extrinsic and intrinsic camera parameters from a single image, their generalization remains limited by the number and distribution of training data samples. The huge computational and space requirement prevents convolutional neural networks (CNNs) from being implemented in resource-constrained environments. This challenge motivated us to learn a CNN gradually, by training new data while maintaining performance on previously learned data. Our approach builds upon a CNN architecture to automatically estimate camera parameters (focal length, pitch, and roll) using different incremental learning strategies to preserve knowledge when updating the network for new data distributions. Precisely, we adapt four common incremental learning, namely: LwF , iCaRL, LU CIR, and BiC by modifying their loss functions to our regression problem. We evaluate on two datasets containing 299008 indoor and outdoor images. Experiment results were significant and indicated which method was better for the camera calibration estimation.
translated by 谷歌翻译
从数字艺术到AR和VR体验,图像编辑和合成已经变得无处不在。为了生产精美的复合材料,需要对相机进行几何校准,这可能很乏味,需要进行物理校准目标。代替传统的多图像校准过程,我们建议使用深层卷积神经网络直接从单个图像中直接从单个图像中推断摄像机校准参数,例如音高,滚动,视场和镜头失真。我们使用大规模全景数据集中自动生成样品训练该网络,从而在标准L2误差方面产生了竞争精度。但是,我们认为将这种标准误差指标最小化可能不是许多应用程序的最佳选择。在这项工作中,我们研究了人类对几何相机校准中不准确性的敏感性。为此,我们进行了一项大规模的人类感知研究,我们要求参与者以正确和有偏见的摄像机校准参数判断3D对象的现实主义。基于这项研究,我们为摄像机校准开发了一种新的感知度量,并证明我们的深校准网络在标准指标以及这一新型感知度量方面都优于先前基于单像的校准方法。最后,我们演示了将校准网络用于多种应用程序,包括虚拟对象插入,图像检索和合成。可以在https://lvsn.github.io/deepcalib上获得我们方法的演示。
translated by 谷歌翻译
Conventionally, deep neural networks are trained offline, relying on a large dataset prepared in advance. This paradigm is often challenged in real-world applications, e.g. online services that involve continuous streams of incoming data. Recently, incremental learning receives increasing attention, and is considered as a promising solution to the practical challenges mentioned above. However, it has been observed that incremental learning is subject to a fundamental difficulty -catastrophic forgetting, namely adapting a model to new data often results in severe performance degradation on previous tasks or classes. Our study reveals that the imbalance between previous and new data is a crucial cause to this problem. In this work, we develop a new framework for incrementally learning a unified classifier, i.e. a classifier that treats both old and new classes uniformly. Specifically, we incorporate three components, cosine normalization, less-forget constraint, and inter-class separation, to mitigate the adverse effects of the imbalance. Experiments show that the proposed method can effectively rebalance the training process, thus obtaining superior performance compared to the existing methods. On CIFAR-100 and ImageNet, our method can reduce the classification errors by more than 6% and 13% respectively, under the incremental setting of 10 phases.
translated by 谷歌翻译
Modern machine learning suffers from catastrophic forgetting when learning new classes incrementally. The performance dramatically degrades due to the missing data of old classes. Incremental learning methods have been proposed to retain the knowledge acquired from the old classes, by using knowledge distilling and keeping a few exemplars from the old classes. However, these methods struggle to scale up to a large number of classes. We believe this is because of the combination of two factors: (a) the data imbalance between the old and new classes, and (b) the increasing number of visually similar classes. Distinguishing between an increasing number of visually similar classes is particularly challenging, when the training data is unbalanced. We propose a simple and effective method to address this data imbalance issue. We found that the last fully connected layer has a strong bias towards the new classes, and this bias can be corrected by a linear model. With two bias parameters, our method performs remarkably well on two large datasets: ImageNet (1000 classes) and MS-Celeb-1M (10000 classes), outperforming the state-of-the-art algorithms by 11.1% and 13.2% respectively.
translated by 谷歌翻译
Geometric camera calibration is often required for applications that understand the perspective of the image. We propose perspective fields as a representation that models the local perspective properties of an image. Perspective Fields contain per-pixel information about the camera view, parameterized as an up vector and a latitude value. This representation has a number of advantages as it makes minimal assumptions about the camera model and is invariant or equivariant to common image editing operations like cropping, warping, and rotation. It is also more interpretable and aligned with human perception. We train a neural network to predict Perspective Fields and the predicted Perspective Fields can be converted to calibration parameters easily. We demonstrate the robustness of our approach under various scenarios compared with camera calibration-based methods and show example applications in image compositing.
translated by 谷歌翻译
For a number of tasks, such as 3D reconstruction, robotic interface, autonomous driving, etc., camera calibration is essential. In this study, we present a unique method for predicting intrinsic (principal point offset and focal length) and extrinsic (baseline, pitch, and translation) properties from a pair of images. We suggested a novel method where camera model equations are represented as a neural network in a multi-task learning framework, in contrast to existing methods, which build a comprehensive solution. By reconstructing the 3D points using a camera model neural network and then using the loss in reconstruction to obtain the camera specifications, this innovative camera projection loss (CPL) method allows us that the desired parameters should be estimated. As far as we are aware, our approach is the first one that uses an approach to multi-task learning that includes mathematical formulas in a framework for learning to estimate camera parameters to predict both the extrinsic and intrinsic parameters jointly. Additionally, we provided a new dataset named as CVGL Camera Calibration Dataset [1] which has been collected using the CARLA Simulator [2]. Actually, we show that our suggested strategy out performs both conventional methods and methods based on deep learning on 8 out of 10 parameters that were assessed using both real and synthetic data. Our code and generated dataset are available at https://github.com/thanif/Camera-Calibration-through-Camera-Projection-Loss.
translated by 谷歌翻译
The counting task, which plays a fundamental rule in numerous applications (e.g., crowd counting, traffic statistics), aims to predict the number of objects with various densities. Existing object counting tasks are designed for a single object class. However, it is inevitable to encounter newly coming data with new classes in our real world. We name this scenario as \textit{evolving object counting}. In this paper, we build the first evolving object counting dataset and propose a unified object counting network as the first attempt to address this task. The proposed model consists of two key components: a class-agnostic mask module and a class-increment module. The class-agnostic mask module learns generic object occupation prior via predicting a class-agnostic binary mask (e.g., 1 denotes there exists an object at the considering position in an image and 0 otherwise). The class-increment module is used to handle new coming classes and provides discriminative class guidance for density map prediction. The combined outputs of class-agnostic mask module and image feature extractor are used to predict the final density map. When new classes come, we first add new neural nodes into the last regression and classification layers of this module. Then, instead of retraining the model from scratch, we utilize knowledge distilling to help the model remember what have already learned about previous object classes. We also employ a support sample bank to store a small number of typical training samples of each class, which are used to prevent the model from forgetting key information of old data. With this design, our model can efficiently and effectively adapt to new coming classes while keeping good performance on already seen data without large-scale retraining. Extensive experiments on the collected dataset demonstrate the favorable performance.
translated by 谷歌翻译
基于正规化的方法有利于缓解类渐进式学习中的灾难性遗忘问题。由于缺乏旧任务图像,如果分类器在新图像上产生类似的输出,它们通常会假设旧知识得到很好的保存。在本文中,我们发现他们的效果很大程度上取决于旧课程的性质:它们在彼此之间容易区分的课程上工作,但可能在更细粒度的群体上失败,例如,男孩和女孩。在SPIRIT中,此类方法将新数据项目投入到完全连接层中的权重向量中跨越的特征空间,对应于旧类。由此产生的预测在细粒度的旧课程上是相似的,因此,新分类器将逐步失去这些课程的歧视能力。为了解决这个问题,我们提出了一种无记忆生成的重播策略,通过直接从旧分类器生成代表性的旧图像并结合新的分类器培训的新数据来保留细粒度的旧阶级特征。为了解决所产生的样本的均化问题,我们还提出了一种分集体损失,使得产生的样品之间的Kullback Leibler(KL)发散。我们的方法最好是通过先前的基于正规化的方法补充,证明是为了易于区分的旧课程有效。我们验证了上述关于CUB-200-2011,CALTECH-101,CIFAR-100和微小想象的设计和见解,并表明我们的策略优于现有的无记忆方法,并具有清晰的保证金。代码可在https://github.com/xmengxin/mfgr获得
translated by 谷歌翻译
很少有人提出了几乎没有阶级的课程学习(FSCIL),目的是使深度学习系统能够逐步学习有限的数据。最近,一位先驱声称,通常使用的基于重播的课堂学习方法(CIL)是无效的,因此对于FSCIL而言并不是首选。如果真理,这对FSCIL领域产生了重大影响。在本文中,我们通过经验结果表明,采用数据重播非常有利。但是,存储和重播旧数据可能会导致隐私问题。为了解决此问题,我们或建议使用无数据重播,该重播可以通过发电机综合数据而无需访问真实数据。在观察知识蒸馏的不确定数据的有效性时,我们在发电机培训中强加了熵正则化,以鼓励更不确定的例子。此外,我们建议使用单速样标签重新标记生成的数据。这种修改使网络可以通过完全减少交叉渗透损失来学习,从而减轻了在常规知识蒸馏方法中平衡不同目标的问题。最后,我们对CIFAR-100,Miniimagenet和Cub-200展示了广泛的实验结果和分析,以证明我们提出的效果。
translated by 谷歌翻译
在学习新知识时,班级学习学习(CIL)与灾难性遗忘和无数据CIL(DFCIL)的斗争更具挑战性,而无需访问以前学过的课程的培训数据。尽管最近的DFCIL作品介绍了诸如模型反转以合成以前类的数据,但由于合成数据和真实数据之间的严重域间隙,它们无法克服遗忘。为了解决这个问题,本文提出了有关DFCIL的关系引导的代表学习(RRL),称为R-DFCIL。在RRL中,我们引入了关系知识蒸馏,以灵活地将新数据的结构关系从旧模型转移到当前模型。我们的RRL增强DFCIL可以指导当前的模型来学习与以前类的表示更好地兼容的新课程的表示,从而大大减少了在改善可塑性的同时遗忘。为了避免表示和分类器学习之间的相互干扰,我们在RRL期间采用本地分类损失而不是全球分类损失。在RRL之后,分类头将通过全球类平衡的分类损失进行完善,以解决数据不平衡问题,并学习新课程和以前类之间的决策界限。关于CIFAR100,Tiny-Imagenet200和Imagenet100的广泛实验表明,我们的R-DFCIL显着超过了以前的方法,并实现了DFCIL的新最新性能。代码可从https://github.com/jianzhangcs/r-dfcil获得。
translated by 谷歌翻译
尽管最近的基于学习的校准方法可以从单个图像预测外部和内在的相机参数,但这些方法的准确性在Fisheye图像中劣化。这种劣化是由实际投影和预期投影之间的不匹配引起的。为了解决这个问题,我们提出了一种通用相机模型,具有解决各种类型的失真。我们的通用摄像机模型用于通过相机投影的闭合形式计算基于学习的方法。同时恢复旋转和鱼眼失真,我们提出了一种使用相机模型的基于学习的校准方法。此外,我们提出了一种损失函数,可以减轻四种外在和内在相机参数的误差幅度的偏差。广泛的实验表明,我们所提出的方法在两种大型数据集和由现成的Fisheye相机捕获的图像上表现优于传统方法。此外,我们是第一位分析基于学习的方法的性能的研究人员,使用各种类型的搁板摄像机的投影。
translated by 谷歌翻译
在这个不断变化的世界中,必须不断学习新概念的能力。但是,深层神经网络在学习新类别时会遭受灾难性的遗忘。已经提出了许多减轻这种现象的作品,而其中大多数要么属于稳定性困境,要么陷入了过多的计算或储存开销。受到梯度增强算法的启发,以逐渐适应目标模型和上一个合奏模型之间的残差,我们提出了一种新颖的两阶段学习范式寄养物,使该模型能够适应新的类别。具体而言,我们首先动态扩展新模块,以适合原始模型的目标和输出之间的残差。接下来,我们通过有效的蒸馏策略删除冗余参数和特征尺寸,以维护单个骨干模型。我们在不同的设置下验证CIFAR-100和Imagenet-100/1000的方法寄养。实验结果表明,我们的方法实现了最先进的性能。代码可在以下网址获得:https://github.com/g-u-n/eccv22-foster。
translated by 谷歌翻译
持续深度学习的领域是一个新兴领域,已经取得了很多进步。但是,同时仅根据图像分类的任务进行了大多数方法,这在智能车辆领域无关。直到最近才提出了班级开展语义分割的方法。但是,所有这些方法都是基于某种形式的知识蒸馏。目前,尚未对基于重播的方法进行调查,这些方法通常在连续的环境中用于对象识别。同时,尽管无监督的语义分割的域适应性获得了很多吸引力,但在持续环境中有关域内收入学习的调查并未得到充分研究。因此,我们工作的目的是评估和调整已建立的解决方案,以连续对象识别语义分割任务,并为连续语义分割的任务提供基线方法和评估协议。首先,我们介绍了类和域内的分割的评估协议,并分析了选定的方法。我们表明,语义分割变化的任务的性质在减轻与图像分类相比最有效的方法中最有效。特别是,在课堂学习中,学习知识蒸馏被证明是至关重要的工具,而在域内,学习重播方法是最有效的方法。
translated by 谷歌翻译
Although deep learning approaches have stood out in recent years due to their state-of-the-art results, they continue to suffer from catastrophic forgetting, a dramatic decrease in overall performance when training with new classes added incrementally. This is due to current neural network architectures requiring the entire dataset, consisting of all the samples from the old as well as the new classes, to update the model-a requirement that becomes easily unsustainable as the number of classes grows. We address this issue with our approach to learn deep neural networks incrementally, using new data and only a small exemplar set corresponding to samples from the old classes. This is based on a loss composed of a distillation measure to retain the knowledge acquired from the old classes, and a cross-entropy loss to learn the new classes. Our incremental training is achieved while keeping the entire framework end-to-end, i.e., learning the data representation and the classifier jointly, unlike recent methods with no such guarantees. We evaluate our method extensively on the CIFAR-100 and Im-ageNet (ILSVRC 2012) image classification datasets, and show state-of-the-art performance.
translated by 谷歌翻译
Recent neural radiance field (NeRF) representation has achieved great success in the tasks of novel view synthesis and 3D reconstruction. However, they suffer from the catastrophic forgetting problem when continuously learning from streaming data without revisiting the previous training data. This limitation prohibits the application of existing NeRF models to scenarios where images come in sequentially. In view of this, we explore the task of incremental learning for neural radiance field representation in this work. We first propose a student-teacher pipeline to mitigate the catastrophic forgetting problem. Specifically, we iterate the process of using the student as the teacher at the end of each incremental step and let the teacher guide the training of the student in the next step. In this way, the student network is able to learn new information from the streaming data and retain old knowledge from the teacher network simultaneously. Given that not all information from the teacher network is helpful since it is only trained with the old data, we further introduce a random inquirer and an uncertainty-based filter to filter useful information. We conduct experiments on the NeRF-synthetic360 and NeRF-real360 datasets, where our approach significantly outperforms the baselines by 7.3% and 25.2% in terms of PSNR. Furthermore, we also show that our approach can be applied to the large-scale camera facing-outwards dataset ScanNet, where we surpass the baseline by 60.0% in PSNR.
translated by 谷歌翻译
源自建筑环境的规律性的线性透视图可用于在线重新校准内在和外在的摄像机参数,但是由于场景中的不规则性,线段估计和背景混乱中的不确定性,这些估计值可能是不可靠的。在这里,我们通过四个计划来应对这一挑战。首先,我们使用PanoContext全景图数据集[27]来策划一个新颖而逼真的平面投影数据集,这些数据集在广泛的场景,焦距和相机姿势上。其次,我们使用这个新颖的数据集和YorkurbandB [4]来系统地评估文献中经常发现的线性透视偏差度量,并表明偏差度量和可能性模型的选择对可靠性具有巨大的影响。第三,我们使用这些发现来创建一个用于在线摄像机校准的新型系统,我们称之为fr,并表明它的表现优于先前的最新状态,从而大大减少了估计的摄像机旋转和焦距的错误。我们的第四个贡献是一种新颖有效的方法来估计不确定性,可以通过战略性地选择用于重新校准的哪种框架来大大提高对性能至关重要的应用程序的在线可靠性。
translated by 谷歌翻译
基于深度学习的方法在3D对象检测任务中显示出显着性能。然而,当在逐步学习新类时,它们遭受了最初训练的课程的灾难性表现下降,而无需重新审视旧数据。这种“灾难性忘记”现象阻碍了现实世界场景中的3D对象检测方法的部署,其中需要连续学习系统。在本文中,我们研究了未开发的但重要的类增量3D对象检测问题,并提出了第一种解决方案 - SDCOT,一种新型静态动态共同教学方法。我们的SDCOT通过静态教师减轻了灾难性的旧课程,这为新样本中的旧课程提供了伪注释,并通过用蒸馏损失提取先前的知识来规范电流模型。与此同时,SDCOT一致地通过动态教师从新数据中了解基础知识。我们对两个基准数据集进行了广泛的实验,并在几个增量学习场景中展示了我们SDCOT对基线方法的卓越性能。
translated by 谷歌翻译
人类的持续学习(CL)能力与稳定性与可塑性困境密切相关,描述了人类如何实现持续的学习能力和保存的学习信息。自发育以来,CL的概念始终存在于人工智能(AI)中。本文提出了对CL的全面审查。与之前的评论不同,主要关注CL中的灾难性遗忘现象,本文根据稳定性与可塑性机制的宏观视角来调查CL。类似于生物对应物,“智能”AI代理商应该是I)记住以前学到的信息(信息回流); ii)不断推断新信息(信息浏览:); iii)转移有用的信息(信息转移),以实现高级CL。根据分类学,评估度量,算法,应用以及一些打开问题。我们的主要贡献涉及I)从人工综合情报层面重新检查CL; ii)在CL主题提供详细和广泛的概述; iii)提出一些关于CL潜在发展的新颖思路。
translated by 谷歌翻译
深度学习模型在逐步学习新任务时遭受灾难性遗忘。已经提出了增量学习,以保留旧课程的知识,同时学习识别新课程。一种典型的方法是使用一些示例来避免忘记旧知识。在这种情况下,旧类和新课之间的数据失衡是导致模型性能下降的关键问题。由于数据不平衡,已经设计了几种策略来纠正新类别的偏见。但是,他们在很大程度上依赖于新旧阶层之间偏见关系的假设。因此,它们不适合复杂的现实世界应用。在这项研究中,我们提出了一种假设不足的方法,即多粒性重新平衡(MGRB),以解决此问题。重新平衡方法用于减轻数据不平衡的影响;但是,我们从经验上发现,他们将拟合新的课程。为此,我们进一步设计了一个新颖的多晶正式化项,该项使模型还可以考虑除了重新平衡数据之外的类别的相关性。类层次结构首先是通过将语义或视觉上类似类分组来构建的。然后,多粒性正则化将单热标签向量转换为连续的标签分布,这反映了基于构造的类层次结构的目标类别和其他类之间的关系。因此,该模型可以学习类间的关系信息,这有助于增强新旧课程的学习。公共数据集和现实世界中的故障诊断数据集的实验结果验证了所提出的方法的有效性。
translated by 谷歌翻译
Lifelong learning has attracted much attention, but existing works still struggle to fight catastrophic forgetting and accumulate knowledge over long stretches of incremental learning. In this work, we propose PODNet, a model inspired by representation learning. By carefully balancing the compromise between remembering the old classes and learning new ones, PODNet fights catastrophic forgetting, even over very long runs of small incremental tasks -a setting so far unexplored by current works. PODNet innovates on existing art with an efficient spatialbased distillation-loss applied throughout the model and a representation comprising multiple proxy vectors for each class. We validate those innovations thoroughly, comparing PODNet with three state-of-the-art models on three datasets: CIFAR100, ImageNet100, and ImageNet1000. Our results showcase a significant advantage of PODNet over existing art, with accuracy gains of 12.10, 6.51, and 2.85 percentage points, respectively. 5
translated by 谷歌翻译