Time-resolved image sensors that capture light at pico-to-nanosecond timescales were once limited to niche applications but are now rapidly becoming mainstream in consumer devices. We propose low-cost and low-power imaging modalities that capture scene information from minimal time-resolved image sensors with as few as one pixel. The key idea is to flood illuminate large scene patches (or the entire scene) with a pulsed light source and measure the time-resolved reflected light by integrating over the entire illuminated area. The one-dimensional measured temporal waveform, called \emph{transient}, encodes both distances and albedoes at all visible scene points and as such is an aggregate proxy for the scene's 3D geometry. We explore the viability and limitations of the transient waveforms by themselves for recovering scene information, and also when combined with traditional RGB cameras. We show that plane estimation can be performed from a single transient and that using only a few more it is possible to recover a depth map of the whole scene. We also show two proof-of-concept hardware prototypes that demonstrate the feasibility of our approach for compact, mobile, and budget-limited applications.
translated by 谷歌翻译
由于捕获的图像中的严重噪音,弱光下的场景推断是一个具有挑战性的问题。减少噪音的一种方法是在捕获过程中使用更长的曝光。但是,在有运动(场景或相机运动)的存在下,较长的暴露会导致运动模糊,从而导致图像信息的丢失。这在这两种图像降解之间创造了权衡取舍:运动模糊(由于长期暴露)与噪声(由于曝光短),也称为本文中的双图像损坏对。随着摄像机的兴起,能够同时捕获同一场景的多次暴露,因此可以克服这一权衡。我们的主要观察结果是,尽管这些不同图像捕获的降解的数量和性质各不相同,但在所有图像中,语义内容保持不变。为此,我们提出了一种方法,以利用这些多曝光捕获在弱光和运动下的鲁棒推理。我们的方法建立在功能一致性损失的基础上,以鼓励这些单个捕获的类似结果,并利用其最终预测的合奏来实现强大的视觉识别。我们证明了方法对模拟图像的有效性以及具有多个暴露的真实捕获,以及对象检测和图像分类的任务。
translated by 谷歌翻译
语言模型既展示了定量的改进,又展示了新的定性功能,随着规模的增加。尽管它们具有潜在的变革性影响,但这些新能力的特征却很差。为了为未来的研究提供信息,为破坏性的新模型能力做准备,并改善社会有害的效果,至关重要的是,我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战,我们介绍了超越模仿游戏基准(Big Bench)。 Big Bench目前由204个任务组成,由132家机构的442位作者贡献。任务主题是多样的,从语言学,儿童发展,数学,常识性推理,生物学,物理学,社会偏见,软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号,Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为,跨越了数百万到数十亿个参数。此外,一个人类专家评估者团队执行了所有任务,以提供强大的基准。研究结果包括:模型性能和校准都随规模改善,但绝对的术语(以及与评估者的性能相比);在模型类中的性能非常相似,尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分,而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标;社交偏见通常会随着含糊不清的环境而随着规模而增加,但这可以通过提示来改善。
translated by 谷歌翻译
互动对象理解,或者我们可以对对象做些什么以及计算机愿景的长期目标。在本文中,我们通过观察野外的自我高端视频的人类手来解决这个问题。我们展示了观察人类的手与之交互以及如何提供相关数据和必要的监督。参加双手,容易定位并稳定积极的物体以进行学习,并揭示发生与对象的交互的地方。分析手显示我们可以对物体做些什么以及如何做些。我们在史诗厨房数据集上应用这些基本原则,并成功地学习了国家敏感的特征,以及互动区域和提供了麦克拉斯的地区),纯粹是通过观察在EGoCentric视频中的手。
translated by 谷歌翻译
视频数据通常是重复的;例如,相邻帧的内容通常强烈相关。这种重复发生在多级复杂性,从低级像素值到纹理和高级语义。我们提出了事件神经网络(EVNETS),这是一种新颖的网络,利用这种重复来实现视频推理任务的相当大的计算节省。 evnets的定义特征是每个神经元具有状态变量,其提供具有长期存储器的状态变量,即使在存在显着的相机运动中,即使在存在显着的相机运动也允许低成本推断。我们表明,几乎可以将任何传统的神经转换为EVNET。我们展示了我们对若干最先进的神经网络的方法的有效性,包括高电平和低电平的视觉处理,包括姿势识别,对象检测,光学流量和图像增强。与传统网络相比,我们观察到计算成本(2-20倍)的数量级减少,模型精度最小降低。
translated by 谷歌翻译
间接飞行时间(ITOF)相机是一个有希望的深度传感技术。然而,它们容易出现由多路径干扰(MPI)和低信噪比(SNR)引起的错误。传统方法,在去噪后,通过估计编码深度的瞬态图像来减轻MPI。最近,在不使用中间瞬态表示的情况下,共同去噪和减轻MPI的数据驱动方法已经成为最先进的。在本文中,我们建议重新审视瞬态代表。使用数据驱动的Priors,我们将其插入/推断ITOF频率并使用它们来估计瞬态图像。给定直接TOF(DTOF)传感器捕获瞬态图像,我们将我们的方法命名为ITOF2DTOF。瞬态表示是灵活的。它可以集成与基于规则的深度感测算法,对低SNR具有强大,并且可以处理实际上出现的模糊场景(例如,镜面MPI,光学串扰)。我们在真正深度传感方案中展示了先前方法上的ITOF2DTOF的好处。
translated by 谷歌翻译
Existing federated classification algorithms typically assume the local annotations at every client cover the same set of classes. In this paper, we aim to lift such an assumption and focus on a more general yet practical non-IID setting where every client can work on non-identical and even disjoint sets of classes (i.e., client-exclusive classes), and the clients have a common goal which is to build a global classification model to identify the union of these classes. Such heterogeneity in client class sets poses a new challenge: how to ensure different clients are operating in the same latent space so as to avoid the drift after aggregation? We observe that the classes can be described in natural languages (i.e., class names) and these names are typically safe to share with all parties. Thus, we formulate the classification problem as a matching process between data representations and class representations and break the classification model into a data encoder and a label encoder. We leverage the natural-language class names as the common ground to anchor the class representations in the label encoder. In each iteration, the label encoder updates the class representations and regulates the data representations through matching. We further use the updated class representations at each round to annotate data samples for locally-unaware classes according to similarity and distill knowledge to local models. Extensive experiments on four real-world datasets show that the proposed method can outperform various classical and state-of-the-art federated learning methods designed for learning with non-IID data.
translated by 谷歌翻译
The rise in data has led to the need for dimension reduction techniques, especially in the area of non-scalar variables, including time series, natural language processing, and computer vision. In this paper, we specifically investigate dimension reduction for time series through functional data analysis. Current methods for dimension reduction in functional data are functional principal component analysis and functional autoencoders, which are limited to linear mappings or scalar representations for the time series, which is inefficient. In real data applications, the nature of the data is much more complex. We propose a non-linear function-on-function approach, which consists of a functional encoder and a functional decoder, that uses continuous hidden layers consisting of continuous neurons to learn the structure inherent in functional data, which addresses the aforementioned concerns in the existing approaches. Our approach gives a low dimension latent representation by reducing the number of functional features as well as the timepoints at which the functions are observed. The effectiveness of the proposed model is demonstrated through multiple simulations and real data examples.
translated by 谷歌翻译
Landing an unmanned aerial vehicle unmanned aerial vehicle (UAV) on top of an unmanned surface vehicle (USV) in harsh open waters is a challenging problem, owing to forces that can damage the UAV due to a severe roll and/or pitch angle of the USV during touchdown. To tackle this, we propose a novel model predictive control (MPC) approach enabling a UAV to land autonomously on a USV in these harsh conditions. The MPC employs a novel objective function and an online decomposition of the oscillatory motion of the vessel to predict, attempt, and accomplish the landing during near-zero tilt of the landing platform. The nonlinear prediction of the motion of the vessel is performed using visual data from an onboard camera. Therefore, the system does not require any communication with the USV or a control station. The proposed method was analyzed in numerous robotics simulations in harsh and extreme conditions and further validated in various real-world scenarios.
translated by 谷歌翻译
Multiple studies have focused on predicting the prospective popularity of an online document as a whole, without paying attention to the contributions of its individual parts. We introduce the task of proactively forecasting popularities of sentences within online news documents solely utilizing their natural language content. We model sentence-specific popularity forecasting as a sequence regression task. For training our models, we curate InfoPop, the first dataset containing popularity labels for over 1.7 million sentences from over 50,000 online news documents. To the best of our knowledge, this is the first dataset automatically created using streams of incoming search engine queries to generate sentence-level popularity annotations. We propose a novel transfer learning approach involving sentence salience prediction as an auxiliary task. Our proposed technique coupled with a BERT-based neural model exceeds nDCG values of 0.8 for proactive sentence-specific popularity forecasting. Notably, our study presents a non-trivial takeaway: though popularity and salience are different concepts, transfer learning from salience prediction enhances popularity forecasting. We release InfoPop and make our code publicly available: https://github.com/sayarghoshroy/InfoPopularity
translated by 谷歌翻译