许多用于反问题和数据同化的现代算法都依赖于集成Kalman更新,以将先前的预测与观察到的数据融为一体。合奏Kalman方法通常具有小的合奏尺寸,这在生成每个粒子成本高昂的应用中至关重要。本文对合奏Kalman更新进行了非反应分析,该分析严格地解释了为什么由于快速衰减或近似稀疏性而导致先前的协方差具有适度的有效尺寸,那么小合奏的大小就足够了。我们在统一的框架中介绍了我们的理论,比较了使用扰动观测值,平方根滤波和本地化的集合卡尔曼更新的几个实现。作为我们分析的一部分,我们为可能具有独立感兴趣的大约稀疏矩阵开发了新的无维度协方差估计界限。
translated by 谷歌翻译
我们考虑了使用显微镜或X射线散射技术产生的图像数据自组装的模型的贝叶斯校准。为了说明BCP平衡结构中的随机远程疾病,我们引入了辅助变量以表示这种不确定性。然而,这些变量导致了高维图像数据的综合可能性,通常可以评估。我们使用基于测量运输的可能性方法以及图像数据的摘要统计数据来解决这一具有挑战性的贝叶斯推理问题。我们还表明,可以计算出有关模型参数的数据中的预期信息收益(EIG),而无需额外的成本。最后,我们介绍了基于二嵌段共聚物薄膜自组装和自上而下显微镜表征的ohta-kawasaki模型的数值案例研究。为了进行校准,我们介绍了一些基于域的能量和傅立叶的摘要统计数据,并使用EIG量化了它们的信息性。我们证明了拟议方法研究数据损坏和实验设计对校准结果的影响的力量。
translated by 谷歌翻译
由于神经操作员能够在功能空间之间近似高维参数图,因此最近引起了重大关注。目前,在神经操作员文献中仅解决了参数函数近似。在这项工作中,我们调查将参数导数信息纳入神经操作员培训中;该信息可以改善功能近似值,此外,它可用于改善衍生物相对于参数的近似值,这通常是高维外环问题的可扩展解决方案的关键(例如,贝叶斯逆问题)。参数雅各布信息由于其高维度而正式棘手,可以正式地合并,以解决我们基于减少的SVD,随机草图和减少基础替代物的使用提出的这种关注。所有这些策略仅需要$ O(r)$ jacobian动作来构建样本雅各布数据,并允许我们减少与雅各布培训相关的线性代数和内存成本,从输入和输出维度的产品中降低到$ o。 (r^2)$,其中$ r $是与缩小技术相关的维度。参数PDE问题的数值结果表明,在训练问题中添加导数信息可以显着改善参数图近似值,尤其是在几乎没有数据的情况下。与参数图相比,当雅各布动作相比便宜时,可以在经济上代替参数地图数据。此外,我们表明,随着Jacobian培训数据的引入,Jacobian误差近似显着改善。该结果为在外环算法中使用衍生知识的神经操作员(恐龙)打开了大门,他们可以通过重复评估来摊销额外的培训数据成本。
translated by 谷歌翻译
我们提出了一种从有限的训练数据学习高维参数映射的解析替代框架。在许多需要重复查询复杂计算模型的许多应用中出现了对参数代理的需求。这些应用包括贝叶斯逆问题,最佳实验设计和不确定度的最佳设计和控制等“外环”问题,以及实时推理和控制问题。许多高维参数映射承认低维结构,这可以通过映射信息的输入和输出的绘图信息的减少基础来利用。利用此属性,我们通过自适应地构造其输入和输出的缩小基础之间的Reset近似来制定用于学习这些地图的低维度近似的框架。最近的近似近似理论作为控制流的离散化,我们证明了我们所提出的自适应投影Reset框架的普遍近似性,这激励了Resnet构造的相关迭代算法。该策略代表了近似理论和算法的汇合,因为两者都使用顺序最小化流量。在数值例子中,我们表明,在训练数据少量的培训数据中,能够实现显着高精度,使其能够实现培训数据生成的最小计算投资的理想代理策略。
translated by 谷歌翻译
我们为在测试时间内对对抗性示例进行了学习预测的问题,为学习预测的问题提供了最小的最佳学习者。有趣的是,我们发现这需要新的算法思想和方法来实现对抗性的学习。特别是,我们从强烈的负面意义上表明,蒙塔瑟(Montasser),Hanneke和Srebro(2019)提出的强大学习者的次级临时性以及我们确定为本地学习者的更广泛的学习者。我们的结果是通过通过关键技术贡献采用全球视角来实现的:可能具有独立利益的全球单包含图,它概括了由于Haussler,Littlestone和Warminguth引起的经典单包含图(1994年)(1994年) )。最后,作为副产品,我们确定了一个定性和定量表征哪些类别的预测因子$ \ mathcal {h} $的维度。由于Montasser等人,这解决了一个空旷的问题。 (2019年),并在固定稳健学习的样品复杂性上,在已建立的上限和下限之间结束了一个(潜在的)无限差距。
translated by 谷歌翻译
该研究检查了通过计算过滤方法Kalman滤波技术(KFT)预测短期交通流量的数量。短期流量预测是交通管理和运输系统运营的重要工具。短期交通流值结果可用于按路线指导和高级旅行者信息系统进行旅行时间估算。尽管KFT已经测试过均匀的流量,但其异质交通效率尚未研究。这项研究是在索班巴格清真寺附近达卡的米尔普尔路进行的。该流包含流量的异质组合,这意味着预测的不确定性。该命题方法使用Pykalman库在Python中执行。该库主要用于KFT框架中的高级数据库建模,该模型解决了不确定性。数据源自车辆的三个小时的交通计数。根据2005年孟加拉国公路和公路部(RHD)出版的《几何设计标准手册》,将异质的交通流量转换为同等的乘用车单元(PCU)。然后将从五分钟聚合获得的PCU用作建议的模型的数据集。命题模型的平均绝对百分比误差(MAPE)为14.62,表明KFT模型可以很好地预测。根平方百分比误差(RMSPE)显示出18.73%的精度,小于25%;因此,该模型是可以接受的。开发的模型的R2值为0.879,表明它可以解释数据集中可变性的87.9%。如果在更长的时间内收集数据,则R2值可能接近1.0。
translated by 谷歌翻译
Retrieval-augmented in-context learning has emerged as a powerful approach for addressing knowledge-intensive tasks using frozen language models (LM) and retrieval models (RM). Existing work has combined these in simple "retrieve-then-read" pipelines in which the RM retrieves passages that are inserted into the LM prompt. To begin to fully realize the potential of frozen LMs and RMs, we propose Demonstrate-Search-Predict (DSP), a framework that relies on passing natural language texts in sophisticated pipelines between an LM and an RM. DSP can express high-level programs that bootstrap pipeline-aware demonstrations, search for relevant passages, and generate grounded predictions, systematically breaking down problems into small transformations that the LM and RM can handle more reliably. We have written novel DSP programs for answering questions in open-domain, multi-hop, and conversational settings, establishing in early evaluations new state-of-the-art in-context learning results and delivering 37-200%, 8-40%, and 80-290% relative gains against vanilla LMs, a standard retrieve-then-read pipeline, and a contemporaneous self-ask pipeline, respectively.
translated by 谷歌翻译
Over the last decade, an approach that has gained a lot of popularity to tackle non-parametric testing problems on general (i.e., non-Euclidean) domains is based on the notion of reproducing kernel Hilbert space (RKHS) embedding of probability distributions. The main goal of our work is to understand the optimality of two-sample tests constructed based on this approach. First, we show that the popular MMD (maximum mean discrepancy) two-sample test is not optimal in terms of the separation boundary measured in Hellinger distance. Second, we propose a modification to the MMD test based on spectral regularization by taking into account the covariance information (which is not captured by the MMD test) and prove the proposed test to be minimax optimal with a smaller separation boundary than that achieved by the MMD test. Third, we propose an adaptive version of the above test which involves a data-driven strategy to choose the regularization parameter and show the adaptive test to be almost minimax optimal up to a logarithmic factor. Moreover, our results hold for the permutation variant of the test where the test threshold is chosen elegantly through the permutation of the samples. Through numerical experiments on synthetic and real-world data, we demonstrate the superior performance of the proposed test in comparison to the MMD test.
translated by 谷歌翻译
When annotators label data, a key metric for quality assurance is inter-annotator agreement (IAA): the extent to which annotators agree on their labels. Though many IAA measures exist for simple categorical and ordinal labeling tasks, relatively little work has considered more complex labeling tasks, such as structured, multi-object, and free-text annotations. Krippendorff's alpha, best known for use with simpler labeling tasks, does have a distance-based formulation with broader applicability, but little work has studied its efficacy and consistency across complex annotation tasks. We investigate the design and evaluation of IAA measures for complex annotation tasks, with evaluation spanning seven diverse tasks: image bounding boxes, image keypoints, text sequence tagging, ranked lists, free text translations, numeric vectors, and syntax trees. We identify the difficulty of interpretability and the complexity of choosing a distance function as key obstacles in applying Krippendorff's alpha generally across these tasks. We propose two novel, more interpretable measures, showing they yield more consistent IAA measures across tasks and annotation distance functions.
translated by 谷歌翻译
Generating a chain of thought (CoT) can increase large language model (LLM) performance on a wide range of tasks. Zero-shot CoT evaluations, however, have been conducted primarily on logical tasks (e.g. arithmetic, commonsense QA). In this paper, we perform a controlled evaluation of zero-shot CoT across two sensitive domains: harmful questions and stereotype benchmarks. We find that using zero-shot CoT reasoning in a prompt can significantly increase a model's likelihood to produce undesirable output. Without future advances in alignment or explicit mitigation instructions, zero-shot CoT should be avoided on tasks where models can make inferences about marginalized groups or harmful topics.
translated by 谷歌翻译