This short report reviews the current state of the research and methodology on theoretical and practical aspects of Artificial Neural Networks (ANN). It was prepared to gather state-of-the-art knowledge needed to construct complex, hypercomplex and fuzzy neural networks. The report reflects the individual interests of the authors and, by now means, cannot be treated as a comprehensive review of the ANN discipline. Considering the fast development of this field, it is currently impossible to do a detailed review of a considerable number of pages. The report is an outcome of the Project 'The Strategic Research Partnership for the mathematical aspects of complex, hypercomplex and fuzzy neural networks' meeting at the University of Warmia and Mazury in Olsztyn, Poland, organized in September 2022.
translated by 谷歌翻译
Many real-world reinforcement learning tasks require control of complex dynamical systems that involve both costly data acquisition processes and large state spaces. In cases where the transition dynamics can be readily evaluated at specified states (e.g., via a simulator), agents can operate in what is often referred to as planning with a \emph{generative model}. We propose the AE-LSVI algorithm for best-policy identification, a novel variant of the kernelized least-squares value iteration (LSVI) algorithm that combines optimism with pessimism for active exploration (AE). AE-LSVI provably identifies a near-optimal policy \emph{uniformly} over an entire state space and achieves polynomial sample complexity guarantees that are independent of the number of states. When specialized to the recently introduced offline contextual Bayesian optimization setting, our algorithm achieves improved sample complexity bounds. Experimentally, we demonstrate that AE-LSVI outperforms other RL algorithms in a variety of environments when robustness to the initial state is required.
translated by 谷歌翻译
In this paper, we present strong baselines for the task of Feedback Comment Generation for Writing Learning. Given a sentence and an error span, the task is to generate a feedback comment explaining the error. Sentences and feedback comments are both in English. We experiment with LLMs and also create multiple pseudo datasets for the task, investigating how it affects the performance of our system. We present our results for the task along with extensive analysis of the generated comments with the aim of aiding future studies in feedback comment generation for English language learners.
translated by 谷歌翻译
Foundation models can be disruptive for future AI development by scaling up deep learning in terms of model size and training data's breadth and size. These models achieve state-of-the-art performance (often through further adaptation) on a variety of tasks in domains such as natural language processing and computer vision. Foundational models exhibit a novel {emergent behavior}: {In-context learning} enables users to provide a query and a few examples from which a model derives an answer without being trained on such queries. Additionally, {homogenization} of models might replace a myriad of task-specific models with fewer very large models controlled by few corporations leading to a shift in power and control over AI. This paper provides a short introduction to foundation models. It contributes by crafting a crisp distinction between foundation models and prior deep learning models, providing a history of machine learning leading to foundation models, elaborating more on socio-technical aspects, i.e., organizational issues and end-user interaction, and a discussion of future research.
translated by 谷歌翻译
Incorporating computed tomography (CT) reconstruction operators into differentiable pipelines has proven beneficial in many applications. Such approaches usually focus on the projection data and keep the acquisition geometry fixed. However, precise knowledge of the acquisition geometry is essential for high quality reconstruction results. In this paper, the differentiable formulation of fan-beam CT reconstruction is extended to the acquisition geometry. This allows to propagate gradient information from a loss function on the reconstructed image into the geometry parameters. As a proof-of-concept experiment, this idea is applied to rigid motion compensation. The cost function is parameterized by a trained neural network which regresses an image quality metric from the motion affected reconstruction alone. Using the proposed method, we are the first to optimize such an autofocus-inspired algorithm based on analytical gradients. The algorithm achieves a reduction in MSE by 35.5 % and an improvement in SSIM by 12.6 % over the motion affected reconstruction. Next to motion compensation, we see further use cases of our differentiable method for scanner calibration or hybrid techniques employing deep models.
translated by 谷歌翻译
Temporal data like time series are often observed at irregular intervals which is a challenging setting for existing machine learning methods. To tackle this problem, we view such data as samples from some underlying continuous function. We then define a diffusion-based generative model that adds noise from a predefined stochastic process while preserving the continuity of the resulting underlying function. A neural network is trained to reverse this process which allows us to sample new realizations from the learned distribution. We define suitable stochastic processes as noise sources and introduce novel denoising and score-matching models on processes. Further, we show how to apply this approach to the multivariate probabilistic forecasting and imputation tasks. Through our extensive experiments, we demonstrate that our method outperforms previous models on synthetic and real-world datasets.
translated by 谷歌翻译
在神经网络中,与任务相关的信息由神经元组共同表示。但是,对信息分布在单个神经元之间的特定方式尚不清楚:虽然部分只能从特定的单个神经元中获得,但其他部分是由多个神经元冗余或协同携带的。我们展示了部分信息分解(PID)是信息理论的最新扩展,可以解散这些贡献。由此,我们介绍了“代表性复杂性”的度量,该量度量化了访问跨多个神经元信息的难度。我们展示了这种复杂性如何直接适用于较小的层。对于较大的层,我们提出了子采样和粗粒程序,并证明了后者的相应边界。从经验上讲,为了量化解决MNIST任务的深度神经网络,我们观察到,代表性复杂性通过连续的隐藏层和过度训练都会降低。总体而言,我们建议代表性复杂性作为分析神经表示结构的原则且可解释的摘要统计量。
translated by 谷歌翻译
背景:机器学习(ML)系统依靠数据来做出预测,与传统软件系统(例如数据处理管道,服务管道和模型培训)相比,该系统具有许多添加的组件。现有关于软件维护的研究研究了针对不同类型的问题(例如绩效和安全问题)的问题报告需求和解决过程。但是,ML系统具有特定的故障类别,报告ML问题需要特定于域的信息。由于ML和传统软件工程系统之间的特征不同,我们不知道报告需求在多大程度上不同,并且这些差异在多大程度上影响了问题解决过程。目的:我们的目标是调查ML和非ML问题之间分辨率时间的分布以及某些ML问题的分配时间是否存在差异。我们进一步研究了ML问题和非ML问题的修复大小。方法:我们在GitHub的最新活动应用ML项目中提取问题报告,提取请求和代码文件,并使用自动方法过滤ML和非ML问题。我们使用已知的深度学习错误分类法手动标记这些问题。我们测量了受控样本上ML和非ML问题的解决方案的分辨率时间和大小,并比较每个类别的分布。
translated by 谷歌翻译
策略培训是一种多学科的康复方法,它教导技能减少中风后认知障碍者的残疾。与传统的康复方法相比,在随机,对照临床试验中已显示策略培训是促进独立性的更可行和有效的干预措施。标准化的保真度评估用于通过检查康复视频记录中的指导和定向口头提示来衡量治疗原则的依从性。尽管用于检测指导和定向的口头提示的忠诚度评估对于单一站点研究是有效的,但在大型多站点务实的务实试验中,它可能会变成劳动力密集,耗时且昂贵。为了应对广泛的战略培训实施的这一挑战,我们利用自然语言处理(NLP)技术来自动化策略培训保真度评估,即自动从康复会议的视频记录中自动识别有指导和指导的口头提示。我们开发了一种基于规则的NLP算法,一个长期术语存储器(LSTM)模型以及该任务的变压器(BERT)模型的双向编码器表示。 BERT模型以0.8075的F1得分实现了最佳性能。这项研究的发现在心理学和康复干预研究和实践方面具有广泛的希望。
translated by 谷歌翻译
ICECUBE是一种用于检测1 GEV和1 PEV之间大气和天体中微子的光学传感器的立方公斤阵列,该阵列已部署1.45 km至2.45 km的南极的冰盖表面以下1.45 km至2.45 km。来自ICE探测器的事件的分类和重建在ICeCube数据分析中起着核心作用。重建和分类事件是一个挑战,这是由于探测器的几何形状,不均匀的散射和冰中光的吸收,并且低于100 GEV的光,每个事件产生的信号光子数量相对较少。为了应对这一挑战,可以将ICECUBE事件表示为点云图形,并将图形神经网络(GNN)作为分类和重建方法。 GNN能够将中微子事件与宇宙射线背景区分开,对不同的中微子事件类型进行分类,并重建沉积的能量,方向和相互作用顶点。基于仿真,我们提供了1-100 GEV能量范围的比较与当前ICECUBE分析中使用的当前最新最大似然技术,包括已知系统不确定性的影响。对于中微子事件分类,与当前的IceCube方法相比,GNN以固定的假阳性速率(FPR)提高了信号效率的18%。另外,GNN在固定信号效率下将FPR的降低超过8(低于半百分比)。对于能源,方向和相互作用顶点的重建,与当前最大似然技术相比,分辨率平均提高了13%-20%。当在GPU上运行时,GNN能够以几乎是2.7 kHz的中位数ICECUBE触发速率的速率处理ICECUBE事件,这打开了在在线搜索瞬态事件中使用低能量中微子的可能性。
translated by 谷歌翻译