智能论文笔记

Identification of Cognitive Workload during Surgical Tasks with Multimodal Deep Learning

Kaizhe Jin , Adrian Rubio-Solis , Ravik Nain , Tochukwu Onyeogulu , Amirul Islam , Salman Khan , Tochukwu Onyeogulu , Amirul Islam , Salman Khan , Izzeddin Teeti

分类：机器学习 | 人工智能

2022-09-12

在手术室（OR）中，活动通常与其他典型的工作环境不同。特别是，外科医生经常受到多种心理组织的约束，可能会对他们的健康和表现造成负面影响。这通常归因于相关的认知工作量（CWL）的增加，该工作量是由于处理意外和重复性任务以及大量信息以及潜在风险的认知超载而导致的。在本文中，建议在多种四个不同的手术任务中对CWL的多模式识别提出了两种机器学习方法。首先，使用基于转移学习概念的模型来确定外科医生是否经历任何CWL。其次，卷积神经网络（CNN）使用此信息来识别与每个手术任务相关的不同类型的CWL。建议的多模式方法考虑来自脑电图（EEG），功能近红外光谱（FNIRS）和瞳孔眼直径的相邻信号。信号的串联允许在时间（时间）和通道位置（空间）方面进行复杂的相关性。数据收集是由多种感应的AI环境来执行的，用于在Harms Lab开发的手术任务$ \＆$角色优化平台（Maestro）。为了比较拟议方法的性能，已经实施了许多最先进的机器学习技术。测试表明，所提出的模型的精度为93％。

translated by 谷歌翻译

Lumen Shape Reconstruction using a Soft Robotic Balloon Catheter and Electrical Impedance Tomography

James Avery , Mark Runciman , Cristina Fiani , Elena Monfort Sanchez , Saina Akhond , Zhuang Liu , Kirill Aristovich , George Mylonas

分类：机器人

2022-07-25

尺寸不正确的气球导管可能导致手术后并发症增加，但是即使术前成像，正确的选择仍然是一个挑战。在手术过程中反馈有限，很难验证正确的部署。我们建议使用集成的阻抗测量和电阻抗断层扫描（EIT）成像来评估气球的变形并确定周围腔的大小和形状。以前使用单个阻抗测量值或压力数据和分析模型的工作，同时证明了较高的尺寸精度，已经假设了圆形横截面。在这里，我们通过添加多种电极来检测椭圆形和遮挡的管腔并获得EIT图像以定位变形来扩展这些方法。以14 FR（5.3 mm）导管为例，进行数值模拟，以找到两个相距10 mm的8个电极的两个环的最佳电极构型。模拟预测，可检测到的最大纵横比在30mm时从14mm气球的0.9降低到0.5。实验验证了尺寸和椭圆度检测结果。构建了原型机器人气球导管，以自动膨胀一个兼容的气球，同时记录EIT和压力数据。在复制具有椭圆形和不对称曲线的狭窄血管的实验中收集了数据，并在血管成形术期间的管腔扩大。校准后，该系统能够正确定位闭合和检测为0.75的宽高比。 EIT图像进一步定位了阻塞，并在气球充气期间可视化管腔扩张。

translated by 谷歌翻译

An Empirical Investigation into the Use of Image Captioning for Automated Software Documentation

Kevin Moran , Ali Yachnes , George Purnell , Junayed Mahmud , Michele Tufano , Carlos Bernal-Cárdenas , Denys Poshyvanyk , Zach H'Doubler

分类：人工智能 | 计算机视觉 | 机器学习

2023-01-03

Existing automated techniques for software documentation typically attempt to reason between two main sources of information: code and natural language. However, this reasoning process is often complicated by the lexical gap between more abstract natural language and more structured programming languages. One potential bridge for this gap is the Graphical User Interface (GUI), as GUIs inherently encode salient information about underlying program functionality into rich, pixel-based data representations. This paper offers one of the first comprehensive empirical investigations into the connection between GUIs and functional, natural language descriptions of software. First, we collect, analyze, and open source a large dataset of functional GUI descriptions consisting of 45,998 descriptions for 10,204 screenshots from popular Android applications. The descriptions were obtained from human labelers and underwent several quality control mechanisms. To gain insight into the representational potential of GUIs, we investigate the ability of four Neural Image Captioning models to predict natural language descriptions of varying granularity when provided a screenshot as input. We evaluate these models quantitatively, using common machine translation metrics, and qualitatively through a large-scale user study. Finally, we offer learned lessons and a discussion of the potential shown by multimodal models to enhance future techniques for automated software documentation.

translated by 谷歌翻译

Neural Point Catacaustics for Novel-View Synthesis of Reflections

Georgios Kopanas , Thomas Leimkühler , Gilles Rainer , Clément Jambon , George Drettakis

分类：计算机视觉

2023-01-03

View-dependent effects such as reflections pose a substantial challenge for image-based and neural rendering algorithms. Above all, curved reflectors are particularly hard, as they lead to highly non-linear reflection flows as the camera moves. We introduce a new point-based representation to compute Neural Point Catacaustics allowing novel-view synthesis of scenes with curved reflectors, from a set of casually-captured input photos. At the core of our method is a neural warp field that models catacaustic trajectories of reflections, so complex specular effects can be rendered using efficient point splatting in conjunction with a neural renderer. One of our key contributions is the explicit representation of reflections with a reflection point cloud which is displaced by the neural warp field, and a primary point cloud which is optimized to represent the rest of the scene. After a short manual annotation step, our approach allows interactive high-quality renderings of novel views with accurate reflection flow. Additionally, the explicit representation of reflection flow supports several forms of scene manipulation in captured scenes, such as reflection editing, cloning of specular objects, reflection tracking across views, and comfortable stereo viewing. We provide the source code and other supplemental material on https://repo-sam.inria.fr/ fungraph/neural_catacaustics/

translated by 谷歌翻译

Temporal Difference Learning with Compressed Updates: Error-Feedback meets Reinforcement Learning

Aritra Mitra , George J. Pappas , Hamed Hassani

分类：机器学习 | 人工智能

2023-01-03

In large-scale machine learning, recent works have studied the effects of compressing gradients in stochastic optimization in order to alleviate the communication bottleneck. These works have collectively revealed that stochastic gradient descent (SGD) is robust to structured perturbations such as quantization, sparsification, and delays. Perhaps surprisingly, despite the surge of interest in large-scale, multi-agent reinforcement learning, almost nothing is known about the analogous question: Are common reinforcement learning (RL) algorithms also robust to similar perturbations? In this paper, we investigate this question by studying a variant of the classical temporal difference (TD) learning algorithm with a perturbed update direction, where a general compression operator is used to model the perturbation. Our main technical contribution is to show that compressed TD algorithms, coupled with an error-feedback mechanism used widely in optimization, exhibit the same non-asymptotic theoretical guarantees as their SGD counterparts. We then extend our results significantly to nonlinear stochastic approximation algorithms and multi-agent settings. In particular, we prove that for multi-agent TD learning, one can achieve linear convergence speedups in the number of agents while communicating just $\tilde{O}(1)$ bits per agent at each time step. Our work is the first to provide finite-time results in RL that account for general compression operators and error-feedback in tandem with linear function approximation and Markovian sampling. Our analysis hinges on studying the drift of a novel Lyapunov function that captures the dynamics of a memory variable introduced by error feedback.

translated by 谷歌翻译

Robust Average-Reward Markov Decision Processes

Yue Wang , Alvaro Velasquez , George Atia , Ashley Prater-Bennette , Shaofeng Zou

分类：机器学习 | 人工智能

2023-01-02

In robust Markov decision processes (MDPs), the uncertainty in the transition kernel is addressed by finding a policy that optimizes the worst-case performance over an uncertainty set of MDPs. While much of the literature has focused on discounted MDPs, robust average-reward MDPs remain largely unexplored. In this paper, we focus on robust average-reward MDPs, where the goal is to find a policy that optimizes the worst-case average reward over an uncertainty set. We first take an approach that approximates average-reward MDPs using discounted MDPs. We prove that the robust discounted value function converges to the robust average-reward as the discount factor $\gamma$ goes to $1$, and moreover, when $\gamma$ is large, any optimal policy of the robust discounted MDP is also an optimal policy of the robust average-reward. We further design a robust dynamic programming approach, and theoretically characterize its convergence to the optimum. Then, we investigate robust average-reward MDPs directly without using discounted MDPs as an intermediate step. We derive the robust Bellman equation for robust average-reward MDPs, prove that the optimal policy can be derived from its solution, and further design a robust relative value iteration algorithm that provably finds its solution, or equivalently, the optimal robust policy.

translated by 谷歌翻译

Segmentation based tracking of cells in 2D+time microscopy images of macrophages

Seol Ah Park , Tamara Sipka , Zuzana Kriva , George Lutfalla , Mai Nguyen-Chi , Karol Mikula

分类：计算机视觉

2023-01-02

The automated segmentation and tracking of macrophages during their migration are challenging tasks due to their dynamically changing shapes and motions. This paper proposes a new algorithm to achieve automatic cell tracking in time-lapse microscopy macrophage data. First, we design a segmentation method employing space-time filtering, local Otsu's thresholding, and the SUBSURF (subjective surface segmentation) method. Next, the partial trajectories for cells overlapping in the temporal direction are extracted in the segmented images. Finally, the extracted trajectories are linked by considering their direction of movement. The segmented images and the obtained trajectories from the proposed method are compared with those of the semi-automatic segmentation and manual tracking. The proposed tracking achieved 97.4% of accuracy for macrophage data under challenging situations, feeble fluorescent intensity, irregular shapes, and motion of macrophages. We expect that the automatically extracted trajectories of macrophages can provide pieces of evidence of how macrophages migrate depending on their polarization modes in the situation, such as during wound healing.

translated by 谷歌翻译

On the Geometry of Reinforcement Learning in Continuous State and Action Spaces

Saket Tiwari , Omer Gottesman , George Konidaris

分类：机器学习 | 人工智能

2022-12-29

Advances in reinforcement learning have led to its successful application in complex tasks with continuous state and action spaces. Despite these advances in practice, most theoretical work pertains to finite state and action spaces. We propose building a theoretical understanding of continuous state and action spaces by employing a geometric lens. Central to our work is the idea that the transition dynamics induce a low dimensional manifold of reachable states embedded in the high-dimensional nominal state space. We prove that, under certain conditions, the dimensionality of this manifold is at most the dimensionality of the action space plus one. This is the first result of its kind, linking the geometry of the state space to the dimensionality of the action space. We empirically corroborate this upper bound for four MuJoCo environments. We further demonstrate the applicability of our result by learning a policy in this low dimensional representation. To do so we introduce an algorithm that learns a mapping to a low dimensional representation, as a narrow hidden layer of a deep neural network, in tandem with the policy using DDPG. Our experiments show that a policy learnt this way perform on par or better for four MuJoCo control suite tasks.

translated by 谷歌翻译

Effects of Data Geometry in Early Deep Learning

Saket Tiwari , George Konidaris

分类：机器学习 | 人工智能

2022-12-29

Deep neural networks can approximate functions on different types of data, from images to graphs, with varied underlying structure. This underlying structure can be viewed as the geometry of the data manifold. By extending recent advances in the theoretical understanding of neural networks, we study how a randomly initialized neural network with piece-wise linear activation splits the data manifold into regions where the neural network behaves as a linear function. We derive bounds on the density of boundary of linear regions and the distance to these boundaries on the data manifold. This leads to insights into the expressivity of randomly initialized deep neural networks on non-Euclidean data sets. We empirically corroborate our theoretical results using a toy supervised learning problem. Our experiments demonstrate that number of linear regions varies across manifolds and the results hold with changing neural network architectures. We further demonstrate how the complexity of linear regions is different on the low dimensional manifold of images as compared to the Euclidean space, using the MetFaces dataset.

translated by 谷歌翻译

Multi-Realism Image Compression with a Conditional Generator

Eirikur Agustsson , David Minnen , George Toderici , Fabian Mentzer

分类：计算机视觉 | 机器学习

2022-12-28

By optimizing the rate-distortion-realism trade-off, generative compression approaches produce detailed, realistic images, even at low bit rates, instead of the blurry reconstructions produced by rate-distortion optimized models. However, previous methods do not explicitly control how much detail is synthesized, which results in a common criticism of these methods: users might be worried that a misleading reconstruction far from the input image is generated. In this work, we alleviate these concerns by training a decoder that can bridge the two regimes and navigate the distortion-realism trade-off. From a single compressed representation, the receiver can decide to either reconstruct a low mean squared error reconstruction that is close to the input, a realistic reconstruction with high perceptual quality, or anything in between. With our method, we set a new state-of-the-art in distortion-realism, pushing the frontier of achievable distortion-realism pairs, i.e., our method achieves better distortions at high realism and better realism at low distortion than ever before.

translated by 谷歌翻译