智能论文笔记

Law to Binary Tree -- An Formal Interpretation of Legal Natural Language

Ha-Thanh Nguyen , Vu Tran , Ngoc-Cam Le , Thi-Thuy Le , Quang-Huy Nguyen , Le-Minh Nguyen , Ken Satoh

分类：自然语言处理

2022-12-16

Knowledge representation and reasoning in law are essential to facilitate the automation of legal analysis and decision-making tasks. In this paper, we propose a new approach based on legal science, specifically legal taxonomy, for representing and reasoning with legal documents. Our approach interprets the regulations in legal documents as binary trees, which facilitates legal reasoning systems to make decisions and resolve logical contradictions. The advantages of this approach are twofold. First, legal reasoning can be performed on the basis of the binary tree representation of the regulations. Second, the binary tree representation of the regulations is more understandable than the existing sentence-based representations. We provide an example of how our approach can be used to interpret the regulations in a legal document.

translated by 谷歌翻译

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Teven Le Scao , Angela Fan , Christopher Akiki , Ellie Pavlick , Suzana Ilić , Daniel Hesslow , Roman Castagné , Alexandra Sasha Luccioni , François Yvon , Matthias Gallé

分类：自然语言处理

2022-11-09

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.

translated by 谷歌翻译

DeepAoANet: Learning Angle of Arrival from Software Defined Radios with Deep Neural Networks

Zhuangzhuang Dai , Yuhang He , Tran Vu , Niki Trigoni , Andrew Markham

分类：机器学习 | 机器人

2021-12-01

基于RF信号的方向查找和定位系统因多径传播而受到显着影响，特别是在室内环境中。现有算法（例如音乐）在多径存在的情况下解决到达角度（AOA）或在弱信号方案中操作时表现不佳。我们注意到数字采样的RF前端允许轻松分析信号和延迟组件。低成本软件定义的无线电（SDR）模块使能跨宽频谱的通道状态信息（CSI）提取，激励增强的到达角度（AOA）解决方案的设计。我们提出了一种深入的学习方法，可以从SDR多通道数据的单一快照派生AOA。我们比较和对比基于深度学习的角度分类和回归模型，准确地估计最多两个AOA。我们已经在不同平台上实施了推理引擎，实时提取了AOA，展示了我们方法的计算途径。为了证明我们的方法的效用，我们在各种视角（LOS）和非线视线中收集了来自四元通用线性阵列（ULA）的IQ（同步和正交组件）样本（ NLOS）环境，并发布了数据集。我们所提出的方法在确定撞击信号的数量并实现平均值为2 ^ {\ rIC} $ 2 ^ {\ cird} $时，我们提出的方法展示了出色的可靠性。

translated by 谷歌翻译

Robust Multi-Robot Coverage of Unknown Environments using a Distributed Robot Swarm

Vu Phi Tran , Matthew A. Garratt , Kathryn Kasmarik , Sreenatha G. Anavatti

分类：机器人

2021-11-29

在移动机器人学中，区域勘探和覆盖率是关键能力。在大多数可用研究中，共同的假设是全球性，远程通信和集中合作。本文提出了一种新的基于群的覆盖控制算法，可以放松这些假设。该算法组合了两个元素：Swarm规则和前沿搜索算法。受到大量简单代理（例如，教育鱼，植绒鸟类，蜂拥昆虫）的自然系统的启发，第一元素使用三个简单的规则来以分布式方式维持群体形成。第二元素提供了选择有希望区域以使用涉及代理的相对位置的成本函数的最小化来探索（和覆盖）的装置。我们在不同环境中测试了我们的方法对异质和同质移动机器人的性能。我们衡量覆盖性能和允许本集团维持沟通的覆盖性能和群体形成统计数据。通过一系列比较实验，我们展示了拟议的策略在最近提出的地图覆盖方法和传统的人工潜在领域基于细胞覆盖，转变和安全路径的百分比，同时保持允许短程的形成沟通。

translated by 谷歌翻译

A New Look and Convergence Rate of Federated Multi-Task Learning with Laplacian Regularization

Canh T. Dinh , Tung T. Vu , Nguyen H. Tran , Minh N. Dao , Hongyu Zhang

分类：机器学习

2021-02-14

客户端之间的非独立和相同分布（非IID）数据分布被视为降低联合学习（FL）性能的关键因素。处理非IID数据（如个性化FL和联邦多任务学习（FMTL）的几种方法对研究社区有很大兴趣。在这项工作中，首先，我们使用Laplacian正规化制定FMTL问题，明确地利用客户模型之间的关系进行多任务学习。然后，我们介绍了FMTL问题的新视图，首次表明配制的FMTL问题可用于传统的FL和个性化FL。我们还提出了两种算法FEDU和DFEDU，分别解决了通信集中和分散方案中的配制FMTL问题。从理论上讲，我们证明了两种算法的收敛速率实现了用于非凸起目标的强大凸起和载位加速的线性加速。实验，我们表明我们的算法优于FL设置的传统算法FedVG，在FMTL设置中的Mocha，以及个性化流程中的PFEDME和PER-FEDAVG。

translated by 谷歌翻译

Noise-robust classification with hypergraph neural network

Nguyen Trinh Vu Dang , Loc Tran , Linh Tran

分类： (统计)机器学习 | 机器学习

2021-02-03

本文介绍了HyperGraph神经网络方法的新颖版本。该方法用于解决嘈杂的标签学习问题。首先，我们将PCA尺寸还原技术应用于图像数据集的特征矩阵，以减少图像数据集的特征矩阵中的“噪声”和冗余功能方法。然后，基于经典的半监督学习方法，经典的基于超毛图的半手法学习方法，图形神经网络，HyperGraph神经网络和我们提出的HyperGraph神经网络用于解决嘈杂的标签学习问题。评估和比较这五种方法的精度。实验结果表明，当噪声水平提高时，超图神经网络方法达到了最佳性能。此外，高图神经网络方法至少与图神经网络一样好。

translated by 谷歌翻译

Invalidator: Automated Patch Correctness Assessment via Semantic and Syntactic Reasoning

Thanh Le-Cong , Duc-Minh Luong , Xuan Bach D. Le , David Lo , Nhat-Hoa Tran , Bui Quang-Huy , Quyet-Thang Huynh

分类：机器学习

2023-01-03

In this paper, we propose a novel technique, namely INVALIDATOR, to automatically assess the correctness of APR-generated patches via semantic and syntactic reasoning. INVALIDATOR reasons about program semantic via program invariants while it also captures program syntax via language semantic learned from large code corpus using the pre-trained language model. Given a buggy program and the developer-patched program, INVALIDATOR infers likely invariants on both programs. Then, INVALIDATOR determines that a APR-generated patch overfits if: (1) it violates correct specifications or (2) maintains errors behaviors of the original buggy program. In case our approach fails to determine an overfitting patch based on invariants, INVALIDATOR utilizes a trained model from labeled patches to assess patch correctness based on program syntax. The benefit of INVALIDATOR is three-fold. First, INVALIDATOR is able to leverage both semantic and syntactic reasoning to enhance its discriminant capability. Second, INVALIDATOR does not require new test cases to be generated but instead only relies on the current test suite and uses invariant inference to generalize the behaviors of a program. Third, INVALIDATOR is fully automated. We have conducted our experiments on a dataset of 885 patches generated on real-world programs in Defects4J. Experiment results show that INVALIDATOR correctly classified 79% overfitting patches, accounting for 23% more overfitting patches being detected by the best baseline. INVALIDATOR also substantially outperforms the best baselines by 14% and 19% in terms of Accuracy and F-Measure, respectively.

translated by 谷歌翻译

Neural Collapse in Deep Linear Network: From Balanced to Imbalanced Data

Hien Dang , Tan Nguyen , Tho Tran , Hung Tran , Nhat Ho

分类：机器学习 | (统计)机器学习

2023-01-01

Modern deep neural networks have achieved superhuman performance in tasks from image classification to game play. Surprisingly, these various complex systems with massive amounts of parameters exhibit the same remarkable structural properties in their last-layer features and classifiers across canonical datasets. This phenomenon is known as "Neural Collapse," and it was discovered empirically by Papyan et al. \cite{Papyan20}. Recent papers have theoretically shown the global solutions to the training network problem under a simplified "unconstrained feature model" exhibiting this phenomenon. We take a step further and prove the Neural Collapse occurrence for deep linear network for the popular mean squared error (MSE) and cross entropy (CE) loss. Furthermore, we extend our research to imbalanced data for MSE loss and present the first geometric analysis for Neural Collapse under this setting.

translated by 谷歌翻译

A Machine Learning Case Study for AI-empowered echocardiography of Intensive Care Unit Patients in low- and middle-income countries

Xochicale Miguel , Thwaites Louise , Yacoub Sophie , Pisani Luigi , Tran Huy Nhat Phung , Kerdegari Hamideh , King Andrew , Gomez Alberto

分类：机器学习

2022-12-30

We present a Machine Learning (ML) study case to illustrate the challenges of clinical translation for a real-time AI-empowered echocardiography system with data of ICU patients in LMICs. Such ML case study includes data preparation, curation and labelling from 2D Ultrasound videos of 31 ICU patients in LMICs and model selection, validation and deployment of three thinner neural networks to classify apical four-chamber view. Results of the ML heuristics showed the promising implementation, validation and application of thinner networks to classify 4CV with limited datasets. We conclude this work mentioning the need for (a) datasets to improve diversity of demographics, diseases, and (b) the need of further investigations of thinner models to be run and implemented in low-cost hardware to be clinically translated in the ICU in LMICs. The code and other resources to reproduce this work are available at https://github.com/vital-ultrasound/ai-assisted-echocardiography-for-low-resource-countries.

translated by 谷歌翻译

Proof of Swarm Based Ensemble Learning for Federated Learning Applications

Ali Raza , Kim Phuc Tran , Ludovic Koehl , Shujun Li

分类：机器学习 | 人工智能

2022-12-28

Ensemble learning combines results from multiple machine learning models in order to provide a better and optimised predictive model with reduced bias, variance and improved predictions. However, in federated learning it is not feasible to apply centralised ensemble learning directly due to privacy concerns. Hence, a mechanism is required to combine results of local models to produce a global model. Most distributed consensus algorithms, such as Byzantine fault tolerance (BFT), do not normally perform well in such applications. This is because, in such methods predictions of some of the peers are disregarded, so a majority of peers can win without even considering other peers' decisions. Additionally, the confidence score of the result of each peer is not normally taken into account, although it is an important feature to consider for ensemble learning. Moreover, the problem of a tie event is often left un-addressed by methods such as BFT. To fill these research gaps, we propose PoSw (Proof of Swarm), a novel distributed consensus algorithm for ensemble learning in a federated setting, which was inspired by particle swarm based algorithms for solving optimisation problems. The proposed algorithm is theoretically proved to always converge in a relatively small number of steps and has mechanisms to resolve tie events while trying to achieve sub-optimum solutions. We experimentally validated the performance of the proposed algorithm using ECG classification as an example application in healthcare, showing that the ensemble learning model outperformed all local models and even the FL-based global model. To the best of our knowledge, the proposed algorithm is the first attempt to make consensus over the output results of distributed models trained using federated learning.

translated by 谷歌翻译