智能论文笔记

Invalidator: Automated Patch Correctness Assessment via Semantic and Syntactic Reasoning

Thanh Le-Cong , Duc-Minh Luong , Xuan Bach D. Le , David Lo , Nhat-Hoa Tran , Bui Quang-Huy , Quyet-Thang Huynh

分类：机器学习

2023-01-03

In this paper, we propose a novel technique, namely INVALIDATOR, to automatically assess the correctness of APR-generated patches via semantic and syntactic reasoning. INVALIDATOR reasons about program semantic via program invariants while it also captures program syntax via language semantic learned from large code corpus using the pre-trained language model. Given a buggy program and the developer-patched program, INVALIDATOR infers likely invariants on both programs. Then, INVALIDATOR determines that a APR-generated patch overfits if: (1) it violates correct specifications or (2) maintains errors behaviors of the original buggy program. In case our approach fails to determine an overfitting patch based on invariants, INVALIDATOR utilizes a trained model from labeled patches to assess patch correctness based on program syntax. The benefit of INVALIDATOR is three-fold. First, INVALIDATOR is able to leverage both semantic and syntactic reasoning to enhance its discriminant capability. Second, INVALIDATOR does not require new test cases to be generated but instead only relies on the current test suite and uses invariant inference to generalize the behaviors of a program. Third, INVALIDATOR is fully automated. We have conducted our experiments on a dataset of 885 patches generated on real-world programs in Defects4J. Experiment results show that INVALIDATOR correctly classified 79% overfitting patches, accounting for 23% more overfitting patches being detected by the best baseline. INVALIDATOR also substantially outperforms the best baselines by 14% and 19% in terms of Accuracy and F-Measure, respectively.

translated by 谷歌翻译

VulCurator: A Vulnerability-Fixing Commit Detector

Truong Giang Nguyen , Thanh Le-Cong , Hong Jin Kang , Xuan-Bach D. Le , David Lo

分类：人工智能

2022-09-07

如今，随着发现的OSS漏洞的数量，开源软件（OSS）漏洞管理流程随着时间的流逝而增加。监视漏洞固定提交是防止脆弱性开发的标准过程的一部分。但是，由于可能有大量的审查，手动检测漏洞固定的犯罪是耗时的。最近，已经提出了许多技术来自动检测使用机器学习的漏洞固定提交。这些解决方案要么：（1）不使用深度学习，或（2）仅对有限的信息来源使用深度学习。本文提出了藤条，该工具利用了更丰富的信息来源，包括提交消息，代码更改和针对漏洞固定的提交分类的报告。我们的实验结果表明，在F1得分方面，沃尔维尔剂的表现优于最先进的基线。 Vulcurator工具可在https://github.com/ntgiang71096/vfdetector和https://zenodo.org/record/7034132#.yw3mn-xbzdi上公开获得。

translated by 谷歌翻译

AutoPruner: Transformer-Based Call Graph Pruning

Thanh Le-Cong , Hong Jin Kang , Truong Giang Nguyen , Stefanus Agus Haryono , David Lo , Xuan-Bach D. Le , Huynh Quyet Thang

分类：人工智能

2022-09-07

构建静态呼叫图需要在健全和精度之间进行权衡。不幸的是，用于构建呼叫图的程序分析技术通常不精确。为了解决这个问题，研究人员最近提出了通过机器学习为静态分析构建的后处理呼叫图所授权的呼叫图。机器学习模型的构建是为了通过在随机森林分类器中提取结构特征来捕获呼叫图中的信息。然后，它消除了预测为误报的边缘。尽管机器学习模型显示了改进，但它们仍然受到限制，因为它们不考虑源代码语义，因此通常无法有效地区分真实和误报。在本文中，我们提出了一种新颖的呼叫图修剪技术AutoRoprouner，用于通过统计语义和结构分析消除呼叫图中的假阳性。给定一个由传统静态分析工具构建的呼叫图，AutoProuner采用基于变压器的方法来捕获呼叫者与呼叫图中每个边缘相关的呼叫者和Callee函数之间的语义关系。为此，AutoProuner微型调节模型是在大型语料库上预先训练的代码模型，以根据其语义的描述表示源代码。接下来，该模型用于从与呼叫图中的每个边缘相关的功能中提取语义特征。 AutoProuner使用这些语义功能以及从呼叫图提取的结构特征通过馈送前向神经网络分类。我们在现实世界程序的基准数据集上进行的经验评估表明，AutoProuner的表现优于最先进的基线，从而改善了F量级，在识别静态呼叫图中识别错误阳性边缘方面，高达13％。

translated by 谷歌翻译

SAFL: A Self-Attention Scene Text Recognizer with Focal Loss

Bao Hieu Tran , Thanh Le-Cong , Huu Manh Nguyen , Duc Anh Le , Thanh Hung Nguyen , Phi Le Nguyen

分类：计算机视觉 | 机器学习

2022-01-01

在过去的几十年中，由于其在广泛的应用中，现场文本认可从学术界和实际用户获得了全世界的关注。尽管在光学字符识别方面取得了成就，但由于诸如扭曲或不规则布局等固有问题，现场文本识别仍然具有挑战性。大多数现有方法主要利用基于复发或卷积的神经网络。然而，虽然经常性的神经网络（RNN）通常由于顺序计算而遭受慢的训练速度，并且遇到消失的梯度或瓶颈，但CNN在复杂性和性能之间衡量折衷。在本文中，我们介绍了SAFL，一种基于自我关注的神经网络模型，具有场景文本识别的焦点损失，克服现有方法的限制。使用焦损而不是负值对数似然有助于模型更多地关注低频样本训练。此外，为应对扭曲和不规则文本，我们在传递到识别网络之前，我们利用空间变换（STN）来纠正文本。我们执行实验以比较拟议模型的性能与七个基准。数值结果表明，我们的模型实现了最佳性能。

translated by 谷歌翻译

Usability and Aesthetics: Better Together for Automated Repair of Web Pages

Thanh Le-Cong , Xuan Bach D. Le , Quyet-Thang Huynh , Phi-Le Nguyen

分类：神经与进化计算

2022-01-01

随着近期智能手机或平板电脑的移动设备的爆炸性增长，保证了所有环境的一致网页外观已成为一个重大问题。这只是因为很难跟踪不同大小和渲染网页的设备类型的网络外观。因此，修复网页的不一致外观可能是困难的，并且所产生的成本可能是巨大的，例如，由于它的用户体验和财务损失差。最近，已经提出了自动化的Web修复技术来自动解决不一致的网页外观，专注于提高可用性。然而，生成的补丁倾向于破坏网页的布局，使修复的网页呈现美学令人难以释放，例如扭曲的图像或组件的未对准。在本文中，我们提出了一种基于Meta-heuristic算法的网页自动修复方法，可以保证可用性和美学。赋予我们方法的关键新颖性是一种新颖的健身功能，使我们能够乐观地发展错误的网页，以查找同时优化可用性和美学的最佳解决方案。实证评估表明，我们的方法能够在94％的评估科目中成功解决移动友好问题，在可用性和美学方面显着优于最先进的基线技术。

translated by 谷歌翻译

Toward the Analysis of Graph Neural Networks

Thanh-Dat Nguyen , Thanh Le-Cong , ThanhVu H. Nguyen , Xuan-Bach D. Le , Quyet-Thang Huynh

分类：机器学习

2022-01-01

图表神经网络（GNNS）最近被呈现为用于图形结构数据的强大框架。它们已应用于许多问题，如知识图分析，社交网络推荐，甚至Covid19检测和疫苗发展。然而，与其他深度神经网络（例如馈送前进神经网络（FFNN））不同，诸如验证和性质推论的诸多分析存在，可能是由于GNN的动态行为，这可以采用任意图形作为输入，而仅采用固定大小的FFNN数值vecors作为输入。本文提出了一种通过将它们转换为FFNNS并重用现有的FFNN分析来分析GNN的方法。我们讨论各种设计，以确保转化的可扩展性和准确性。我们在节点分类的研究案例上说明了我们的方法。我们认为，我们的方法开启了了解和分析GNN的新研究方向。

translated by 谷歌翻译

An Adaptive Kernel Approach to Federated Learning of Heterogeneous Causal Effects

Thanh Vinh Vo , Arnab Bhattacharyya , Young Lee , Tze-Yun Leong

分类：机器学习 | 人工智能 | (统计)机器学习

2023-01-01

We propose a new causal inference framework to learn causal effects from multiple, decentralized data sources in a federated setting. We introduce an adaptive transfer algorithm that learns the similarities among the data sources by utilizing Random Fourier Features to disentangle the loss function into multiple components, each of which is associated with a data source. The data sources may have different distributions; the causal effects are independently and systematically incorporated. The proposed method estimates the similarities among the sources through transfer coefficients, and hence requiring no prior information about the similarity measures. The heterogeneous causal effects can be estimated with no sharing of the raw training data among the sources, thus minimizing the risk of privacy leak. We also provide minimax lower bounds to assess the quality of the parameters learned from the disparate sources. The proposed method is empirically shown to outperform the baselines on decentralized data sources with dissimilar distributions.

translated by 谷歌翻译

Multisensor Data Fusion for Reliable Obstacle Avoidance

Thanh Nguyen Canh , Truong Son Nguyen , Cong Hoang Quach , Xiem HoangVan , Manh Duong Phung

分类：机器人

2022-12-26

In this work, we propose a new approach that combines data from multiple sensors for reliable obstacle avoidance. The sensors include two depth cameras and a LiDAR arranged so that they can capture the whole 3D area in front of the robot and a 2D slide around it. To fuse the data from these sensors, we first use an external camera as a reference to combine data from two depth cameras. A projection technique is then introduced to convert the 3D point cloud data of the cameras to its 2D correspondence. An obstacle avoidance algorithm is then developed based on the dynamic window approach. A number of experiments have been conducted to evaluate our proposed approach. The results show that the robot can effectively avoid static and dynamic obstacles of different shapes and sizes in different environments.

translated by 谷歌翻译

Front-door Adjustment via Style Transfer for Out-of-distribution Generalisation

Toan Nguyen , Kien Do , Duc Thanh Nguyen , Bao Duong , Thin Nguyen

分类：计算机视觉 | 人工智能

2022-12-06

Out-of-distribution (OOD) generalisation aims to build a model that can well generalise its learnt knowledge from source domains to an unseen target domain. However, current image classification models often perform poorly in the OOD setting due to statistically spurious correlations learning from model training. From causality-based perspective, we formulate the data generation process in OOD image classification using a causal graph. On this graph, we show that prediction P(Y|X) of a label Y given an image X in statistical learning is formed by both causal effect P(Y|do(X)) and spurious effects caused by confounding features (e.g., background). Since the spurious features are domain-variant, the prediction P(Y|X) becomes unstable on unseen domains. In this paper, we propose to mitigate the spurious effect of confounders using front-door adjustment. In our method, the mediator variable is hypothesized as semantic features that are essential to determine a label for an image. Inspired by capability of style transfer in image generation, we interpret the combination of the mediator variable with different generated images in the front-door formula and propose novel algorithms to estimate it. Extensive experimental results on widely used benchmark datasets verify the effectiveness of our method.

translated by 谷歌翻译

QC-StyleGAN -- Quality Controllable Image Generation and Manipulation

Dat Viet Thanh Nguyen , Phong Tran The , Tan M. Dinh , Cuong Pham , Anh Tuan Tran

分类：计算机视觉 | 人工智能

2022-12-02

The introduction of high-quality image generation models, particularly the StyleGAN family, provides a powerful tool to synthesize and manipulate images. However, existing models are built upon high-quality (HQ) data as desired outputs, making them unfit for in-the-wild low-quality (LQ) images, which are common inputs for manipulation. In this work, we bridge this gap by proposing a novel GAN structure that allows for generating images with controllable quality. The network can synthesize various image degradation and restore the sharp image via a quality control code. Our proposed QC-StyleGAN can directly edit LQ images without altering their quality by applying GAN inversion and manipulation techniques. It also provides for free an image restoration solution that can handle various degradations, including noise, blur, compression artifacts, and their mixtures. Finally, we demonstrate numerous other applications such as image degradation synthesis, transfer, and interpolation.

translated by 谷歌翻译