智能论文笔记

FedLesScan: Mitigating Stragglers in Serverless Federated Learning

Mohamed Elzohairy , Mohak Chadha , Anshul Jindal , Andreas Grafberger , Jianfeng Gu , Michael Gerndt , Osama Abboud

分类：机器学习

2022-11-10

Federated Learning (FL) is a machine learning paradigm that enables the training of a shared global model across distributed clients while keeping the training data local. While most prior work on designing systems for FL has focused on using stateful always running components, recent work has shown that components in an FL system can greatly benefit from the usage of serverless computing and Function-as-a-Service technologies. To this end, distributed training of models with severless FL systems can be more resource-efficient and cheaper than conventional FL systems. However, serverless FL systems still suffer from the presence of stragglers, i.e., slow clients due to their resource and statistical heterogeneity. While several strategies have been proposed for mitigating stragglers in FL, most methodologies do not account for the particular characteristics of serverless environments, i.e., cold-starts, performance variations, and the ephemeral stateless nature of the function instances. Towards this, we propose FedLesScan, a novel clustering-based semi-asynchronous training strategy, specifically tailored for serverless FL. FedLesScan dynamically adapts to the behaviour of clients and minimizes the effect of stragglers on the overall system. We implement our strategy by extending an open-source serverless FL system called FedLess. Moreover, we comprehensively evaluate our strategy using the 2nd generation Google Cloud Functions with four datasets and varying percentages of stragglers. Results from our experiments show that compared to other approaches FedLesScan reduces training time and cost by an average of 8% and 20% respectively while utilizing clients better with an average increase in the effective update ratio of 17.75%.

translated by 谷歌翻译

Fine-Tuning Large Neural Language Models for Biomedical Natural Language Processing

Robert Tinn , Hao Cheng , Yu Gu , Naoto Usuyama , Xiaodong Liu , Tristan Naumann , Jianfeng Gao , Hoifung Poon

分类：自然语言处理 | 机器学习

2021-12-15

动机：生物医学研究人员和临床从业者的常年挑战是随着出版物和医疗票据的快速增长而待的。自然语言处理（NLP）已成为驯服信息超载的有希望的方向。特别是，大型神经语言模型通过预先绘制的文本预测，通过各种NLP应用中的BERT模型的成功示例，便于通过预先绘制的预先来进行学习。然而，用于结束任务的微调此类模型仍然具有挑战性，特别是具有小标记数据集，这些数据集是生物医学NLP的常见。结果：我们对生物医学NLP的微调稳定性进行了系统研究。我们表明FineTuning性能可能对预先预订的设置敏感，尤其是在低资源域中。大型型号有可能获得更好的性能，但越来越多的模型大小也加剧了FineTuning不稳定性。因此，我们对解决微调不稳定的技术进行了全面的探索。我们表明，这些技术可以大大提高低源生物医学NLP应用的微调性能。具体地，冻结下层有助于标准伯特基型号，而完整的衰减对于BERT-LARD和Electra型号更有效。对于低资源文本相似性任务，如生物，重新初始化顶层是最佳策略。总体而言，占星型词汇和预制促进更强大的微调模型。基于这些调查结果，我们在广泛的生物医学NLP应用方面建立了新的技术。可用性和实施：为了促进生物医学NLP的进展，我们释放了我们最先进的预订和微调模型：https://aka.ms/blurb。

translated by 谷歌翻译

FedLess: Secure and Scalable Federated Learning Using Serverless Computing

Andreas Grafberger , Mohak Chadha , Anshul Jindal , Jianfeng Gu , Michael Gerndt

分类：机器学习

2021-11-05

传统的深度学习方法（DL）需要在中央服务器上收集和处理的培训数据，这些中央服务器通常在保健等隐私敏感域中挑战。为此，提出了一种新的学习范式，称为联合学习（FL），在解决隐私和数据所有权问题的同时将DL的潜力带到了这些域。 FL使远程客户端能够在保持数据本地时学习共享ML模型。然而，传统的FL系统面临多种挑战，例如可扩展性，复杂的基础设施管理，并且由于空闲客户端而被浪费的计算和产生的成本。 FL系统的这些挑战与无服务器计算和功能 - AS-Service（FAAS）平台旨在解决的核心问题密切对齐。这些包括快速可扩展性，无基础设施管理，自动缩放为空闲客户端，以及每次使用付费计费模型。为此，我们为无服务器FL展示了一个新颖的系统和框架，称为不发烟。我们的系统支持多个商业和自主主机的FAAS提供商，可以在机构数据中心和边缘设备上部署在云端，内部部署。据我们所知，我们是第一个能够在一大面料的异构FAAS提供商中启用FL，同时提供安全性和差异隐私等重要功能。我们展示了全面的实验，即使用我们的系统可以成功地培训多达200个客户功能的不同任务，更容易实现。此外，我们通过将其与传统的FL系统进行比较来证明我们的方法的实际可行性，并表明它可以更便宜，更资源效率更便宜。

translated by 谷歌翻译

A Comparative Study of Image Disguising Methods for Confidential Outsourced Learning

Sagar Sharma , Yuechun Gu , Keke Chen

分类：机器学习

2022-12-31

Large training data and expensive model tweaking are standard features of deep learning for images. As a result, data owners often utilize cloud resources to develop large-scale complex models, which raises privacy concerns. Existing solutions are either too expensive to be practical or do not sufficiently protect the confidentiality of data and models. In this paper, we study and compare novel \emph{image disguising} mechanisms, DisguisedNets and InstaHide, aiming to achieve a better trade-off among the level of protection for outsourced DNN model training, the expenses, and the utility of data. DisguisedNets are novel combinations of image blocktization, block-level random permutation, and two block-level secure transformations: random multidimensional projection (RMT) and AES pixel-level encryption (AES). InstaHide is an image mixup and random pixel flipping technique \cite{huang20}. We have analyzed and evaluated them under a multi-level threat model. RMT provides a better security guarantee than InstaHide, under the Level-1 adversarial knowledge with well-preserved model quality. In contrast, AES provides a security guarantee under the Level-2 adversarial knowledge, but it may affect model quality more. The unique features of image disguising also help us to protect models from model-targeted attacks. We have done an extensive experimental evaluation to understand how these methods work in different settings for different datasets.

translated by 谷歌翻译

Translating Text Synopses to Video Storyboards

Xu Gu , Yuchong Sun , Feiyue Ni , Shizhe Chen , Ruihua Song , Boyuan Li , Xiang Cao

分类：计算机视觉

2022-12-31

A storyboard is a roadmap for video creation which consists of shot-by-shot images to visualize key plots in a text synopsis. Creating video storyboards however remains challenging which not only requires association between high-level texts and images, but also demands for long-term reasoning to make transitions smooth across shots. In this paper, we propose a new task called Text synopsis to Video Storyboard (TeViS) which aims to retrieve an ordered sequence of images to visualize the text synopsis. We construct a MovieNet-TeViS benchmark based on the public MovieNet dataset. It contains 10K text synopses each paired with keyframes that are manually selected from corresponding movies by considering both relevance and cinematic coherence. We also present an encoder-decoder baseline for the task. The model uses a pretrained vision-and-language model to improve high-level text-image matching. To improve coherence in long-term shots, we further propose to pre-train the decoder on large-scale movie frames without text. Experimental results demonstrate that our proposed model significantly outperforms other models to create text-relevant and coherent storyboards. Nevertheless, there is still a large gap compared to human performance suggesting room for promising future work.

translated by 谷歌翻译

Pontryagin Optimal Controller via Neural Networks

Chengyang Gu , Yize Chen

分类：机器学习

2022-12-30

Solving real-world optimal control problems are challenging tasks, as the system dynamics can be highly non-linear or including nonconvex objectives and constraints, while in some cases the dynamics are unknown, making it hard to numerically solve the optimal control actions. To deal with such modeling and computation challenges, in this paper, we integrate Neural Networks with the Pontryagin's Minimum Principle (PMP), and propose a computationally efficient framework NN-PMP. The resulting controller can be implemented for systems with unknown and complex dynamics. It can not only utilize the accurate surrogate models parameterized by neural networks, but also efficiently recover the optimality conditions along with the optimal action sequences via PMP conditions. A toy example on a nonlinear Martian Base operation along with a real-world lossy energy storage arbitrage example demonstrates our proposed NN-PMP is a general and versatile computation tool for finding optimal solutions. Compared with solutions provided by the numerical optimization solver with approximated linear dynamics, NN-PMP achieves more efficient system modeling and higher performance in terms of control objectives.

translated by 谷歌翻译

NeMo: 3D Neural Motion Fields from Multiple Video Instances of the Same Action

Kuan-Chieh Wang , Zhenzhen Weng , Maria Xenochristou , Joao Pedro Araujo , Jeffrey Gu , C. Karen Liu , Serena Yeung

分类：计算机视觉

2022-12-28

The task of reconstructing 3D human motion has wideranging applications. The gold standard Motion capture (MoCap) systems are accurate but inaccessible to the general public due to their cost, hardware and space constraints. In contrast, monocular human mesh recovery (HMR) methods are much more accessible than MoCap as they take single-view videos as inputs. Replacing the multi-view Mo- Cap systems with a monocular HMR method would break the current barriers to collecting accurate 3D motion thus making exciting applications like motion analysis and motiondriven animation accessible to the general public. However, performance of existing HMR methods degrade when the video contains challenging and dynamic motion that is not in existing MoCap datasets used for training. This reduces its appeal as dynamic motion is frequently the target in 3D motion recovery in the aforementioned applications. Our study aims to bridge the gap between monocular HMR and multi-view MoCap systems by leveraging information shared across multiple video instances of the same action. We introduce the Neural Motion (NeMo) field. It is optimized to represent the underlying 3D motions across a set of videos of the same action. Empirically, we show that NeMo can recover 3D motion in sports using videos from the Penn Action dataset, where NeMo outperforms existing HMR methods in terms of 2D keypoint detection. To further validate NeMo using 3D metrics, we collected a small MoCap dataset mimicking actions in Penn Action,and show that NeMo achieves better 3D reconstruction compared to various baselines.

translated by 谷歌翻译

Quality at the Tail

Zhengxin Yang , Wanling Gao , Chunjie Luo , Lei Wang , Jianfeng Zhan

分类：机器学习 | 人工智能 | 计算机视觉

2022-12-25

Practical applications employing deep learning must guarantee inference quality. However, we found that the inference quality of state-of-the-art and state-of-the-practice in practical applications has a long tail distribution. In the real world, many tasks have strict requirements for the quality of deep learning inference, such as safety-critical and mission-critical tasks. The fluctuation of inference quality seriously affects its practical applications, and the quality at the tail may lead to severe consequences. State-of-the-art and state-of-the-practice with outstanding inference quality designed and trained under loose constraints still have poor inference quality under constraints with practical application significance. On the one hand, the neural network models must be deployed on complex systems with limited resources. On the other hand, safety-critical and mission-critical tasks need to meet more metric constraints while ensuring high inference quality. We coin a new term, ``tail quality,'' to characterize this essential requirement and challenge. We also propose a new metric, ``X-Critical-Quality,'' to measure the inference quality under certain constraints. This article reveals factors contributing to the failure of using state-of-the-art and state-of-the-practice algorithms and systems in real scenarios. Therefore, we call for establishing innovative methodologies and tools to tackle this enormous challenge.

translated by 谷歌翻译

Do DALL-E and Flamingo Understand Each Other?

Hang Li , Jindong Gu , Rajat Koner , Sahand Sharifzadeh , Volker Tresp

分类：计算机视觉 | 机器学习

2022-12-23

A major goal of multimodal research is to improve machine understanding of images and text. Tasks include image captioning, text-to-image generation, and vision-language representation learning. So far, research has focused on the relationships between images and text. For example, captioning models attempt to understand the semantics of images which are then transformed into text. An important question is: which annotation reflects best a deep understanding of image content? Similarly, given a text, what is the best image that can present the semantics of the text? In this work, we argue that the best text or caption for a given image is the text which would generate the image which is the most similar to that image. Likewise, the best image for a given text is the image that results in the caption which is best aligned with the original text. To this end, we propose a unified framework that includes both a text-to-image generative model and an image-to-text generative model. Extensive experiments validate our approach.

translated by 谷歌翻译

GAN-based Domain Inference Attack

Yuechun Gu , Keke Chen

分类：机器学习 | 人工智能

2022-12-22

Model-based attacks can infer training data information from deep neural network models. These attacks heavily depend on the attacker's knowledge of the application domain, e.g., using it to determine the auxiliary data for model-inversion attacks. However, attackers may not know what the model is used for in practice. We propose a generative adversarial network (GAN) based method to explore likely or similar domains of a target model -- the model domain inference (MDI) attack. For a given target (classification) model, we assume that the attacker knows nothing but the input and output formats and can use the model to derive the prediction for any input in the desired form. Our basic idea is to use the target model to affect a GAN training process for a candidate domain's dataset that is easy to obtain. We find that the target model may distract the training procedure less if the domain is more similar to the target domain. We then measure the distraction level with the distance between GAN-generated datasets, which can be used to rank candidate domains for the target model. Our experiments show that the auxiliary dataset from an MDI top-ranked domain can effectively boost the result of model-inversion attacks.

translated by 谷歌翻译