Vanilla联合学习(FL)依赖于集中的全球聚合机制,并假设所有客户都是诚实的。这使得FL减轻单一失败和不诚实客户的挑战。由于FL和区块链的好处(例如,民主,激励性和不变性),FL的设计理念中的这些即将到来的挑战呼吁基于区块链的联邦学习(BFL)。但是,香草BFL中的一个问题是,它的功能不会以动态的方式遵循采用者的需求。此外,Vanilla BFL依赖于无法验证的客户的自我报告的贡献,例如数据大小,因为在FL中不允许检查客户的原始数据是否存在隐私问题。我们设计和评估了一种新型的BFL框架,并以更大的灵活性和激励机制(称为Fair-BFL)解决了香草BFL中确定的挑战。与现有作品相反,Fair-BFL通过模块化设计提供了前所未有的灵活性,使采用者可以按照动态的方式调整其业务需求的能力。我们的设计说明了BFL量化每个客户对全球学习过程的贡献的能力。这种量化提供了一个合理的指标,可以在联邦客户之间分配奖励,并帮助发现可能毒害全球模型的恶意参与者。
translated by 谷歌翻译
Transformer models have achieved great success across many NLP problems. However, previous studies in automated ICD coding concluded that these models fail to outperform some of the earlier solutions such as CNN-based models. In this paper we challenge this conclusion. We present a simple and scalable method to process long text with the existing transformer models such as BERT. We show that this method significantly improves the previous results reported for transformer models in ICD coding, and is able to outperform one of the prominent CNN-based methods.
translated by 谷歌翻译
Our paper aims to analyze political polarization in US political system using Language Models, and thereby help candidates make an informed decision. The availability of this information will help voters understand their candidates views on the economy, healthcare, education and other social issues. Our main contributions are a dataset extracted from Wikipedia that spans the past 120 years and a Language model based method that helps analyze how polarized a candidate is. Our data is divided into 2 parts, background information and political information about a candidate, since our hypothesis is that the political views of a candidate should be based on reason and be independent of factors such as birthplace, alma mater, etc. We further split this data into 4 phases chronologically, to help understand if and how the polarization amongst candidates changes. This data has been cleaned to remove biases. To understand the polarization we begin by showing results from some classical language models in Word2Vec and Doc2Vec. And then use more powerful techniques like the Longformer, a transformer based encoder, to assimilate more information and find the nearest neighbors of each candidate based on their political view and their background.
translated by 谷歌翻译
Object movement identification is one of the most researched problems in the field of computer vision. In this task, we try to classify a pixel as foreground or background. Even though numerous traditional machine learning and deep learning methods already exist for this problem, the two major issues with most of them are the need for large amounts of ground truth data and their inferior performance on unseen videos. Since every pixel of every frame has to be labeled, acquiring large amounts of data for these techniques gets rather expensive. Recently, Zhao et al. [1] proposed one of a kind Arithmetic Distribution Neural Network (ADNN) for universal background subtraction which utilizes probability information from the histogram of temporal pixels and achieves promising results. Building onto this work, we developed an intelligent video surveillance system that uses ADNN architecture for motion detection, trims the video with parts only containing motion, and performs anomaly detection on the trimmed video.
translated by 谷歌翻译
Recent advances in deep learning have enabled us to address the curse of dimensionality (COD) by solving problems in higher dimensions. A subset of such approaches of addressing the COD has led us to solving high-dimensional PDEs. This has resulted in opening doors to solving a variety of real-world problems ranging from mathematical finance to stochastic control for industrial applications. Although feasible, these deep learning methods are still constrained by training time and memory. Tackling these shortcomings, Tensor Neural Networks (TNN) demonstrate that they can provide significant parameter savings while attaining the same accuracy as compared to the classical Dense Neural Network (DNN). In addition, we also show how TNN can be trained faster than DNN for the same accuracy. Besides TNN, we also introduce Tensor Network Initializer (TNN Init), a weight initialization scheme that leads to faster convergence with smaller variance for an equivalent parameter count as compared to a DNN. We benchmark TNN and TNN Init by applying them to solve the parabolic PDE associated with the Heston model, which is widely used in financial pricing theory.
translated by 谷歌翻译
As language models have grown in parameters and layers, it has become much harder to train and infer with them on single GPUs. This is severely restricting the availability of large language models such as GPT-3, BERT-Large, and many others. A common technique to solve this problem is pruning the network architecture by removing transformer heads, fully-connected weights, and other modules. The main challenge is to discern the important parameters from the less important ones. Our goal is to find strong metrics for identifying such parameters. We thus propose two strategies: Cam-Cut based on the GradCAM interpretations, and Smooth-Cut based on the SmoothGrad, for calculating the importance scores. Through this work, we show that our scoring functions are able to assign more relevant task-based scores to the network parameters, and thus both our pruning approaches significantly outperform the standard weight and gradient-based strategies, especially at higher compression ratios in BERT-based models. We also analyze our pruning masks and find them to be significantly different from the ones obtained using standard metrics.
translated by 谷歌翻译
The rapid growth of machine translation (MT) systems has necessitated comprehensive studies to meta-evaluate evaluation metrics being used, which enables a better selection of metrics that best reflect MT quality. Unfortunately, most of the research focuses on high-resource languages, mainly English, the observations for which may not always apply to other languages. Indian languages, having over a billion speakers, are linguistically different from English, and to date, there has not been a systematic study of evaluating MT systems from English into Indian languages. In this paper, we fill this gap by creating an MQM dataset consisting of 7000 fine-grained annotations, spanning 5 Indian languages and 7 MT systems, and use it to establish correlations between annotator scores and scores obtained using existing automatic metrics. Our results show that pre-trained metrics, such as COMET, have the highest correlations with annotator scores. Additionally, we find that the metrics do not adequately capture fluency-based errors in Indian languages, and there is a need to develop metrics focused on Indian languages. We hope that our dataset and analysis will help promote further research in this area.
translated by 谷歌翻译
Independence testing is a fundamental and classical statistical problem that has been extensively studied in the batch setting when one fixes the sample size before collecting data. However, practitioners often prefer procedures that adapt to the complexity of a problem at hand instead of setting sample size in advance. Ideally, such procedures should (a) allow stopping earlier on easy tasks (and later on harder tasks), hence making better use of available resources, and (b) continuously monitor the data and efficiently incorporate statistical evidence after collecting new data, while controlling the false alarm rate. It is well known that classical batch tests are not tailored for streaming data settings, since valid inference after data peeking requires correcting for multiple testing, but such corrections generally result in low power. In this paper, we design sequential kernelized independence tests (SKITs) that overcome such shortcomings based on the principle of testing by betting. We exemplify our broad framework using bets inspired by kernelized dependence measures such as the Hilbert-Schmidt independence criterion (HSIC) and the constrained-covariance criterion (COCO). Importantly, we also generalize the framework to non-i.i.d. time-varying settings, for which there exist no batch tests. We demonstrate the power of our approaches on both simulated and real data.
translated by 谷歌翻译
Explainable Artificial Intelligence (AI) in the form of an interpretable and semiautomatic approach to stage grading ocular pathologies such as Diabetic retinopathy, Hypertensive retinopathy, and other retinopathies on the backdrop of major systemic diseases. The experimental study aims to evaluate an explainable staged grading process without using deep Convolutional Neural Networks (CNNs) directly. Many current CNN-based deep neural networks used for diagnosing retinal disorders might have appreciable performance but fail to pinpoint the basis driving their decisions. To improve these decisions' transparency, we have proposed a clinician-in-the-loop assisted intelligent workflow that performs a retinal vascular assessment on the fundus images to derive quantifiable and descriptive parameters. The retinal vessel parameters meta-data serve as hyper-parameters for better interpretation and explainability of decisions. The semiautomatic methodology aims to have a federated approach to AI in healthcare applications with more inputs and interpretations from clinicians. The baseline process involved in the machine learning pipeline through image processing techniques for optic disc detection, vessel segmentation, and arteriole/venule identification.
translated by 谷歌翻译
Gradient-based first-order convex optimization algorithms find widespread applicability in a variety of domains, including machine learning tasks. Motivated by the recent advances in fixed-time stability theory of continuous-time dynamical systems, we introduce a generalized framework for designing accelerated optimization algorithms with strongest convergence guarantees that further extend to a subclass of non-convex functions. In particular, we introduce the \emph{GenFlow} algorithm and its momentum variant that provably converge to the optimal solution of objective functions satisfying the Polyak-{\L}ojasiewicz (PL) inequality, in a fixed-time. Moreover for functions that admit non-degenerate saddle-points, we show that for the proposed GenFlow algorithm, the time required to evade these saddle-points is bounded uniformly for all initial conditions. Finally, for strongly convex-strongly concave minimax problems whose optimal solution is a saddle point, a similar scheme is shown to arrive at the optimal solution again in a fixed-time. The superior convergence properties of our algorithm are validated experimentally on a variety of benchmark datasets.
translated by 谷歌翻译