我们介绍Samanantar,是最大的公开可用的并行Corpora Collection,用于指示语言。该集合中的英语和11个上线语言之间总共包含4970万句对(来自两种语言系列)。具体而言,我们从现有的公共可用并行基层编译1240万句对,另外,从网络上挖掘3740万句对,导致4倍增加。我们通过组合许多语料库,工具和方法来挖掘网站的并行句子:(a)Web爬行单格式语料库,(b)文档OCR,用于从扫描的文档中提取句子,(c)用于对齐句子的多语言表示模型,以及(d)近似最近的邻居搜索搜索大量句子。人类评估新矿业的Corpora的样本验证了11种语言的高质量平行句子。此外,我们使用英语作为枢轴语言,从英式并行语料库中提取所有55个指示语言对之间的834百万句子对。我们培训了跨越Samanantar上所有这些语言的多语种NMT模型,这在公开可用的基准上表现出现有的模型和基准,例如弗洛雷斯,建立萨曼塔尔的效用。我们的数据和模型可在Https://indicnlp.ai4bharat.org/samanantar/上公开提供,我们希望他们能够帮助推进NMT和Multibingual NLP的研究。
translated by 谷歌翻译
Real-world datasets exhibit imbalances of varying types and degrees. Several techniques based on re-weighting and margin adjustment of loss are often used to enhance the performance of neural networks, particularly on minority classes. In this work, we analyze the class-imbalanced learning problem by examining the loss landscape of neural networks trained with re-weighting and margin-based techniques. Specifically, we examine the spectral density of Hessian of class-wise loss, through which we observe that the network weights converge to a saddle point in the loss landscapes of minority classes. Following this observation, we also find that optimization methods designed to escape from saddle points can be effectively used to improve generalization on minority classes. We further theoretically and empirically demonstrate that Sharpness-Aware Minimization (SAM), a recent technique that encourages convergence to a flat minima, can be effectively used to escape saddle points for minority classes. Using SAM results in a 6.2\% increase in accuracy on the minority classes over the state-of-the-art Vector Scaling Loss, leading to an overall average increase of 4\% across imbalanced datasets. The code is available at: https://github.com/val-iisc/Saddle-LongTail.
translated by 谷歌翻译
Gradient-based first-order convex optimization algorithms find widespread applicability in a variety of domains, including machine learning tasks. Motivated by the recent advances in fixed-time stability theory of continuous-time dynamical systems, we introduce a generalized framework for designing accelerated optimization algorithms with strongest convergence guarantees that further extend to a subclass of non-convex functions. In particular, we introduce the \emph{GenFlow} algorithm and its momentum variant that provably converge to the optimal solution of objective functions satisfying the Polyak-{\L}ojasiewicz (PL) inequality, in a fixed-time. Moreover for functions that admit non-degenerate saddle-points, we show that for the proposed GenFlow algorithm, the time required to evade these saddle-points is bounded uniformly for all initial conditions. Finally, for strongly convex-strongly concave minimax problems whose optimal solution is a saddle point, a similar scheme is shown to arrive at the optimal solution again in a fixed-time. The superior convergence properties of our algorithm are validated experimentally on a variety of benchmark datasets.
translated by 谷歌翻译
Power grids, across the world, play an important societal and economical role by providing uninterrupted, reliable and transient-free power to several industries, businesses and household consumers. With the advent of renewable power resources and EVs resulting into uncertain generation and highly dynamic load demands, it has become ever so important to ensure robust operation of power networks through suitable management of transient stability issues and localize the events of blackouts. In the light of ever increasing stress on the modern grid infrastructure and the grid operators, this paper presents a reinforcement learning (RL) framework, PowRL, to mitigate the effects of unexpected network events, as well as reliably maintain electricity everywhere on the network at all times. The PowRL leverages a novel heuristic for overload management, along with the RL-guided decision making on optimal topology selection to ensure that the grid is operated safely and reliably (with no overloads). PowRL is benchmarked on a variety of competition datasets hosted by the L2RPN (Learning to Run a Power Network). Even with its reduced action space, PowRL tops the leaderboard in the L2RPN NeurIPS 2020 challenge (Robustness track) at an aggregate level, while also being the top performing agent in the L2RPN WCCI 2020 challenge. Moreover, detailed analysis depicts state-of-the-art performances by the PowRL agent in some of the test scenarios.
translated by 谷歌翻译
Neuromorphic computing using biologically inspired Spiking Neural Networks (SNNs) is a promising solution to meet Energy-Throughput (ET) efficiency needed for edge computing devices. Neuromorphic hardware architectures that emulate SNNs in analog/mixed-signal domains have been proposed to achieve order-of-magnitude higher energy efficiency than all-digital architectures, however at the expense of limited scalability, susceptibility to noise, complex verification, and poor flexibility. On the other hand, state-of-the-art digital neuromorphic architectures focus either on achieving high energy efficiency (Joules/synaptic operation (SOP)) or throughput efficiency (SOPs/second/area), resulting in poor ET efficiency. In this work, we present THOR, an all-digital neuromorphic processor with a novel memory hierarchy and neuron update architecture that addresses both energy consumption and throughput bottlenecks. We implemented THOR in 28nm FDSOI CMOS technology and our post-layout results demonstrate an ET efficiency of 7.29G $\text{TSOP}^2/\text{mm}^2\text{Js}$ at 0.9V, 400 MHz, which represents a 3X improvement over state-of-the-art digital neuromorphic processors.
translated by 谷歌翻译
Tasks critical to enterprise profitability, such as customer churn prediction, fraudulent account detection or customer lifetime value estimation, are often tackled by models trained on features engineered from customer data in tabular format. Application-specific feature engineering adds development, operationalization and maintenance costs over time. Recent advances in representation learning present an opportunity to simplify and generalize feature engineering across applications. When applying these advancements to tabular data researchers deal with data heterogeneity, variations in customer engagement history or the sheer volume of enterprise datasets. In this paper, we propose a novel approach to encode tabular data containing customer transactions, purchase history and other interactions into a generic representation of a customer's association with the business. We then evaluate these embeddings as features to train multiple models spanning a variety of applications. CASPR, Customer Activity Sequence-based Prediction and Representation, applies Transformer architecture to encode activity sequences to improve model performance and avoid bespoke feature engineering across applications. Our experiments at scale validate CASPR for both small and large enterprise applications.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
We propose a novel deep neural network architecture to learn interpretable representation for medical image analysis. Our architecture generates a global attention for region of interest, and then learns bag of words style deep feature embeddings with local attention. The global, and local feature maps are combined using a contemporary transformer architecture for highly accurate Gallbladder Cancer (GBC) detection from Ultrasound (USG) images. Our experiments indicate that the detection accuracy of our model beats even human radiologists, and advocates its use as the second reader for GBC diagnosis. Bag of words embeddings allow our model to be probed for generating interpretable explanations for GBC detection consistent with the ones reported in medical literature. We show that the proposed model not only helps understand decisions of neural network models but also aids in discovery of new visual features relevant to the diagnosis of GBC. Source-code and model will be available at https://github.com/sbasu276/RadFormer
translated by 谷歌翻译
The presence of bias in deep models leads to unfair outcomes for certain demographic subgroups. Research in bias focuses primarily on facial recognition and attribute prediction with scarce emphasis on face detection. Existing studies consider face detection as binary classification into 'face' and 'non-face' classes. In this work, we investigate possible bias in the domain of face detection through facial region localization which is currently unexplored. Since facial region localization is an essential task for all face recognition pipelines, it is imperative to analyze the presence of such bias in popular deep models. Most existing face detection datasets lack suitable annotation for such analysis. Therefore, we web-curate the Fair Face Localization with Attributes (F2LA) dataset and manually annotate more than 10 attributes per face, including facial localization information. Utilizing the extensive annotations from F2LA, an experimental setup is designed to study the performance of four pre-trained face detectors. We observe (i) a high disparity in detection accuracies across gender and skin-tone, and (ii) interplay of confounding factors beyond demography. The F2LA data and associated annotations can be accessed at http://iab-rubric.org/index.php/F2LA.
translated by 谷歌翻译
This paper presents a 1-D convolutional graph neural network for fault detection in microgrids. The combination of 1-D convolutional neural networks (1D-CNN) and graph convolutional networks (GCN) helps extract both spatial-temporal correlations from the voltage measurements in microgrids. The fault detection scheme includes fault event detection, fault type and phase classification, and fault location. There are five neural network model training to handle these tasks. Transfer learning and fine-tuning are applied to reduce training efforts. The combined recurrent graph convolutional neural networks (1D-CGCN) is compared with the traditional ANN structure on the Potsdam 13-bus microgrid dataset. The achievable accuracy of 99.27%, 98.1%, 98.75%, and 95.6% for fault detection, fault type classification, fault phase identification, and fault location respectively.
translated by 谷歌翻译