Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
用于卫星图像分析的计算机视觉算法的创新可以使我们能够在行星层面探索全球挑战,例如城市化和土地利用变化。但是,当试图复制将这些分析推向新领域的模型时,尤其是在发展中国家的模型时,域转移问题是一个普遍的情况。如果模型是通过一个位置的图像和标签训练的,则通常不会很好地概括到图像和数据分布不同的新位置。在这项工作中,我们考虑了我们有一个大型卫星图像场景的设置,我们希望在该场景上解决一个应用问题 - 构建足迹细分。在这里,我们不一定需要担心创建一个概括过我们场景边界的模型,而是可以训练本地模型。我们表明,使用非常高分辨率(0.5m/px)卫星图像解决建筑细分问题需要的标签很少。我们只有527个稀疏多边形注释(相当于1500 x 1500名被标记的像素)训练的最佳型号,召回了0.87的持有足迹,R2的r2为0.93视窗。我们将模型应用于约旦安曼(Amman)的高分辨率图像中,在一项有关城市变化检测的案例研究中。
translated by 谷歌翻译
可再生能源的快速开发,尤其是太阳能光伏(PV),对于缓解气候变化至关重要。结果,印度设定了雄心勃勃的目标,可以在2030年之前安装500吉瓦的太阳能容量。鉴于预计大量的足迹可以满足可再生能源能源目标,因此对环境价值的土地利用冲突的潜力很高。为了加快太阳能的发展,土地使用计划者将需要访问PV基础设施的最新,准确的地理空间信息。在这项工作中,我们开发了一种露骨的机器学习模型,以使用自由使用的卫星图像绘制印度的公用事业规模的太阳能项目,平均准确性为92%。我们的模型预测得到了人类专家的验证,以获取1363个太阳能光伏农场的数据集。使用此数据集,我们测量了整个印度的太阳足迹,并量化了与PV基础设施发展相关的土地盖修改程度。我们的分析表明,印度超过74%的太阳能发展是建立在具有自然生态系统保护或农业价值的陆生类型上的。
translated by 谷歌翻译
远程感知的地理空间数据对于包括精确农业,城市规划,灾害监测和反应以及气候变化研究等应用至关重要。对于在类似的计算机视觉任务中的深度神经网络的成功和可用的远程感测图像的纯粹体积的情况下,深入学习方法尤为前接受了许多遥感任务。然而,数据收集方法的方差和地理空间元数据的处理使得深度学习方法的应用成为远程感测的数据不动性。例如,卫星图像通常包括超出红色,绿色和蓝色的额外光谱频带,并且必须连接到可以具有不同坐标系,界限和分辨率的其他地理空间数据源。为了帮助实现遥感应用的深度学习的潜力,我们介绍了一个Pythono库的Torchgeo,用于将地理空间数据集成到Pytorch深度学习生态系统中。 Torchgeo为各种基准数据集,用于通用地理空间数据源的可组合数据集,用于地理空间数据的采样器以及使用多光谱图像的转换的数据加载器。 Torchgeo也是第一个为多光谱卫星图像提供预先训练的模型的库(例如,使用Sentinel 2卫星的所有频段的模型),允许在下游遥感任务上传输学习,其中包含有限的标记数据。我们使用Torchgeo在现有数据集上创建可重复的基准结果,并将我们的建议方法用于直通预处理地理空间图像。 Torchgeo是开源的,可在GitHub上提供:https://github.com/microsoft/torchgeo。
translated by 谷歌翻译
我们表明,基于补丁的模型,例如展示,可以对使用深卷积神经网络的语义分割和标签超分辨率的最新状态具有卓越的性能。我们推导出一种新的培训算法,其允许从非常大的数据集中学习并从拓扑表征中推导出标签超分辨率算法作为统计推理算法。我们说明了我们在陆地覆盖映射和医学图像分析任务的方法。
translated by 谷歌翻译
While the brain connectivity network can inform the understanding and diagnosis of developmental dyslexia, its cause-effect relationships have not yet enough been examined. Employing electroencephalography signals and band-limited white noise stimulus at 4.8 Hz (prosodic-syllabic frequency), we measure the phase Granger causalities among channels to identify differences between dyslexic learners and controls, thereby proposing a method to calculate directional connectivity. As causal relationships run in both directions, we explore three scenarios, namely channels' activity as sources, as sinks, and in total. Our proposed method can be used for both classification and exploratory analysis. In all scenarios, we find confirmation of the established right-lateralized Theta sampling network anomaly, in line with the temporal sampling framework's assumption of oscillatory differences in the Theta and Gamma bands. Further, we show that this anomaly primarily occurs in the causal relationships of channels acting as sinks, where it is significantly more pronounced than when only total activity is observed. In the sink scenario, our classifier obtains 0.84 and 0.88 accuracy and 0.87 and 0.93 AUC for the Theta and Gamma bands, respectively.
translated by 谷歌翻译
Cohn and Umans proposed a framework for developing fast matrix multiplication algorithms based on the embedding computation in certain groups algebras. In subsequent work with Kleinberg and Szegedy, they connected this to the search for combinatorial objects called strong uniquely solvable puzzles (strong USPs). We begin a systematic computer-aided search for these objects. We develop and implement constraint-based algorithms build on reductions to $\mathrm{SAT}$ and $\mathrm{IP}$ to verify that puzzles are strong USPs, and to search for large strong USPs. We produce tight bounds on the maximum size of a strong USP for width $k \le 5$, construct puzzles of small width that are larger than previous work, and improve the upper bounds on strong USP size for $k \le 12$. Although our work only deals with puzzles of small-constant width, the strong USPs we find imply matrix multiplication algorithms that run in $O(n^\omega)$ time with exponent $\omega \le 2.66$. While our algorithms do not beat the fastest algorithms, our work provides evidence and, perhaps, a path to finding families of strong USPs that imply matrix multiplication algorithms that are more efficient than those currently known.
translated by 谷歌翻译
Nonconvex-nonconcave minimax optimization has been the focus of intense research over the last decade due to its broad applications in machine learning and operation research. Unfortunately, most existing algorithms cannot be guaranteed to converge and always suffer from limit cycles. Their global convergence relies on certain conditions that are difficult to check, including but not limited to the global Polyak-\L{}ojasiewicz condition, the existence of a solution satisfying the weak Minty variational inequality and $\alpha$-interaction dominant condition. In this paper, we develop the first provably convergent algorithm called doubly smoothed gradient descent ascent method, which gets rid of the limit cycle without requiring any additional conditions. We further show that the algorithm has an iteration complexity of $\mathcal{O}(\epsilon^{-4})$ for finding a game stationary point, which matches the best iteration complexity of single-loop algorithms under nonconcave-concave settings. The algorithm presented here opens up a new path for designing provable algorithms for nonconvex-nonconcave minimax optimization problems.
translated by 谷歌翻译
Customers are rapidly turning to social media for customer support. While brand agents on these platforms are motivated and well-intentioned to help and engage with customers, their efforts are often ignored if their initial response to the customer does not match a specific tone, style, or topic the customer is aiming to receive. The length of a conversation can reflect the effort and quality of the initial response made by a brand toward collaborating and helping consumers, even when the overall sentiment of the conversation might not be very positive. Thus, through this study, we aim to bridge this critical gap in the existing literature by analyzing language's content and stylistic aspects such as expressed empathy, psycho-linguistic features, dialogue tags, and metrics for quantifying personalization of the utterances that can influence the engagement of an interaction. This paper demonstrates that we can predict engagement using initial customer and brand posts.
translated by 谷歌翻译
Objective. The impact of social determinants of health (SDoH) on patients' healthcare quality and the disparity is well-known. Many SDoH items are not coded in structured forms in electronic health records. These items are often captured in free-text clinical notes, but there are limited methods for automatically extracting them. We explore a multi-stage pipeline involving named entity recognition (NER), relation classification (RC), and text classification methods to extract SDoH information from clinical notes automatically. Materials and Methods. The study uses the N2C2 Shared Task data, which was collected from two sources of clinical notes: MIMIC-III and University of Washington Harborview Medical Centers. It contains 4480 social history sections with full annotation for twelve SDoHs. In order to handle the issue of overlapping entities, we developed a novel marker-based NER model. We used it in a multi-stage pipeline to extract SDoH information from clinical notes. Results. Our marker-based system outperformed the state-of-the-art span-based models at handling overlapping entities based on the overall Micro-F1 score performance. It also achieved state-of-the-art performance compared to the shared task methods. Conclusion. The major finding of this study is that the multi-stage pipeline effectively extracts SDoH information from clinical notes. This approach can potentially improve the understanding and tracking of SDoHs in clinical settings. However, error propagation may be an issue, and further research is needed to improve the extraction of entities with complex semantic meanings and low-resource entities using external knowledge.
translated by 谷歌翻译