Conventional methods for human motion synthesis are either deterministic or struggle with the trade-off between motion diversity and motion quality. In response to these limitations, we introduce MoFusion, i.e., a new denoising-diffusion-based framework for high-quality conditional human motion synthesis that can generate long, temporally plausible, and semantically accurate motions based on a range of conditioning contexts (such as music and text). We also present ways to introduce well-known kinematic losses for motion plausibility within the motion diffusion framework through our scheduled weighting strategy. The learned latent space can be used for several interactive motion editing applications -- like inbetweening, seed conditioning, and text-based editing -- thus, providing crucial abilities for virtual character animation and robotics. Through comprehensive quantitative evaluations and a perceptual user study, we demonstrate the effectiveness of MoFusion compared to the state of the art on established benchmarks in the literature. We urge the reader to watch our supplementary video and visit https://vcai.mpi-inf.mpg.de/projects/MoFusion.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
销售预测是许多管理决策的先决条件,如生产规划,物资资源规划和供应链中的预算。促销是最重要的商业策略之一,通常用于提高销售。虽然促销对于产生需求具有吸引力,但通常难以预测其存在需求。在过去的几十年中,已经开发了几种量化模型来预测销售,包括统计和机器学习模型。但是,这些方法可能不足以解释可能影响销售的所有内部和外部因素。因此,由于咨询专家已被证明通过提供上下文信息,因此已采用定量模型以及定量方法。这些模型正在广泛使用,以考虑可能导致销售快速变化的因素,例如在促销期间。在本文中,我们的目标是利用贝叶斯网络预测促销销售,其中包括价格,促销类型和产品位置的因素和产品位置影响销售。我们选择开发BN模型,因为BN模型基本上具有与因果形式相结合的各种定性和定量因素的能力,使其成为促销期间销售预测的有吸引力的工具。这可用于在本案例研究的背景下调整公司的促销策略。我们收集来自销售澳大利亚产品的零售商的特定产品的销售数据。我们为此产品开发贝叶斯网络,并通过实证分析验证我们的结果。本文证实,BNS可以有效地用于预测销售,特别是在促销期间。最终,我们提供一些研究途径,用于使用BNS预测销售。
translated by 谷歌翻译
人工神经网络(ANN)能够学习,纠正错误和将大量原始数据转化为治疗和护理的有用医疗决策,这增加了增强患者安全和护理质量的普及。因此,本文审查了ANN的关键作用为患者医疗保健决策提供有价值的见解和有效的疾病诊断。我们彻底审查了现有文献中的不同类型的ANN,以便为复杂应用程序进行高级ANNS适配。此外,我们还调查Ann的各种疾病诊断和治疗的进步,例如病毒,皮肤,癌症和Covid-19。此外,我们提出了一种名为ConxNet的新型深度卷积神经网络(CNN)模型,用于提高Covid-19疾病的检测准确性。 ConxNet经过培训并使用不同的数据集进行测试,它达到了超过97%的检测精度和精度,这明显优于现有型号。最后,我们突出了未来的研究方向和挑战,例如算法的复杂性,可用数据,隐私和安全性,以及与ANN的生物传染集成。这些研究方向需要大幅关注改善医疗诊断和治疗应用的ANN的范围。
translated by 谷歌翻译
联合学习(FL)可以通过各种不同远程数据源的机器学习模型的分布式计算,而无需将任何单独的数据传输到集中位置。这导致改进的模型的完全性,并且随着更多来源和较大的数据集被添加到联合中的计算和计算的有效缩放。然而,最近的成员攻击表明,当模型参数或摘要统计数据与中央站点共享时,有时可以泄露或推断出私有或敏感的个人数据,需要改进的安全解决方案。在这项工作中,我们提出了一种使用全同性全相治(FHE)的安全FL框架。具体而言,我们使用CKKS构造,近似浮点兼容方案,这些方案受益于密文包装和重新扫描。在我们对大型脑MRI数据集的评估中,我们使用建议的安全流动框架来培训深度学习模型,以预测分布式MRI扫描的一个人的年龄,一个共同的基准测试任务,并证明在学习表现中没有降级在加密和非加密的联合模型之间。
translated by 谷歌翻译
Diabetic Retinopathy (DR) is considered one of the primary concerns due to its effect on vision loss among most people with diabetes globally. The severity of DR is mostly comprehended manually by ophthalmologists from fundus photography-based retina images. This paper deals with an automated understanding of the severity stages of DR. In the literature, researchers have focused on this automation using traditional machine learning-based algorithms and convolutional architectures. However, the past works hardly focused on essential parts of the retinal image to improve the model performance. In this paper, we adopt transformer-based learning models to capture the crucial features of retinal images to understand DR severity better. We work with ensembling image transformers, where we adopt four models, namely ViT (Vision Transformer), BEiT (Bidirectional Encoder representation for image Transformer), CaiT (Class-Attention in Image Transformers), and DeiT (Data efficient image Transformers), to infer the degree of DR severity from fundus photographs. For experiments, we used the publicly available APTOS-2019 blindness detection dataset, where the performances of the transformer-based models were quite encouraging.
translated by 谷歌翻译
This paper presents our solutions for the MediaEval 2022 task on DisasterMM. The task is composed of two subtasks, namely (i) Relevance Classification of Twitter Posts (RCTP), and (ii) Location Extraction from Twitter Texts (LETT). The RCTP subtask aims at differentiating flood-related and non-relevant social posts while LETT is a Named Entity Recognition (NER) task and aims at the extraction of location information from the text. For RCTP, we proposed four different solutions based on BERT, RoBERTa, Distil BERT, and ALBERT obtaining an F1-score of 0.7934, 0.7970, 0.7613, and 0.7924, respectively. For LETT, we used three models namely BERT, RoBERTa, and Distil BERTA obtaining an F1-score of 0.6256, 0.6744, and 0.6723, respectively.
translated by 谷歌翻译
Objective: Despite numerous studies proposed for audio restoration in the literature, most of them focus on an isolated restoration problem such as denoising or dereverberation, ignoring other artifacts. Moreover, assuming a noisy or reverberant environment with limited number of fixed signal-to-distortion ratio (SDR) levels is a common practice. However, real-world audio is often corrupted by a blend of artifacts such as reverberation, sensor noise, and background audio mixture with varying types, severities, and duration. In this study, we propose a novel approach for blind restoration of real-world audio signals by Operational Generative Adversarial Networks (Op-GANs) with temporal and spectral objective metrics to enhance the quality of restored audio signal regardless of the type and severity of each artifact corrupting it. Methods: 1D Operational-GANs are used with generative neuron model optimized for blind restoration of any corrupted audio signal. Results: The proposed approach has been evaluated extensively over the benchmark TIMIT-RAR (speech) and GTZAN-RAR (non-speech) datasets corrupted with a random blend of artifacts each with a random severity to mimic real-world audio signals. Average SDR improvements of over 7.2 dB and 4.9 dB are achieved, respectively, which are substantial when compared with the baseline methods. Significance: This is a pioneer study in blind audio restoration with the unique capability of direct (time-domain) restoration of real-world audio whilst achieving an unprecedented level of performance for a wide SDR range and artifact types. Conclusion: 1D Op-GANs can achieve robust and computationally effective real-world audio restoration with significantly improved performance. The source codes and the generated real-world audio datasets are shared publicly with the research community in a dedicated GitHub repository1.
translated by 谷歌翻译
Uncertainty quantification is crucial to inverse problems, as it could provide decision-makers with valuable information about the inversion results. For example, seismic inversion is a notoriously ill-posed inverse problem due to the band-limited and noisy nature of seismic data. It is therefore of paramount importance to quantify the uncertainties associated to the inversion process to ease the subsequent interpretation and decision making processes. Within this framework of reference, sampling from a target posterior provides a fundamental approach to quantifying the uncertainty in seismic inversion. However, selecting appropriate prior information in a probabilistic inversion is crucial, yet non-trivial, as it influences the ability of a sampling-based inference in providing geological realism in the posterior samples. To overcome such limitations, we present a regularized variational inference framework that performs posterior inference by implicitly regularizing the Kullback-Leibler divergence loss with a CNN-based denoiser by means of the Plug-and-Play methods. We call this new algorithm Plug-and-Play Stein Variational Gradient Descent (PnP-SVGD) and demonstrate its ability in producing high-resolution, trustworthy samples representative of the subsurface structures, which we argue could be used for post-inference tasks such as reservoir modelling and history matching. To validate the proposed method, numerical tests are performed on both synthetic and field post-stack seismic data.
translated by 谷歌翻译
We present a novel image inversion framework and a training pipeline to achieve high-fidelity image inversion with high-quality attribute editing. Inverting real images into StyleGAN's latent space is an extensively studied problem, yet the trade-off between the image reconstruction fidelity and image editing quality remains an open challenge. The low-rate latent spaces are limited in their expressiveness power for high-fidelity reconstruction. On the other hand, high-rate latent spaces result in degradation in editing quality. In this work, to achieve high-fidelity inversion, we learn residual features in higher latent codes that lower latent codes were not able to encode. This enables preserving image details in reconstruction. To achieve high-quality editing, we learn how to transform the residual features for adapting to manipulations in latent codes. We train the framework to extract residual features and transform them via a novel architecture pipeline and cycle consistency losses. We run extensive experiments and compare our method with state-of-the-art inversion methods. Qualitative metrics and visual comparisons show significant improvements. Code: https://github.com/hamzapehlivan/StyleRes
translated by 谷歌翻译