我向已知的数学问题提出了一个深入的加强学习(RL)解决方案,称为新闻温丹主模型,这旨在考虑到概率的需求分布。为了反映更现实和复杂的情况,需求分布可以改变本周不同的日子,从而改变了最佳行为。我使用了一个双延迟的深度确定性政策梯度代理(写为完全原始代码)与演员和批评网络来解决这个问题。该代理能够学习与问题的分析解决方案一致的最佳行为,并且可以识别本周不同日期的单独概率分布并相应地行事。
translated by 谷歌翻译
多机构增强学习(MARL)是训练在共同环境中独立起作用的自动化系统的强大工具。但是,当个人激励措施和群体激励措施分歧时,它可能导致次优行为。人类非常有能力解决这些社会困境。在MAL中,复制自私的代理商中的这种合作行为是一个开放的问题。在这项工作中,我们借鉴了经济学正式签约的想法,以克服MARL代理商之间的动力分歧。我们提出了对马尔可夫游戏的增强,在预先指定的条件下,代理商自愿同意约束依赖状态依赖的奖励转移。我们的贡献是理论和经验的。首先,我们表明,这种增强使所有完全观察到的马尔可夫游戏的所有子游戏完美平衡都表现出社会最佳行为,并且鉴于合同的足够丰富的空间。接下来,我们通过表明最先进的RL算法学习了我们的增强术,我们将学习社会最佳政策,从而补充我们的游戏理论分析。我们的实验包括经典的静态困境,例如塔格·亨特(Stag Hunt),囚犯的困境和公共物品游戏,以及模拟交通,污染管理和共同池资源管理的动态互动。
translated by 谷歌翻译
从我们生命的最早几年开始,人类使用语言来表达我们的信念和欲望。因此,能够与人造代理讨论我们的偏好将实现价值一致性的核心目标。然而,今天,我们缺乏解释这种灵活和抽象语言使用的计算模型。为了应对这一挑战,我们考虑在线性强盗环境中考虑社会学习,并询问人类如何传达与行为的偏好(即奖励功能)。我们研究两种不同类型的语言:指令,提供有关所需政策的信息和描述,这些信息提供了有关奖励功能的信息。为了解释人类如何使用这些形式的语言,我们建议他们推理出已知和未知的未来状态:对当前的说明优化,同时描述对未来进行了推广。我们通过扩展奖励设计来考虑对国家的分配来形式化此选择。然后,我们定义了一种务实的听众,该代理人通过推理说话者如何表达自己来侵犯说话者的奖励功能。我们通过行为实验来验证我们的模型,表明(1)我们的说话者模型预测了自发的人类行为,并且(2)我们的务实的听众能够恢复其奖励功能。最后,我们表明,在传统的强化学习环境中,务实的社会学习可以与个人学习相结合并加速。我们的发现表明,从更广泛的语言中的社会学习,特别是,扩大了该领域的目前对指示的关注,以包括从描述中学习 - 是一种有前途的价值一致性和强化学习的有前途的方法。
translated by 谷歌翻译
Designing experiments often requires balancing between learning about the true treatment effects and earning from allocating more samples to the superior treatment. While optimal algorithms for the Multi-Armed Bandit Problem (MABP) provide allocation policies that optimally balance learning and earning, they tend to be computationally expensive. The Gittins Index (GI) is a solution to the MABP that can simultaneously attain optimality and computationally efficiency goals, and it has been recently used in experiments with Bernoulli and Gaussian rewards. For the first time, we present a modification of the GI rule that can be used in experiments with exponentially-distributed rewards. We report its performance in simulated 2- armed and 3-armed experiments. Compared to traditional non-adaptive designs, our novel GI modified design shows operating characteristics comparable in learning (e.g. statistical power) but substantially better in earning (e.g. direct benefits). This illustrates the potential that designs using a GI approach to allocate participants have to improve participant benefits, increase efficiencies, and reduce experimental costs in adaptive multi-armed experiments with exponential rewards.
translated by 谷歌翻译
Modelling and forecasting real-life human behaviour using online social media is an active endeavour of interest in politics, government, academia, and industry. Since its creation in 2006, Twitter has been proposed as a potential laboratory that could be used to gauge and predict social behaviour. During the last decade, the user base of Twitter has been growing and becoming more representative of the general population. Here we analyse this user base in the context of the 2021 Mexican Legislative Election. To do so, we use a dataset of 15 million election-related tweets in the six months preceding election day. We explore different election models that assign political preference to either the ruling parties or the opposition. We find that models using data with geographical attributes determine the results of the election with better precision and accuracy than conventional polling methods. These results demonstrate that analysis of public online data can outperform conventional polling methods, and that political analysis and general forecasting would likely benefit from incorporating such data in the immediate future. Moreover, the same Twitter dataset with geographical attributes is positively correlated with results from official census data on population and internet usage in Mexico. These findings suggest that we have reached a period in time when online activity, appropriately curated, can provide an accurate representation of offline behaviour.
translated by 谷歌翻译
Existing federated classification algorithms typically assume the local annotations at every client cover the same set of classes. In this paper, we aim to lift such an assumption and focus on a more general yet practical non-IID setting where every client can work on non-identical and even disjoint sets of classes (i.e., client-exclusive classes), and the clients have a common goal which is to build a global classification model to identify the union of these classes. Such heterogeneity in client class sets poses a new challenge: how to ensure different clients are operating in the same latent space so as to avoid the drift after aggregation? We observe that the classes can be described in natural languages (i.e., class names) and these names are typically safe to share with all parties. Thus, we formulate the classification problem as a matching process between data representations and class representations and break the classification model into a data encoder and a label encoder. We leverage the natural-language class names as the common ground to anchor the class representations in the label encoder. In each iteration, the label encoder updates the class representations and regulates the data representations through matching. We further use the updated class representations at each round to annotate data samples for locally-unaware classes according to similarity and distill knowledge to local models. Extensive experiments on four real-world datasets show that the proposed method can outperform various classical and state-of-the-art federated learning methods designed for learning with non-IID data.
translated by 谷歌翻译
This is paper for the smooth function approximation by neural networks (NN). Mathematical or physical functions can be replaced by NN models through regression. In this study, we get NNs that generate highly accurate and highly smooth function, which only comprised of a few weight parameters, through discussing a few topics about regression. First, we reinterpret inside of NNs for regression; consequently, we propose a new activation function--integrated sigmoid linear unit (ISLU). Then special charateristics of metadata for regression, which is different from other data like image or sound, is discussed for improving the performance of neural networks. Finally, the one of a simple hierarchical NN that generate models substituting mathematical function is presented, and the new batch concept ``meta-batch" which improves the performance of NN several times more is introduced. The new activation function, meta-batch method, features of numerical data, meta-augmentation with metaparameters, and a structure of NN generating a compact multi-layer perceptron(MLP) are essential in this study.
translated by 谷歌翻译
The existing methods for video anomaly detection mostly utilize videos containing identifiable facial and appearance-based features. The use of videos with identifiable faces raises privacy concerns, especially when used in a hospital or community-based setting. Appearance-based features can also be sensitive to pixel-based noise, straining the anomaly detection methods to model the changes in the background and making it difficult to focus on the actions of humans in the foreground. Structural information in the form of skeletons describing the human motion in the videos is privacy-protecting and can overcome some of the problems posed by appearance-based features. In this paper, we present a survey of privacy-protecting deep learning anomaly detection methods using skeletons extracted from videos. We present a novel taxonomy of algorithms based on the various learning approaches. We conclude that skeleton-based approaches for anomaly detection can be a plausible privacy-protecting alternative for video anomaly detection. Lastly, we identify major open research questions and provide guidelines to address them.
translated by 谷歌翻译
The Government of Kerala had increased the frequency of supply of free food kits owing to the pandemic, however, these items were static and not indicative of the personal preferences of the consumers. This paper conducts a comparative analysis of various clustering techniques on a scaled-down version of a real-world dataset obtained through a conjoint analysis-based survey. Clustering carried out by centroid-based methods such as k means is analyzed and the results are plotted along with SVD, and finally, a conclusion is reached as to which among the two is better. Once the clusters have been formulated, commodities are also decided upon for each cluster. Also, clustering is further enhanced by reassignment, based on a specific cluster loss threshold. Thus, the most efficacious clustering technique for designing a food kit tailored to the needs of individuals is finally obtained.
translated by 谷歌翻译
Machine learning-based segmentation in medical imaging is widely used in clinical applications from diagnostics to radiotherapy treatment planning. Segmented medical images with ground truth are useful for investigating the properties of different segmentation performance metrics to inform metric selection. Regular geometrical shapes are often used to synthesize segmentation errors and illustrate properties of performance metrics, but they lack the complexity of anatomical variations in real images. In this study, we present a tool to emulate segmentations by adjusting the reference (truth) masks of anatomical objects extracted from real medical images. Our tool is designed to modify the defined truth contours and emulate different types of segmentation errors with a set of user-configurable parameters. We defined the ground truth objects from 230 patient images in the Glioma Image Segmentation for Radiotherapy (GLIS-RT) database. For each object, we used our segmentation synthesis tool to synthesize 10 versions of segmentation (i.e., 10 simulated segmentors or algorithms), where each version has a pre-defined combination of segmentation errors. We then applied 20 performance metrics to evaluate all synthetic segmentations. We demonstrated the properties of these metrics, including their ability to capture specific types of segmentation errors. By analyzing the intrinsic properties of these metrics and categorizing the segmentation errors, we are working toward the goal of developing a decision-tree tool for assisting in the selection of segmentation performance metrics.
translated by 谷歌翻译