我们研究了具有预处理结果数据的实验研究的最佳设计。估计平均处理效果是治疗和控制单元的加权平均结果之间的差异。许多常用的方法符合该配方,包括差分估计器和各种合成控制技术。我们提出了几种方法,用于结合重量选择一组处理的单位。观察问题的NP硬度,我们介绍了混合整数编程配方,可选择处理和控制集和单位权重。我们证明,这些提出的方法导致定性不同的实验单元进行治疗。我们根据美国劳动统计局的公开数据使用模拟,这些数据在与随机试验等简单和常用的替代品相比时,表现出平均平方误差和统计功率的改进。
translated by 谷歌翻译
Distributionally robust optimization (DRO) can improve the robustness and fairness of learning methods. In this paper, we devise stochastic algorithms for a class of DRO problems including group DRO, subpopulation fairness, and empirical conditional value at risk (CVaR) optimization. Our new algorithms achieve faster convergence rates than existing algorithms for multiple DRO settings. We also provide a new information-theoretic lower bound that implies our bounds are tight for group DRO. Empirically, too, our algorithms outperform known methods
translated by 谷歌翻译
Synergetic use of sensors for soil moisture retrieval is attracting considerable interest due to the different advantages of different sensors. Active, passive, and optic data integration could be a comprehensive solution for exploiting the advantages of different sensors aimed at preparing soil moisture maps. Typically, pixel-based methods are used for multi-sensor fusion. Since, different applications need different scales of soil moisture maps, pixel-based approaches are limited for this purpose. Object-based image analysis employing an image object instead of a pixel could help us to meet this need. This paper proposes a segment-based image fusion framework to evaluate the possibility of preparing a multi-scale soil moisture map through integrated Sentinel-1, Sentinel-2, and Soil Moisture Active Passive (SMAP) data. The results confirmed that the proposed methodology was able to improve soil moisture estimation in different scales up to 20% better compared to pixel-based fusion approach.
translated by 谷歌翻译
Utilizing autonomous drones or unmanned aerial vehicles (UAVs) has shown great advantages over preceding methods in support of urgent scenarios such as search and rescue (SAR) and wildfire detection. In these operations, search efficiency in terms of the amount of time spent to find the target is crucial since with the passing of time the survivability of the missing person decreases or wildfire management becomes more difficult with disastrous consequences. In this work, it is considered a scenario where a drone is intended to search and detect a missing person (e.g., a hiker or a mountaineer) or a potential fire spot in a given area. In order to obtain the shortest path to the target, a general framework is provided to model the problem of target detection when the target's location is probabilistically known. To this end, two algorithms are proposed: Path planning and target detection. The path planning algorithm is based on Bayesian inference and the target detection is accomplished by means of a residual neural network (ResNet) trained on the image dataset captured by the drone as well as existing pictures and datasets on the web. Through simulation and experiment, the proposed path planning algorithm is compared with two benchmark algorithms. It is shown that the proposed algorithm significantly decreases the average time of the mission.
translated by 谷歌翻译
Segmentation of regions of interest (ROIs) for identifying abnormalities is a leading problem in medical imaging. Using Machine Learning (ML) for this problem generally requires manually annotated ground-truth segmentations, demanding extensive time and resources from radiologists. This work presents a novel weakly supervised approach that utilizes binary image-level labels, which are much simpler to acquire, to effectively segment anomalies in medical Magnetic Resonance (MR) images without ground truth annotations. We train a binary classifier using these labels and use it to derive seeds indicating regions likely and unlikely to contain tumors. These seeds are used to train a generative adversarial network (GAN) that converts cancerous images to healthy variants, which are then used in conjunction with the seeds to train a ML model that generates effective segmentations. This method produces segmentations that achieve Dice coefficients of 0.7903, 0.7868, and 0.7712 on the MICCAI Brain Tumor Segmentation (BraTS) 2020 dataset for the training, validation, and test cohorts respectively. We also propose a weakly supervised means of filtering the segmentations, removing a small subset of poorer segmentations to acquire a large subset of high quality segmentations. The proposed filtering further improves the Dice coefficients to up to 0.8374, 0.8232, and 0.8136 for training, validation, and test, respectively.
translated by 谷歌翻译
训练机器学习(ML)模型以分割肿瘤和医学图像中的其他异常是一个越来越受欢迎的研究领域,但通常需要手动注释的地面真实分段,这需要大量的时间和资源来创建。这项工作提出了一个使用二进制分类标签的ML模型的管道,可以轻松获取,以分割ROI,而无需进行地面真实注释。我们使用了来自多模式脑肿瘤分割挑战(BRAT)的2D磁共振成像(MRI)脑扫描2020数据集,标签表明存在高级神经胶质瘤(HGG)肿瘤来训练管道。我们的管道还引入了基于深度学习的超级像素生成的新颖变体,该变体能够以聚类的超像素为指导,并同时训练超像素聚类模型。在我们的测试集中,我们的管道的分割达到了61.7%的骰子系数,当使用流行的局部局部可解释的模型 - 敏捷解释(LIME)方法时,获得的42.8%骰子系数是一个实质性的改善。
translated by 谷歌翻译
从随机实验获得的数据培训模型是做出良好决策的理想选择。但是,随机实验通常是耗时的,昂贵的,冒险的,不可行的或不道德的,决策者别无选择,只能依靠培训模型时在历史策略下收集的观察数据。这不仅为实践中的决策政策发挥了最佳作用,还为不同的数据收集协议对数据培训的各种政策的绩效的影响,或者在问题上的稳健性方面的稳健性,对问题的绩效提出了疑问诸如观察结果中的动作或奖励 - 特定延迟之类的特征。我们的目的是为了在LinkedIn优化销售渠道分配的问题回答此类问题,其中销售帐户(线索)需要分配给三个渠道之一,目的是在一段时间内最大程度地提高成功转换的数量。关键问题特征构成了观察分配结果的随机延迟,其分布既是通道和结果依赖性的。我们构建了一个离散的时间模拟,可以处理我们的问题功能并将其用于评估:a)基于历史规则的策略; b)有监督的机器学习政策(XGBOOST); c)多臂强盗(MAB)策略,在涉及的不同情况下:i)用于培训的数据收集(观察性与随机分组); ii)铅转换方案; iii)延迟分布。我们的仿真结果表明,Linucb是一种简单的mAB策略,始终优于其他策略,相对于基于规则的策略,实现了18-47%的提升
translated by 谷歌翻译
可推广性是机器学习(ML)图像分类器的最终目标,其中噪声和有限的数据集大小是主要问题。我们通过利用深度多任务学习(DMTL)的框架来应对这些挑战,并将图像深度估计作为一项辅助任务。在MNIST数据集的自定义和深度增强的推导下,我们显示a)多任务损耗功能是实施DMTL的最有效方法,b)有限的数据集大小主要导致分类不准确,并且c)深度估计主要受到噪声的影响。 。为了进一步验证结果,我们手动将NYU深度V2数据集标记为场景分类任务。作为对该领域的贡献,我们以Python Antial Format公开提供了作为开源数据集的数据,并提供了场景标签。我们对MNIST和NYU-DEPTH-V2的实验显示了DMTL在数据集嘈杂并且示例的数量受到限制时提高了分类器的普遍性。
translated by 谷歌翻译
深度学习的概括分析通常假定训练会收敛到固定点。但是,最近的结果表明,实际上,用随机梯度下降优化的深神经网络的权重通常无限期振荡。为了减少理论和实践之间的这种差异,本文着重于神经网络的概括,其训练动力不一定会融合到固定点。我们的主要贡献是提出一个统计算法稳定性(SAS)的概念,该算法将经典算法稳定性扩展到非convergergent算法并研究其与泛化的联系。与传统的优化和学习理论观点相比,这种崇高的理论方法可导致新的见解。我们证明,学习算法的时间复杂行为的稳定性与其泛化有关,并在经验上证明了损失动力学如何为概括性能提供线索。我们的发现提供了证据表明,即使训练无限期继续并且权重也不会融合,即使训练持续进行训练,训练更好地概括”的网络也是如此。
translated by 谷歌翻译
道德框架和情感会影响各种在线和离线行为,包括捐赠,亲环境行动,政治参与,甚至参与暴力抗议活动。自然语言处理中的各种计算方法(NLP)已被用来从文本数据中检测道德情绪,但是为了在此类主观任务中取得更好的性能,需要大量的手工注销训练数据。事实证明,以前对道德情绪注释的语料库已被证明是有价值的,并且在NLP和整个社会科学中都产生了新的见解,但仅限于Twitter。为了促进我们对道德修辞的作用的理解,我们介绍了道德基础Reddit语料库,收集了16,123个reddit评论,这些评论已从12个不同的子雷迪维特策划,由至少三个训练有素的注释者手工注释,用于8种道德情绪(即护理,相称性,平等,纯洁,权威,忠诚,瘦道,隐含/明确的道德)基于更新的道德基础理论(MFT)框架。我们使用一系列方法来为这种新的语料库(例如跨域分类和知识转移)提供基线道德句子分类结果。
translated by 谷歌翻译