对于前列腺癌患者,Gleason评分是最重要的预后因素之一,可能决定独立于分期的治疗。然而,Gleason评分基于肿瘤形态的主观显微镜检查并且具有较差的再现性。在这里,我们提出了一个深度学习系统(DLS),用于Gleason评分前列腺切除术的全幻灯片图像。我们的系统是使用来自1,226张幻灯片的1.12亿个病理学家注释的图像片段开发的,并在331个幻灯片的独立验证数据集上进行评估,其中参考标准由泌尿生殖专家病理学家建立。在验证数据集中,29名一般病理学家的平均准确度为0.61。 DLS的诊断准确率显着提高0.70(p = 0.002),并且与临床随访数据的相关性趋向于更好的患者风险分层。我们的方法可以提高格里森评分的准确性和随后的治疗决策,特别是在专业知识不可用的情况下。 DLS还超越了当前的格里森系统,以更精细地表征和定量肿瘤形态,为格里森系统本身的细化提供了机会。
translated by 谷歌翻译
在确保良好的机器人性能的同时指定复杂的任务行为对于未经训练的用户来说可我们研究了一个框架,供用户在共享环境(如工业设施)中指定可接受行为的规则。由于非专业用户可能对这些规范如何影响机器人的性能几乎没有直觉,我们设计了一个与用户交互的学习系统,以找到最佳解决方案。使用主动偏好学习,我们迭代地显示机器人可以在接口上采用的替代路径。从用户反馈排名的替代方案中,我们了解用户对其规范的每个部分的权重。我们将用户模型从我们以前的工作扩展到离散贝叶斯学习模型,并引入了一种贪婪算法,用于提出对用户权重等价区域进行操作的替代算法。我们证明了使用该算法,修订主动学习过程收敛于用户最优路径。在现实工业环境的模拟中,我们展示了我们方法的收敛性和稳健性。
translated by 谷歌翻译
假设检验在统计推断中起着核心作用,并且在许多隐私问题至关重要的环境中使用。这项工作回答了关于私下测试简单假设的基本问题:给定两个分配$ P $和$ Q $,以及隐私等级$ \ varepsilon $,需要多少个iidsamples来区分$ P $和$ Q $受$ $ varepsilon $ -differential隐私,什么样的测试具有最佳的样本复杂度?具体来说,我们根据$ P $和$ Q $的结构以及隐私级别$ \ varepsilon $来表征此样本复杂度达到常数因子,并表明此样本复杂性是通过日志的某个随机和钳位变体实现的。似然比检验。我们的结果类似于私人假设检验中经典的Neyman-Pearson引理。我们还将结果应用于私有交换点检测。我们的表征更一般地适用于基本上满足算法稳定性概念的假设检验,已知这意味着自适应数据分析中的强泛化边界,因此即使隐私不是主要关注,我们的结果也有应用。
translated by 谷歌翻译
我们开发了一种卷积神经网络(CNN),它可以首次对液氩时间投影室(LArTPC)记录的图像数据中的物体进行像素级预测。我们描述了为培训该网络而开发的网络设计,培训技术和软件工具。这项工作的目的是为MicroBooNE探测器开发一个完整的基于深度神经网络的数据构建链。我们使用MicroBooNEcollection平面图像显示网络在实际LArTPC数据上的有效性的第一次演示。演示用于停止μ子和$ \ nu_ \ mu $充电电流中性π介数数据样本。
translated by 谷歌翻译
我们证明,当训练集上的模型预测接近给定输入的标签的真实条件分布时,固有的微观梯度噪声导致自然梯度下降的静态分布接近贝叶斯后部近局部最小值作为学习率$ \ epsilon \ rightarrow 0 $。此后部的温度$ T \ approx \ epsilon N /(2B)$由学习率,训练集大小$ N $和batchsize $ B $控制。然而,小批量NGD不是参数化不变的,因此引入“随机自然梯度下降”,其通过向平稳分布引入乘法偏差来保持参数化不变性。我们将这种偏见视为众所周知的Jeffreysprior。为了支持我们的说法,我们表明,当$ T = 1 $时,来自NGD的样本分布接近拉普拉斯近似值。此外,使用NGD绘制的集合的测试损失迅速下降,因为我们增加批量大小直到$ B \大约是eps / N $ 2,而在此点之上,测试损失是恒定的或缓慢上升。
translated by 谷歌翻译
Recent work has explored the problem of autonomous navigation by imitating a teacher and learning an end-to-end policy, which directly predicts controls from raw images. However, these approaches tend to be sensitive to mistakes by the teacher and do not scale well to other environments or vehicles. To this end, we propose Observational Imitation Learning (OIL), a novel imitation learning variant that supports online training and automatic selection of optimal behavior by observing multiple imperfect teachers. We apply our proposed methodology to the challenging problems of autonomous driving and UAV racing. For both tasks, we utilize the Sim4CV simulator [18] that enables the generation of large amounts of synthetic training data and also allows for online learning and evaluation. We train a perception network to predict waypoints from raw image data and use OIL to train another network to predict controls from these waypoints. Extensive experiments demonstrate that our trained network outperforms its teachers , conventional imitation learning (IL) and reinforcement learning (RL) baselines and even humans in simulation.
translated by 谷歌翻译
It is common practice to decay the learning rate. Here we show one canusually obtain the same learning curve on both training and test sets byinstead increasing the batch size during training. This procedure is successfulfor stochastic gradient descent (SGD), SGD with momentum, Nesterov momentum,and Adam. It reaches equivalent test accuracies after the same number oftraining epochs, but with fewer parameter updates, leading to greaterparallelism and shorter training times. We can further reduce the number ofparameter updates by increasing the learning rate $\epsilon$ and scaling thebatch size $B \propto \epsilon$. Finally, one can increase the momentumcoefficient $m$ and scale $B \propto 1/(1-m)$, although this tends to slightlyreduce the test accuracy. Crucially, our techniques allow us to repurposeexisting training schedules for large batch training with no hyper-parametertuning. We train ResNet-50 on ImageNet to $76.1\%$ validation accuracy in under30 minutes.
translated by 谷歌翻译
We consider two questions at the heart of machine learning; how can wepredict if a minimum will generalize to the test set, and why does stochasticgradient descent find minima that generalize well? Our work responds to Zhanget al. (2016), who showed deep neural networks can easily memorize randomlylabeled training data, despite generalizing well on real labels of the sameinputs. We show that the same phenomenon occurs in small linear models. Theseobservations are explained by the Bayesian evidence, which penalizes sharpminima but is invariant to model parameterization. We also demonstrate that,when one holds the learning rate fixed, there is an optimum batch size whichmaximizes the test set accuracy. We propose that the noise introduced by smallmini-batches drives the parameters towards minima whose evidence is large.Interpreting stochastic gradient descent as a stochastic differential equation,we identify the "noise scale" $g = \epsilon (\frac{N}{B} - 1) \approx \epsilonN/B$, where $\epsilon$ is the learning rate, $N$ the training set size and $B$the batch size. Consequently the optimum batch size is proportional to both thelearning rate and the size of the training set, $B_{opt} \propto \epsilon N$.We verify these predictions empirically.
translated by 谷歌翻译
Automating the navigation of unmanned aerial vehicles (UAVs) in diverse scenarios has gained much attention in recent years. However, teaching UAVs to fly in challenging environments remains an unsolved problem, mainly due to the lack of training data. In this paper, we train a deep neural network to predict UAV controls from raw image data for the task of autonomous UAV racing in a photo-realistic simulation. Training is done through imitation learning with data augmentation to allow for the correction of navigation mistakes. Extensive experiments demonstrate that our trained network (when sufficient data augmentation is used) outperforms state-of-the-art methods and flies more consistently than many human pilots. Additionally, we show that our optimized network architecture can run in real-time on embedded hardware, allowing for efficient on-board processing critical for real-world deployment. From a broader perspective, our results underline the importance of extensive data augmentation techniques to improve robustness in end-to-end learning setups.
translated by 谷歌翻译
Usually bilingual word vectors are trained "online". Mikolov et al. showedthey can also be found "offline", whereby two pre-trained embeddings arealigned with a linear transformation, using dictionaries compiled from expertknowledge. In this work, we prove that the linear transformation between twospaces should be orthogonal. This transformation can be obtained using thesingular value decomposition. We introduce a novel "inverted softmax" foridentifying translation pairs, with which we improve the precision @1 ofMikolov's original mapping from 34% to 43%, when translating a test setcomposed of both common and rare English words into Italian. Orthogonaltransformations are more robust to noise, enabling us to learn thetransformation without expert bilingual signal by constructing a"pseudo-dictionary" from the identical character strings which appear in bothlanguages, achieving 40% precision on the same test set. Finally, we extend ourmethod to retrieve the true translations of English sentences from a corpus of200k Italian sentences with a precision @1 of 68%.
translated by 谷歌翻译