HyperParameter优化是机器学习中的一种无处不在的挑战,训练型模型的性能在其有效选择时依赖于大致依赖。虽然为此目的存在丰富的工具,但目前在差分隐私(DP)的约束下,目前没有实际的超参数选择方法。我们研究鉴于差异私立机器学习的诚实的封锁,其中,在整体隐私预算中占了超代调优的过程。为此,我们)显示标准的组合工具在许多设置中优于更高级的技术,ii)经验和理论上展示了学习率和剪辑规范率HyperParameters,III之间的内在联系,表明DPADAM等自适应优化器享有显着的优势在诚实的HyperParameter调整过程中,IV)借鉴了DP设置中ADAM的新颖限制行为,以设计新的更高效的优化器。
translated by 谷歌翻译
Privacy noise may negate the benefits of using adaptive optimizers in differentially private model training. Prior works typically address this issue by using auxiliary information (e.g., public data) to boost the effectiveness of adaptive optimization. In this work, we explore techniques to estimate and efficiently adapt to gradient geometry in private adaptive optimization without auxiliary data. Motivated by the observation that adaptive methods can tolerate stale preconditioners, we propose differentially private adaptive training with delayed preconditioners (DP^2), a simple method that constructs delayed but less noisy preconditioners to better realize the benefits of adaptivity. Theoretically, we provide convergence guarantees for our method for both convex and non-convex problems, and analyze trade-offs between delay and privacy noise reduction. Empirically, we explore DP^2 across several real-world datasets, demonstrating that it can improve convergence speed by as much as 4x relative to non-adaptive baselines and match the performance of state-of-the-art optimization methods that require auxiliary data.
translated by 谷歌翻译
自适应优化方法已成为许多机器学习任务的默认求解器。不幸的是,适应性的好处可能会在具有不同隐私的训练时降低,因为噪声增加了,以确保隐私会降低自适应预处理的有效性。为此,我们提出了ADADP,这是一个使用非敏感的侧面信息来预处梯度的一般框架,从而可以在私有设置中有效使用自适应方法。我们正式显示ADADPS减少了获得类似隐私保证所需的噪声量,从而提高了优化性能。从经验上讲,我们利用简单且随时可用的侧面信息来探索实践中ADADP的性能,与集中式和联合设置中的强大基线相比。我们的结果表明,ADADP平均提高了准确性7.7%(绝对) - 在大规模文本和图像基准上产生最先进的隐私性权衡权衡。
translated by 谷歌翻译
Machine learning techniques based on neural networks are achieving remarkable results in a wide variety of domains. Often, the training of models requires large, representative datasets, which may be crowdsourced and contain sensitive information. The models should not expose private information in these datasets. Addressing this goal, we develop new algorithmic techniques for learning and a refined analysis of privacy costs within the framework of differential privacy. Our implementation and experiments demonstrate that we can train deep neural networks with non-convex objectives, under a modest privacy budget, and at a manageable cost in software complexity, training efficiency, and model quality. * Google.† OpenAI. Work done while at Google.
translated by 谷歌翻译
每个例子梯度剪辑是一个关键算法步骤,可实现对深度学习模型的实用差异私有(DP)培训。但是,剪辑规范$ r $的选择对于在DP下实现高精度至关重要。我们提出了一个易于使用的替代品,称为Autoclipping,它消除了任何DP优化器(包括DP-SGD,DP-ADAM,DP-LAMB等)调整$ R $的需求。自动变体与现有的DP优化器一样私有和计算效率,但不需要DP特定的超参数,因此使DP培训与标准的非私人培训一样适合。我们在非凸vex设置中对自动DP-SGD进行了严格的融合分析,这表明它具有与标准SGD相匹配的渐近收敛速率。我们还展示了各种语言和视觉任务,这些任务自动剪辑优于或匹配最新的,并且可以轻松使用对现有代码库的最小更改。
translated by 谷歌翻译
深度神经网络(DNNS)铰接对大型数据集的可用性的最新成功;但是,对此类数据集的培训经常为敏感培训信息构成隐私风险。在本文中,我们的目标是探讨生成模型和梯度稀疏性的力量,并提出了一种可扩展的隐私保留生成模型数据标准。与标准展示隐私保留框架相比,允许教师对一维预测进行投票,在高维梯度向量上投票在隐私保存方面具有挑战性。随着需要尺寸减少技术,我们需要在(1)之间的改进之间导航精致的权衡空间,并进行SGD收敛的放缓。为了解决这一点,我们利用通信高效学习,并通过将顶-K压缩与相应的噪声注入机构相结合,提出一种新的噪声压缩和聚集方法TopAGG。理论上,我们证明了DataLens框架保证了其生成数据的差异隐私,并提供了其收敛性的分析。为了展示DataLens的实际使用情况,我们对不同数据集进行广泛的实验,包括Mnist,Fashion-Mnist和高维Celeba,并且我们表明,DataLens显着优于其他基线DP生成模型。此外,我们改进了所提出的Topagg方法,该方法是DP SGD培训的主要构建块之一,并表明它能够在大多数情况下实现比最先进的DP SGD方法更高的效用案件。我们的代码在HTTPS://github.com/ai-secure/datalens公开提供。
translated by 谷歌翻译
我们考虑使用迷你批量梯度进行差异隐私(DP)的培训模型。现有的最先进的差异私有随机梯度下降(DP-SGD)需要通过采样或洗机来获得最佳隐私/准确性/计算权衡的隐私放大。不幸的是,在重要的实际情况下,精确采样和洗牌的精确要求可能很难获得,特别是联邦学习(FL)。我们设计和分析跟随 - 正规的领导者(DP-FTRL)的DP变体,其比较(理论上和经验地)与放大的DP-SGD相比,同时允许更灵活的数据访问模式。DP-FTRL不使用任何形式的隐私放大。该代码可在https://github.com/google-Research/federated/tree/master/dp_ftrl和https://github.com/google-reesearch/dp-ftrl处获得。
translated by 谷歌翻译
差异隐私(DP)提供了正式的隐私保证,以防止对手可以访问机器学习模型,从而从提取有关单个培训点的信息。最受欢迎的DP训练方法是差异私有随机梯度下降(DP-SGD),它通过在训练过程中注入噪声来实现这种保护。然而,以前的工作发现,DP-SGD通常会导致标准图像分类基准的性能显着降解。此外,一些作者假设DP-SGD在大型模型上固有地表现不佳,因为保留隐私所需的噪声规范与模型维度成正比。相反,我们证明了过度参数化模型上的DP-SGD可以比以前想象的要好得多。将仔细的超参数调整与简单技术结合起来,以确保信号传播并提高收敛速率,我们获得了新的SOTA,而没有额外数据的CIFAR-10,在81.4%的81.4%下(8,10^{ - 5}) - 使用40 -layer wide-Resnet,比以前的SOTA提高了71.7%。当对预训练的NFNET-F3进行微调时,我们在ImageNet(0.5,8*10^{ - 7})下达到了83.8%的TOP-1精度。此外,我们还在(8,8 \ cdot 10^{ - 7})下达到了86.7%的TOP-1精度,DP仅比当前的非私人SOTA仅4.3%。我们认为,我们的结果是缩小私人图像分类和非私有图像分类之间准确性差距的重要一步。
translated by 谷歌翻译
在联合学习(FL)设置中具有用户级差异隐私(例如,DP联合平均)培训神经网络的现有方法涉及通过*将其绘制到某些常量值的贡献限制每个用户的模型更新的贡献。但是,没有好处*先验*跨任务和学习设置的剪切规范设置:更新规范分布取决于模型架构和丢失,每个设备上的数据量,客户端学习率以及可能各种其他参数。我们提出了一种方法,其中代替固定剪切范围,一个剪辑到更新规范分布的指定定量位的值,其中定量位的值本身估计在线,具有差异隐私。该方法紧密地追踪量级,使用可忽略的隐私预算,与其他联合学习技术相容,例如压缩和安全聚合,并具有DP-Fedivg的直接联合DP分析。实验表明,适应性剪辑到中位更新规范的适应性剪辑跨越一系列现实的联合学习任务,有时甚至优于在后敏感中选择的最佳固定剪辑,而无需调整任何剪切的超参数。
translated by 谷歌翻译
Differential privacy (DP) provides a formal privacy guarantee that prevents adversaries with access to machine learning models from extracting information about individual training points. Differentially private stochastic gradient descent (DPSGD) is the most popular training method with differential privacy in image recognition. However, existing DPSGD schemes lead to significant performance degradation, which prevents the application of differential privacy. In this paper, we propose a simulated annealing-based differentially private stochastic gradient descent scheme (SA-DPSGD) which accepts a candidate update with a probability that depends both on the update quality and on the number of iterations. Through this random update screening, we make the differentially private gradient descent proceed in the right direction in each iteration, and result in a more accurate model finally. In our experiments, under the same hyperparameters, our scheme achieves test accuracies 98.35%, 87.41% and 60.92% on datasets MNIST, FashionMNIST and CIFAR10, respectively, compared to the state-of-the-art result of 98.12%, 86.33% and 59.34%. Under the freely adjusted hyperparameters, our scheme achieves even higher accuracies, 98.89%, 88.50% and 64.17%. We believe that our method has a great contribution for closing the accuracy gap between private and non-private image classification.
translated by 谷歌翻译
我们考虑一个顺序设置,其中使用单个数据集用于执行自适应选择的分析,同时确保每个参与者的差别隐私丢失不超过预先指定的隐私预算。此问题的标准方法依赖于限制所有个人对所有个人的隐私损失的最坏情况估计,以及每个单一分析的所有可能的数据值。然而,在许多情况下,这种方法过于保守,特别是对于“典型”数据点,通过参与大部分分析产生很少的隐私损失。在这项工作中,我们基于每个分析中每个人的个性化隐私损失估计的价值,给出了更严格的隐私损失会计的方法。实现我们设计R \'enyi差异隐私的过滤器。过滤器是一种工具,可确保具有自适应选择的隐私参数的组合算法序列的隐私参数不超过预先预算。我们的过滤器比以往的$(\ epsilon,\ delta)$ - rogers等人的差别隐私更简单且更紧密。我们将结果应用于对嘈杂渐变下降的分析,并显示个性化会计可以实用,易于实施,并且只能使隐私式权衡更紧密。
translated by 谷歌翻译
梯度泄漏攻击被认为是深度学习中的邪恶隐私威胁之一,因为攻击者在迭代培训期间隐蔽了梯度更新,而不会影响模型培训质量,但又使用泄漏的梯度逐步重建敏感培训数据,具有高攻击成功率。虽然具有差异隐私的深度学习是发布具有差异隐私保障的深度学习模型的违法标准,但我们展示了具有固定隐私参数的差异私有算法易受梯度泄漏攻击的影响。本文调查了差异隐私(DP)的梯度泄漏弹性深度学习的替代方法。首先,我们分析了差异隐私的深度学习的现有实现,它使用固定噪声方差使用固定隐私参数将恒定噪声对所有层中的梯度注入恒定噪声。尽管提供了DP保证,但该方法遭受了低精度,并且很容易受到梯度泄漏攻击。其次,通过使用动态隐私参数,我们提出了一种梯度泄漏弹性深度学习方法,差异隐私保证。与导致恒定噪声方差导致的固定参数策略不同,不同的动态参数策略存在替代技术,以引入自适应噪声方差和自适应噪声注入,其与差别私有模型训练期间的梯度更新的趋势紧密对齐。最后,我们描述了四个互补指标来评估和比较替代方法。
translated by 谷歌翻译
A major direction in differentially private machine learning is differentially private fine-tuning: pretraining a model on a source of "public data" and transferring the extracted features to downstream tasks. This is an important setting because many industry deployments fine-tune publicly available feature extractors on proprietary data for downstream tasks. In this paper, we use features extracted from state-of-the-art open source models to solve benchmark tasks in computer vision and natural language processing using differentially private fine-tuning. Our key insight is that by accelerating training, we can quickly drive the model parameters to regions in parameter space where the impact of noise is minimized. In doing so, we recover the same performance as non-private fine-tuning for realistic values of epsilon in [0.01, 1.0] on benchmark image classification datasets including CIFAR100.
translated by 谷歌翻译
Differentially private deep learning has recently witnessed advances in computational efficiency and privacy-utility trade-off. We explore whether further improvements along the two axes are possible and provide affirmative answers leveraging two instantiations of \emph{group-wise clipping}. To reduce the compute time overhead of private learning, we show that \emph{per-layer clipping}, where the gradient of each neural network layer is clipped separately, allows clipping to be performed in conjunction with backpropagation in differentially private optimization. This results in private learning that is as memory-efficient and almost as fast per training update as non-private learning for many workflows of interest. While per-layer clipping with constant thresholds tends to underperform standard flat clipping, per-layer clipping with adaptive thresholds matches or outperforms flat clipping under given training epoch constraints, hence attaining similar or better task performance within less wall time. To explore the limits of scaling (pretrained) models in differentially private deep learning, we privately fine-tune the 175 billion-parameter GPT-3. We bypass scaling challenges associated with clipping gradients that are distributed across multiple devices with \emph{per-device clipping} that clips the gradient of each model piece separately on its host device. Privately fine-tuning GPT-3 with per-device clipping achieves a task performance at $\epsilon=1$ better than what is attainable by non-privately fine-tuning the largest GPT-2 on a summarization task.
translated by 谷歌翻译
通过确保学习算法中的差异隐私,可以严格降低大型模型记忆敏感培训数据的风险。在本文中,我们为此目的研究了两种算法,即DP-SGD和DP-NSGD,它们首先剪辑或归一化\ textIt \ textIt {每样本}梯度以绑定灵敏度,然后添加噪声以使精确信息混淆。我们通过两个常见的假设分析了非凸优化设置中这两种算法的收敛行为,并实现了$ \ nathcal {o} \ left(\ sqrt [4] {\ frac {\ frac {d \ log(1/\ delta) )} {n^2 \ epsilon^2}} \ right)$ $ d $ - 二维模型,$ n $ samples和$(\ epsilon,\ delta)$ - dp,它改进了以前的改进在较弱的假设下的界限。具体而言,我们在DP-NSGD中引入了一个正规化因素,并表明它对融合证明至关重要,并巧妙地控制了偏见和噪声权衡。我们的证明故意处理针对私人环境指定的按样本梯度剪辑和标准化。从经验上讲,我们证明这两种算法达到了相似的最佳准确性,而DP-NSGD比DP-SGD更容易调整,因此在计算调整工作时可能有助于进一步节省隐私预算。
translated by 谷歌翻译
在本文中,我们研究了差异化的私人经验风险最小化(DP-erm)。已经表明,随着尺寸的增加,DP-MER的(最坏的)效用会减小。这是私下学习大型机器学习模型的主要障碍。在高维度中,某些模型的参数通常比其他参数更多的信息是常见的。为了利用这一点,我们提出了一个差异化的私有贪婪坐标下降(DP-GCD)算法。在每次迭代中,DP-GCD私人沿梯度(大约)最大条目执行坐标梯度步骤。从理论上讲,DP-GCD可以通过利用问题解决方案的结构特性(例如稀疏性或准方面的)来改善实用性,并在早期迭代中取得非常快速的进展。然后,我们在合成数据集和真实数据集上以数值说明。最后,我们描述了未来工作的有前途的方向。
translated by 谷歌翻译
隐私和沟通效率是联邦神经网络培训中的重要挑战,并将它们组合仍然是一个公开的问题。在这项工作中,我们开发了一种统一高度压缩通信和差异隐私(DP)的方法。我们引入基于相对熵编码(REC)到联合设置的压缩技术。通过对REC进行微小的修改,我们获得了一种可怕的私立学习算法,DP-REC,并展示了如何计算其隐私保证。我们的实验表明,DP-REC大大降低了通信成本,同时提供与最先进的隐私保证。
translated by 谷歌翻译
虽然在巨大数据上培训的机器学习模型导致了几个领域的断路器,但由于限制数据的访问,他们在隐私敏感域中的部署仍然有限。在私有数据上具有隐私约束的生成模型可以避免此挑战,而是提供对私有数据的间接访问。我们提出DP-Sinkhorn,一种新的最优传输的生成方法,用于从具有差异隐私的私有数据学习数据分布。 DP-Sinkhorn以差别私人方式在模型和数据之间的模型和数据之间最小化陷阱的分歧,将计算上有效的近似值,并在模型和数据之间使用新技术来控制梯度估计的偏差差异的偏差折衷。与现有的培训方法不同,差异私人生成模型主要基于生成的对抗网络,我们不依赖于对抗性目标,这令人惊叹的难以优化,特别是在隐私约束所施加的噪声存在下。因此,DP-Sinkhorn易于训练和部署。通过实验,我们改进了多种图像建模基准的最先进,并显示了差异私有的信息RGB图像综合。项目页面:https://nv-tlabs.github.io/dp-sinkhorn。
translated by 谷歌翻译
Differential privacy is a strong notion for privacy that can be used to prove formal guarantees, in terms of a privacy budget, , about how much information is leaked by a mechanism. However, implementations of privacy-preserving machine learning often select large values of in order to get acceptable utility of the model, with little understanding of the impact of such choices on meaningful privacy. Moreover, in scenarios where iterative learning procedures are used, differential privacy variants that offer tighter analyses are used which appear to reduce the needed privacy budget but present poorly understood trade-offs between privacy and utility. In this paper, we quantify the impact of these choices on privacy in experiments with logistic regression and neural network models. Our main finding is that there is a huge gap between the upper bounds on privacy loss that can be guaranteed, even with advanced mechanisms, and the effective privacy loss that can be measured using current inference attacks. Current mechanisms for differentially private machine learning rarely offer acceptable utility-privacy trade-offs with guarantees for complex learning tasks: settings that provide limited accuracy loss provide meaningless privacy guarantees, and settings that provide strong privacy guarantees result in useless models.
translated by 谷歌翻译
我们重新审视使​​用公共数据来改善差异私有(DP)模型培训的隐私/实用权折衷的问题。在这里,公共数据是指没有隐私问题的辅助数据集。我们考虑与私人培训数据相同的分发的公共数据。对于凸损失,我们表明镜子血清的变体提供了与模型的维度($ p $)的人口风险保证。具体地,我们将镜像血液应用于由公共数据生成的丢失作为镜像映射,并使用私有(敏感)数据生成的丢失的DP梯度。为了获得维度独立性,我们需要$ g_q ^ 2 \ leq p $公共数据样本,其中$ g_q $是损失功能各向同性的量度。我们进一步表明,我们的算法具有天然的“噪音稳定性”属性:如果围绕当前迭代公共损失,请以$ V $的方向满足$ \ alpha_v $ -strong凸性,然后使用嘈杂的渐变而不是确切的渐变偏移我们的下一次迭代$ v $ v $比例为$ 1 / alpha_v $(与DP-SGD相比,换档是各向同性的)。在前作品中的类似结果必须使用预处理器矩阵形式的公共数据明确地学习几何图形。我们的方法也适用于非凸损失,因为它不依赖于凸起假设以确保DP保证。我们通过显示线性回归,深度学习基准数据集(Wikitext-2,Cifar-10和Emnist)以及联合学习(StackOverflow)来证明我们的算法的经验效果。我们表明,我们的算法不仅显着改善了传统的DP-SGD和DP-FedAVG,它没有访问公共数据,而且还可以改善DP-SGD和DP-FedAVG对已与公众预先培训的模型数据开始。
translated by 谷歌翻译