差异隐私对于具有严格的隐私保证的统计和机器学习算法的现实部署至关重要。为了释放样品平均值,最早开发了差异隐私机制的统计查询。在几何统计中,样本fr \'echet均值代表了最基本的统计摘要之一,因为它概括了属于非线性歧管的数据的样本均值。本着这种精神,到目前为止,已经开发出差异隐私机制的唯一几何统计查询是用于释放样本fr \'echet的含义:最近提出了\ emph {riemannian laplace机制},以使FR私有化FR私有化\'echet的意思是完全riemannian歧管。在许多领域中,对称正定(SPD)矩阵的流形用于对数据空间进行建模,包括在隐私要求是关键的医学成像中。我们提出了一种新颖,简单且快速的机制 - \ emph {切线高斯机构} - 以计算赋予二型e echet的差异私有fr \'echet echet echet echet echet echet echet echet echet echet echet echet echet echet echet echet echet echet echet echet echet echet echet echet echet echet echet echet含量均为ecly -eeclidean riemannian metric。我们表明,我们的新机制在当前和仅可用的基线方面就数据维度获得了二次实用性改进。我们的机制在实践中也更简单,因为它不需要任何昂贵的马尔可夫链蒙特卡洛(MCMC)采样,并且通过多个数量级的计算速度更快 - 如广泛的实验所证实。
translated by 谷歌翻译
我们介绍了$ \ pi $ -test,这是一种用于测试跨多方数据分布的数据之间的统计独立性的隐私保护算法。我们的算法依赖于私人估计数据集之间的距离相关性,这是SZ \'ekely等人中引入的独立性的定量度量。[2007]。我们在差异私有测试的实用性上建立了加法和乘法误差界,我们相信在涉及敏感数据的各种分布式假设测试设置中,我们会发现应用程序。
translated by 谷歌翻译
本文为视觉变压器(VIT)体系结构提供了分布式学习解决方案。与卷积神经网络(CNN)架构相比,VIT通常具有较大的模型尺寸,并且计算昂贵,从而使联合学习(FL)不适合使用。拆分学习(SL)可以通过分裂模型并在拆分层上传达隐藏的表示形式(也称为粉碎的数据)来避开此问题。尽管如此,VIT的粉碎数据与输入数据一样大,在违反数据隐私时否定了SL的通信效率。为了解决这些问题,我们通过随机打孔和压缩原始粉碎的数据来提出一种新形式的切割数据。利用这一点,我们为VIT,CUTMIXSL开发了一个新颖的SL框架,并传达了切割的数据。 cutmixsl不仅降低了通信成本和隐私泄漏,而且固有地涉及cutmix数据增强,从而提高了准确性和可扩展性。模拟证实了cutmixsl的表现优于平行的SL等基线,并将其与SL集成在一起。
translated by 谷歌翻译
近年来,与私人数据的分散学习领域有很大进展。联合学习(FL)和分裂学习(SL)是两个拥有其优点和缺点的矛头,并分别适用于许多用户客户和大型型号。为了享受这两个好处,斯普利特这样的混合方法已经出现了迟到,但他们的基本面仍然是虚幻的。在这项工作中,我们首先识别SL的基本瓶颈,从而提出可伸缩的SL框架,被卷曲的SGLR。 SGLR下的服务器在分裂层上广播了平均的公共梯度,在没有横跨客户端的情况下仿真FL而没有任何额外的通信。同时,SGLR将学习率分解为服务器端和客户端速率,并单独调整它们以支持许多客户端。仿真结果证实了SGLR实现比其他基线SL方法更高的精度,包括分裂,这甚至是与耗能更高的能量和通信成本的影响。作为次要结果,我们通过使用SLGR通过基线通过相互信息观察更大的敏感信息泄漏。
translated by 谷歌翻译
分布式深度学习框架,如联合学习(FL)及其变体都是在广泛的Web客户端和移动/ IOT设备上实现个性化体验。然而,由于模型参数的爆炸增长(例如,十亿参数模型),基于FL的框架受到客户的计算资源的限制。拆分学习(SL),最近的框架,通过拆分客户端和服务器之间的模型培训来减少客户端计算负载。这种灵活性对于低计算设置非常有用,但通常以带宽消耗的增加成本而实现,并且可能导致次优化会聚,尤其是当客户数据异构时。在这项工作中,我们介绍了adasplit,通过降低带宽消耗并提高异构客户端的性能,使得能够将SL有效地缩放到低资源场景。为了捕获和基准的分布式深度学习的多维性质,我们还介绍了C3分数,是评估资源预算下的性能。我们通过与强大联邦和分裂学习基线的大量实验比较进行了大量实验比较,验证了adasplit在有限的资源下的有效性。我们还展示了adasplit中关键设计选择的敏感性分析,该选择验证了adasplit在可变资源预算中提供适应性权衡的能力。
translated by 谷歌翻译
我们介绍了一种差别的私有方法来测量遍布两个实体托管的敏感数据之间的非线性相关性。我们提供私人估算器的实用程序保障。我们是第一个非线性相关性的私人估算器,据我们在多方设置中的知识中最好。我们认为的非线性相关的重要措施是距离相关性。这项工作具有直接应用于私有功能筛选,私人独立测试,私人K样品测试,私有多方因果推断和私有数据综合,除了探索数据分析。代码访问:公开访问的链接在补充文件中提供了代码。
translated by 谷歌翻译
Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while keeping the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches. Motivated by the explosive growth in FL research, this paper discusses recent advances and presents an extensive collection of open problems and challenges.
translated by 谷歌翻译
Understanding the ambient scene is imperative for several applications such as autonomous driving and navigation. While obtaining real-world image data with per-pixel labels is challenging, existing accurate synthetic image datasets primarily focus on indoor spaces with fixed lighting and scene participants, thereby severely limiting their application to outdoor scenarios. In this work we introduce OmniHorizon, a synthetic dataset with 24,335 omnidirectional views comprising of a broad range of indoor and outdoor spaces consisting of buildings, streets, and diverse vegetation. Our dataset also accounts for dynamic scene components including lighting, different times of a day settings, pedestrians, and vehicles. Furthermore, we also demonstrate a learned synthetic-to-real cross-domain inference method for in-the-wild 3D scene depth and normal estimation method using our dataset. To this end, we propose UBotNet, an architecture based on a UNet and a Bottleneck Transformer, to estimate scene-consistent normals. We show that UBotNet achieves significantly improved depth accuracy (4.6%) and normal estimation (5.75%) compared to several existing networks such as U-Net with skip-connections. Finally, we demonstrate in-the-wild depth and normal estimation on real-world images with UBotNet trained purely on our OmniHorizon dataset, showing the promise of proposed dataset and network for scene understanding.
translated by 谷歌翻译
A large portion of today's world population suffer from vision impairments and wear prescription eyeglasses. However, eyeglasses causes additional bulk and discomfort when used with augmented and virtual reality headsets, thereby negatively impacting the viewer's visual experience. In this work, we remedy the usage of prescription eyeglasses in Virtual Reality (VR) headsets by shifting the optical complexity completely into software and propose a prescription-aware rendering approach for providing sharper and immersive VR imagery. To this end, we develop a differentiable display and visual perception model encapsulating display-specific parameters, color and visual acuity of human visual system and the user-specific refractive errors. Using this differentiable visual perception model, we optimize the rendered imagery in the display using stochastic gradient-descent solvers. This way, we provide prescription glasses-free sharper images for a person with vision impairments. We evaluate our approach on various displays, including desktops and VR headsets, and show significant quality and contrast improvements for users with vision impairments.
translated by 谷歌翻译
Kernel matrices, as well as weighted graphs represented by them, are ubiquitous objects in machine learning, statistics and other related fields. The main drawback of using kernel methods (learning and inference using kernel matrices) is efficiency -- given $n$ input points, most kernel-based algorithms need to materialize the full $n \times n$ kernel matrix before performing any subsequent computation, thus incurring $\Omega(n^2)$ runtime. Breaking this quadratic barrier for various problems has therefore, been a subject of extensive research efforts. We break the quadratic barrier and obtain $\textit{subquadratic}$ time algorithms for several fundamental linear-algebraic and graph processing primitives, including approximating the top eigenvalue and eigenvector, spectral sparsification, solving linear systems, local clustering, low-rank approximation, arboricity estimation and counting weighted triangles. We build on the recent Kernel Density Estimation framework, which (after preprocessing in time subquadratic in $n$) can return estimates of row/column sums of the kernel matrix. In particular, we develop efficient reductions from $\textit{weighted vertex}$ and $\textit{weighted edge sampling}$ on kernel graphs, $\textit{simulating random walks}$ on kernel graphs, and $\textit{importance sampling}$ on matrices to Kernel Density Estimation and show that we can generate samples from these distributions in $\textit{sublinear}$ (in the support of the distribution) time. Our reductions are the central ingredient in each of our applications and we believe they may be of independent interest. We empirically demonstrate the efficacy of our algorithms on low-rank approximation (LRA) and spectral sparsification, where we observe a $\textbf{9x}$ decrease in the number of kernel evaluations over baselines for LRA and a $\textbf{41x}$ reduction in the graph size for spectral sparsification.
translated by 谷歌翻译