我们呈现深度,第一端到端联合源通道编码(JSCC)视频传输方案,其利用深神经网络(DNN)的力量直接将视频信号映射到信道符号,组合视频压缩,信道编码并且调制步骤进入单个神经变换。我们的DNN解码器预测无失真反馈的残差,这通过占闭塞/脱离和相机运动来提高视频质量。我们同时培训不同的带宽分配网络,以允许变量带宽传输。然后,我们使用强化学习(RL)训练带宽分配网络,该钢筋学习(RL)优化视频帧之间的有限可用信道带宽的分配,以最大限度地提高整体视觉质量。我们的研究结果表明,深度可以克服悬崖效应,这在传统的分离的数字通信方案中普遍存在,并在估计和实际信道质量之间取得不匹配来实现优雅的降级。 DeepWive优于H.264视频压缩,然后在所有信道条件下的低密度奇偶校验(LDPC)代码在多尺度结构相似性指数(MS-SSIM)方面平均达到0.0462,同时跳动H.265 + LDPC平均高达0.0058。我们还说明了通过显示我们的最佳带宽分配策略优于NA \“IVE统一分配来优化JSCC视频传输中的带宽分配的重要性。我们相信这是实现端到端潜力的重要一步优化的JSCC无线视频传输系统优于当前的基于分离的设计。
translated by 谷歌翻译
最近的作品表明,现代机器学习技术可以为长期存在的联合源通道编码(JSCC)问题提供另一种方法。非常有希望的初始结果,优于使用单独的源代码和通道代码的流行数字方案,已被证明用于使用深神经网络(DNNS)的无线图像和视频传输。但是,此类方案的端到端培训需要可区分的通道输入表示。因此,先前的工作假设可以通过通道传输任何复杂值。这可以防止在硬件或协议只能接收数字星座规定的某些频道输入集的情况下应用这些代码。本文中,我们建议使用有限通道输入字母的端到端优化的JSCC解决方案DeepJSCC-Q。我们表明,DEEPJSCC-Q可以实现与允许任何复杂的有价值通道输入的先前作品相似的性能,尤其是在可用的高调制订单时,并且在调制顺序增加的情况下,性能渐近接近无约束通道输入的情况。重要的是,DEEPJSCC-Q保留了不可预测的渠道条件下图像质量的优雅降级,这是在频道迅速变化的移动系统中部署的理想属性。
translated by 谷歌翻译
最近的作品表明,可以通过使用机器学习技术来学习图像的无线传输的任务。已经通过训练了自动化器,非常有前沿图像质量,优于利用源和信道编码分离的流行数字方案,以具有中间的不可培训的沟道层,优于利用源和信道编码分离。然而,这些方法假设可以通过信道传输任何复数,这可以防止硬件或协议只能承认某些信道输入的场景中的算法,例如使用数字星座的使用。这里,我们提出了DeepJSCC-Q,用于无线图像传输的端到端优化的联合源信道编码方案,其能够用固定信道输入字母操作。我们表明DeepJSCC-Q可以对使用连续值通道输入的模型来实现类似的性能。重要的是,在信道条件恶化的情况下,保留在现有工作中观察到的图像质量的正常劣化,使DeepJSCC-Q在实际系统中部署更具吸引力。
translated by 谷歌翻译
Most semantic communication systems leverage deep learning models to provide end-to-end transmission performance surpassing the established source and channel coding approaches. While, so far, research has mainly focused on architecture and model improvements, but such a model trained over a full dataset and ergodic channel responses is unlikely to be optimal for every test instance. Due to limitations on the model capacity and imperfect optimization and generalization, such learned models will be suboptimal especially when the testing data distribution or channel response is different from that in the training phase, as is likely to be the case in practice. To tackle this, in this paper, we propose a novel semantic communication paradigm by leveraging the deep learning model's overfitting property. Our model can for instance be updated after deployment, which can further lead to substantial gains in terms of the transmission rate-distortion (RD) performance. This new system is named adaptive semantic communication (ASC). In our ASC system, the ingredients of wireless transmitted stream include both the semantic representations of source data and the adapted decoder model parameters. Specifically, we take the overfitting concept to the extreme, proposing a series of ingenious methods to adapt the semantic codec or representations to an individual data or channel state instance. The whole ASC system design is formulated as an optimization problem whose goal is to minimize the loss function that is a tripartite tradeoff among the data rate, model rate, and distortion terms. The experiments (including user study) verify the effectiveness and efficiency of our ASC system. Notably, the substantial gain of our overfitted coding paradigm can catalyze semantic communication upgrading to a new era.
translated by 谷歌翻译
迄今为止,通信系统主要旨在可靠地交流位序列。这种方法提供了有效的工程设计,这些设计对消息的含义或消息交换所旨在实现的目标不可知。但是,下一代系统可以通过将消息语义和沟通目标折叠到其设计中来丰富。此外,可以使这些系统了解进行交流交流的环境,从而为新颖的设计见解提供途径。本教程总结了迄今为止的努力,从早期改编,语义意识和以任务为导向的通信开始,涵盖了基础,算法和潜在的实现。重点是利用信息理论提供基础的方法,以及学习在语义和任务感知通信中的重要作用。
translated by 谷歌翻译
在本文中,我们提出了一类新的高效的深源通道编码方法,可以在非线性变换下的源分布下,可以在名称非线性变换源通道编码(NTSCC)下收集。在所考虑的模型中,发射器首先了解非线性分析变换以将源数据映射到潜伏空间中,然后通过深关节源通道编码将潜在的表示发送到接收器。我们的模型在有效提取源语义特征并提供源通道编码的侧面信息之前,我们的模型包括强度。与现有的传统深度联合源通道编码方法不同,所提出的NTSCC基本上学习源潜像和熵模型,作为先前的潜在表示。因此,开发了新的自适应速率传输和高辅助辅助编解码器改进机制以升级深关节源通道编码。整个系统设计被制定为优化问题,其目标是最小化建立感知质量指标下的端到端传输率失真性能。在简单的示例源和测试图像源上,我们发现所提出的NTSCC传输方法通常优于使用标准的深关节源通道编码和基于经典分离的数字传输的模拟传输。值得注意的是,由于其剧烈的内容感知能力,所提出的NTSCC方法可能会支持未来的语义通信。
translated by 谷歌翻译
State-of-the-art performance for many emerging edge applications is achieved by deep neural networks (DNNs). Often, these DNNs are location and time sensitive, and the parameters of a specific DNN must be delivered from an edge server to the edge device rapidly and efficiently to carry out time-sensitive inference tasks. In this paper, we introduce AirNet, a novel training and transmission method that allows efficient wireless delivery of DNNs under stringent transmit power and latency constraints. We first train the DNN with noise injection to counter the wireless channel noise. Then we employ pruning to reduce the network size to the available channel bandwidth, and perform knowledge distillation from a larger model to achieve satisfactory performance, despite pruning. We show that AirNet achieves significantly higher test accuracy compared to digital alternatives under the same bandwidth and power constraints. The accuracy of the network at the receiver also exhibits graceful degradation with channel quality, which reduces the requirement for accurate channel estimation. We further improve the performance of AirNet by pruning the network below the available bandwidth, and using channel expansion to provide better robustness against channel noise. We also benefit from unequal error protection (UEP) by selectively expanding more important layers of the network. Finally, we develop an ensemble training approach, which trains a whole spectrum of DNNs, each of which can be used at different channel condition, resolving the impractical memory requirements.
translated by 谷歌翻译
语义通信引起了人们的兴趣,因为它可以显着减少在不丢失关键信息的情况下要传输的数据量。大多数现有作品都探索文本的语义编码和传输,并在自然语言处理(NLP)中应用技术来解释文本的含义。在本文中,我们构想了图像数据的语义通信,这些语义数据在语义和带宽敏感方面更为丰富。我们提出了一种基于增强学习的自适应语义编码(RL-ASC)方法,该方法编码超过像素级别的图像。首先,我们定义了图像数据的语义概念,该概念包括类别,空间布置和视觉特征作为表示单元,并提出卷积语义编码器以提取语义概念。其次,我们提出了图像重建标准,该标准从传统像素的相似性演变为语义相似性和感知性能。第三,我们设计了一种基于RL的新型语义位分配模型,其奖励是用自适应量化水平编码某个语义概念后的速率语义感知性能的提高。因此,与任务相关的信息得到正确保存和重建,同时丢弃了较少重要的数据。最后,我们提出了基于生成的对抗网(GAN)的语义解码器,该语义解码器通过注意模块融合本地和全球特征。实验结果表明,所提出的RL-ASC具有噪声稳定性,可以重建视觉上令人愉悦和语义一致的图像,并节省与标准编解码器和其他基于深度学习的图像编解码器相比,可以节省位置的时间。
translated by 谷歌翻译
Ultra-reliable short-packet communication is a major challenge in future wireless networks with critical applications. To achieve ultra-reliable communications beyond 99.999%, this paper envisions a new interaction-based communication paradigm that exploits feedback from the receiver. We present AttentionCode, a new class of feedback codes leveraging deep learning (DL) technologies. The underpinnings of AttentionCode are three architectural innovations: AttentionNet, input restructuring, and adaptation to fading channels, accompanied by several training methods, including large-batch training, distributed learning, look-ahead optimizer, training-test signal-to-noise ratio (SNR) mismatch, and curriculum learning. The training methods can potentially be generalized to other wireless communication applications with machine learning. Numerical experiments verify that AttentionCode establishes a new state of the art among all DL-based feedback codes in both additive white Gaussian noise (AWGN) channels and fading channels. In AWGN channels with noiseless feedback, for example, AttentionCode achieves a block error rate (BLER) of $10^{-7}$ when the forward channel SNR is 0 dB for a block size of 50 bits, demonstrating the potential of AttentionCode to provide ultra-reliable short-packet communications.
translated by 谷歌翻译
Conventional video compression approaches use the predictive coding architecture and encode the corresponding motion information and residual information. In this paper, taking advantage of both classical architecture in the conventional video compression method and the powerful nonlinear representation ability of neural networks, we propose the first end-to-end video compression deep model that jointly optimizes all the components for video compression. Specifically, learning based optical flow estimation is utilized to obtain the motion information and reconstruct the current frames. Then we employ two auto-encoder style neural networks to compress the corresponding motion and residual information. All the modules are jointly learned through a single loss function, in which they collaborate with each other by considering the trade-off between reducing the number of compression bits and improving quality of the decoded video. Experimental results show that the proposed approach can outperform the widely used video coding standard H.264 in terms of PSNR and be even on par with the latest standard H.265 in terms of MS-SSIM. Code is released at https://github.com/GuoLusjtu/DVC. * Corresponding author (a) Original frame (Bpp/MS-SSIM) (b) H.264 (0.0540Bpp/0.945) (c) H.265 (0.082Bpp/0.960) (d) Ours ( 0.0529Bpp/ 0.961
translated by 谷歌翻译
传统的视频压缩(VC)方法基于运动补偿变换编码,并且由于端到端优化问题的组合性质,运动估计,模式和量化参数选择的步骤和熵编码是单独优化的。学习VC允许同时对端到端速率失真(R-D)优化非线性变换,运动和熵模型的优化训练。大多数工作都在学习VC基于R-D损耗对连续帧的对考虑连续视频编解码器的端到端优化。它在传统的VC中众所周知的是,双向编码优于顺序压缩,因为它能够使用过去和未来的参考帧。本文提出了一种学习的分层双向视频编解码器(LHBDC),其结合了分层运动补偿预测和端到端优化的益处。实验结果表明,我们达到了迄今为​​止在PSNR和MS-SSIM中的学习VC方案报告的最佳R-D结果。与传统的视频编解码器相比,我们的端到端优化编解码器的RD性能优于PSNR和MS-SSIM中的X265和SVT-HEVC编码器(“非常流”预设)以及MS-中的HM 16.23参考软件。 SSIM。我们提出了由于所提出的新颖工具,例如学习屏蔽,流场附带和时间流量矢量预测等新颖工具,展示了表现出性能提升。重现我们结果的模型和说明可以在https://github.com/makinyilmaz/lhbdc/中找到
translated by 谷歌翻译
在视频压缩中,通过运动和剩余补偿从先前解码的帧重复使用像素来提高编码效率。我们在视频帧中定义了两个层次冗余的两个级别:1)一阶:像素空间中的冗余,即跨相邻帧的像素值的相似性,该框架的相似性是通过运动和残差补偿有效捕获的,2)二阶:二阶:冗余:自然视频中的平稳运动引起的运动和残留地图。尽管大多数现有的神经视频编码文献都涉及一阶冗余,但我们解决了通过预测变量在神经视频编解码器中捕获二阶冗余的问题。我们引入了通用运动和残留预测因子,这些预测因素学会从先前解码的数据中推断出来。这些预测因子是轻量级的,可以使用大多数神经视频编解码器来提高其率延伸性能。此外,虽然RGB是神经视频编码文献中的主导色彩空间,但我们引入了神经视频编解码器的一般修改,以包含YUV420 Colorspace并报告YUV420的结果。我们的实验表明,使用众所周知的神经视频编解码器使用我们的预测因子可在UVG数据集中测得的RGB和YUV420 Colorspace中节省38%和34%的比特率。
translated by 谷歌翻译
我们引入基于实例自适应学习的视频压缩算法。在要传输的每个视频序列上,我们介绍了预训练的压缩模型。最佳参数与潜在代码一起发送到接收器。通过熵编码在合适的混合模型下的参数更新,我们确保可以有效地编码网络参数。该实例自适应压缩算法对于基础模型的选择是不可知的,并且具有改进任何神经视频编解码器的可能性。在UVG,HEVC和XIPH数据集上,我们的CODEC通过21%至26%的BD速率节省,提高了低延迟尺度空间流量模型的性能,以及最先进的B帧模型17至20%的BD速率储蓄。我们还证明了实例 - 自适应FineTuning改善了域移位的鲁棒性。最后,我们的方法降低了压缩模型的容量要求。我们表明它即使在将网络大小减少72%之后也能实现最先进的性能。
translated by 谷歌翻译
基于深度学习(DL)的联合源通道编码(DEEPJSCC)的最新进展导致了语义通信的新范式。基于DEEPJSCC的语义通信的两个显着特征是直接从源信号中对语义感知功能的开发以及这些功能的离散时间模拟传输(DTAT)。与传统的数字通信相比,与DEEPJSCC的语义通信在接收器上提供了出色的重建性能,并具有较高的频道质量降解,但在传输信号中也表现出较大的峰值功率比(PAPR)。一个空旷的问题是,DeepJSCC的收益是否来自高PAPR连续振幅信号带来的额外自由。在本文中,我们通过在图像传输的应用中探索三种PAPR还原技术来解决这个问题。我们确认,基于DEEPJSCC的语义通信的出色图像重建性能可以保留,而传输的PAPR被抑制至可接受的水平。该观察是在实用语义通信系统中实施DEEPJSCC的重要一步。
translated by 谷歌翻译
在多输入多输出(MIMO)系统中使用深度自动码器(DAE)进行端到端通信,是一种具有重要潜力的新概念。在误码率(BER)方面,已示出DAE-ADED MIMO以占地识别的奇异值分解(SVD)为基础的预编码MIMO。本文提出将信道矩阵的左右奇异矢量嵌入到DAE编码器和解码器中,以进一步提高MIMO空间复用的性能。 SVD嵌入式DAE主要优于BER的理论线性预编码。这是显着的,因为它表明所提出的DAES通过将通信系统视为单个端到端优化块来超出当前系统设计的极限。基于仿真结果,在SNR = 10dB,所提出的SVD嵌入式设计可以实现近10美元,并将BER减少至少10次,而没有SVD,相比增长了18倍的增长率最高18倍具有理论线性预编码。我们将这一点归因于所提出的DAE可以将输入和输出与具有有限字母输入的自适应调制结构匹配。我们还观察到添加到DAE的剩余连接进一步提高了性能。
translated by 谷歌翻译
Along with the springing up of semantics-empowered communication (SemCom) researches, it is now witnessing an unprecedentedly growing interest towards a wide range of aspects (e.g., theories, applications, metrics and implementations) in both academia and industry. In this work, we primarily aim to provide a comprehensive survey on both the background and research taxonomy, as well as a detailed technical tutorial. Specifically, we start by reviewing the literature and answering the "what" and "why" questions in semantic transmissions. Afterwards, we present corresponding ecosystems, including theories, metrics, datasets and toolkits, on top of which the taxonomy for research directions is presented. Furthermore, we propose to categorize the critical enabling techniques by explicit and implicit reasoning-based methods, and elaborate on how they evolve and contribute to modern content \& channel semantics-empowered communications. Besides reviewing and summarizing the latest efforts in SemCom, we discuss the relations with other communication levels (e.g., reliable and goal-oriented communications) from a holistic and unified viewpoint. Subsequently, in order to facilitate the future developments and industrial applications, we also highlight advanced practical techniques for boosting semantic accuracy, robustness, and large-scale scalability, just to mention a few. Finally, we discuss the technical challenges that shed light on future research opportunities.
translated by 谷歌翻译
我们提出了一种压缩具有隐式神经表示的全分辨率视频序列的方法。每个帧表示为映射坐标位置到像素值的神经网络。我们使用单独的隐式网络来调制坐标输入,从而实现帧之间的有效运动补偿。与一个小的残余网络一起,这允许我们有效地相对于前一帧压缩p帧。通过使用学习的整数量化存储网络权重,我们进一步降低了比特率。我们呼叫隐式像素流(IPF)的方法,提供了几种超简化的既定神经视频编解码器:它不需要接收器可以访问预先磨普的神经网络,不使用昂贵的内插基翘曲操作,而不是需要单独的培训数据集。我们展示了神经隐式压缩对图像和视频数据的可行性。
translated by 谷歌翻译
学习的视频压缩最近成为开发高级视频压缩技术的重要研究主题,其中运动补偿被认为是最具挑战性的问题之一。在本文中,我们通过异质变形补偿策略(HDCVC)提出了一个学识渊博的视频压缩框架,以解决由单尺度可变形的特征域中单尺可变形核引起的不稳定压缩性能的问题。更具体地说,所提出的算法提取物从两个相邻框架中提取的算法提取物特征来估算估计内容自适应的异质变形(Hetdeform)内核偏移量,而不是利用光流或单尺内核变形对齐。然后,我们将参考特征转换为HetDeform卷积以完成运动补偿。此外,我们设计了一个空间 - 邻化的分裂归一化(SNCDN),以实现更有效的数据高斯化结合了广义分裂的归一化。此外,我们提出了一个多框架增强的重建模块,用于利用上下文和时间信息以提高质量。实验结果表明,HDCVC比最近最新学习的视频压缩方法取得了优越的性能。
translated by 谷歌翻译
The current optical communication systems minimize bit or symbol errors without considering the semantic meaning behind digital bits, thus transmitting a lot of unnecessary information. We propose and experimentally demonstrate a semantic optical fiber communication (SOFC) system. Instead of encoding information into bits for transmission, semantic information is extracted from the source using deep learning. The generated semantic symbols are then directly transmitted through an optical fiber. Compared with the bit-based structure, the SOFC system achieved higher information compression and a more stable performance, especially in the low received optical power regime, and enhanced the robustness against optical link impairments. This work introduces an intelligent optical communication system at the human analytical thinking level, which is a significant step toward a breakthrough in the current optical communication architecture.
translated by 谷歌翻译
视频编码技术已不断改进,以更高的分辨率以更高的压缩比。但是,最先进的视频编码标准(例如H.265/HEVC和多功能视频编码)仍在设计中,该假设将被人类观看。随着深度神经网络在解决计算机视觉任务方面的巨大进步和成熟,越来越多的视频通过无人参与的深度神经网络直接分析。当计算机视觉应用程序使用压缩视频时,这种传统的视频编码标准设计并不是最佳的。尽管人类视觉系统对具有高对比度的内容一直敏感,但像素对计算机视觉算法的影响是由特定的计算机视觉任务驱动的。在本文中,我们探索并总结了计算机视觉任务的视频编码和新兴视频编码标准,机器的视频编码。
translated by 谷歌翻译