It has been observed in practice that applying pruning-at-initialization methods to neural networks and training the sparsified networks can not only retain the testing performance of the original dense models, but also sometimes even slightly boost the generalization performance. Theoretical understanding for such experimental observations are yet to be developed. This work makes the first attempt to study how different pruning fractions affect the model's gradient descent dynamics and generalization. Specifically, this work considers a classification task for overparameterized two-layer neural networks, where the network is randomly pruned according to different rates at the initialization. It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero and the network exhibits good generalization performance. More surprisingly, the generalization bound gets better as the pruning fraction gets larger. To complement this positive result, this work further shows a negative result: there exists a large pruning fraction such that while gradient descent is still able to drive the training loss toward zero (by memorizing noise), the generalization performance is no better than random guessing. This further suggests that pruning can change the feature learning process, which leads to the performance drop of the pruned neural network. Up to our knowledge, this is the \textbf{first} generalization result for pruned neural networks, suggesting that pruning can improve the neural network's generalization.
translated by 谷歌翻译
时间动作本地化旨在预测未修剪长视频中每个动作实例的边界和类别。基于锚或建议的大多数先前方法忽略了整个视频序列中的全局本地上下文相互作用。此外,他们的多阶段设计无法直接生成动作边界和类别。为了解决上述问题,本文提出了一种新颖的端到端模型,称为自适应感知变压器(简称apperformer)。具体而言,Adaperformer探索了双支球多头的自我发项机制。一个分支会照顾全球感知的关注,该注意力可以模拟整个视频序列并汇总全球相关环境。而其他分支集中于局部卷积转移,以通过我们的双向移动操作来汇总框架内和框架间信息。端到端性质在没有额外步骤的情况下产生视频动作的边界和类别。提供了广泛的实验以及消融研究,以揭示我们设计的有效性。我们的方法在Thumos14数据集上实现了最先进的准确性(根据map@0.5、42.6 \%map@0.7和62.7 \%map@avg),并在活动网络上获得竞争性能, -1.3数据集,平均地图为36.1 \%。代码和型号可在https://github.com/soupero/adaperformer上找到。
translated by 谷歌翻译
视频对象检测(VID)是具有挑战性的,因为对象外观的较高变化以及某些帧中的不同变化。在正面,与静止图像相比,视频的某个框架中的检测可以吸引其他帧的支撑。因此,如何在不同框架上汇总特征对于VID问题至关重要。大多数现有的聚合算法都是针对两阶段探测器定制的。但是,由于两阶段的性质,该类别中的探测器通常在计算上很昂贵。这项工作提出了一种简单而有效的策略来解决上述问题,该策略花费了很高的准确性上的边缘开销。具体而言,我们与传统的两阶段管道不同,我们主张在单阶段检测之后放置区域级别的选择,以避免处理大量的低质量候选者。此外,还构建了一个新的模块来评估目标框架及其参考的关系,并指导聚合。进行了广泛的实验和消融研究,以验证我们的设计功效,并揭示其优于其他最先进的VID方法的优势。我们的基于YOLOX的模型可以实现有希望的性能(例如,在单个2080TI GPU上的Imagenet VID数据集上的30 fps的87.5%AP50)使其对大规模或实时应用程序有吸引力。实现很简单,演示代码和模型已在https://github.com/yuhengsss/yolov上提供。
translated by 谷歌翻译
在目标属性下设计和生成新数据一直吸引着各种关键应用,例如分子设计,图像编辑和语音合成。传统手工制作的方法在很大程度上依赖于专业知识经验和强化人类的努力,但仍遭受科学知识和低吞吐量的不足,无法支持有效,有效的数据生成。最近,深度学习的进步引起了可以学习数据的基本表示和属性的表达方法。这种能力为弄清数据的结构模式和功能特性之间的相互关系提供了新的机会,并利用这种关系以生成所需属性的结构数据。本文对这个有前途的研究领域进行了系统的综述,通常称为可控制的深度数据生成。首先,提出了潜在的挑战,并提供了初步的挑战。然后,正式定义了可控的深度数据生成,提出了各种技术的分类法,并总结了该特定领域中的评估指标。之后,引入了可控制的深度数据生成的令人兴奋的应用程序,并对现有的作品进行了实验分析和比较。最后,突出显示了可控制的深度数据生成的有希望的未来方向,并确定了五个潜在的挑战。
translated by 谷歌翻译
在这项工作中,我们为基于视觉的不均衡的BEV表示学习提出了PolarBev。为了适应摄像机成像的预先处理效果,我们将BEV空间横向和辐射上栅格化,并引入极性嵌入分解,以模拟极性网格之间的关联。极性网格被重新排列到类似阵列的常规表示,以进行有效处理。此外,为了确定2到3D对应关系,我们根据假设平面迭代更新BEV表面,并采用基于高度的特征转换。PolarBev在单个2080TI GPU上保持实时推理速度,并且在BEV语义分割和BEV实例分割方面都优于其他方法。展示彻底消融以验证设计。该代码将在\ url {https://github.com/superz-liu/polarbev}上发布。
translated by 谷歌翻译
在过去的十年中,电子商务的自动产品描述生成已经取得了重大进步。产品文案旨在通过通过文本描述突出产品特征来吸引用户的兴趣并改善用户体验。随着电子商务平台提供的服务变得多样化,有必要动态地调整自动生成描述的模式。在本文中,我们将基于电子商务前缀的可控文案生成(EPCCG)系统部署到JD.com电子商务产品推荐平台中的经验。系统的开发包含两个主要组成部分:1)文案写作方面提取; 2)弱监督的方面标签; 3)具有基于前缀的语言模型的文本生成; 4)文案写作质量控制。我们进行实验以验证拟议的EPCCG的有效性。此外,我们将与EPCCG合作的已部署架构介绍到实时JD.com电子商务推荐平台以及部署以来的巨大回报。
translated by 谷歌翻译
在低光环境中捕获的图像经常遭受复杂的降级。简单地调整光不可避免地导致隐藏噪声和颜色失真的突发。从退化投入寻求满足的照明,清洁和现实主义的结果​​,这篇论文提出了一种灵感来自分界和规则原则的新颖框架,大大减轻了退化纠缠。假设图像可以被分解成纹理(具有可能的噪声)和颜色分量,可以具体地执行噪声去除和颜色校正以及光调节。为此目的,我们建议将来自RGB空间的图像转换为亮度色度。可调节的噪声抑制网络设计用于消除亮度亮度的噪声,其具有估计的照明图以指示噪声升高水平。增强型亮度进一步用于色度映射器的指导,以产生现实颜色。进行了广泛的实验,揭示了我们设计的有效性,并在几个基准数据集上展示了定量和定性的最先进的替代方案的优势。我们的代码在HTTPS://github.com/mingcv/bread下公开提供。
translated by 谷歌翻译
Blind image quality assessment (BIQA) remains challenging due to the diversity of distortion and image content variation, which complicate the distortion patterns crossing different scales and aggravate the difficulty of the regression problem for BIQA. However, existing BIQA methods often fail to consider multi-scale distortion patterns and image content, and little research has been done on learning strategies to make the regression model produce better performance. In this paper, we propose a simple yet effective Progressive Multi-Task Image Quality Assessment (PMT-IQA) model, which contains a multi-scale feature extraction module (MS) and a progressive multi-task learning module (PMT), to help the model learn complex distortion patterns and better optimize the regression issue to align with the law of human learning process from easy to hard. To verify the effectiveness of the proposed PMT-IQA model, we conduct experiments on four widely used public datasets, and the experimental results indicate that the performance of PMT-IQA is superior to the comparison approaches, and both MS and PMT modules improve the model's performance.
translated by 谷歌翻译
Time-series anomaly detection is an important task and has been widely applied in the industry. Since manual data annotation is expensive and inefficient, most applications adopt unsupervised anomaly detection methods, but the results are usually sub-optimal and unsatisfactory to end customers. Weak supervision is a promising paradigm for obtaining considerable labels in a low-cost way, which enables the customers to label data by writing heuristic rules rather than annotating each instance individually. However, in the time-series domain, it is hard for people to write reasonable labeling functions as the time-series data is numerically continuous and difficult to be understood. In this paper, we propose a Label-Efficient Interactive Time-Series Anomaly Detection (LEIAD) system, which enables a user to improve the results of unsupervised anomaly detection by performing only a small amount of interactions with the system. To achieve this goal, the system integrates weak supervision and active learning collaboratively while generating labeling functions automatically using only a few labeled data. All of these techniques are complementary and can promote each other in a reinforced manner. We conduct experiments on three time-series anomaly detection datasets, demonstrating that the proposed system is superior to existing solutions in both weak supervision and active learning areas. Also, the system has been tested in a real scenario in industry to show its practicality.
translated by 谷歌翻译
As an important variant of entity alignment (EA), multi-modal entity alignment (MMEA) aims to discover identical entities across different knowledge graphs (KGs) with multiple modalities like images. However, current MMEA algorithms all adopt KG-level modality fusion strategies but ignore modality differences among individual entities, hurting the robustness to potential noise involved in modalities (e.g., unidentifiable images and relations). In this paper we present MEAformer, a multi-modal entity alignment transformer approach for meta modality hybrid, to dynamically predict the mutual correlation coefficients among modalities for instance-level feature fusion. A modal-aware hard entity replay strategy is also proposed for addressing vague entity details. Extensive experimental results show that our model not only achieves SOTA performance on multiple training scenarios including supervised, unsupervised, iterative, and low resource, but also has limited parameters, optimistic speed, and good interpretability. Our code will be available soon.
translated by 谷歌翻译