The substitute-based recommendation is widely used in E-commerce to provide better alternatives to customers. However, existing research typically uses the customer behavior signals like co-view and view-but-purchase-another to capture the substitute relationship. Despite its intuitive soundness, we find that such an approach might ignore the functionality and characteristics of products. In this paper, we adapt substitute recommendation into language matching problem by taking product title description as model input to consider product functionality. We design a new transformation method to de-noise the signals derived from production data. In addition, we consider multilingual support from the engineering point of view. Our proposed end-to-end transformer-based model achieves both successes from offline and online experiments. The proposed model has been deployed in a large-scale E-commerce website for 11 marketplaces in 6 languages. Our proposed model is demonstrated to increase revenue by 19% based on an online A/B experiment.
translated by 谷歌翻译
深度学习取得了长足的进步,用于图像中的对象检测。对象检测的检测准确性和计算成本取决于图像的空间分辨率,这可能会受到相机和存储注意事项的约束。压缩通常是通过减少空间或幅度分辨率或有时两者都对性能的众所周知的影响来实现的。检测精度还取决于感兴趣的对象与摄像机的距离。我们的工作研究了空间和振幅分辨率以及对象距离对物体检测准确性和计算成本的影响。我们开发了Yolov5(ra-Yolo)的分辨率 - 自适应变体,该变体基于输入图像的空间分辨率,它在特征金字塔和检测头中变化。为了训练和评估这种新方法,我们通过结合TJU和Eurocity数据集的图像来创建具有不同空间和振幅分辨率的图像数据集,并通过应用空间调整和压缩来生成不同的分辨率。我们首先表明Ra-Yolo在各种空间分辨率上实现了检测准确性和推理时间之间的良好权衡。然后,我们使用拟议的RA-YOLO模型评估空间和振幅分辨率对物体检测准确性的影响。我们证明,导致最高检测精度的最佳空间分辨率取决于“耐受性”图像大小。我们进一步评估了对象到摄像机对检测准确性的影响,并表明较高的空间分辨率可实现更大的检测范围。这些结果为选择图像空间分辨率和压缩设置提供了重要的指南,这些分辨率和压缩设置基于可用的带宽,存储,所需的推理时间和/或所需的检测范围,在实际应用中。
translated by 谷歌翻译
基于草图的3D形状检索(SBSR)是一项重要但艰巨的任务,近年来引起了越来越多的关注。现有方法在限制设置中解决了该问题,而无需适当模拟真实的应用程序方案。为了模仿现实的设置,在此曲目中,我们采用了不同级别的绘图技能的业余爱好者以及各种3D形状的大规模草图,不仅包括CAD型号,而且还可以从真实对象扫描的模型。我们定义了两个SBSR任务,并构建了两个基准,包括46,000多个CAD型号,1,700个现实型号和145,000个草图。四个团队参加了这一轨道,并为这两个任务提交了15次跑步,由7个常用指标评估。我们希望,基准,比较结果和开源评估法会在3D对象检索社区中促进未来的研究。
translated by 谷歌翻译
先进的可穿戴设备越来越多地利用高分辨率多摄像头系统。作为用于处理所得到的图像数据的最先进的神经网络是计算要求的,对于利用第五代(5G)无线连接和移动边缘计算,已经越来越感兴趣,以将该处理卸载到云。为了评估这种可能性,本文提出了一个详细的仿真和评估,用于5G无线卸载,用于对象检测,在一个名为Vis4ion的强大新型智能可穿戴物中,用于盲目损害(BVI)。目前的Vis4ion系统是一种具有高分辨率摄像机,视觉处理和触觉和音频反馈的仪表簿。本文认为将相机数据上载到移动边缘云以执行实时对象检测并将检测结果传输回可穿戴。为了确定视频要求,纸张评估视频比特率和分辨率对物体检测精度和范围的影响。利用与BVI导航相关的标记对象的新街道场景数据集进行分析。视觉评估与详细的全堆栈无线网络仿真结合,以确定吞吐量的分布和延迟,具有来自城市环境中的新高分辨率3D模型的实际导航路径和射线跟踪。为了比较,无线仿真考虑了标准的4G长期演进(LTE)载波和高速度5G毫米波(MMWAVE)载波。因此,该工作提供了对具有高带宽和低延迟要求的应用中的MMWAVE连接的边缘计算的彻底和现实评估。
translated by 谷歌翻译
本文研究了从观察数据学习因果关系的问题。我们用二进制图邻接矩阵参数化的形式重整结构方程模型(SEM),并显示,如果原始SEM是可识别的,则可以识别二进制邻接矩阵到真实因果图的超图在温和的条件下。然后,我们利用所述重新设计的SEM来开发一种因果结构学习方法,可以通过利用对非循环性和Gumbel-Softmax方法的平滑表征来实现基于梯度的优化来有效地接受训练,以近似于二进制邻接矩阵。发现获得的条目通常在零或一个附近,并且可以容易地阈值以识别边缘。我们对合成和实时数据集进行实验,以验证所提出的方法的有效性,并表明它容易包括不同的平滑模型功能,并在考虑大多数数据集中实现了大大提高的性能。
translated by 谷歌翻译
Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.
translated by 谷歌翻译
As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.
translated by 谷歌翻译
Increasing research interests focus on sequential recommender systems, aiming to model dynamic sequence representation precisely. However, the most commonly used loss function in state-of-the-art sequential recommendation models has essential limitations. To name a few, Bayesian Personalized Ranking (BPR) loss suffers the vanishing gradient problem from numerous negative sampling and predictionbiases; Binary Cross-Entropy (BCE) loss subjects to negative sampling numbers, thereby it is likely to ignore valuable negative examples and reduce the training efficiency; Cross-Entropy (CE) loss only focuses on the last timestamp of the training sequence, which causes low utilization of sequence information and results in inferior user sequence representation. To avoid these limitations, in this paper, we propose to calculate Cumulative Cross-Entropy (CCE) loss over the sequence. CCE is simple and direct, which enjoys the virtues of painless deployment, no negative sampling, and effective and efficient training. We conduct extensive experiments on five benchmark datasets to demonstrate the effectiveness and efficiency of CCE. The results show that employing CCE loss on three state-of-the-art models GRU4Rec, SASRec, and S3-Rec can reach 125.63%, 69.90%, and 33.24% average improvement of full ranking NDCG@5, respectively. Using CCE, the performance curve of the models on the test data increases rapidly with the wall clock time, and is superior to that of other loss functions in almost the whole process of model training.
translated by 谷歌翻译
Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications (i.e., phone unlocking) while lacking consideration of long-distance scenes (i.e., surveillance security checks). In order to promote relevant research and fill this gap in the community, we collect a large-scale Surveillance High-Fidelity Mask (SuHiFiMask) dataset captured under 40 surveillance scenes, which has 101 subjects from different age groups with 232 3D attacks (high-fidelity masks), 200 2D attacks (posters, portraits, and screens), and 2 adversarial attacks. In this scene, low image resolution and noise interference are new challenges faced in surveillance FAS. Together with the SuHiFiMask dataset, we propose a Contrastive Quality-Invariance Learning (CQIL) network to alleviate the performance degradation caused by image quality from three aspects: (1) An Image Quality Variable module (IQV) is introduced to recover image information associated with discrimination by combining the super-resolution network. (2) Using generated sample pairs to simulate quality variance distributions to help contrastive learning strategies obtain robust feature representation under quality variation. (3) A Separate Quality Network (SQN) is designed to learn discriminative features independent of image quality. Finally, a large number of experiments verify the quality of the SuHiFiMask dataset and the superiority of the proposed CQIL.
translated by 谷歌翻译
Unsupervised domain adaptation (UDA) via deep learning has attracted appealing attention for tackling domain-shift problems caused by distribution discrepancy across different domains. Existing UDA approaches highly depend on the accessibility of source domain data, which is usually limited in practical scenarios due to privacy protection, data storage and transmission cost, and computation burden. To tackle this issue, many source-free unsupervised domain adaptation (SFUDA) methods have been proposed recently, which perform knowledge transfer from a pre-trained source model to unlabeled target domain with source data inaccessible. A comprehensive review of these works on SFUDA is of great significance. In this paper, we provide a timely and systematic literature review of existing SFUDA approaches from a technical perspective. Specifically, we categorize current SFUDA studies into two groups, i.e., white-box SFUDA and black-box SFUDA, and further divide them into finer subcategories based on different learning strategies they use. We also investigate the challenges of methods in each subcategory, discuss the advantages/disadvantages of white-box and black-box SFUDA methods, conclude the commonly used benchmark datasets, and summarize the popular techniques for improved generalizability of models learned without using source data. We finally discuss several promising future directions in this field.
translated by 谷歌翻译