Machine Translation Quality Estimation (QE) is the task of evaluating translation output in the absence of human-written references. Due to the scarcity of human-labeled QE data, previous works attempted to utilize the abundant unlabeled parallel corpora to produce additional training data with pseudo labels. In this paper, we demonstrate a significant gap between parallel data and real QE data: for QE data, it is strictly guaranteed that the source side is original texts and the target side is translated (namely translationese). However, for parallel data, it is indiscriminate and the translationese may occur on either source or target side. We compare the impact of parallel data with different translation directions in QE data augmentation, and find that using the source-original part of parallel corpus consistently outperforms its target-original counterpart. Moreover, since the WMT corpus lacks direction information for each parallel sentence, we train a classifier to distinguish source- and target-original bitext, and carry out an analysis of their difference in both style and domain. Together, these findings suggest using source-original parallel data for QE data augmentation, which brings a relative improvement of up to 4.0% and 6.4% compared to undifferentiated data on sentence- and word-level QE tasks respectively.
translated by 谷歌翻译
Vision Transformer (ViT) has emerged as a competitive alternative to convolutional neural networks for various computer vision applications. Specifically, ViT multi-head attention layers make it possible to embed information globally across the overall image. Nevertheless, computing and storing such attention matrices incurs a quadratic cost dependency on the number of patches, limiting its achievable efficiency and scalability and prohibiting more extensive real-world ViT applications on resource-constrained devices. Sparse attention has been shown to be a promising direction for improving hardware acceleration efficiency for NLP models. However, a systematic counterpart approach is still missing for accelerating ViT models. To close the above gap, we propose a first-of-its-kind algorithm-hardware codesigned framework, dubbed ViTALiTy, for boosting the inference efficiency of ViTs. Unlike sparsity-based Transformer accelerators for NLP, ViTALiTy unifies both low-rank and sparse components of the attention in ViTs. At the algorithm level, we approximate the dot-product softmax operation via first-order Taylor attention with row-mean centering as the low-rank component to linearize the cost of attention blocks and further boost the accuracy by incorporating a sparsity-based regularization. At the hardware level, we develop a dedicated accelerator to better leverage the resulting workload and pipeline from ViTALiTy's linear Taylor attention which requires the execution of only the low-rank component, to further boost the hardware efficiency. Extensive experiments and ablation studies validate that ViTALiTy offers boosted end-to-end efficiency (e.g., $3\times$ faster and $3\times$ energy-efficient) under comparable accuracy, with respect to the state-of-the-art solution.
translated by 谷歌翻译
最近的研究验证了心血管疾病(CVD)风险与视网膜眼底图像之间的关联。结合深度学习(DL)和便携式底面摄像机将在各种情况下实现CVD风险估计并改善医疗保健民主化。但是,仍然有重大问题要解决。首要问题最重要的是研究材料数据库与生产环境中样本之间的不同摄像头差异。大多数准备进行研究的高质量视网膜图数据库都是从高端底面摄像机中收集的,并且不同摄像机之间存在显着的域差异。为了充分探索域差异问题,我们首先收集了一个配对(FCP)的数据集,该数据集包含由高端TopCon视网膜摄像头捕获的配对底面图像和同一患者的低端Mediwork Portable fellus摄像头。然后,我们提出了一个跨外观特征对齐预训练方案和一个自发注意的摄像头适配器模块,以提高模型的鲁棒性。交叉效力特征对齐训练鼓励模型从同一患者的左右眼底图像中学习常识,并改善模型的概括。同时,设备适应模块学习了从目标域到源域的特征转换。我们对英国生物银行数据库和我们的FCP数据进行了全面的实验。实验结果表明,通过我们提出的方法,提高了CVD风险回归准确性和两个摄像头的结果一致性。该代码可在此处找到:\ url {https://github.com/linzhlalala/cvd-risk-lasike-base--on-retinal-fundus-images-images}
translated by 谷歌翻译
已知深神经网络(DNN)容易受到对抗性攻击的影响,即对输入的不可察觉的扰动可以误导DNN在清洁图像上培训,以制造错误的预测。为了解决这一目标,对抗性训练是目前最有效的防御方法,通过增强速度设定的训练,在飞行中产生的对抗样本。有趣的是,我们首次发现,在随机初始化的网络中,在没有任何模型训练的随机初始化网络中,第一次发现具有天生稳健性,匹配或超越对抗训练网络的强大准确性的鲁棒准确性,表明对模型权重的对抗训练不是对抗性鲁棒性不可或缺。我们命名为强大的临时票故障票(RST),也是自然效率的那种。不同于流行的彩票假设,既不需要培训原始密集的网络也不需要训练。为了验证和理解这种迷人的发现,我们进一步开展了广泛的实验,以研究不同模型,数据集,稀疏模式和攻击下RST的存在性和性质,绘制关于DNNS鲁棒性与其初始化/过度分辨率之间的关系的洞察。此外,我们确定从同一随机初始化的密集网络绘制的不同稀疏比率的RST之间的差的对抗性转移性,并提出了一种随机切换不同RST之间的随机切换的随机性,作为基于顶部的新型防御方法第一次。我们相信我们对RST的调查结果已经开辟了一个新的视角,以研究模型稳健性并扩大彩票假设。
translated by 谷歌翻译
在本文中,我们研究了在深网(DNS)中修剪的重要性,以及(1)修剪高度参数的DNS之间的Yin&Yang关系,这些DNS已从随机初始化训练,并且(2)培训“巧妙”的小型DNS,这些DNS已“巧妙”。初始化。在大多数情况下,从业者只能诉诸随机初始化,因此强烈需要对DN修剪建立扎实的理解。当前的文献在很大程度上仍然是经验的,缺乏对修剪如何影响DNS决策边界,如何解释修剪以及如何设计相应的原则修剪技术的理论理解。为了解决这些问题,我们建议在连续分段仿射(CPA)DNS的理论分析中采用最新进展。从这个角度来看,我们将能够检测到早期的鸟类(EB)票务现象,为当前的修剪技术提供可解释性,并制定有原则的修剪策略。在研究的每个步骤中,我们进行了广泛的实验,以支持我们的主张和结果;尽管我们的主要目标是增强对DN修剪的当前理解,而不是开发一种新的修剪方法,但我们的样条修剪标准在层和全球修剪方面与先进的修剪方法相当甚至超过了。
translated by 谷歌翻译
Existing federated classification algorithms typically assume the local annotations at every client cover the same set of classes. In this paper, we aim to lift such an assumption and focus on a more general yet practical non-IID setting where every client can work on non-identical and even disjoint sets of classes (i.e., client-exclusive classes), and the clients have a common goal which is to build a global classification model to identify the union of these classes. Such heterogeneity in client class sets poses a new challenge: how to ensure different clients are operating in the same latent space so as to avoid the drift after aggregation? We observe that the classes can be described in natural languages (i.e., class names) and these names are typically safe to share with all parties. Thus, we formulate the classification problem as a matching process between data representations and class representations and break the classification model into a data encoder and a label encoder. We leverage the natural-language class names as the common ground to anchor the class representations in the label encoder. In each iteration, the label encoder updates the class representations and regulates the data representations through matching. We further use the updated class representations at each round to annotate data samples for locally-unaware classes according to similarity and distill knowledge to local models. Extensive experiments on four real-world datasets show that the proposed method can outperform various classical and state-of-the-art federated learning methods designed for learning with non-IID data.
translated by 谷歌翻译
Existing measures and representations for trajectories have two longstanding fundamental shortcomings, i.e., they are computationally expensive and they can not guarantee the `uniqueness' property of a distance function: dist(X,Y) = 0 if and only if X=Y, where $X$ and $Y$ are two trajectories. This paper proposes a simple yet powerful way to represent trajectories and measure the similarity between two trajectories using a distributional kernel to address these shortcomings. It is a principled approach based on kernel mean embedding which has a strong theoretical underpinning. It has three distinctive features in comparison with existing approaches. (1) A distributional kernel is used for the very first time for trajectory representation and similarity measurement. (2) It does not rely on point-to-point distances which are used in most existing distances for trajectories. (3) It requires no learning, unlike existing learning and deep learning approaches. We show the generality of this new approach in three applications: (a) trajectory anomaly detection, (b) anomalous sub-trajectory detection, and (c) trajectory pattern mining. We identify that the distributional kernel has (i) a unique data-dependent property and the above uniqueness property which are the key factors that lead to its superior task-specific performance; and (ii) runtime orders of magnitude faster than existing distance measures.
translated by 谷歌翻译
Natural Language Processing (NLP) has been revolutionized by the use of Pre-trained Language Models (PLMs) such as BERT. Despite setting new records in nearly every NLP task, PLMs still face a number of challenges including poor interpretability, weak reasoning capability, and the need for a lot of expensive annotated data when applied to downstream tasks. By integrating external knowledge into PLMs, \textit{\underline{K}nowledge-\underline{E}nhanced \underline{P}re-trained \underline{L}anguage \underline{M}odels} (KEPLMs) have the potential to overcome the above-mentioned limitations. In this paper, we examine KEPLMs systematically through a series of studies. Specifically, we outline the common types and different formats of knowledge to be integrated into KEPLMs, detail the existing methods for building and evaluating KEPLMS, present the applications of KEPLMs in downstream tasks, and discuss the future research directions. Researchers will benefit from this survey by gaining a quick and comprehensive overview of the latest developments in this field.
translated by 谷歌翻译
Autonomous robotic surgery has advanced significantly based on analysis of visual and temporal cues in surgical workflow, but relational cues from domain knowledge remain under investigation. Complex relations in surgical annotations can be divided into intra- and inter-relations, both valuable to autonomous systems to comprehend surgical workflows. Intra- and inter-relations describe the relevance of various categories within a particular annotation type and the relevance of different annotation types, respectively. This paper aims to systematically investigate the importance of relational cues in surgery. First, we contribute the RLLS12M dataset, a large-scale collection of robotic left lateral sectionectomy (RLLS), by curating 50 videos of 50 patients operated by 5 surgeons and annotating a hierarchical workflow, which consists of 3 inter- and 6 intra-relations, 6 steps, 15 tasks, and 38 activities represented as the triplet of 11 instruments, 8 actions, and 16 objects, totaling 2,113,510 video frames and 12,681,060 annotation entities. Correspondingly, we propose a multi-relation purification hybrid network (MURPHY), which aptly incorporates novel relation modules to augment the feature representation by purifying relational features using the intra- and inter-relations embodied in annotations. The intra-relation module leverages a R-GCN to implant visual features in different graph relations, which are aggregated using a targeted relation purification with affinity information measuring label consistency and feature similarity. The inter-relation module is motivated by attention mechanisms to regularize the influence of relational features based on the hierarchy of annotation types from the domain knowledge. Extensive experimental results on the curated RLLS dataset confirm the effectiveness of our approach, demonstrating that relations matter in surgical workflow analysis.
translated by 谷歌翻译
Deep learning-based methods have achieved significant performance for image defogging. However, existing methods are mainly developed for land scenes and perform poorly when dealing with overwater foggy images, since overwater scenes typically contain large expanses of sky and water. In this work, we propose a Prior map Guided CycleGAN (PG-CycleGAN) for defogging of images with overwater scenes. To promote the recovery of the objects on water in the image, two loss functions are exploited for the network where a prior map is designed to invert the dark channel and the min-max normalization is used to suppress the sky and emphasize objects. However, due to the unpaired training set, the network may learn an under-constrained domain mapping from foggy to fog-free image, leading to artifacts and loss of details. Thus, we propose an intuitive Upscaling Inception Module (UIM) and a Long-range Residual Coarse-to-fine framework (LRC) to mitigate this issue. Extensive experiments on qualitative and quantitative comparisons demonstrate that the proposed method outperforms the state-of-the-art supervised, semi-supervised, and unsupervised defogging approaches.
translated by 谷歌翻译