The lack of efficient segmentation methods and fully-labeled datasets limits the comprehensive assessment of optical coherence tomography angiography (OCTA) microstructures like retinal vessel network (RVN) and foveal avascular zone (FAZ), which are of great value in ophthalmic and systematic diseases evaluation. Here, we introduce an innovative OCTA microstructure segmentation network (OMSN) by combining an encoder-decoder-based architecture with multi-scale skip connections and the split-attention-based residual network ResNeSt, paying specific attention to OCTA microstructural features while facilitating better model convergence and feature representations. The proposed OMSN achieves excellent single/multi-task performances for RVN or/and FAZ segmentation. Especially, the evaluation metrics on multi-task models outperform single-task models on the same dataset. On this basis, a fully annotated retinal OCTA segmentation (FAROS) dataset is constructed semi-automatically, filling the vacancy of a pixel-level fully-labeled OCTA dataset. OMSN multi-task segmentation model retrained with FAROS further certifies its outstanding accuracy for simultaneous RVN and FAZ segmentation.
translated by 谷歌翻译
Because of the widespread existence of noise and data corruption, recovering the true regression parameters with a certain proportion of corrupted response variables is an essential task. Methods to overcome this problem often involve robust least-squares regression, but few methods perform well when confronted with severe adaptive adversarial attacks. In many applications, prior knowledge is often available from historical data or engineering experience, and by incorporating prior information into a robust regression method, we develop an effective robust regression method that can resist adaptive adversarial attacks. First, we propose the novel TRIP (hard Thresholding approach to Robust regression with sImple Prior) algorithm, which improves the breakdown point when facing adaptive adversarial attacks. Then, to improve the robustness and reduce the estimation error caused by the inclusion of priors, we use the idea of Bayesian reweighting to construct the more robust BRHT (robust Bayesian Reweighting regression via Hard Thresholding) algorithm. We prove the theoretical convergence of the proposed algorithms under mild conditions, and extensive experiments show that under different types of dataset attacks, our algorithms outperform other benchmark ones. Finally, we apply our methods to a data-recovery problem in a real-world application involving a space solar array, demonstrating their good applicability.
translated by 谷歌翻译
Molecular representation learning is crucial for the problem of molecular property prediction, where graph neural networks (GNNs) serve as an effective solution due to their structure modeling capabilities. Since labeled data is often scarce and expensive to obtain, it is a great challenge for GNNs to generalize in the extensive molecular space. Recently, the training paradigm of "pre-train, fine-tune" has been leveraged to improve the generalization capabilities of GNNs. It uses self-supervised information to pre-train the GNN, and then performs fine-tuning to optimize the downstream task with just a few labels. However, pre-training does not always yield statistically significant improvement, especially for self-supervised learning with random structural masking. In fact, the molecular structure is characterized by motif subgraphs, which are frequently occurring and influence molecular properties. To leverage the task-related motifs, we propose a novel paradigm of "pre-train, prompt, fine-tune" for molecular representation learning, named molecule continuous prompt tuning (MolCPT). MolCPT defines a motif prompting function that uses the pre-trained model to project the standalone input into an expressive prompt. The prompt effectively augments the molecular graph with meaningful motifs in the continuous representation space; this provides more structural patterns to aid the downstream classifier in identifying molecular properties. Extensive experiments on several benchmark datasets show that MolCPT efficiently generalizes pre-trained GNNs for molecular property prediction, with or without a few fine-tuning steps.
translated by 谷歌翻译
Adversarial attacks can easily fool object recognition systems based on deep neural networks (DNNs). Although many defense methods have been proposed in recent years, most of them can still be adaptively evaded. One reason for the weak adversarial robustness may be that DNNs are only supervised by category labels and do not have part-based inductive bias like the recognition process of humans. Inspired by a well-known theory in cognitive psychology -- recognition-by-components, we propose a novel object recognition model ROCK (Recognizing Object by Components with human prior Knowledge). It first segments parts of objects from images, then scores part segmentation results with predefined human prior knowledge, and finally outputs prediction based on the scores. The first stage of ROCK corresponds to the process of decomposing objects into parts in human vision. The second stage corresponds to the decision process of the human brain. ROCK shows better robustness than classical recognition models across various attack settings. These results encourage researchers to rethink the rationality of currently widely-used DNN-based object recognition models and explore the potential of part-based models, once important but recently ignored, for improving robustness.
translated by 谷歌翻译
We present a method for introducing a text encoder into pre-trained end-to-end speech translation systems. It enhances the ability of adapting one modality (i.e., source-language speech) to another (i.e., source-language text). Thus, the speech translation model can learn from both unlabeled and labeled data, especially when the source-language text data is abundant. Beyond this, we present a denoising method to build a robust text encoder that can deal with both normal and noisy text data. Our system sets new state-of-the-arts on the MuST-C En-De, En-Fr, and LibriSpeech En-Fr tasks.
translated by 谷歌翻译
我们提出了一个新颖的范式,该范式是通过单眼视频输入来构建可动画的3D人类代表,以便可以以任何看不见的姿势和观点呈现。我们的方法基于由基于网格的参数3D人类模型操纵的动态神经辐射场(NERF),该模型用作几何代理。以前的方法通常依靠多视频视频或准确的3D几何信息作为其他输入;此外,大多数方法在概括地看不见的姿势时会降解质量。我们确定概括的关键是查询动态NERF的良好输入嵌入:良好的输入嵌入应定义完整量化空间中的注入映射,并在姿势变化下表面网格变形引导。基于此观察结果,我们建议将输入查询嵌入其与局部表面区域的关系,并在网格顶点上跨越一组地球的最近邻居跨越。通过包括位置和相对距离信息,我们的嵌入式定义了距离保存的变形映射,并可以很好地概括为看不见的姿势。为了减少对其他输入的依赖性,我们首先使用现成的工具初始化人均3D网格,然后提出一条管道以共同优化NERF并完善初始网格。广泛的实验表明,我们的方法可以在看不见的姿势和观点下合成合理的人类渲染结果。
translated by 谷歌翻译
机器学习构成了严重的隐私问题,因为这表明学识渊博的模型可以揭示有关其培训数据的敏感信息。许多作品已经调查了广泛补习的数据增强(DA)和对抗性培训(AT)技术的影响,这些技术在论文中称为数据增强对机器学习模型的隐私泄漏的影响。这种隐私效应通常是通过成员推理攻击(MIA)来衡量的,旨在确定特定例子是否属于培训集。我们建议从称为记忆的新角度调查隐私。通过记忆的镜头,我们发现先前部署的MIA会产生误导性结果,因为与具有低隐私风险的样本相比,它们不太可能将具有较高隐私风险的样本识别为较高的隐私风险样本。为了解决这个问题,我们部署了最近的攻击,该攻击可以捕获单个样本的记忆度以进行评估。通过广泛的实验,我们提出了关于机器学习模型的三个重要属性(包括隐私,泛化差距和对抗性鲁棒性)之间连接的非平凡发现。我们证明,与现有结果不同,概括差距与隐私泄漏没有高度关联。此外,更强的对抗性鲁棒性并不一定意味着该模型更容易受到隐私攻击的影响。
translated by 谷歌翻译
视觉关系检测旨在检测图像中对象之间的相互作用。但是,由于对象和相互作用的多样性,此任务遭受了组合爆炸的影响。由于与同一对象相关的相互作用是依赖的,因此我们探讨了相互作用的依赖性以减少搜索空间。我们通过交互图明确地对象和交互对象进行建模,然后提出一种消息式风格的算法来传播上下文信息。因此,我们称为建议的方法神经信息传递(NMP)。我们进一步整合了语言先验和空间线索,以排除不切实际的互动并捕获空间互动。两个基准数据集的实验结果证明了我们提出的方法的优越性。我们的代码可在https://github.com/phyllish/nmp上找到。
translated by 谷歌翻译
对象探测器对于许多现代计算机视觉应用至关重要。但是,即使是最新的对象探测器也不是完美的。在两个看起来与人眼类似的图像上,同一探测器可以做出不同的预测,因为摄像机传感器噪声和照明变化等小图像变形。这个问题称为不一致。现有的准确性指标不能正确解释不一致的情况,并且在该领域的类似工作仅针对人造图像扭曲的改善。因此,我们提出了一种使用非人工视频框架来测量对象检测一致性,随着时间的流逝,跨帧的方法来测量对象检测一致性。使用此方法,我们表明,来自多个对象跟踪挑战的不同视频数据集,现代对象检测器的一致性范围从83.2%至97.1%。最后,我们表明应用图像失真校正(例如.WEBP图像压缩和UNSHARP遮罩)可以提高一致性多达5.1%,而准确性没有损失。
translated by 谷歌翻译
通常使用卷积神经网络(CNN)进行计算机视觉。 CNN是计算密集型的,并且在移动和互联网(IoT)设备等电力控制系统上部署。 CNN是计算密集型的,因为它们不加选择地计算输入图像的所有像素上的许多特征。我们观察到,鉴于计算机视觉任务,图像通常包含与任务无关的像素。例如,如果任务正在寻找汽车,那么天空中的像素不是很有用。因此,我们建议对CNN进行修改以仅在相关像素上操作以节省计算和能量。我们提出了一种研究三个流行的计算机视觉数据集的方法,发现48%的像素无关紧要。我们还提出了重点卷积,以修改CNN的卷积层,以拒绝明显无关的像素。在嵌入式设备上,我们没有观察到准确性的损失,而推论潜伏期,能耗和倍增add计数均减少了约45%。
translated by 谷歌翻译