本文介绍了持续的Weisfeiler-Lehman随机步行方案(缩写为PWLR),用于图形表示,这是一个新型的数学框架,可生成具有离散和连续节点特征的图形的可解释的低维表示。提出的方案有效地结合了归一化的Weisfeiler-Lehman程序,在图形上随机行走以及持续的同源性。因此,我们整合了图形的三个不同属性,即局部拓扑特征,节点度和全局拓扑不变,同时保留图形扰动的稳定性。这概括了Weisfeiler-Lehman过程的许多变体,这些变体主要用于嵌入具有离散节点标签的图形。经验结果表明,可以有效地利用这些表示形式与最新的技术产生可比较的结果,以分类具有离散节点标签的图形,并在对具有连续节点特征的人分类中增强性能。
translated by 谷歌翻译
我们提出了一个新的变压器模型,用于无监督学习骨架运动序列的任务。用于基于无监督骨骼的动作学习的现有变压器模型被了解到每个关节从相邻帧的瞬时速度没有全球运动信息。因此,该模型在学习全身运动和暂时遥远的关节方面的关注方面存在困难。此外,模型中尚未考虑人与人之间的互动。为了解决全身运动,远程时间动态和人与人之间的互动的学习,我们设计了一种全球和本地的注意机制,在其中,全球身体动作和本地关节运动相互关注。此外,我们提出了一种新颖的预处理策略,即多间隔姿势位移预测,以在不同的时间范围内学习全球和本地关注。提出的模型成功地学习了关节的局部动力学,并从运动序列中捕获了全局上下文。我们的模型优于代表性基准中明显边缘的最先进模型。代码可在https://github.com/boeun-kim/gl-transformer上找到。
translated by 谷歌翻译
类别不平衡数据的问题在于,由于少数类别的数据缺乏数据,分类器的泛化性能劣化。在本文中,我们提出了一种新的少数民族过度采样方法,通过利用大多数类作为背景图像的丰富背景来增加多元化的少数民族样本。为了使少数民族样本多样化,我们的主要思想是将前景补丁从少数级别粘贴到来自具有富裕环境的多数类的背景图像。我们的方法很简单,可以轻松地与现有的长尾识别方法结合。我们通过广泛的实验和消融研究证明了提出的过采样方法的有效性。如果没有任何架构更改或复杂的算法,我们的方法在各种长尾分类基准上实现了最先进的性能。我们的代码将在链接上公开提供。
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Thanks to the development of 2D keypoint detectors, monocular 3D human pose estimation (HPE) via 2D-to-3D uplifting approaches have achieved remarkable improvements. Still, monocular 3D HPE is a challenging problem due to the inherent depth ambiguities and occlusions. To handle this problem, many previous works exploit temporal information to mitigate such difficulties. However, there are many real-world applications where frame sequences are not accessible. This paper focuses on reconstructing a 3D pose from a single 2D keypoint detection. Rather than exploiting temporal information, we alleviate the depth ambiguity by generating multiple 3D pose candidates which can be mapped to an identical 2D keypoint. We build a novel diffusion-based framework to effectively sample diverse 3D poses from an off-the-shelf 2D detector. By considering the correlation between human joints by replacing the conventional denoising U-Net with graph convolutional network, our approach accomplishes further performance improvements. We evaluate our method on the widely adopted Human3.6M and HumanEva-I datasets. Comprehensive experiments are conducted to prove the efficacy of the proposed method, and they confirm that our model outperforms state-of-the-art multi-hypothesis 3D HPE methods.
translated by 谷歌翻译
Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
ICECUBE是一种用于检测1 GEV和1 PEV之间大气和天体中微子的光学传感器的立方公斤阵列,该阵列已部署1.45 km至2.45 km的南极的冰盖表面以下1.45 km至2.45 km。来自ICE探测器的事件的分类和重建在ICeCube数据分析中起着核心作用。重建和分类事件是一个挑战,这是由于探测器的几何形状,不均匀的散射和冰中光的吸收,并且低于100 GEV的光,每个事件产生的信号光子数量相对较少。为了应对这一挑战,可以将ICECUBE事件表示为点云图形,并将图形神经网络(GNN)作为分类和重建方法。 GNN能够将中微子事件与宇宙射线背景区分开,对不同的中微子事件类型进行分类,并重建沉积的能量,方向和相互作用顶点。基于仿真,我们提供了1-100 GEV能量范围的比较与当前ICECUBE分析中使用的当前最新最大似然技术,包括已知系统不确定性的影响。对于中微子事件分类,与当前的IceCube方法相比,GNN以固定的假阳性速率(FPR)提高了信号效率的18%。另外,GNN在固定信号效率下将FPR的降低超过8(低于半百分比)。对于能源,方向和相互作用顶点的重建,与当前最大似然技术相比,分辨率平均提高了13%-20%。当在GPU上运行时,GNN能够以几乎是2.7 kHz的中位数ICECUBE触发速率的速率处理ICECUBE事件,这打开了在在线搜索瞬态事件中使用低能量中微子的可能性。
translated by 谷歌翻译
本文回顾了AIM 2022上压缩图像和视频超级分辨率的挑战。这项挑战包括两条曲目。轨道1的目标是压缩图像的超分辨率,轨迹〜2靶向压缩视频的超分辨率。在轨道1中,我们使用流行的数据集DIV2K作为培训,验证和测试集。在轨道2中,我们提出了LDV 3.0数据集,其中包含365个视频,包括LDV 2.0数据集(335个视频)和30个其他视频。在这一挑战中,有12支球队和2支球队分别提交了赛道1和赛道2的最终结果。所提出的方法和解决方案衡量了压缩图像和视频上超分辨率的最先进。提出的LDV 3.0数据集可在https://github.com/renyang-home/ldv_dataset上找到。此挑战的首页是在https://github.com/renyang-home/aim22_compresssr。
translated by 谷歌翻译
在本文中,我们提出了一种使用CNN和变压器结构融合以提高图像分类性能的方法。对于CNN,可以很好地提取有关图像上局部区域的信息,但是限制了全局信息的提取。另一方面,变压器在相对全局的提取方面具有优势,但缺点是因为它需要大量的内存来进行本地特征值提取。在图像的情况下,它通过CNN转换为特征映射,每个特征映射的像素都被视为令牌。同时,将图像分为贴片区域,然后与将其视为令牌视图的变压器方法融合在一起。对于令牌与两个不同特征的融合,我们提出了三种方法:(1)具有平行结构的晚令融合,(2)早期令牌融合,(3)逐层中的令牌融合。在使用Imagenet 1K的实验中,提出的方法显示了最佳的分类性能。
translated by 谷歌翻译
磁共振图像的降解有益于提高低信噪比图像的质量。最近,使用深层神经网络进行DENOSING表现出了令人鼓舞的结果。但是,这些网络大多数都利用监督学习,这需要大量的噪声和清洁图像对的培训图像。获得训练图像,尤其是干净的图像,既昂贵又耗时。因此,已经开发了仅需要成对噪声浪费图像的噪声2Noise(N2N)之类的方法来减轻获得训练数据集的负担。在这项研究中,我们提出了一种新的自我监督的denoising方法Coil2Coil(C2C),该方法不需要获取干净的图像或配对的噪声浪费图像进行训练。取而代之的是,该方法利用了从分阶段阵列线圈中的多通道数据来生成训练图像。首先,它将多通道线圈图像分为两个图像,一个用于输入,另一个用于标签。然后,它们被处理以施加噪声独立性和敏感性归一化,以便它们可用于N2N的训练图像。为了推断,该方法输入了一个线圈组合的图像(例如DICOM图像),从而允许该方法的广泛应用。当使用合成噪声添加的图像进行评估时,C2C对几种自我监督方法显示了最佳性能,从而报告了与监督方法的可比结果。在测试DICOM图像时,C2C成功地将真实噪声降低,而没有显示误差图中的结构依赖性残差。由于不需要对清洁或配对图像进行额外扫描的显着优势,因此可以轻松地用于各种临床应用。
translated by 谷歌翻译