在机器学习(ML)社区中,低阶功能方差分析(FAROVA)模型以固有的可解释的机器学习为幌子。可解释的提升机或EBM(Lou等人,2013年)和Gami-Net(Yang等,2021)是最近提出的两种用于拟合功能性主要效应和二阶相互作用的ML算法。我们提出了一种称为Gami-Tree的新算法,类似于EBM,但具有许多可带来更好性能的功能。它使用基于模型的树作为基础学习者,并结合了一种新的交互过滤方法,可以更好地捕获基础交互。此外,我们的迭代训练方法会收敛到具有更好的预测性能的模型,并且嵌入式纯化确保相互作用在层次上是正交的,与主要效应是正交的。该算法不需要广泛的调整,我们的实施是快速有效的。我们使用模拟和真实数据集比较Gami-Tree与EBM和GAMI-NET的性能和解释性。
translated by 谷歌翻译
由于其理想的特性,与Shapley相关的技术已成为全球和局部解释工具的关注。但是,他们使用条件期望的计算在计算上是昂贵的。文献中建议的近似方法有局限性。本文提出了基于条件期望的基于替代模型的树来计算沙普利和塑造值。仿真研究表明,拟议的算法可提供准确性的提高,统一全球沙普利和外形解释,而阈值方法为折衷运行时间和准确性提供了一种方法。
translated by 谷歌翻译
尽管在时间序列重建的深度学习方法中取得了长足的进步,但由于其对优化损失的贡献可忽略不计,因此没有设计现有的方法来揭示具有微小信号强度的本地活动。但是,这种局部活动可以表示生理系统中重要的异常事件,例如额外的焦点触发心脏电波异常的传播。我们讨论了一种重建这种本地活动的新技术,尽管信号强度很小,但它是随后具有较大信号强度的全球活动的原因。我们的中心创新是通过明确建模并解开系统潜在的潜在隐藏内部干预措施的影响来解决此问题。在状态空间模型(SSM)的新型神经公式中,我们首先通过分别描述的相互作用的神经ODES系统引入潜在动力学的因果效应建模1)内部干预的连续时间动力学; 2)它对系统本地状态轨迹的影响。因为不能直接观察干预措施,而必须与观察到的后续效果脱离,所以我们整合了对系统的无天然干预动态的知识,并通过假设它是对实际观察到的差异来推断隐藏干预措施的推断和假设的无干预动态。我们证明了对重建异位焦点的提出框架的概念证明,从而破坏了从远程观察到正常心脏电气传播的过程。
translated by 谷歌翻译
标记级别的高清地图(HD地图)对自动驾驶汽车具有重要意义,尤其是在大规模,外观改变的情况下,自动驾驶汽车依靠标记来定位和车道来安全驾驶。在本文中,我们提出了一个高度可行的框架,用于使用简单的传感器设置(一个或多个单眼摄像机)自动构建标记级别的高清图。我们优化标记角的位置,以适合标记分割的结果,并同时优化相应摄像机的反视角映射(IPM)矩阵,以获得从前视图图像到鸟类视图(BEV)的准确转换。在定量评估中,构建的高清图几乎达到了百厘厘米级的准确性。优化的IPM矩阵的准确性与手动校准相似。该方法还可以概括以通过增加可识别标记的类型来从更广泛的意义上构建高清图。
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
The recent increase in public and academic interest in preserving biodiversity has led to the growth of the field of conservation technology. This field involves designing and constructing tools that utilize technology to aid in the conservation of wildlife. In this article, we will use case studies to demonstrate the importance of designing conservation tools with human-wildlife interaction in mind and provide a framework for creating successful tools. These case studies include a range of complexities, from simple cat collars to machine learning and game theory methodologies. Our goal is to introduce and inform current and future researchers in the field of conservation technology and provide references for educating the next generation of conservation technologists. Conservation technology not only has the potential to benefit biodiversity but also has broader impacts on fields such as sustainability and environmental protection. By using innovative technologies to address conservation challenges, we can find more effective and efficient solutions to protect and preserve our planet's resources.
translated by 谷歌翻译
Different people speak with diverse personalized speaking styles. Although existing one-shot talking head methods have made significant progress in lip sync, natural facial expressions, and stable head motions, they still cannot generate diverse speaking styles in the final talking head videos. To tackle this problem, we propose a one-shot style-controllable talking face generation framework. In a nutshell, we aim to attain a speaking style from an arbitrary reference speaking video and then drive the one-shot portrait to speak with the reference speaking style and another piece of audio. Specifically, we first develop a style encoder to extract dynamic facial motion patterns of a style reference video and then encode them into a style code. Afterward, we introduce a style-controllable decoder to synthesize stylized facial animations from the speech content and style code. In order to integrate the reference speaking style into generated videos, we design a style-aware adaptive transformer, which enables the encoded style code to adjust the weights of the feed-forward layers accordingly. Thanks to the style-aware adaptation mechanism, the reference speaking style can be better embedded into synthesized videos during decoding. Extensive experiments demonstrate that our method is capable of generating talking head videos with diverse speaking styles from only one portrait image and an audio clip while achieving authentic visual effects. Project Page: https://github.com/FuxiVirtualHuman/styletalk.
translated by 谷歌翻译
Decompilation aims to transform a low-level program language (LPL) (eg., binary file) into its functionally-equivalent high-level program language (HPL) (e.g., C/C++). It is a core technology in software security, especially in vulnerability discovery and malware analysis. In recent years, with the successful application of neural machine translation (NMT) models in natural language processing (NLP), researchers have tried to build neural decompilers by borrowing the idea of NMT. They formulate the decompilation process as a translation problem between LPL and HPL, aiming to reduce the human cost required to develop decompilation tools and improve their generalizability. However, state-of-the-art learning-based decompilers do not cope well with compiler-optimized binaries. Since real-world binaries are mostly compiler-optimized, decompilers that do not consider optimized binaries have limited practical significance. In this paper, we propose a novel learning-based approach named NeurDP, that targets compiler-optimized binaries. NeurDP uses a graph neural network (GNN) model to convert LPL to an intermediate representation (IR), which bridges the gap between source code and optimized binary. We also design an Optimized Translation Unit (OTU) to split functions into smaller code fragments for better translation performance. Evaluation results on datasets containing various types of statements show that NeurDP can decompile optimized binaries with 45.21% higher accuracy than state-of-the-art neural decompilation frameworks.
translated by 谷歌翻译
Driven by improved architectures and better representation learning frameworks, the field of visual recognition has enjoyed rapid modernization and performance boost in the early 2020s. For example, modern ConvNets, represented by ConvNeXt, have demonstrated strong performance in various scenarios. While these models were originally designed for supervised learning with ImageNet labels, they can also potentially benefit from self-supervised learning techniques such as masked autoencoders (MAE). However, we found that simply combining these two approaches leads to subpar performance. In this paper, we propose a fully convolutional masked autoencoder framework and a new Global Response Normalization (GRN) layer that can be added to the ConvNeXt architecture to enhance inter-channel feature competition. This co-design of self-supervised learning techniques and architectural improvement results in a new model family called ConvNeXt V2, which significantly improves the performance of pure ConvNets on various recognition benchmarks, including ImageNet classification, COCO detection, and ADE20K segmentation. We also provide pre-trained ConvNeXt V2 models of various sizes, ranging from an efficient 3.7M-parameter Atto model with 76.7% top-1 accuracy on ImageNet, to a 650M Huge model that achieves a state-of-the-art 88.9% accuracy using only public training data.
translated by 谷歌翻译
In this paper, we propose a novel framework dubbed peer learning to deal with the problem of biased scene graph generation (SGG). This framework uses predicate sampling and consensus voting (PSCV) to encourage different peers to learn from each other, improving model diversity and mitigating bias in SGG. To address the heavily long-tailed distribution of predicate classes, we propose to use predicate sampling to divide and conquer this issue. As a result, the model is less biased and makes more balanced predicate predictions. Specifically, one peer may not be sufficiently diverse to discriminate between different levels of predicate distributions. Therefore, we sample the data distribution based on frequency of predicates into sub-distributions, selecting head, body, and tail classes to combine and feed to different peers as complementary predicate knowledge during the training process. The complementary predicate knowledge of these peers is then ensembled utilizing a consensus voting strategy, which simulates a civilized voting process in our society that emphasizes the majority opinion and diminishes the minority opinion. This approach ensures that the learned representations of each peer are optimally adapted to the various data distributions. Extensive experiments on the Visual Genome dataset demonstrate that PSCV outperforms previous methods. We have established a new state-of-the-art (SOTA) on the SGCls task by achieving a mean of \textbf{31.6}.
translated by 谷歌翻译