智能论文笔记

A Simplistic and Cost-Effective Design for Real-World Development of an Ambient Assisted Living System for Fall Detection and Indoor Localization: Proof of Concept

Nirmalya Thakur , Chia Y. Han

分类：人工智能 | 计算机视觉 | 机器学习

2022-07-24

跌倒在不断增加的全球老龄化人口中非常普遍，可能会对他们的健康，福祉和生活质量产生各种负面影响，包括限制他们进行日常生活活动（ADL）的能力，这对于这对于对此至关重要，这对一个人的寄托。跌倒期间的及时协助是非常必要的，这涉及跟踪老年人在与ADL相关的多样化导航模式中的室内位置，以检测跌倒的精确位置。随着全球护理人员人数的减少，重要的是，智能生活环境的未来可以在ADL期间发现下降，同时能够跟踪老年人在现实世界中的室内位置。为了应对这些挑战，这项工作为环境辅助生活系统提出了一种具有成本效益和简单的设计范式，该系统可以在ADL期间捕获用户行为的多模式组成部分，这是在现实世界中同时以现实世界的方式执行秋季检测和室内定位所必需的。。提出了来自现实世界实验的概念证明，以维护系统的有效工作。还提出了两项与先前作品的比较研究的发现，以维护这项工作的新颖性。第一个比较研究表明，在其软件设计和硬件设计的有效性方面，提出的系统在室内定位和跌倒检测领域中如何优于先前的验证领域。第二项比较研究表明，与这些领域的先前作品相比，该系统的开发成本最少，这些领域涉及下划线系统的现实开发，从而维护其具有成本效益的性质。

translated by 谷歌翻译

Indoor Localization for Personalized Ambient Assisted Living of Multiple Users in Multi-Floor Smart Environments

Nirmalya Thakur , Chia Y. Han

分类：人工智能 | 计算机视觉 | 机器学习

2022-07-19

本文提出了一个多功能的跨学科框架，为个性化的环境辅助生活做出了四项科学贡献，其特定重点是满足智能生活环境未来各种衰老人群的不同和动态需求。首先，它提出了一种基于概率推理的数学方法，以对这些环境中多个用户的用户多样性产生的任何活动建模所有可能的用户交互形式。其次，它提出了一种系统，该系统将这种方法与机器学习方法一起使用，以建模单个用户配置文件和特定用户的用户交互，以检测每个特定用户的动态室内位置。第三，为了满足开发高度准确的室内本地化系统以增加信任，依赖和无缝的用户接受，该框架引入了一种新颖的方法，其中两种增强方法梯度增强和Adaboost算法都集成并用于基于决策树的基于决策树学习模型以执行室内定位。第四，该框架引入了两个新型功能，以在检测每个用户的特定地点的位置以及跟踪特定用户是否位于基于多层室内的特定空间区域内还是外部，以提供室内本地化的语义上下文。环境。这些新型框架的新功能是在与本地化相关的大数据数据集中测试的，这些数据集从18个不同的用户收集的数据集中，这些用户在3个建筑物中导航，该建筑物由5层和254个室内空间区域组成。结果表明，与对普通用户建模的传统方法相比，对每个特定用户建模的个性化AAL的这种室内定位方法始终达到更高的准确性。

translated by 谷歌翻译

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Teven Le Scao , Angela Fan , Christopher Akiki , Ellie Pavlick , Suzana Ilić , Daniel Hesslow , Roman Castagné , Alexandra Sasha Luccioni , François Yvon , Matthias Gallé

分类：自然语言处理

2022-11-09

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.

translated by 谷歌翻译

LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

Rui Han , Qinglong Zhang , Chi Harold Liu , Guoren Wang , Jian Tang , Lydia Y. Chen

分类：计算机视觉

2021-12-18

深度神经网络（DNN）已成为移动和嵌入式系统中的普遍存在的技术，用于图像/对象识别和分类。执行多个DNN的趋势同时加剧了资源受限移动设备上满足严格延迟/准确性要求的现有限制。现有技术通过根据资源动态缩放模型大小来探索精度资源权衡的光。然而，这种模型缩放方法接近迫在眉睫的挑战：（i）模型尺寸的大空间探索，（ii）对不同模型组合的培训时间非常长。在本文中，我们介绍了Legodnn，一种用于在移动视觉系统中运行多DNN工作负载的轻质块粒度缩放解决方案。 Legodnn仅通过在DNN中提取和培训少数常见块（例如，在VGG和RENET中的VGG和8中的8中）来保证短模型培训时间。在运行时，Legodnn最佳地结合了这些块的后代模型，以最大限度地在特定资源和延迟约束下最大限度地提高精度，同时通过DNN的智能块级缩放来降低切换开销。我们在Tensorflow Lite中实现Legodnn，并通过一组普遍的DNN模型，广泛地评估了最先进的技术（浮标缩放，知识蒸馏和模型压缩）。评估结果表明，乐高达在模型尺寸下提供了1,296倍至279,936倍，而在不增加训练时间的情况下，推断准确性的提高高达31.74％，降低缩放能耗减少了71.07％。

translated by 谷歌翻译

Survey Descent: A Multipoint Generalization of Gradient Descent for Nonsmooth Optimization

X. Y. Han , Adrian S. Lewis

分类：机器学习

2021-11-30

对于光滑的强凸目标，梯度下降的经典理论可确保相对于梯度评估的数量的线性收敛。一个类似的非球形理论是具有挑战性的：即使目标在每一次迭代的目标流畅时，相应的本地模型也是不稳定的，传统的补救措施需要不可预测的许多切割平面。我们提出了对局部优化的梯度下降迭代的多点概括。虽然设计了一般目标，但我们受到“最大平滑”模型的动机，可在最佳状态下捕获子样本维度。当目标本身自象最大的情况时，我们证明了线性融合，并且实验表明了更普遍的现象。

translated by 谷歌翻译

TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models

Sucheng Ren , Fangyun Wei , Zheng Zhang , Han Hu

分类：计算机视觉

2023-01-03

Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.

translated by 谷歌翻译

Reference Twice: A Simple and Unified Baseline for Few-Shot Instance Segmentation

Yue Han , Jiangning Zhang , Zhucun Xue , Chao Xu , Xintian Shen , Yabiao Wang , Chengjie Wang , Yong Liu , Xiangtai Li

分类：计算机视觉

2023-01-03

Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.

translated by 谷歌翻译

Muse: Text-To-Image Generation via Masked Generative Transformers

Huiwen Chang , Han Zhang , Jarred Barber , AJ Maschinot , Jose Lezama , Lu Jiang , Ming-Hsuan Yang , Kevin Murphy , William T. Freeman , Michael Rubinstein

分类：计算机视觉 | 人工智能 | 机器学习

2023-01-02

We present Muse, a text-to-image Transformer model that achieves state-of-the-art image generation performance while being significantly more efficient than diffusion or autoregressive models. Muse is trained on a masked modeling task in discrete token space: given the text embedding extracted from a pre-trained large language model (LLM), Muse is trained to predict randomly masked image tokens. Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding. The use of a pre-trained LLM enables fine-grained language understanding, translating to high-fidelity image generation and the understanding of visual concepts such as objects, their spatial relationships, pose, cardinality etc. Our 900M parameter model achieves a new SOTA on CC3M, with an FID score of 6.06. The Muse 3B parameter model achieves an FID of 7.88 on zero-shot COCO evaluation, along with a CLIP score of 0.32. Muse also directly enables a number of image editing applications without the need to fine-tune or invert the model: inpainting, outpainting, and mask-free editing. More results are available at https://muse-model.github.io

translated by 谷歌翻译

Conditional Diffusion Based on Discrete Graph Structures for Molecular Graph Generation

Han Huang , Leilei Sun , Bowen Du , Weifeng Lv

分类：机器学习

2023-01-01

Learning the underlying distribution of molecular graphs and generating high-fidelity samples is a fundamental research problem in drug discovery and material science. However, accurately modeling distribution and rapidly generating novel molecular graphs remain crucial and challenging goals. To accomplish these goals, we propose a novel Conditional Diffusion model based on discrete Graph Structures (CDGS) for molecular graph generation. Specifically, we construct a forward graph diffusion process on both graph structures and inherent features through stochastic differential equations (SDE) and derive discrete graph structures as the condition for reverse generative processes. We present a specialized hybrid graph noise prediction model that extracts the global context and the local node-edge dependency from intermediate graph states. We further utilize ordinary differential equation (ODE) solvers for efficient graph sampling, based on the semi-linear structure of the probability flow ODE. Experiments on diverse datasets validate the effectiveness of our framework. Particularly, the proposed method still generates high-quality molecular graphs in a limited number of steps.

translated by 谷歌翻译

Tracing the Origin of Adversarial Attack for Forensic Investigation and Deterrence

Han Fang , Jiyi Zhang , Yupeng Qiu , Ke Xu , Chengfang Fang , Ee-Chien Chang

分类：计算机视觉 | 机器学习

2022-12-31

Deep neural networks are vulnerable to adversarial attacks. In this paper, we take the role of investigators who want to trace the attack and identify the source, that is, the particular model which the adversarial examples are generated from. Techniques derived would aid forensic investigation of attack incidents and serve as deterrence to potential attacks. We consider the buyers-seller setting where a machine learning model is to be distributed to various buyers and each buyer receives a slightly different copy with same functionality. A malicious buyer generates adversarial examples from a particular copy $\mathcal{M}_i$ and uses them to attack other copies. From these adversarial examples, the investigator wants to identify the source $\mathcal{M}_i$. To address this problem, we propose a two-stage separate-and-trace framework. The model separation stage generates multiple copies of a model for a same classification task. This process injects unique characteristics into each copy so that adversarial examples generated have distinct and traceable features. We give a parallel structure which embeds a ``tracer'' in each copy, and a noise-sensitive training loss to achieve this goal. The tracing stage takes in adversarial examples and a few candidate models, and identifies the likely source. Based on the unique features induced by the noise-sensitive loss function, we could effectively trace the potential adversarial copy by considering the output logits from each tracer. Empirical results show that it is possible to trace the origin of the adversarial example and the mechanism can be applied to a wide range of architectures and datasets.

translated by 谷歌翻译