智能论文笔记

A workflow for segmenting soil and plant X-ray CT images with deep learning in Googles Colaboratory

Devin A. Rippner , Pranav Raja , J. Mason Earles , Alexander Buchko , Mina Momayyezi , Fiona Duong , Dilworth Parkinson , Elizabeth Forrestel , Ken Shackel , Jeffrey Neyhart

分类：计算机视觉

2022-03-18

X射线微型计算机断层扫描（X射线Microct）已使以微米尺度上的植物和土壤中发生的特性和过程表征。尽管这种高级技术广泛使用，但硬件和软件的主要限制都限制了图像处理和数据分析的速度和准确性。机器学习的最新进展，特别是将卷积神经网络应用于图像分析的应用，已实现了图像数据的快速而准确的分割。然而，在将卷积神经网络应用于环境和农业相关图像的分析中仍然存在挑战。具体而言，计算机科学家和工程师，构建这些AI/ML工具的工程师与农业研究中潜在的最终用户之间存在脱节，他们可能不确定如何在其工作中应用这些工具。此外，与传统的计算系统相比，培训和应用深度学习模型所需的计算资源是独特的，对计算机游戏系统或图形设计工作更为常见。为了应对这些挑战，我们开发了一个模块化工作流程，用于使用Googles Colaboragoration Web应用程序中的低成本资源，将卷积神经网络应用于X射线Microct图像。在这里，我们介绍了工作流的结果，说明了如何使用核桃叶，杏仁花芽和土壤骨料的示例扫描来优化参数以获得最佳结果。我们预计该框架将加速植物和土壤科学中新兴的深度学习技术的采用和使用。

translated by 谷歌翻译

Shakebot: A Low-cost, Open-source Shake Table for Ground Motion Seismic Studies

Zhiang Chen , Devin Keating , Yash Shethwala , Aravind Adhith Pandian Saravanakumaran , Ramon Arrowsmith , Chris Madugo , Albert Kottke , Jnaneshwar Das

分类：机器人

2022-12-21

Our earlier research built a virtual shake robot in simulation to study the dynamics of precariously balanced rocks (PBR), which are negative indicators of earthquakes in nature. The simulation studies need validation through physical experiments. For this purpose, we developed Shakebot, a low-cost (under $2,000), open-source shake table to validate simulations of PBR dynamics and facilitate other ground motion experiments. The Shakebot is a custom one-dimensional prismatic robotic system with perception and motion software developed using the Robot Operating System (ROS). We adapted affordable and high-accuracy components from 3D printers, particularly a closed-loop stepper motor for actuation and a toothed belt for transmission. The stepper motor enables the bed to reach a maximum horizontal acceleration of 11.8 m/s^2 (1.2 g), and velocity of 0.5 m/s, when loaded with a 2 kg scale-model PBR. The perception system of the Shakebot consists of an accelerometer and a high frame-rate camera. By fusing camera-based displacements with acceleration measurements, the Shakebot is able to carry out accurate bed velocity estimation. The ROS-based perception and motion software simplifies the transition of code from our previous virtual shake robot to the physical Shakebot. The reuse of the control programs ensures that the implemented ground motions are consistent for both the simulation and physical experiments, which is critical to validate our simulation experiments.

translated by 谷歌翻译

Studying Bias in GANs through the Lens of Race

Vongani H. Maluleke , Neerja Thakkar , Tim Brooks , Ethan Weber , Trevor Darrell , Alexei A. Efros , Angjoo Kanazawa , Devin Guillory

分类：计算机视觉 | 机器学习

2022-09-06

在这项工作中，我们研究了生成图像模型的性能和评估如何受到其培训数据集的种族组成的影响。通过检查和控制各种培训数据集中的种族分布，我们能够观察不同培训分布对生成的图像质量和生成图像的种族分布的影响。我们的结果表明，生成的图像的种族组成成功地保留了培训数据。但是，我们观察到截断是一种用于在推断过程中生成更高质量图像的技术，加剧了数据中的种族失衡。最后，在检查图像质量与种族之间的关系时，我们发现给定种族的最高可感知的视觉质量图像来自该种族代表性很好的分布，并且注释者始终偏爱白人的生成图像，而不是黑人。

translated by 谷歌翻译

Learning-based Monocular 3D Reconstruction of Birds: A Contemporary Survey

Seyed Mojtaba Marvasti-Zadeh , Mohammad N. S. Jahromi , Javad Khaghanix , Devin Goodsman , Nilanjan Ray , Nadir Erbilgin

分类：计算机视觉

2022-07-10

在自然界中，动物的集体行为（例如飞鸟）由同一物种的个体之间的相互作用主导。但是，对鸟类物种中这种行为的研究是一个复杂的过程，即人类无法使用常规的视觉观察技术（例如自然界的焦点采样）进行。对于鸟类等社会动物，群体形成的机制可以帮助生态学家了解社交线索及其视觉特征随着时间的流逝（例如姿势和形状）之间的关系。但是，恢复飞行鸟类的不同姿势和形状是一个极具挑战性的问题。解决此瓶颈的一种广泛的解决方案是将姿势和形状从2D图像提取到3D对应关系。 3D视觉的最新进展导致了关于3D形状和姿势估计的许多令人印象深刻的作品，每项作品都有不同的利弊。据我们所知，这项工作是首次尝试概述基于单眼视觉的3D鸟重建的最新进展，使计算机视觉和生物学研究人员概述了现有方法，并比较其特征。

translated by 谷歌翻译

CNN Architectures for Large-Scale Audio Classification

Shawn Hershey , Sourish Chaudhuri , Daniel P. W. Ellis , Jort F. Gemmeke , Aren Jansen , R. Channing Moore , Manoj Plakal , Devin Platt , Rif A. Saurous , Bryan Seybold

分类：

2016-09-29

Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio. We use various CNN architectures to classify the soundtracks of a dataset of 70M training videos (5.24 million hours) with 30,871 video-level labels. We examine fully connected Deep Neural Networks (DNNs), AlexNet [1], VGG [2], Inception [3], and ResNet [4]. We investigate varying the size of both training set and label vocabulary, finding that analogs of the CNNs used in image classification do well on our audio classification task, and larger training and label sets help up to a point. A model using embeddings from these classifiers does much better than raw features on the Audio Set [5] Acoustic Event Detection (AED) classification task.

translated by 谷歌翻译

A Tutorial on Parametric Variational Inference

Jens Sjölund

分类： (统计)机器学习 | 机器学习

2023-01-03

Variational inference uses optimization, rather than integration, to approximate the marginal likelihood, and thereby the posterior, in a Bayesian model. Thanks to advances in computational scalability made in the last decade, variational inference is now the preferred choice for many high-dimensional models and large datasets. This tutorial introduces variational inference from the parametric perspective that dominates these recent developments, in contrast to the mean-field perspective commonly found in other introductory texts.

translated by 谷歌翻译

A Survey On Few-shot Knowledge Graph Completion with Structural and Commonsense Knowledge

Haodi Ma , Daisy Zhe Wang

分类：自然语言处理 | 人工智能 | 机器学习

2023-01-03

Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.

translated by 谷歌翻译

Reference Twice: A Simple and Unified Baseline for Few-Shot Instance Segmentation

Yue Han , Jiangning Zhang , Zhucun Xue , Chao Xu , Xintian Shen , Yabiao Wang , Chengjie Wang , Yong Liu , Xiangtai Li

分类：计算机视觉

2023-01-03

Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.

translated by 谷歌翻译

I2F: A Unified Image-to-Feature Approach for Domain Adaptive Semantic Segmentation

Haoyu Ma , Xiangru Lin , Yizhou Yu

分类：计算机视觉

2023-01-03

Unsupervised domain adaptation (UDA) for semantic segmentation is a promising task freeing people from heavy annotation work. However, domain discrepancies in low-level image statistics and high-level contexts compromise the segmentation performance over the target domain. A key idea to tackle this problem is to perform both image-level and feature-level adaptation jointly. Unfortunately, there is a lack of such unified approaches for UDA tasks in the existing literature. This paper proposes a novel UDA pipeline for semantic segmentation that unifies image-level and feature-level adaptation. Concretely, for image-level domain shifts, we propose a global photometric alignment module and a global texture alignment module that align images in the source and target domains in terms of image-level properties. For feature-level domain shifts, we perform global manifold alignment by projecting pixel features from both domains onto the feature manifold of the source domain; and we further regularize category centers in the source domain through a category-oriented triplet loss and perform target domain consistency regularization over augmented target domain images. Experimental results demonstrate that our pipeline significantly outperforms previous methods. In the commonly tested GTA5$\rightarrow$Cityscapes task, our proposed method using Deeplab V3+ as the backbone surpasses previous SOTA by 8%, achieving 58.2% in mIoU.

translated by 谷歌翻译

MGTAB: A Multi-Relational Graph-Based Twitter Account Detection Benchmark

Shuhao Shi , Kai Qiao , Jian Chen , Shuai Yang , Jie Yang , Baojie Song , Linyuan Wang , Bin Yan

分类：计算机视觉

2023-01-03

The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.

translated by 谷歌翻译