智能论文笔记

Spatial-Temporal Deep Embedding for Vehicle Trajectory Reconstruction from High-Angle Video

Tianya T. Zhang Ph. D. , Peter J. Jin Ph. D. , Han Zhou , Benedetto Piccoli , Ph. D

分类：计算机视觉 | 人工智能

2022-09-17

基于时空的图（STMAP）方法显示出为车辆轨迹重建处理高角度视频的巨大潜力，可以满足各种数据驱动的建模和模仿学习应用的需求。在本文中，我们开发了时空深嵌入（STDE）模型，该模型在像素和实例水平上施加了平等约束，以生成用于STMAP上车辆条纹分割的实例感知嵌入。在像素级别上，每个像素在不同范围的8-邻居像素进行编码，随后使用该编码来指导神经网络学习嵌入机制。在实例级别上，歧视性损耗函数被设计为将属于同一实例的像素更接近，并将不同实例的平均值分开。然后，通过静脉 - 沃特算法算法优化时空亲和力的输出，以获得最终的聚类结果。基于分割指标，我们的模型优于其他五个用于STMAP处理的基线，并在阴影，静态噪声和重叠的影响下显示出稳健性。该设计的模型用于处理所有公共NGSIM US-101视频，以生成完整的车辆轨迹，表明具有良好的可扩展性和适应性。最后但并非最不重要的一点是，讨论了带有STDE和未来方向的扫描线方法的优势。代码，STMAP数据集和视频轨迹在在线存储库中公开可用。 github链接：shorturl.at/jklt0。

translated by 谷歌翻译

Big Data Analytics for Network Level Short-Term Travel Time Prediction with Hierarchical LSTM and Attention

Tianya T. Zhang , Ying Ye

分类：机器学习 | 人工智能

2022-01-15

从广泛的流量监视传感器收集的旅行时间数据需要大数据分析工具来查询，可视化和识别有意义的流量模式。本文利用了Caltrans性能测量系统（PEMS）系统的大规模旅行时间数据集，该系统是传统数据处理和建模工具的溢出。为了克服大量数据的挑战，大数据分析引擎Apache Spark和Apache MXNET用于数据争吵和建模。进行季节性和自相关以探索和可视化时变数据的趋势。受到许多人工智能（AI）任务的层次结构成功的启发，我们巩固了细胞和隐藏状态，从低级到高级LSTM传递，其注意力集中在类似于人类感知系统的运作方式上。设计的分层LSTM模型可以在不同的时间尺度上考虑依赖项，以捕获网络级别旅行时间的时空相关性。然后，设计了另一个自我发场模块，以将LSTM提取的功能连接到完全连接的层，从而预测所有走廊的旅行时间，而不是单个链接/路线。比较结果表明，层次的LSTM引起注意（HIERLSTMAT）模型在30分钟和45分钟的视野时给出了最佳的预测结果，并且可以成功预测不寻常的拥塞。通过将它们与流行的数据科学和深度学习框架进行比较，从大数据分析工具中得出的效率得到了评估。

translated by 谷歌翻译

Spatial-Temporal Map Vehicle Trajectory Detection Using Dynamic Mode Decomposition and Res-UNet+ Neural Networks

Tianya T. Zhang , Peter J. Jin

分类：计算机视觉

2022-01-13

本文提出了一种机器学习增强的纵向扫描线方法，用于从大角度交通摄像机中提取车辆轨迹。通过将空间颞映射（STMAP）分解到稀疏前景和低秩背景，应用动态模式分解（DMD）方法来提取车辆股线。通过调整两个普遍的深度学习架构，设计了一个名为Res-Unet +的深神经网络。 RES-UNET +神经网络显着提高了基于STMAP的车辆检测的性能，DMD模型提供了许多有趣的见解，了解由Stmap保留的潜在空间结构的演变。与先前的图像处理模型和主流语义分割深神经网络进行比较模型输出。经过彻底的评估后，证明该模型对许多具有挑战性的因素来说是准确和强大的。最后但并非最不重要的是，本文从根本上解决了NGSIM轨迹数据中发现了许多质量问题。清除清洁的高质量轨迹数据，以支持交通流量和微观车辆控制的未来理论和建模研究。该方法是用于基于视频的轨迹提取的可靠解决方案，并且具有广泛的适用性。

translated by 谷歌翻译

Muse: Text-To-Image Generation via Masked Generative Transformers

Huiwen Chang , Han Zhang , Jarred Barber , AJ Maschinot , Jose Lezama , Lu Jiang , Ming-Hsuan Yang , Kevin Murphy , William T. Freeman , Michael Rubinstein

分类：计算机视觉 | 人工智能 | 机器学习

2023-01-02

We present Muse, a text-to-image Transformer model that achieves state-of-the-art image generation performance while being significantly more efficient than diffusion or autoregressive models. Muse is trained on a masked modeling task in discrete token space: given the text embedding extracted from a pre-trained large language model (LLM), Muse is trained to predict randomly masked image tokens. Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding. The use of a pre-trained LLM enables fine-grained language understanding, translating to high-fidelity image generation and the understanding of visual concepts such as objects, their spatial relationships, pose, cardinality etc. Our 900M parameter model achieves a new SOTA on CC3M, with an FID score of 6.06. The Muse 3B parameter model achieves an FID of 7.88 on zero-shot COCO evaluation, along with a CLIP score of 0.32. Muse also directly enables a number of image editing applications without the need to fine-tune or invert the model: inpainting, outpainting, and mask-free editing. More results are available at https://muse-model.github.io

translated by 谷歌翻译

Multi-Task Imitation Learning for Linear Dynamical Systems

Thomas T. Zhang , Katie Kang , Bruce D. Lee , Claire Tomlin , Sergey Levine , Stephen Tu , Nikolai Matni

分类：机器学习

2022-12-01

We study representation learning for efficient imitation learning over linear systems. In particular, we consider a setting where learning is split into two phases: (a) a pre-training step where a shared $k$-dimensional representation is learned from $H$ source policies, and (b) a target policy fine-tuning step where the learned representation is used to parameterize the policy class. We find that the imitation gap over trajectories generated by the learned target policy is bounded by $\tilde{O}\left( \frac{k n_x}{HN_{\mathrm{shared}}} + \frac{k n_u}{N_{\mathrm{target}}}\right)$, where $n_x > k$ is the state dimension, $n_u$ is the input dimension, $N_{\mathrm{shared}}$ denotes the total amount of data collected for each policy during representation learning, and $N_{\mathrm{target}}$ is the amount of target task data. This result formalizes the intuition that aggregating data across related tasks to learn a representation can significantly improve the sample efficiency of learning a target task. The trends suggested by this bound are corroborated in simulation.

translated by 谷歌翻译

An Incremental Phase Mapping Approach for X-ray Diffraction Patterns using Binary Peak Representations

Dipendra Jha , K. V. L. V. Narayanachari , Ruifeng Zhang , Justin Liao , Denis T. Keane , Wei-keng Liao , Alok Choudhary , Yip-Wah Chung , Michael Bedzyk , Ankit Agrawal

分类：机器学习 | 计算机视觉

2022-11-08

Despite the huge advancement in knowledge discovery and data mining techniques, the X-ray diffraction (XRD) analysis process has mostly remained untouched and still involves manual investigation, comparison, and verification. Due to the large volume of XRD samples from high-throughput XRD experiments, it has become impossible for domain scientists to process them manually. Recently, they have started leveraging standard clustering techniques, to reduce the XRD pattern representations requiring manual efforts for labeling and verification. Nevertheless, these standard clustering techniques do not handle problem-specific aspects such as peak shifting, adjacent peaks, background noise, and mixed phases; hence, resulting in incorrect composition-phase diagrams that complicate further steps. Here, we leverage data mining techniques along with domain expertise to handle these issues. In this paper, we introduce an incremental phase mapping approach based on binary peak representations using a new threshold based fuzzy dissimilarity measure. The proposed approach first applies an incremental phase computation algorithm on discrete binary peak representation of XRD samples, followed by hierarchical clustering or manual merging of similar pure phases to obtain the final composition-phase diagram. We evaluate our method on the composition space of two ternary alloy systems- Co-Ni-Ta and Co-Ti-Ta. Our results are verified by domain scientists and closely resembles the manually computed ground-truth composition-phase diagrams. The proposed approach takes us closer towards achieving the goal of complete end-to-end automated XRD analysis.

translated by 谷歌翻译

A Deep Learning Approach to Generating Photospheric Vector Magnetograms of Solar Active Regions for SOHO/MDI Using SDO/HMI and BBSO Data

Haodi Jiang , Qin Li , Zhihang Hu , Nian Liu , Yasser Abduallah , Ju Jing , Genwei Zhang , Yan Xu , Wynne Hsu , Jason T. L. Wang

分类：机器学习

2022-11-04

Solar activity is usually caused by the evolution of solar magnetic fields. Magnetic field parameters derived from photospheric vector magnetograms of solar active regions have been used to analyze and forecast eruptive events such as solar flares and coronal mass ejections. Unfortunately, the most recent solar cycle 24 was relatively weak with few large flares, though it is the only solar cycle in which consistent time-sequence vector magnetograms have been available through the Helioseismic and Magnetic Imager (HMI) on board the Solar Dynamics Observatory (SDO) since its launch in 2010. In this paper, we look into another major instrument, namely the Michelson Doppler Imager (MDI) on board the Solar and Heliospheric Observatory (SOHO) from 1996 to 2010. The data archive of SOHO/MDI covers more active solar cycle 23 with many large flares. However, SOHO/MDI data only has line-of-sight (LOS) magnetograms. We propose a new deep learning method, named MagNet, to learn from combined LOS magnetograms, Bx and By taken by SDO/HMI along with H-alpha observations collected by the Big Bear Solar Observatory (BBSO), and to generate vector components Bx' and By', which would form vector magnetograms with observed LOS data. In this way, we can expand the availability of vector magnetograms to the period from 1996 to present. Experimental results demonstrate the good performance of the proposed method. To our knowledge, this is the first time that deep learning has been used to generate photospheric vector magnetograms of solar active regions for SOHO/MDI using SDO/HMI and H-alpha data.

translated by 谷歌翻译

Solar Flare Index Prediction Using SDO/HMI Vector Magnetic Data Products with Statistical and Machine Learning Methods

Hewei Zhang , Qin Li , Yanxing Yang , Ju Jing , Jason T. L. Wang , Haimin Wang , Zuofeng Shang

分类： (统计)机器学习

2022-09-28

太阳耀斑，尤其是M级和X级耀斑，通常与冠状质量弹出（CMES）有关。它们是太空天气影响的最重要来源，可能会严重影响近地环境。因此，必须预测耀斑（尤其是X级），以减轻其破坏性和危险后果。在这里，我们介绍了几种统计和机器学习方法，以预测AR的耀斑指数（FI），这些方法通过考虑到一定时间间隔内的不同类耀斑的数量来量化AR的耀斑生产力。具体而言，我们的样本包括2010年5月至2017年12月在太阳能磁盘上出现的563个AR。25个磁性参数，由空中震动和磁性成像器（HMI）的太空天气HMI活性区域（Sharp）提供的太阳能动力学观测值（HMI）。（SDO），表征了代理中存储在ARS中的冠状磁能，并用作预测因子。我们研究了这些尖锐的参数与ARS的FI与机器学习算法（样条回归）和重采样方法（合成少数群体过度采样技术，用于使用高斯噪声回归的合成少数群体过度采样技术，smogn简短）。基于既定关系，我们能够在接下来的1天内预测给定AR的FIS值。与其他4种流行的机器学习算法相比，我们的方法提高了FI预测的准确性，尤其是对于大型FI。此外，我们根据Borda Count方法从由9种不同的机器学习方法渲染的等级计算出尖锐参数的重要性。

translated by 谷歌翻译

Searching a High-Performance Feature Extractor for Text Recognition Network

Hui Zhang , Quanming Yao , James T. Kwok , Xiang Bai

分类：计算机视觉 | 人工智能

2022-09-27

功能提取器在文本识别（TR）中起着至关重要的作用，但是由于昂贵的手动调整，自定义其体系结构的探索相对较少。在这项工作中，受神经体系结构搜索（NAS）的成功启发，我们建议搜索合适的功能提取器。我们通过探索具有良好功能提取器的原理来设计特定于域的搜索空间。该空间包括用于空间模型的3D结构空间和顺序模型的基于转换的空间。由于该空间是巨大且结构复杂的，因此无法应用现有的NAS算法。我们提出了一种两阶段算法，以有效地在空间中进行搜索。在第一阶段，我们将空间切成几个块，并借助辅助头逐步训练每个块。我们将延迟约束引入第二阶段，并通过自然梯度下降从受过训练的超级网络搜索子网络。在实验中，进行了一系列消融研究，以更好地了解设计的空间，搜索算法和搜索架构。我们还将所提出的方法与手写和场景TR任务上的各种最新方法进行了比较。广泛的结果表明，我们的方法可以以较小的延迟获得更好的识别性能。

translated by 谷歌翻译

Hybrid Supervised and Reinforcement Learning for the Design and Optimization of Nanophotonic Structures

Christopher Yeung , Benjamin Pham , Zihan Zhang , Katherine T. Fountaine , Aaswath P. Raman

分类：机器学习

2022-09-08

从较高的计算效率到实现新颖和复杂结构的发现，深度学习已成为设计和优化纳米光子电路和组件的有力框架。但是，数据驱动和基于勘探的机器学习策略在其对纳米光逆设计的有效性方面都有局限性。监督的机器学习方法需要大量的培训数据，以产生高性能模型，并且在设计空间的复杂性鉴于训练数据之外，难以推广。另一方面，基于无监督和强化学习的方法可以具有与之相关的非常长的培训或优化时间。在这里，我们证明了一种混合监督的学习和强化学习方法来实现纳米光子结构的逆设计，并证明这种方法可以减少训练数据的依赖性，改善模型预测的普遍性，并通过数量级缩短探索性培训时间。因此，提出的策略解决了许多现代深度学习的挑战，同时为新的设计方法开辟了大门，这些方法利用了多种机器学习算法来为光子设计提供更有效和实用的解决方案。

translated by 谷歌翻译