智能论文笔记

Transfer Ranking in Finance: Applications to Cross-Sectional Momentum with Data Scarcity

Daniel Poh , Stephen Roberts , Stefan Zohren

分类：机器学习

2022-08-21

横截面策略是一种经典且流行的交易方式，最近的高性能变体结合了复杂的神经体系结构。尽管这些策略已成功地应用于涉及具有悠久历史的成熟资产的数据丰富的设置，但将它们部署在具有有限样本的仪器上，通常会产生过度合适的模型，具有降级性能。在本文中，我们介绍了融合的编码器网络 - 混合参数共享转移排名模型。该模型融合了使用在源数据集上操作的编码器 - 注意模块提取的信息，该模块具有相似但单独的模块，该模块集中在较小的目标数据集上。除了减轻目标数据稀缺性问题外，模型的自我注意机制还可以考虑工具之间的相互作用，不仅在模型训练期间的损失水平，而且在推理时间处。融合的编码器网络专注于市场资本化应用于前十的加密货币，融合的编码器网络在大多数性能指标上优于参考基准，在大多数绩效指标上的参考基准，相对于古典动量，夏普的比率和改进的速度比较提高了三倍。在没有交易成本的情况下，大约50％的基准模型。即使考虑到与加密货币相关的高交易成本后，它仍会继续超过基准。

translated by 谷歌翻译

Trading with the Momentum Transformer: An Intelligent and Interpretable Architecture

Kieran Wood , Sven Giegerich , Stephen Roberts , Stefan Zohren

分类：机器学习 | (统计)机器学习

2021-12-16

已经发现，已经发现深度学习架构，特别是深度动量网络（DMNS）[1904.04912]是一种有效的势头和平均逆转交易的方法。然而，近年来一些关键挑战涉及学习长期依赖，在考虑返回交易成本净净额并适应新的市场制度时，绩效的退化，特别是在SARS-COV-2危机期间。注意机制或基于变换器的架构是对这些挑战的解决方案，因为它们允许网络专注于过去和长期模式的重要时间步骤。我们介绍了势头变压器，一种基于关注的架构，胜过基准，并且本质上是可解释的，为我们提供更大的深入学习交易策略。我们的模型是基于LSTM的DMN的扩展，它通过在风险调整的性能度量上优化网络，直接输出位置尺寸，例如锐利比率。我们发现注意力LSTM混合解码器仅时间融合变压器（TFT）样式架构是最佳的执行模型。在可解释性方面，我们观察注意力模式的显着结构，在动量转点时具有重要的重要性。因此，时间序列被分段为制度，并且该模型倾向于关注以前的制度中的先前时间步骤。我们发现ChangePoint检测（CPD）[2105.13727]，另一个用于响应政权变化的技术可以补充多抬头的注意力，特别是当我们在多个时间尺度运行CPD时。通过添加可解释的变量选择网络，我们观察CPD如何帮助我们的模型在日常返回数据上主要远离交易。我们注意到该模型可以智能地切换和混合古典策略 - 基于数据的决定。

translated by 谷歌翻译

A transformer-based model for default prediction in mid-cap corporate markets

Kamesh Korangi , Christophe Mues , Cristián Bravo

分类：机器学习

2021-11-18

在本文中，我们研究了中途公司，即在市场资本化少于100亿美元的公开交易公司。在30年内使用美国中载公司的大型数据集，我们期望通过中期预测默认的概率术语结构，了解哪些数据源（即基本，市场或定价数据）对违约风险贡献最多。然而，现有方法通常要求来自不同时间段的数据首先聚合并转变为横截面特征，我们将问题框架作为多标签时间级分类问题。我们适应变压器模型，从自然语言处理领域发出的最先进的深度学习模型，以信用风险建模设置。我们还使用注意热图解释这些模型的预测。为了进一步优化模型，我们为多标签分类和新型多通道架构提供了一种自定义损耗功能，具有差异训练，使模型能够有效地使用所有输入数据。我们的结果表明，拟议的深度学习架构的卓越性能，导致传统模型的AUC（接收器运行特征曲线下的区域）提高了13％。我们还展示了如何使用特定于这些模型的福利方法生成不同数据源和时间关系的重要性排名。

translated by 谷歌翻译

Stock Market Prediction via Deep Learning Techniques: A Survey

Jinan Zou , Qingying Zhao , Yang Jiao , Haiyao Cao , Yanxi Liu , Qingsen Yan , Ehsan Abbasnejad , Lingqiao Liu , Javen Qinfeng Shi

分类：人工智能

2022-12-24

The stock market prediction has been a traditional yet complex problem researched within diverse research areas and application domains due to its non-linear, highly volatile and complex nature. Existing surveys on stock market prediction often focus on traditional machine learning methods instead of deep learning methods. Deep learning has dominated many domains, gained much success and popularity in recent years in stock market prediction. This motivates us to provide a structured and comprehensive overview of the research on stock market prediction focusing on deep learning techniques. We present four elaborated subtasks of stock market prediction and propose a novel taxonomy to summarize the state-of-the-art models based on deep neural networks from 2011 to 2022. In addition, we also provide detailed statistics on the datasets and evaluation metrics commonly used in the stock market. Finally, we highlight some open issues and point out several future directions by sharing some new perspectives on stock market prediction.

translated by 谷歌翻译

Automatic Identification and Classification of Share Buybacks and their Effect on Short-, Mid- and Long-Term Returns

Thilo Reintjes

分类：人工智能 | 机器学习

2022-09-26

本文调查了股票回购，特别是分享回购公告。它解决了如何识别此类公告，股票回购的超额回报以及股票回购公告后的回报的预测。我们说明了两种NLP方法，用于自动检测股票回购公告。即使有少量的培训数据，我们也可以达到高达90％的准确性。该论文利用这些NLP方法生成一个由57,155个股票回购公告组成的大数据集。通过分析该数据集，本论文的目的是表明大多数宣布回购的公司的大多数公司都表现不佳。但是，少数公司的表现极大地超过了MSCI世界。当查看所有公司的平均值时，这种重要的表现过高会导致净收益。如果根据公司的规模调整了基准指数，则平均表现过高，并且大多数表现不佳。但是，发现宣布股票回购的公司至少占其市值的1％，即使使用调整后的基准，也平均交付了显着的表现。还发现，在危机时期宣布股票回购的公司比整个市场更好。此外，生成的数据集用于训练72个机器学习模型。通过此，它能够找到许多可以达到高达77％并产生大量超额回报的策略。可以在六个不同的时间范围内改善各种性能指标，并确定明显的表现。这是通过训练多个模型的不同任务和时间范围以及结合这些不同模型的方法来实现的，从而通过融合弱学习者来产生重大改进，以创造一个强大的学习者。

translated by 谷歌翻译

Robust machine learning pipelines for trading market-neutral stock portfolios

Thomas Wong , Mauricio Barahona

分类：机器学习

2022-12-30

The application of deep learning algorithms to financial data is difficult due to heavy non-stationarities which can lead to over-fitted models that underperform under regime changes. Using the Numerai tournament data set as a motivating example, we propose a machine learning pipeline for trading market-neutral stock portfolios based on tabular data which is robust under changes in market conditions. We evaluate various machine-learning models, including Gradient Boosting Decision Trees (GBDTs) and Neural Networks with and without simple feature engineering, as the building blocks for the pipeline. We find that GBDT models with dropout display high performance, robustness and generalisability with relatively low complexity and reduced computational cost. We then show that online learning techniques can be used in post-prediction processing to enhance the results. In particular, dynamic feature neutralisation, an efficient procedure that requires no retraining of models and can be applied post-prediction to any machine learning model, improves robustness by reducing drawdown in volatile market conditions. Furthermore, we demonstrate that the creation of model ensembles through dynamic model selection based on recent model performance leads to improved performance over baseline by improving the Sharpe and Calmar ratios. We also evaluate the robustness of our pipeline across different data splits and random seeds with good reproducibility of results.

translated by 谷歌翻译

Deep Neural Networks and Tabular Data: A Survey

Vadim Borisov , Tobias Leemann , Kathrin Seßler , Johannes Haug , Martin Pawelczyk , Gjergji Kasneci

分类：机器学习

2021-10-05

异构表格数据是最常用的数据形式，对于众多关键和计算要求的应用程序至关重要。在同质数据集上，深度神经网络反复显示出卓越的性能，因此被广泛采用。但是，它们适应了推理或数据生成任务的表格数据仍然具有挑战性。为了促进该领域的进一步进展，这项工作概述了表格数据的最新深度学习方法。我们将这些方法分为三组：数据转换，专业体系结构和正则化模型。对于每个小组，我们的工作提供了主要方法的全面概述。此外，我们讨论了生成表格数据的深度学习方法，并且还提供了有关解释对表格数据的深层模型的策略的概述。因此，我们的第一个贡献是解决上述领域中的主要研究流和现有方法，同时强调相关的挑战和开放研究问题。我们的第二个贡献是在传统的机器学习方法中提供经验比较，并在五个流行的现实世界中的十种深度学习方法中，具有不同规模和不同的学习目标的经验比较。我们已将作为竞争性基准公开提供的结果表明，基于梯度增强的树合奏的算法仍然大多在监督学习任务上超过了深度学习模型，这表明对表格数据的竞争性深度学习模型的研究进度停滞不前。据我们所知，这是对表格数据深度学习方法的第一个深入概述。因此，这项工作可以成为有价值的起点，以指导对使用表格数据深入学习感兴趣的研究人员和从业人员。

translated by 谷歌翻译

Survey of Generative Methods for Social Media Analysis

Stan Matwin , Aristides Milios , Paweł Prałat , Amilcar Soares , François Théberge

分类：机器学习

2021-12-13

本次调查绘制了用于分析社交媒体数据的生成方法的研究状态的广泛的全景照片（Sota）。它填补了空白，因为现有的调查文章在其范围内或被约会。我们包括两个重要方面，目前正在挖掘和建模社交媒体的重要性：动态和网络。社会动态对于了解影响影响或疾病的传播，友谊的形成，友谊的形成等，另一方面，可以捕获各种复杂关系，提供额外的洞察力和识别否则将不会被注意的重要模式。

translated by 谷歌翻译

From Multi-label Learning to Cross-Domain Transfer: A Model-Agnostic Approach

Jesse Read

分类：机器学习

2022-07-24

在多标签学习中，单个数据点与多个目标标签相关联的多任务学习的特定情况，在文献中广泛假定，为了获得最佳准确性，应明确建模标签之间的依赖性。这个前提导致提供的方法的扩散，以学习和预测标签，例如，一个标签的预测会影响对其他标签的预测。即使现在人们承认，在许多情况下，最佳性能并不需要一种依赖模型，但此类模型在某些情况下继续超越独立模型，这暗示了其对其性能的替代解释以外的标签依赖性，而文献仅是文献才是最近开始解开。利用并扩展了最近的发现，我们将多标签学习的原始前提转移到其头上，并在任务标签之间没有任何可衡量的依赖性的情况下特别处理联合模型的问题；例如，当任务标签来自单独的问题域时。我们将洞察力从这项研究转移到建立转移学习方法，该方法挑战了长期以来的假设，即任务的可转移性来自源和目标域或模型之间相似性的测量。这使我们能够设计和测试一种传输学习方法，该方法是模型驱动的，而不是纯粹的数据驱动，并且它是黑匣子和模型不合时式（可以考虑任何基本模型类）。我们表明，从本质上讲，我们可以根据源模型容量创建任务依赖性。我们获得的结果具有重要的含义，并在多标签和转移学习领域为将来的工作提供了明确的方向。

translated by 谷歌翻译

A Comprehensive Survey on Transfer Learning

Fuzhen Zhuang , Zhiyuan Qi , Keyu Duan , Dongbo Xi , Yongchun Zhu , Hengshu Zhu , Hui Xiong , Qing He

分类：

2019-11-07

Transfer learning aims at improving the performance of target learners on target domains by transferring the knowledge contained in different but related source domains. In this way, the dependence on a large number of target domain data can be reduced for constructing target learners. Due to the wide application prospects, transfer learning has become a popular and promising area in machine learning. Although there are already some valuable and impressive surveys on transfer learning, these surveys introduce approaches in a relatively isolated way and lack the recent advances in transfer learning. Due to the rapid expansion of the transfer learning area, it is both necessary and challenging to comprehensively review the relevant studies. This survey attempts to connect and systematize the existing transfer learning researches, as well as to summarize and interpret the mechanisms and the strategies of transfer learning in a comprehensive way, which may help readers have a better understanding of the current research status and ideas. Unlike previous surveys, this survey paper reviews more than forty representative transfer learning approaches, especially homogeneous transfer learning approaches, from the perspectives of data and model. The applications of transfer learning are also briefly introduced. In order to show the performance of different transfer learning models, over twenty representative transfer learning models are used for experiments. The models are performed on three different datasets, i.e., Amazon Reviews, Reuters-21578, and Office-31. And the experimental results demonstrate the importance of selecting appropriate transfer learning models for different applications in practice.

translated by 谷歌翻译

Proceedings of the 3rd International Workshop on Reading Music Systems

Jorge Calvo-Zaragoza , Alexander Pacha

分类：计算机视觉 | 机器学习

2022-12-01

The International Workshop on Reading Music Systems (WoRMS) is a workshop that tries to connect researchers who develop systems for reading music, such as in the field of Optical Music Recognition, with other researchers and practitioners that could benefit from such systems, like librarians or musicologists. The relevant topics of interest for the workshop include, but are not limited to: Music reading systems; Optical music recognition; Datasets and performance evaluation; Image processing on music scores; Writer identification; Authoring, editing, storing and presentation systems for music scores; Multi-modal systems; Novel input-methods for music to produce written music; Web-based Music Information Retrieval services; Applications and projects; Use-cases related to written music. These are the proceedings of the 3rd International Workshop on Reading Music Systems, held in Alicante on the 23rd of July 2021.

translated by 谷歌翻译

A Generic Methodology for the Statistically Uniform & Comparable Evaluation of Automated Trading Platform Components

Artur Sokolovsky , Luca Arnaboldi

分类：机器学习

2020-09-21

尽管机器学习方法已在金融领域广泛使用，但在非常成功的学位上，这些方法仍然可以根据解释性，可比性和可重复性来定制特定研究和不透明。这项研究的主要目的是通过提供一种通用方法来阐明这一领域，该方法是调查 - 不合Snostic且可解释给金融市场从业人员，从而提高了其效率，降低了进入的障碍，并提高了实验的可重复性。提出的方法在两个自动交易平台组件上展示。也就是说，价格水平，众所周知的交易模式和一种新颖的2步特征提取方法。该方法依赖于假设检验，该假设检验在其他社会和科学学科中广泛应用，以有效地评估除简单分类准确性之外的具体结果。提出的主要假设是为了评估所选的交易模式是否适合在机器学习设置中使用。在整个实验中，我们发现在机器学习设置中使用所考虑的交易模式仅由统计数据得到部分支持，从而导致效果尺寸微不足道（反弹7- $ 0.64 \ pm 1.02 $，反弹11 $ 0.38 \ pm 0.98 $，并且篮板15- $ 1.05 \ pm 1.16 $），但允许拒绝零假设。我们展示了美国期货市场工具上的通用方法，并提供了证据表明，通过这种方法，我们可以轻松获得除传统绩效和盈利度指标之外的信息指标。这项工作是最早将这种严格的统计支持方法应用于金融市场领域的工作之一，我们希望这可能是更多研究的跳板。

translated by 谷歌翻译

Slow Momentum with Fast Reversion: A Trading Strategy Using Deep Learning and Changepoint Detection

Kieran Wood , Stephen Roberts , Stefan Zohren

分类： (统计)机器学习 | 机器学习

2021-05-28

动量策略是替代投资的重要组成部分，是商品交易顾问（CTA）的核心。然而，这些策略已被发现难以调整市场条件的快速变化，例如在2020年市场崩溃期间。特别是，在动量转向点之后，在趋势从上升趋势（下降趋势）逆转到下降趋势（上升趋势），时间序列动量（TSMOM）策略容易发生不良赌注。为了提高对政权变更的响应，我们介绍了一种新颖的方法，在那里我们将在线切换点检测（CPD）模块插入深势网络（DMN）[1904.04912]管道，它使用LSTM深度学习架构同时学习趋势估算与定位尺寸。此外，我们的模型能够优化它的平衡1）延迟延期的速度策略，它利用持续趋势，但没有过度反应到本地化价格移动，而且2）通过快速翻转其位置，这是一种快速平均转换策略制度，然后再次将其交换为利用本地化的价格。我们的CPD模块输出ChangePoint位置和严重性分数，允许我们的模型以数据驱动的方式学习响应变化的不平衡或更小，更局部化的变换点。在1995 - 2020年期间，在1995 - 2020年期间，添加CPD模块的添加导致夏普率的提高三分之一。该模块在显着的非间抗性期间特别有益，特别是在最近几年（2015-2020）中，性能提升大约三分之二。随着传统的动量策略在此期间的表现不佳，这很有趣。

translated by 谷歌翻译

Explainability of Text Processing and Retrieval Methods: A Critical Survey

Sourav Saha , Debapriyo Majumdar , Mandar Mitra

分类：人工智能 | 自然语言处理

2022-12-14

Deep Learning and Machine Learning based models have become extremely popular in text processing and information retrieval. However, the non-linear structures present inside the networks make these models largely inscrutable. A significant body of research has focused on increasing the transparency of these models. This article provides a broad overview of research on the explainability and interpretability of natural language processing and information retrieval methods. More specifically, we survey approaches that have been applied to explain word embeddings, sequence modeling, attention modules, transformers, BERT, and document ranking. The concluding section suggests some possible directions for future research on this topic.

translated by 谷歌翻译

Augmented Bilinear Network for Incremental Multi-Stock Time-Series Classification

Mostafa Shabani , Dat Thanh Tran , Juho Kanniainen , Alexandros Iosifidis

分类：机器学习

2022-07-23

深度学习模型已在解决财务时间序列分析问题，推翻常规机器学习和统计方法方面已成为主导。大多数情况下，由于市场条件固有的差异，经过培训的一个市场或安全性的模型不能直接应用于另一个市场或安全性。此外，随着市场随着时间的推移的发展，有必要在提供新数据时更新现有模型或培训新模型。这种情况是大多数财务预测应用程序固有的，自然会提出以下研究问题：如何有效地将预训练的模型适应新的数据集，同时保留旧数据的性能，尤其是当旧数据无法访问时？在本文中，我们提出了一种方法，可以有效保留在一组证券上预先培训的神经网络中可用的知识，并将其调整以实现新的证券。在我们的方法中，通过保持现有连接的固定来维护预先训练的神经网络中编码的先验知识，并且通过一组增强连接对新证券进行调整，并使用新数据对新证券进行了调整。辅助连接被限制为低级。这不仅使我们能够快速针对新任务进行优化，而且还可以降低部署阶段的存储和运行时间复杂性。我们的方法的效率在使用大规模限制订单数据集的股票中价运动预测问题中得到了经验验证。实验结果表明，我们的方法增强了预测性能，并减少了网络参数的总数。

translated by 谷歌翻译

FETILDA: An Effective Framework For Fin-tuned Embeddings For Long Financial Text Documents

Bolun "Namir" Xia , Vipula D. Rawte , Mohammed J. Zaki , Aparna Gupta

分类：自然语言处理 | 人工智能 | 机器学习

2022-06-14

非结构化数据，尤其是文本，在各个领域继续迅速增长。特别是，在金融领域，有大量累积的非结构化财务数据，例如公司定期向监管机构提交的文本披露文件，例如证券和交易委员会（SEC）。这些文档通常很长，并且倾向于包含有关公司绩效的宝贵信息。因此，从这些长文本文档中学习预测模型是非常兴趣的，尤其是用于预测数值关键绩效指标（KPI）。尽管在训练有素的语言模型（LMS）中取得了长足的进步，这些模型从大量的文本数据中学习，但他们仍然在有效的长期文档表示方面挣扎。我们的工作满足了这种批判性需求，即如何开发更好的模型来从长文本文档中提取有用的信息，并学习有效的功能，这些功能可以利用软件财务和风险信息来进行文本回归（预测）任务。在本文中，我们提出并实施了一个深度学习框架，该框架将长文档分为大块，并利用预先训练的LMS处理和将块汇总为矢量表示，然后进行自我关注以提取有价值的文档级特征。我们根据美国银行的10-K公共披露报告以及美国公司提交的另一个报告数据集评估了模型。总体而言，我们的框架优于文本建模的强大基线方法以及仅使用数值数据的基线回归模型。我们的工作提供了更好的见解，即如何利用预先训练的域特异性和微调的长输入LMS来表示长文档可以提高文本数据的表示质量，从而有助于改善预测分析。

translated by 谷歌翻译

Reinforcement Learning for Systematic FX Trading

Gabriel Borrageiro , Nick Firoozye , Paolo Barucca

分类：机器学习

2021-10-10

我们探索在线感应转移学习，通过由高斯混合模型隐藏的加工单元形成的径向基函数网络转移到直接，经常性的加固学习剂。该代理商在实验中进行工作，交易主要的现货市场货币对，我们准确地占交易和资金成本。这些利润和损失来源，包括货币市场发生的价格趋势，通过二次实用程序向代理商提供，他们将直接学习瞄准职位。我们通过学习在在线转移学习背景下瞄准风险职位之前提前改进工作。我们的代理商实现了0.52的年度组合信息比例，复合返回率为9.3％，净的执行和资金成本，超过7年的测试集;尽管在交易成本在统计上最贵的价格是最昂贵的，但仍然迫使模型在5点在5点在5月5日的交易日结束。

translated by 谷歌翻译

Transformers in Vision: A Survey

Salman Khan , Muzammal Naseer , Munawar Hayat , Syed Waqas Zamir , Fahad Shahbaz Khan , Mubarak Shah

分类：

2021-01-04

Astounding results from Transformer models on natural language tasks have intrigued the vision community to study their application to computer vision problems. Among their salient benefits, Transformers enable modeling long dependencies between input sequence elements and support parallel processing of sequence as compared to recurrent networks e.g., Long short-term memory (LSTM). Different from convolutional networks, Transformers require minimal inductive biases for their design and are naturally suited as set-functions. Furthermore, the straightforward design of Transformers allows processing multiple modalities (e.g., images, videos, text and speech) using similar processing blocks and demonstrates excellent scalability to very large capacity networks and huge datasets. These strengths have led to exciting progress on a number of vision tasks using Transformer networks. This survey aims to provide a comprehensive overview of the Transformer models in the computer vision discipline. We start with an introduction to fundamental concepts behind the success of Transformers i.e., self-attention, large-scale pre-training, and bidirectional feature encoding. We then cover extensive applications of transformers in vision including popular recognition tasks (e.g., image classification, object detection, action recognition, and segmentation), generative modeling, multi-modal tasks (e.g., visual-question answering, visual reasoning, and visual grounding), video processing (e.g., activity recognition, video forecasting), low-level vision (e.g., image super-resolution, image enhancement, and colorization) and 3D analysis (e.g., point cloud classification and segmentation). We compare the respective advantages and limitations of popular techniques both in terms of architectural design and their experimental value. Finally, we provide an analysis on open research directions and possible future works. We hope this effort will ignite further interest in the community to solve current challenges towards the application of transformer models in computer vision.

translated by 谷歌翻译

TNT-KID: Transformer-based Neural Tagger for Keyword Identification

Matej Martinc , Blaž Škrlj , Senja Pollak

分类：自然语言处理

2020-03-20

随着越来越多的可用文本数据，能够自动分析，分类和摘要这些数据的算法的开发已成为必需品。在本研究中，我们提出了一种用于关键字识别的新颖算法，即表示给定文档的关键方面的一个或多字短语的提取，称为基于变压器的神经标记器，用于关键字识别（TNT-KID）。通过将变压器架构适用于手头的特定任务并利用域特定语料库上的预先磨损的语言模型，该模型能够通过提供竞争和强大的方式克服监督和无监督的最先进方法的缺陷在各种不同的数据集中的性能，同时仅需要最佳执行系统所需的手动标记的数据。本研究还提供了彻底的错误分析，具有对模型内部运作的有价值的见解和一种消融研究，测量关键字识别工作流程的特定组分对整体性能的影响。

translated by 谷歌翻译

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy , Lucas Beyer , Alexander Kolesnikov , Dirk Weissenborn , Xiaohua Zhai , Thomas Unterthiner , Mostafa Dehghani , Matthias Minderer , Georg Heigold , Sylvain Gelly

分类：

2020-10-22

While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while keeping their overall structure in place. We show that this reliance on CNNs is not necessary and a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. When pre-trained on large amounts of data and transferred to multiple mid-sized or small image recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc.), Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train. 1

translated by 谷歌翻译