智能论文笔记

Forecasting financial markets with semantic network analysis in the COVID-19 crisis

A. Fronzetti Colladon , S. Grassi , F. Ravazzolo , F. Violante

分类：自然语言处理

2020-09-09

This paper uses a new textual data index for predicting stock market data. The index is applied to a large set of news to evaluate the importance of one or more general economic-related keywords appearing in the text. The index assesses the importance of the economic-related keywords, based on their frequency of use and semantic network position. We apply it to the Italian press and construct indices to predict Italian stock and bond market returns and volatilities in a recent sample period, including the COVID-19 crisis. The evidence shows that the index captures the different phases of financial time series well. Moreover, results indicate strong evidence of predictability for bond market data, both returns and volatilities, short and long maturities, and stock market volatility.

translated by 谷歌翻译

Evaluating and improving social awareness of energy communities through semantic network analysis of online news

C. Piselli , A. Fronzetti Colladon , L. Segneri , A. L. Pisello

分类：自然语言处理

2022-08-03

能源社区的实施代表了一种跨学科现象，有可能支持能源过渡，同时促进公民在整个能源系统中的参与及其对可再生能源的剥削。在线信息源在使人们参与此过程并提高他们对相关利益的认识方面发挥了重要作用。在这种观点中，这项工作分析了有关能源社区的在线新闻数据，以了解人们的意识和媒体的重要性。我们将语义品牌评分（SBS）指标用作语义重要性的创新度量，结合了社交网络分析和文本挖掘方法。结果表明，对能源社区以及其他能源和社会有关的主题的重要性趋势不同，也允许识别其联系。我们的方法为信息差距和可能采取的行动提供了证据，以促进低碳能量过渡。

translated by 谷歌翻译

Benchmarking Machine Learning Models to Predict Corporate Bankruptcy

Emmanuel Alanis , Sudheer Chava , Agam Shah

分类：机器学习

2022-12-22

Using a comprehensive sample of 2,585 bankruptcies from 1990 to 2019, we benchmark the performance of various machine learning models in predicting financial distress of publicly traded U.S. firms. We find that gradient boosted trees outperform other models in one-year-ahead forecasts. Variable permutation tests show that excess stock returns, idiosyncratic risk, and relative size are the more important variables for predictions. Textual features derived from corporate filings do not improve performance materially. In a credit competition model that accounts for the asymmetric cost of default misclassification, the survival random forest is able to capture large dollar profits.

translated by 谷歌翻译

Forecast Evaluation in Large Cross-Sections of Realized Volatility

Christis Katsouris

分类： (统计)机器学习 | 机器学习

2021-12-09

在本文中，我们考虑了使用相同的预测精度测试程序在横截面依赖下实现了实现波动率测量的预测评估。在预测实现挥发性时，我们根据增强横截面评估模型的预测精度。在相等预测精度的零假设下，所采用的基准模型是标准的HAR模型，而在非相同的预测精度的替代方案下，预测模型是通过套索缩收估计的增强的HAR模型。我们通过结合测量误差校正以及横截面跳转分量测量来研究预报对模型规范的敏感性。使用数值实现评估模型的样本外预测评估。

translated by 谷歌翻译

Stock Market Prediction via Deep Learning Techniques: A Survey

Jinan Zou , Qingying Zhao , Yang Jiao , Haiyao Cao , Yanxi Liu , Qingsen Yan , Ehsan Abbasnejad , Lingqiao Liu , Javen Qinfeng Shi

分类：人工智能

2022-12-24

The stock market prediction has been a traditional yet complex problem researched within diverse research areas and application domains due to its non-linear, highly volatile and complex nature. Existing surveys on stock market prediction often focus on traditional machine learning methods instead of deep learning methods. Deep learning has dominated many domains, gained much success and popularity in recent years in stock market prediction. This motivates us to provide a structured and comprehensive overview of the research on stock market prediction focusing on deep learning techniques. We present four elaborated subtasks of stock market prediction and propose a novel taxonomy to summarize the state-of-the-art models based on deep neural networks from 2011 to 2022. In addition, we also provide detailed statistics on the datasets and evaluation metrics commonly used in the stock market. Finally, we highlight some open issues and point out several future directions by sharing some new perspectives on stock market prediction.

translated by 谷歌翻译

Stock Market Prediction using Natural Language Processing -- A Survey

Om Mane , Saravanakumar kandasamy

分类：机器学习

2022-08-26

股票市场是一个网络，为几乎所有主要的经济交易提供平台。虽然投资股票市场是一个好主意，但对单个股票进行投资可能不是一个好主意，尤其是对于休闲投资者而言。智能储备需要深入研究和大量奉献精神。预测这种股票价值提供了巨大的套利利润机会。找到解决方案的这种吸引力促使研究人员找到了过去的问题，例如波动，季节性和时间依赖时间。本文调查了自然语言处理和机器学习技术领域的最新文献，用于预测股票市场的发展。本文的主要贡献包括许多最近的文章的复杂分类以及股票市场预测研究及其相关领域的最新研究趋势。

translated by 谷歌翻译

Forecasting Crude Oil Price Using Event Extraction

Jiangwei Liu , Xiaohong Huang

分类：机器学习 | 人工智能 | 自然语言处理

2021-11-14

原油价格预测研究由于其对全球经济的重大影响，从学者和政策制定者引起了巨大的关注。除供需外，原油价格在很大程度上受到各种因素的影响，如经济发展，金融市场，冲突，战争和政治事件。最先前的研究将原油价格预测视为时间序列或计量计量的可变预测问题。虽然最近已经考虑了考虑实时新闻事件的影响，但大多数作品主要使用原始新闻头条或主题模型来提取文本功能，而不会深刻探索事件信息。在这项研究中，提出了一种新的原油价格预测框架，Agesl，用于处理这个问题。在我们的方法中，利用开放域事件提取算法提取底层相关事件，并且文本情绪分析算法用于从大规模新闻中提取情绪。然后，一系列深度神经网络集成了新闻事件特征，感情特征和历史价格特征，以预测未来原油价格。实证实验是在西德克萨斯中间体（WTI）原油价格数据上进行的，结果表明，与几种基准方法相比，我们的方法获得了卓越的性能。

translated by 谷歌翻译

Empirical Asset Pricing via Ensemble Gaussian Process Regression

Damir Filipović , Puneet Pasricha

分类：机器学习

2022-12-02

We introduce an ensemble learning method based on Gaussian Process Regression (GPR) for predicting conditional expected stock returns given stock-level and macro-economic information. Our ensemble learning approach significantly reduces the computational complexity inherent in GPR inference and lends itself to general online learning tasks. We conduct an empirical analysis on a large cross-section of US stocks from 1962 to 2016. We find that our method dominates existing machine learning models statistically and economically in terms of out-of-sample $R$-squared and Sharpe ratio of prediction-sorted portfolios. Exploiting the Bayesian nature of GPR, we introduce the mean-variance optimal portfolio with respect to the predictive uncertainty distribution of the expected stock returns. It appeals to an uncertainty averse investor and significantly dominates the equal- and value-weighted prediction-sorted portfolios, which outperform the S&P 500.

translated by 谷歌翻译

Enhanced Bayesian Neural Networks for Macroeconomics and Finance

Niko Hauzenberger , Florian Huber , Karin Klieber , Massimiliano Marcellino

分类： (统计)机器学习

2022-11-09

We develop Bayesian neural networks (BNNs) that permit to model generic nonlinearities and time variation for (possibly large sets of) macroeconomic and financial variables. From a methodological point of view, we allow for a general specification of networks that can be applied to either dense or sparse datasets, and combines various activation functions, a possibly very large number of neurons, and stochastic volatility (SV) for the error term. From a computational point of view, we develop fast and efficient estimation algorithms for the general BNNs we introduce. From an empirical point of view, we show both with simulated data and with a set of common macro and financial applications that our BNNs can be of practical use, particularly so for observations in the tails of the cross-sectional or time series distributions of the target variables.

translated by 谷歌翻译

EmTract: Investor Emotions and Market Behavior

Domonkos Vamossy , Rolf Skog

分类：自然语言处理

2021-12-07

我们开发一个从社交媒体文本数据中提取情绪的工具。我们的方法有三个主要优势。首先，它适用于财务背景;其次，它包含社交媒体数据的关键方面，例如非标准短语，表情符号和表情符号;第三，它通过顺序地学习潜在的表示来操作，该潜在表示包括单词顺序，单词使用和本地上下文等功能。此工具以及用户指南可供选择：https://github.com/dvamossy/mtract。使用大学，我们探讨了社会媒体和资产价格表达的投资者情绪之间的关系。我们记录了一些有趣的见解。首先，我们确认了一些受控实验室实验的调查结果，将投资者情绪与资产价格变动相关联。其次，我们表明投资者的情绪是预测日常价格变动的预测。当波动率或短暂的兴趣更高，当机构所有权或流动性降低时，这些影响更大。第三，在IPO之前增加了投资者的热情，促进了大量的第一天返回，并长期不足的IPO股票。为了证实我们的结果，我们提供了许多稳健性检查，包括使用替代情感模型。我们的研究结果强化了情绪和市场动态密切相关的直觉，并突出了在评估股票的短期价值时考虑投资者情绪的重要性。

translated by 谷歌翻译

Yield Spread Selection in Predicting Recession Probabilities: A Machine Learning Approach

Jaehyuk Choi , Desheng Ge , Kyu Ho Kang , Sungbin Sohn

分类： (统计)机器学习

2021-01-23

使用产量曲线预测核肉的文献通常使用10年 - 三个月的财政收益率，而无需验证该对选择。本研究通过让机器学习算法识别最佳成熟度对和系数来调查是否可以改善传播的预测能力。我们的综合分析表明，由于估计误差，即尽管有可能增益，机器学习方法不会显着提高预测。这对预测地平线，控制变量，样品期和经济衰退观察的过采样是强大的。我们的发现支持使用10年 - 三个月的传播。

translated by 谷歌翻译

Data-Centric Epidemic Forecasting: A Survey

Alexander Rodríguez , Harshavardhan Kamarthi , Pulak Agarwal , Javen Ho , Mira Patel , Suchet Sapre , B. Aditya Prakash

分类：机器学习

2022-07-19

COVID-19的大流行提出了对多个领域决策者的流行预测的重要性，从公共卫生到整个经济。虽然预测流行进展经常被概念化为类似于天气预测，但是它具有一些关键的差异，并且仍然是一项非平凡的任务。疾病的传播受到人类行为，病原体动态，天气和环境条件的多种混杂因素的影响。由于政府公共卫生和资助机构的倡议，捕获以前无法观察到的方面的丰富数据来源的可用性增加了研究的兴趣。这尤其是在“以数据为中心”的解决方案上进行的一系列工作，这些解决方案通过利用非传统数据源以及AI和机器学习的最新创新来增强我们的预测能力的潜力。这项调查研究了各种数据驱动的方法论和实践进步，并介绍了一个概念框架来导航它们。首先，我们列举了与流行病预测相关的大量流行病学数据集和新的数据流，捕获了各种因素，例如有症状的在线调查，零售和商业，流动性，基因组学数据等。接下来，我们将讨论关注最近基于数据驱动的统计和深度学习方法的方法和建模范式，以及将机械模型知识域知识与统计方法的有效性和灵活性相结合的新型混合模型类别。我们还讨论了这些预测系统的现实部署中出现的经验和挑战，包括预测信息。最后，我们重点介绍了整个预测管道中发现的一些挑战和开放问题。

translated by 谷歌翻译

Factor-augmented tree ensembles

Filippo Pellegrino

分类： (统计)机器学习 | 机器学习

2021-11-27

本文提出了标准时间序列回归树建模的扩展，以处理呈现缺失观察，季节性和周期形式的缺失的违规行为的预测因子，以及非静止趋势。在这样做时，这种方法还允许通过未观察的组件来丰富基于树的自动推送中使用的信息集。此外，该稿件还示出了基于集合学习和千克文学中的最新发展来控制过度拟合的相关方法。与基准重采样方法相比，当观察时间段的数量小而有利时，这是强有力的。经验结果表明，通过因子增强树集合的函数预测股票平方返回的好处，通过因子增强树集合，关于更简单的基准。作为副产品，这种方法允许研究经济新闻对股权波动的实时重要性。

translated by 谷歌翻译

Adaptive Learning on Time Series: Method and Financial Applications

Parley Ruogu Yang , Ryan Lucas , Camilla Schelpe

分类： (统计)机器学习

2021-10-21

我们正式介绍了一个时序统计学习方法，称为自适应学习，能够在嘈杂的环境中处理模型选择，采样外预测和解释。通过仿真研究，我们证明该方法可以在条件切换的情况下呈现传统的模型选择技术，例如AIC和BIC，以及促进数据生成过程时的窗口尺寸确定是时变的。根据性地，我们使用该方法来预测S＆P 500跨越多个预测视野，从VIX曲线和产量曲线采用信息。我们发现自适应学习模型通常与，如果不是更好的话，如果不是更好的参数模型，在MSE方面评估，同时也在交叉验证下表现优于效果。我们在2020年市场崩盘期间提出了学习结果的财务应用和对学习制度的解释。这些研究可以在统计方向和金融应用方面延伸。

translated by 谷歌翻译

American Hate Crime Trends Prediction with Event Extraction

Songqiao Han , Hailiang Huang , Jiangwei Liu , Shengsheng Xiao

分类：自然语言处理 | 人工智能

2021-11-09

社交媒体平台可能为包含仇恨语音的话语提供潜在的空间，甚至更糟糕，可以充当仇恨犯罪的传播机制。联邦调查局的统一犯罪报告（UCR）计划收集仇恨犯罪数据并每年发布统计报告。这些统计数据提供了确定国家仇恨犯罪趋势的信息。统计数据还可以为执法机构提供有价值的整体和战略洞察力，或证明法律制造者为具体的立法。但是，该报告主要在明年发布，落后于许多即时需求。最近的研究主要侧重于社会媒体文本或对确诊犯罪影响的实证研究中的仇恨语音检测。本文提出了一个框架，首先利用文本采矿技术从纽约时报新闻中提取仇恨犯罪事件，然后利用结果促进预测美国国家一级和国家级仇恨犯罪趋势。实验结果表明，随着时间序列或回归方法，我们的方法可以显着提高预测性能，而无需事件相关的因素。我们的框架拓宽了国家级和国家级仇恨犯罪趋势预测的方法。

translated by 谷歌翻译

Design interpretable experience of dynamical feed forward machine learning model for forecasting NASDAQ

Pouriya Khalilian , Sara Azizi , Mohammad Hossein Amiri , Javad T. Firouzjaee

分类：人工智能

2022-12-22

National Association of Securities Dealers Automated Quotations(NASDAQ) is an American stock exchange based. It is one of the most valuable stock economic indices in the world and is located in New York City \cite{pagano2008quality}. The volatility of the stock market and the influence of economic indicators such as crude oil, gold, and the dollar in the stock market, and NASDAQ shares are also affected and have a volatile and chaotic nature \cite{firouzjaee2022lstm}.In this article, we have examined the effect of oil, dollar, gold, and the volatility of the stock market in the economic market, and then we have also examined the effect of these indicators on NASDAQ stocks. Then we started to analyze the impact of the feedback on the past prices of NASDAQ stocks and its impact on the current price. Using PCA and Linear Regression algorithm, we have designed an optimal dynamic learning experience for modeling these stocks. The results obtained from the quantitative analysis are consistent with the results of the qualitative analysis of economic studies, and the modeling done with the optimal dynamic experience of machine learning justifies the current price of NASDAQ shares.

translated by 谷歌翻译

Ask "Who", Not "What": Bitcoin Volatility Forecasting with Twitter Data

M. Eren Akbiyik , Mert Erkul , Killian Kaempf , Vaiva Vasiliauskaite , Nino Antulov-Fantulin

分类：机器学习

2021-10-27

Understanding the variations in trading price (volatility), and its response to exogenous information, is a well-researched topic in finance. In this study, we focus on finding stable and accurate volatility predictors for a relatively new asset class of cryptocurrencies, in particular Bitcoin, using deep learning representations of public social media data obtained from Twitter. For our experiments, we extracted semantic information and user statistics from over 30 million Bitcoin-related tweets, in conjunction with 15-minute frequency price data over a horizon of 144 days. Using this data, we built several deep learning architectures that utilized different combinations of the gathered information. For each model, we conducted ablation studies to assess the influence of different components and feature sets over the prediction accuracy. We found statistical evidences for the hypotheses that: (i) temporal convolutional networks perform significantly better than both classical autoregressive models and other deep learning-based architectures in the literature, and (ii) tweet author meta-information, even detached from the tweet itself, is a better predictor of volatility than the semantic content and tweet volume statistics. We demonstrate how different information sets gathered from social media can be utilized in different architectures and how they affect the prediction results. As an additional contribution, we make our dataset public for future research.

translated by 谷歌翻译

Should Bank Stress Tests Be Fair?

Paul Glasserman , Mike Li

分类： (统计)机器学习 | 机器学习

2022-07-27

监管压力测试已成为在美国最大银行设定资本要求的主要工具。美联储使用机密模型来评估在共同的压力方案中针对银行特定投资组合的特定银行成果。作为政策，尽管机构之间存在相当多的异质性，但所有银行都使用相同的模型；单个银行认为，某些模型不适合其业务。在这场辩论中，我们问，单独量身定制的模型的合理聚集是什么？我们认为，简单地跨银行汇总数据平等对待银行，但会遭受两个缺陷的影响：它可能会扭曲合法投资组合功能的影响，并且很容易受到隐含的合法信息的隐含误导来推断银行身份。我们比较了回归公平的各种概念，以解决这些缺陷，考虑到预测准确性和平等待遇。在线性模型的设置中，我们主张估算，然后丢弃中心的银行固定效果，这是可取的，而不是简单地忽略整个银行的差异。我们提供证据表明总体影响可能是重要的。我们还讨论了非线性模型的扩展。

translated by 谷歌翻译

A Comprehensive Review of Visual-Textual Sentiment Analysis from Social Media Networks

Israa Khalaf Salman Al-Tameemi , Mohammad-Reza Feizi-Derakhshi , Saeed Pashazadeh , Mohammad Asadpour

分类：自然语言处理 | 人工智能

2022-07-05

社交媒体网络已成为人们生活的重要方面，它是其思想，观点和情感的平台。因此，自动化情绪分析（SA）对于以其他信息来源无法识别人们的感受至关重要。对这些感觉的分析揭示了各种应用，包括品牌评估，YouTube电影评论和医疗保健应用。随着社交媒体的不断发展，人们以不同形式发布大量信息，包括文本，照片，音频和视频。因此，传统的SA算法已变得有限，因为它们不考虑其他方式的表现力。通过包括来自各种物质来源的此类特征，这些多模式数据流提供了新的机会，以优化基于文本的SA之外的预期结果。我们的研究重点是多模式SA的最前沿领域，该领域研究了社交媒体网络上发布的视觉和文本数据。许多人更有可能利用这些信息在这些平台上表达自己。为了作为这个快速增长的领域的学者资源，我们介绍了文本和视觉SA的全面概述，包括数据预处理，功能提取技术，情感基准数据集以及适合每个字段的多重分类方法的疗效。我们还简要介绍了最常用的数据融合策略，并提供了有关Visual Textual SA的现有研究的摘要。最后，我们重点介绍了最重大的挑战，并调查了一些重要的情感应用程序。

translated by 谷歌翻译

Forecast combinations: an over 50-year review

Xiaoqian Wang , Rob J Hyndman , Feng Li , Yanfei Kang

分类： (统计)机器学习

2022-05-09

预测组合在预测社区中蓬勃发展，近年来，已经成为预测研究和活动主流的一部分。现在，由单个（目标）系列产生的多个预测组合通过整合来自不同来源收集的信息，从而提高准确性，从而减轻了识别单个“最佳”预测的风险。组合方案已从没有估计的简单组合方法演变为涉及时间变化的权重，非线性组合，组件之间的相关性和交叉学习的复杂方法。它们包括结合点预测和结合概率预测。本文提供了有关预测组合的广泛文献的最新评论，并参考可用的开源软件实施。我们讨论了各种方法的潜在和局限性，并突出了这些思想如何随着时间的推移而发展。还调查了有关预测组合实用性的一些重要问题。最后，我们以当前的研究差距和未来研究的潜在见解得出结论。

translated by 谷歌翻译