智能论文笔记

The DEBS 2022 Grand Challenge: Detecting Trading Trends in Financial Tick Data

Sebastian Frischbier , Jawad Tahir , Christoph Doblander , Arne Hormann , Ruben Mayer , Hans-Arno Jacobsen

分类：机器学习

2022-06-23

DEBS Grand Challenge（GC）是一项年度编程竞赛，向来自学术界和行业的从业人员开放。 GC 2022版的重点是Infront Financial Technology GmbH提供的大量tick数据的实时复杂事件处理。挑战的目的是有效计算特定趋势指标并检测这些指标中的模式，例如现实生活中的交易者使用的指标来决定在金融市场上购买或销售。用于基准测试的数据集交易数据包含来自阿姆斯特丹三个主要交易所（NL），巴黎（FR）和法兰克福AM（GER）的大约5500多个金融工具的2.89亿个tick事件。 2021年的整周。数据集可公开使用。除了正确性和绩效外，提交还必须明确专注于可重复性和实用性。因此，参与者必须满足特定的非功能要求，并被要求在开源平台上构建。本文介绍了所需的方案和数据集交易数据，定义了问题声明的查询，并解释了对评估平台挑战者的增强功能，该挑战者处理数据分布，动态订阅以及对提交的远程评估。

translated by 谷歌翻译

Architecture of Automated Crypto-Finance Agent

Ali Raheman , Anton Kolonin , Ben Goertzel , Gergely Hegykozi , Ikram Ansari

分类：人工智能

2021-07-16

我们介绍了分散金融中的积极投资组合管理的自主代理的认知体系结构，涉及资产选择，投资组合平衡，流动性和交易等活动。提供架构的部分实施并提供初步结果和结论。

translated by 谷歌翻译

Comparison and Evaluation of Methods for a Predict+Optimize Problem in Renewable Energy

Christoph Bergmeir , Frits de Nijs , Abishek Sriramulu , Mahdi Abolghasemi , Richard Bean , John Betts , Quang Bui , Nam Trong Dinh , Nils Einecke , Rasul Esmaeilbeigi

分类：人工智能

2022-12-21

Algorithms that involve both forecasting and optimization are at the core of solutions to many difficult real-world problems, such as in supply chains (inventory optimization), traffic, and in the transition towards carbon-free energy generation in battery/load/production scheduling in sustainable energy systems. Typically, in these scenarios we want to solve an optimization problem that depends on unknown future values, which therefore need to be forecast. As both forecasting and optimization are difficult problems in their own right, relatively few research has been done in this area. This paper presents the findings of the ``IEEE-CIS Technical Challenge on Predict+Optimize for Renewable Energy Scheduling," held in 2021. We present a comparison and evaluation of the seven highest-ranked solutions in the competition, to provide researchers with a benchmark problem and to establish the state of the art for this benchmark, with the aim to foster and facilitate research in this area. The competition used data from the Monash Microgrid, as well as weather data and energy market data. It then focused on two main challenges: forecasting renewable energy production and demand, and obtaining an optimal schedule for the activities (lectures) and on-site batteries that lead to the lowest cost of energy. The most accurate forecasts were obtained by gradient-boosted tree and random forest models, and optimization was mostly performed using mixed integer linear and quadratic programming. The winning method predicted different scenarios and optimized over all scenarios jointly using a sample average approximation method.

translated by 谷歌翻译

Collaborative Multiobjective Evolutionary Algorithms in search of better Pareto Fronts. An application to trading systems

Francisco J. Soltero , Pablo Fernández-Blanco , J. Ignacio Hidalgo

分类：神经与进化计算

2022-11-04

Technical indicators use graphic representations of data sets by applying various mathematical formulas to financial time series of prices. These formulas comprise a set of rules and parameters whose values are not necessarily known and depend on many factors: the market in which it operates, the size of the time window, and others. This paper focuses on the real-time optimization of the parameters applied for analyzing time series of data. In particular, we optimize the parameters of technical and financial indicators and propose other applications, such as glucose time series. We propose the combination of several Multi-objective Evolutionary Algorithms (MOEAs). Unlike other approaches, this paper applies a set of different MOEAs, collaborating to construct a global Pareto Set of solutions. Solutions for financial problems seek high returns with minimal risk. The optimization process is continuous and occurs at the same frequency as the investment time interval. This technique permits the application of non-dominated solutions obtained with different MOEAs simultaneously. Experimental results show that this technique increases the returns of the commonly used Buy \& Hold strategy and other multi-objective strategies, even for daily operations.

translated by 谷歌翻译

Development of a hybrid method for stock trading based on TOPSIS, EMD and ELM

Elivelto Ebermam , Helder Knidel , Renato A. Krohling

分类：神经与进化计算

2022-06-14

决定何时购买或出售股票并不是一件容易的事，因为市场难以预测，受到政治和经济因素的影响。因此，基于计算智能的方法已应用于这个具有挑战性的问题。在这项工作中，每天使用技术分析标准以相似性（TOPSIS）的相似性（TOPSIS）对订单偏好进行排名，并选择最合适的股票进行购买。即便如此，在某些日子甚至Topsis都会选择不正确的选择。为了改善选择，应使用另一种方法。因此，提出了由经验模式分解（EMD）和极端学习机（ELM）组成的混合模型。 EMD将系列分解为几个子系列，因此提取了主要组分（趋势）。该组件由ELM处理，该组件执行下一个组件元素的预测。如果榆树预测的价值大于最后一个值，则确认购买股票的价值。该方法应用于巴西市场的50个股票的宇宙。与随机选择和Bovespa指数产生的回报相比，Topsis进行的选择显示出令人鼓舞的结果。使用EMD-ELM混合动力模型的确认能够增加利润交易的百分比。

translated by 谷歌翻译

The Role of "Live" in Livestreaming Markets: Evidence Using Orthogonal Random Forest

Ziwei Cong , Jia Liu , Puneet Manchanda

分类： (统计)机器学习 | 机器学习

2021-07-04

关于日益增长的直播媒介的一种普遍信念是，其价值在于其“实时”组成部分。我们通过比较实时事件需求的价格弹性如何在直播中和之后的生活中进行了比较，从而研究了这种信念。我们使用来自大型直播平台的独特且丰富的数据来做到这一点，该数据使消费者可以在流中期后购买录制版本的直播版本。在我们背景下的一个挑战是，存在高维混杂因素，其与治疗政策（即价格）和兴趣结果（即需求）的关系是复杂的，并且仅部分知道。我们通过使用广义正交随机森林框架来解决这一挑战，以进行异质治疗效果估计。我们发现在整个事件生命周期中，需求价格弹性的时间弹性都显着。具体而言，随着时间的流逝，需求变得越来越敏感，直到直播一天，那天就变成了无弹性。在生活后的时期，对录制版本的需求仍然对价格敏感，但远低于在播放前的时期。我们进一步表明，价格弹性的这种时间变化是由此类事件固有的质量不确定性以及在直播过程中与内容创建者进行实时互动的机会所驱动的。

translated by 谷歌翻译

Novel Modelling Strategies for High-frequency Stock Trading Data

Xuekui Zhang , Yuying Huang , Ke Xu , Li Xing

分类：机器学习

2022-11-30

Full electronic automation in stock exchanges has recently become popular, generating high-frequency intraday data and motivating the development of near real-time price forecasting methods. Machine learning algorithms are widely applied to mid-price stock predictions. Processing raw data as inputs for prediction models (e.g., data thinning and feature engineering) can primarily affect the performance of the prediction methods. However, researchers rarely discuss this topic. This motivated us to propose three novel modelling strategies for processing raw data. We illustrate how our novel modelling strategies improve forecasting performance by analyzing high-frequency data of the Dow Jones 30 component stocks. In these experiments, our strategies often lead to statistically significant improvement in predictions. The three strategies improve the F1 scores of the SVM models by 0.056, 0.087, and 0.016, respectively.

translated by 谷歌翻译

Orchestrating Collaborative Cybersecurity: A Secure Framework for Distributed Privacy-Preserving Threat Intelligence Sharing

Juan R. Trocoso-Pastoriza , Alain Mermoud , Romain Bouyé , Francesco Marino , Jean-Philippe Bossuat , Vincent Lenders , Jean-Pierre Hubaux

分类：人工智能

2022-09-06

网络威胁情报（CTI）共享是减少攻击者和捍卫者之间信息不对称的重要活动。但是，由于数据共享和机密性之间的紧张关系，这项活动带来了挑战，这导致信息保留通常会导致自由骑士问题。因此，共享的信息仅代表冰山一角。当前的文献假设访问包含所有信息的集中数据库，但是由于上述张力，这并不总是可行的。这会导致不平衡或不完整的数据集，需要使用技术扩展它们。我们展示了这些技术如何导致结果和误导性能期望。我们提出了一个新颖的框架，用于从分布式数据中提取有关事件，漏洞和妥协指标的分布式数据，并与恶意软件信息共享平台（MISP）一起证明其在几种实际情况下的使用。提出和讨论了CTI共享的政策影响。拟议的系统依赖于隐私增强技术和联合处理的有效组合。这使组织能够控制其CTI，并最大程度地减少暴露或泄漏的风险，同时为共享的好处，更准确和代表性的结果以及更有效的预测性和预防性防御能力。

translated by 谷歌翻译

A Survey on Concept Drift in Process Mining

Denise Maria Vecino Sato , Sheila Cristiana de Freitas , Jean Paul Barddal , Edson Emilio Scalabrin

分类：机器学习

2021-12-03

概念漂移过程挖掘（PM）是一种挑战，因为古典方法假设进程处于稳态，即事件共享相同的进程版本。我们对这些领域的交叉点进行了系统的文献综述，从而审查了过程采矿中的概念漂移，并提出了用于漂移检测和在线流程挖掘的现有技术的分类，以实现不断发展的环境。现有的作品描绘了（i）PM仍然主要关注离线分析，并且（ii）由于缺乏公共评估协议，数据集和指标，过程中的概念漂移技术的评估是麻烦的。

translated by 谷歌翻译

Analyzing social media with crowdsourcing in Crowd4SDG

Carlo Bono , Mehmet Oğuz Mülâyim , Cinzia Cappiello , Mark Carman , Jesus Cerquides , Jose Luis Fernandez-Marquez , Rosy Mondardini , Edoardo Ramalli , Barbara Pernici

分类：人工智能

2022-08-04

社交媒体有可能提供有关紧急情况和突然事件的及时信息。但是，在每天发布的数百万帖子中找到相关信息可能很困难，并且开发数据分析项目通常需要时间和技术技能。这项研究提出了一种为分析社交媒体的灵活支持的方法，尤其是在紧急情况下。引入了可以采用社交媒体分析的不同用例，并讨论了从大量帖子中检索信息的挑战。重点是分析社交媒体帖子中包含的图像和文本，以及一组自动数据处理工具，用于过滤，分类和使用人类的方法来支持数据分析师的内容。这种支持包括配置自动化工具的反馈和建议，以及众包收集公民的投入。通过讨论Crowd4SDG H2020欧洲项目中开发的三个案例研究来验证结果。

translated by 谷歌翻译

When Creators Meet the Metaverse: A Survey on Computational Arts

Lik-Hang Lee , Zijun Lin , Rui Hu , Zhengya Gong , Abhishek Kumar , Tangyao Li , Sijia Li , Pan Hui

分类：人工智能 | 机器学习

2021-11-26

MetaVerse，巨大的虚拟物理网络空间，为艺术家带来了前所未有的机会，将我们的身体环境的每个角落与数字创造力混合。本文对计算艺术进行了全面的调查，其中七个关键主题与成权相关，描述了混合虚拟物理现实中的新颖艺术品。主题首先涵盖了MetaVerse的建筑元素，例如虚拟场景和字符，听觉，文本元素。接下来，已经反映了诸如沉浸式艺术，机器人艺术和其他用户以其他用户的方法提供了沉浸式艺术，机器人艺术和其他用户中心的若干非凡类型的新颖创作。最后，我们提出了几项研究议程：民主化的计算艺术，数字隐私和搬迁艺术家的安全性，为数字艺术品，技术挑战等等的所有权认可。该调查还担任艺术家和搬迁技术人员的介绍材料，以开始在超现实主义网络空间领域创造。

translated by 谷歌翻译

A survey on concept drift adaptation

分类：

Concept drift primarily refers to an online supervised learning scenario when the relation between the input data and the target variable changes over time. Assuming a general knowledge of supervised learning in this paper we characterize adaptive learning process, categorize existing strategies for handling concept drift, overview the most representative, distinct and popular techniques and algorithms, discuss evaluation methodology of adaptive algorithms, and present a set of illustrative applications. The survey covers the different facets of concept drift in an integrated way to reflect on the existing scattered state-of-the-art. Thus, it aims at providing a comprehensive introduction to the concept drift adaptation for researchers, industry analysts and practitioners.

translated by 谷歌翻译

Is it a great Autonomous FX Trading Strategy or you are just fooling yourself

Murilo Sibrao Bernardini , Paulo Andre Lima de Castro

分类：人工智能

2021-01-15

在本文中，我们提出了一种评估为策略的长期绩效提供了现实预期的自主交易策略的方法。此方法解决此方法解决了许多陷阱，目前甚至经历过多种软件开发人员和研究人员，更不用说购买这些产品的客户。我们展示了将我们的方法应用于几种着名的自主交易策略的结果，用于管理各种金融资产选择。结果表明，许多这些公布的策略远远不可靠的金融投资车辆。我们的方法暴露了建立可靠，长期策略的困难，并提供了一种通过建立最小期间和测试执行要求来选择最有前途的潜在策略的手段。有许多开发人员可以创建软件，以自主购买和销售金融资产，其中一些人在使用历史价格系列（通常称为Resolties）时仿真时具有很大的性能。尽管如此，当这些策略用于实际市场（或在培训或评估中使用的数据）时，它们通常会非常糟糕。该方法可用于评估潜在的策略。通过这种方式，该方法有助于判断您是否真的有一个很好的交易策略，或者您只是愚弄自己。

translated by 谷歌翻译

Deep Learning-Driven Edge Video Analytics: A Survey

Renjie Xu , Saiedeh Razavi , Rong Zheng

分类：计算机视觉 | 机器学习

2022-11-28

Video, as a key driver in the global explosion of digital information, can create tremendous benefits for human society. Governments and enterprises are deploying innumerable cameras for a variety of applications, e.g., law enforcement, emergency management, traffic control, and security surveillance, all facilitated by video analytics (VA). This trend is spurred by the rapid advancement of deep learning (DL), which enables more precise models for object classification, detection, and tracking. Meanwhile, with the proliferation of Internet-connected devices, massive amounts of data are generated daily, overwhelming the cloud. Edge computing, an emerging paradigm that moves workloads and services from the network core to the network edge, has been widely recognized as a promising solution. The resulting new intersection, edge video analytics (EVA), begins to attract widespread attention. Nevertheless, only a few loosely-related surveys exist on this topic. A dedicated venue for collecting and summarizing the latest advances of EVA is highly desired by the community. Besides, the basic concepts of EVA (e.g., definition, architectures, etc.) are ambiguous and neglected by these surveys due to the rapid development of this domain. A thorough clarification is needed to facilitate a consensus on these concepts. To fill in these gaps, we conduct a comprehensive survey of the recent efforts on EVA. In this paper, we first review the fundamentals of edge computing, followed by an overview of VA. The EVA system and its enabling techniques are discussed next. In addition, we introduce prevalent frameworks and datasets to aid future researchers in the development of EVA systems. Finally, we discuss existing challenges and foresee future research directions. We believe this survey will help readers comprehend the relationship between VA and edge computing, and spark new ideas on EVA.

translated by 谷歌翻译

Automatic Identification and Classification of Share Buybacks and their Effect on Short-, Mid- and Long-Term Returns

Thilo Reintjes

分类：人工智能 | 机器学习

2022-09-26

本文调查了股票回购，特别是分享回购公告。它解决了如何识别此类公告，股票回购的超额回报以及股票回购公告后的回报的预测。我们说明了两种NLP方法，用于自动检测股票回购公告。即使有少量的培训数据，我们也可以达到高达90％的准确性。该论文利用这些NLP方法生成一个由57,155个股票回购公告组成的大数据集。通过分析该数据集，本论文的目的是表明大多数宣布回购的公司的大多数公司都表现不佳。但是，少数公司的表现极大地超过了MSCI世界。当查看所有公司的平均值时，这种重要的表现过高会导致净收益。如果根据公司的规模调整了基准指数，则平均表现过高，并且大多数表现不佳。但是，发现宣布股票回购的公司至少占其市值的1％，即使使用调整后的基准，也平均交付了显着的表现。还发现，在危机时期宣布股票回购的公司比整个市场更好。此外，生成的数据集用于训练72个机器学习模型。通过此，它能够找到许多可以达到高达77％并产生大量超额回报的策略。可以在六个不同的时间范围内改善各种性能指标，并确定明显的表现。这是通过训练多个模型的不同任务和时间范围以及结合这些不同模型的方法来实现的，从而通过融合弱学习者来产生重大改进，以创造一个强大的学习者。

translated by 谷歌翻译

mt5se: An Open Source Framework for Building Autonomous Traders

Paulo André Lima de Castro

分类：人工智能

2021-01-20

在人工智能区域中已经在人工智能区域进行了自主交易机器人。已经测试了许多AI技术，用于建立能够交易金融资产的自主代理。这些举措包括传统的神经网络，模糊逻辑，加固学习，而且还有更新的方法，如深神经网络和深度加强学习。许多开发人员声称在使用历史价格系列执行时，在模拟执行时，可以成功创建具有良好性能的机器人。然而，当这些机器人在真正的市场中使用时，通常它们在风险方面存在糟糕的表现并返回。在本文中，我们提出了一个名为MT5SE的开源框架，有助于开发，重新击退，实时测试和自主交易者的实际运作。我们使用MT5SE构建并测试了几个交易者。结果表明它可能有助于开发更好的交易者。此外，我们讨论了许多研究中使用的简单架构，并提出了一种替代的多层架构。这种架构将投资组合经理（PM）分开了两个主要问题：价格预测和资本分配。超过达到高精度，PM应该在正确的时候增加利润并减少损失。此外，价格预测高度依赖于资产的性质和历史，而资本分配仅依赖于分析师的预测性能和资产的相关性。最后，我们讨论了该地区的一些有前途的技术。

translated by 谷歌翻译

A Modular Framework for Reinforcement Learning Optimal Execution

Fernando de Meer Pardo , Christoph Auth , Florin Dascalu

分类：机器学习

2022-08-11

在本文中，我们开发了一个模块化框架，用于将强化学习应用于最佳贸易执行问题。该框架的设计考虑了灵活性，以便简化不同的仿真设置的实现。我们不关注代理和优化方法，而是专注于环境，并分解必要的要求，以模拟在强化学习框架下的最佳贸易执行，例如数据预处理，观察结果的构建，行动处理，儿童订单执行，模拟，模拟我们给出了每个组件的示例，探索他们的各个实现\＆它们之间的相互作用所带来的困难，并讨论每个组件在模拟中引起的不同现象，并突出了模拟与行为之间的分歧，并讨论了不同的现象。真正的市场。我们通过设置展示我们的模块化实施，该设置是按照时间加权的平均价格（TWAP）提交时间表，允许代理人专门放置限制订单，并通过迭代的迭代来模拟限制订单（LOB）（LOB）和根据相同的时间表，将奖励计算为TWAP基准算法所达到的价格的\ $改进。我们还制定了评估程序，以在培训视野的间隔内纳入给定代理的迭代重新训练和评估，并模仿代理在随着新市场数据的可用而连续再培训时的行为，并模拟算法提供者是限制的监测实践在当前的监管框架下执行。

translated by 谷歌翻译

Machine Learning Application Development: Practitioners' Insights

Md Saidur Rahman , Foutse Khomh , Alaleh Hamidi , Jinghui Cheng , Giuliano Antoniol , Hironori Washizaki

分类：机器学习

2021-12-31

如今，由于最近在人工智能（AI）和机器学习（ML）中的近期突破，因此，智能系统和服务越来越受欢迎。然而，机器学习不仅满足软件工程，不仅具有有希望的潜力，而且还具有一些固有的挑战。尽管最近的一些研究努力，但我们仍然没有明确了解开发基于ML的申请和当前行业实践的挑战。此外，目前尚不清楚软件工程研究人员应将其努力集中起来，以更好地支持ML应用程序开发人员。在本文中，我们报告了一个旨在了解ML应用程序开发的挑战和最佳实践的调查。我们合成从80名从业者（以不同的技能，经验和应用领域）获得的结果为17个调查结果;概述ML应用程序开发的挑战和最佳实践。参与基于ML的软件系统发展的从业者可以利用总结最佳实践来提高其系统的质量。我们希望报告的挑战将通知研究界有关需要调查的主题，以改善工程过程和基于ML的申请的质量。

translated by 谷歌翻译

A Generic Methodology for the Statistically Uniform & Comparable Evaluation of Automated Trading Platform Components

Artur Sokolovsky , Luca Arnaboldi

分类：机器学习

2020-09-21

尽管机器学习方法已在金融领域广泛使用，但在非常成功的学位上，这些方法仍然可以根据解释性，可比性和可重复性来定制特定研究和不透明。这项研究的主要目的是通过提供一种通用方法来阐明这一领域，该方法是调查 - 不合Snostic且可解释给金融市场从业人员，从而提高了其效率，降低了进入的障碍，并提高了实验的可重复性。提出的方法在两个自动交易平台组件上展示。也就是说，价格水平，众所周知的交易模式和一种新颖的2步特征提取方法。该方法依赖于假设检验，该假设检验在其他社会和科学学科中广泛应用，以有效地评估除简单分类准确性之外的具体结果。提出的主要假设是为了评估所选的交易模式是否适合在机器学习设置中使用。在整个实验中，我们发现在机器学习设置中使用所考虑的交易模式仅由统计数据得到部分支持，从而导致效果尺寸微不足道（反弹7- $ 0.64 \ pm 1.02 $，反弹11 $ 0.38 \ pm 0.98 $，并且篮板15- $ 1.05 \ pm 1.16 $），但允许拒绝零假设。我们展示了美国期货市场工具上的通用方法，并提供了证据表明，通过这种方法，我们可以轻松获得除传统绩效和盈利度指标之外的信息指标。这项工作是最早将这种严格的统计支持方法应用于金融市场领域的工作之一，我们希望这可能是更多研究的跳板。

translated by 谷歌翻译

Analyzing the State of Computer Science Research with the DBLP Discovery Dataset

Lennart Küll

分类：自然语言处理

2022-12-01

The number of scientific publications continues to rise exponentially, especially in Computer Science (CS). However, current solutions to analyze those publications restrict access behind a paywall, offer no features for visual analysis, limit access to their data, only focus on niches or sub-fields, and/or are not flexible and modular enough to be transferred to other datasets. In this thesis, we conduct a scientometric analysis to uncover the implicit patterns hidden in CS metadata and to determine the state of CS research. Specifically, we investigate trends of the quantity, impact, and topics for authors, venues, document types (conferences vs. journals), and fields of study (compared to, e.g., medicine). To achieve this we introduce the CS-Insights system, an interactive web application to analyze CS publications with various dashboards, filters, and visualizations. The data underlying this system is the DBLP Discovery Dataset (D3), which contains metadata from 5 million CS publications. Both D3 and CS-Insights are open-access, and CS-Insights can be easily adapted to other datasets in the future. The most interesting findings of our scientometric analysis include that i) there has been a stark increase in publications, authors, and venues in the last two decades, ii) many authors only recently joined the field, iii) the most cited authors and venues focus on computer vision and pattern recognition, while the most productive prefer engineering-related topics, iv) the preference of researchers to publish in conferences over journals dwindles, v) on average, journal articles receive twice as many citations compared to conference papers, but the contrast is much smaller for the most cited conferences and journals, and vi) journals also get more citations in all other investigated fields of study, while only CS and engineering publish more in conferences than journals.

translated by 谷歌翻译