Spatio-Temporal Attack Course-of-Action (COA) Search Learning for Scalable and Time-Varying Networks

Haemin Lee

^{†}

, Seok Bin Son

^{†}

, Won Joon Yun

^{†}

, Joongheon Kim

^{†}

, Soyi Jung

^{‡}

, and Dong Hwa Kim

\lx@sectionsign

^{†}

Department of Electrical and Computer Engineering, Korea University, Seoul, Republic of Korea

^{‡}

School of Software, Hallym University, Chuncheon, Republic of Korea

\lx@sectionsign

Agency for Defense Development (ADD), Seoul, Republic of Korea
{haemin2,lydiasb,ywjoon95,joongheon}@korea.ac.kr, sjung@hallym.ac.kr, dhkim@add.re.kr

Abstract

One of the key topics in network security research is the autonomous COA (Couse-of-Action) attack search method. Traditional COA attack search methods that passively search for attacks can be difficult, especially as the network gets bigger. To address these issues, new autonomous COA techniques are being developed, and among them, an intelligent spatial algorithm is designed in this paper for efficient operations in scalable networks. On top of the spatial search, a Monte-Carlo (MC)-based temporal approach is additionally considered for taking care of time-varying network behaviors. Therefore, we propose a spatio-temporal attack COA search algorithm for scalable and time-varying networks.

I Introduction

As the network becomes more complex and expanded, the importance of security for the network has increased [1, 2]. Especially, security accidents in the network have become more frequent, and its importance has become obviously higher.

In this situation, it is meaningful to identify the attackers’ behaviors; and the corresponding research have been widely and actively conducted in the security and privacy literature with the name of attack course-of-action (COA).

In order to design the attack COA search algorithms, many modern artificial intelligent methods can be utilized such as time-varying dynamic optimization and heuristic search [3, 4, 5, 6, 7, 8, 9, 10, 11, 12], deep and distributed learning [13, 14, 15, 16, 17], and reinforcement learning with and without multi-agent cooperation [18, 19, 20, 21]. Among these algorithms, deep learning-based approaches are widely and actively used in modern research trends, however, these approaches are not considered in this paper because it works very well if the attack patterns are similar to the training data patterns. Therefore, the training-based algorithms cannot be scalable. In order to design and implement the scalable attack COA algorithms, a novel spatial algorithm inspired by heuristic search is proposed in this paper. By defining appropriate heuristic functions for attack COA search and path planning, our proposed spatial attack COA search algorithm can work for any kinds of networks. Furthermore, we additionally consider Monte Carlo (MC)-based tree search in order to consider time-varying network behaviors. Therefore, the final form of our proposed attack COA search algorithm is spatio-temporal.

The rest of this paper is organized as follows. Sec. II introduces our proposed spatio-temporal attack COA search algorithm in scalable and time-varying networks. Sec. III presents algorithm execution results over two-types of networks. Sec. IV concludes this paper and presents future research directions.

Ii Spatio-Temporal Attack COA Optimization

Our proposed spatio-temporal attack COA optimization algorithm consists of two parts, i.e., (i) spatial intelligent attack COA heuristic for scalable networks (refer to Sec. II-A) and (ii) MC-based tree search for time-varying networks (refer to Sec. II-B), respectively.

Ii-a Spatial Intelligent Attack COA Heuristic for Scalable Networks

Our proposed spatial intelligent attack COA heuristic is for computing optimal attack COA routes. Our proposed algorithm walks through the intermediate network nodes from the attack source node to attack target node which can maximize the impacts of attacks. For this purpose, the value of attack should be numerically defined.

For the purpose, our learning heuristic function can be defined as follows,

f (n) = g (n) + h (n)

(1)

where $f (n)$ is defined as the value of attack at current intermediate network node $n$ . In addition, $g (n)$ stands for the summation of values of attacks from the attack source node to current intermediate network node $n$ . Lastly, $h (h)$ stands for the approximated value of attack from current intermediate network node $n$ to our attack target node.

Here, $g (n)$ is a calculated value which is the sum of the passed edges to arrive at current node, while $h (n)$ is the approximated value for the valuation of target node which can regarded as a heuristic function. To find the optimal path, the heuristic function $h (n)$ needs to be well-defined. In this paper, we used vulnerability information for all nodes using the Common Vulnerability Scoring System (CVSS) [22].

This CVSS is a framework used to rank the characteristics and severity of vulnerabilities and a exploitable weaknesses. In CVSS, base score reflects the harmfulness of the vulnerability itself, and the exploitability score reflects the feasibility of exploiting that particular vulnerability which has a maximum value of 10. To define the vulnerability score we used exploitability score to weight the significance of the base score considering how feasible it is to use a certain vulnerability [23].

We first assign the values of source node and target node as 0.01 and 100, respectively. Assuming that if any node with higher vulnerability score will lead to higher probability to reach the target node, we assigned individual vulnerability score to the nodes that exploits a vulnerability using $S c o r e_{v u l}$ (2) to the node with higher vulnerability score, as follows,

S c o r e_{v u l} = b a s e S c o r e \times \frac{e x p l o i t a b i l i t y S c o r e}{10} .

(2)

In addition, for the nodes that access a files or execute a code, we assign the value of 1.5 since such actions are critical. For others, we assign 0.

Here, $g (n)$ evaluates the attack value along the path from source node to current node, which is the sum of the edge weights between the visited nodes. We also use CVSS information for every connected edge weights considering the vulnerability of adjacent nodes and assigned -1 if not connected.

Based on this defined learning heuristic function $f (n)$ over given networks, adaptive tree search should be conducted in terms of highest $f (n)$ value consideration [24] to select the optimal path which makes more effective and damaging attack.

Ii-B MC-based Tree Search for Time-Varying Networks

After computing the optimal attack COA paths, our proposed spatio-temporal algorithm additionally conducts MC-based tree search for taking care of time-varying situations over networks [25]. By using the MC-based tree search, the proposed algorithm can identify whether the nearby solutions can be more valuable in terms of attack COA in the scalable and time-varying networks.

Iii Results with Two Example Networks

In this paper, two types of networks are considered for evaluating the proposed spatio-temporal attack COA search optimization learning algorithm where the network topology illustrations are in Fig. 1 and Fig. 2.

In this paper, a multi-host multi-stage vulnerability analysis tool (MULVAL) which is one of logic-based security analyzers is used [26]. Note that this is one of widely utilized extendable attack graph analysis tools. This MULVAL generates attack graphs depending on network topology such as Fig. 1 and Fig. 2. As a result, an attack graph is generated. Attack graph is a structured representation of the vulnerabilities present in the system and their interactions with each other. The attack graph is a graph created by the technique of predicting the path for an attacker to break into the system. The attack graph expresses all connections within the network in the form of a graph.It also expresses all possible attack paths based on the Shodan data used in Mulval, vulnerability information such as CVSS Score, and environmental information on the network such as IDS(Intrusion detection system) rules and firewalls.

Fig. 3 and Fig. 4 are the attack graphs generated from the network topology of Fig. 1 and Fig. 2, respectively. In the Tables of Fig. 3 and Fig. 4, various vulnerability information is described, e.g., network connection information, ids rule, and etc. In addition, it can be confirmed that the attack graph is created using the information in the Tables. There are various attack paths in the attack graphs, and it is not known which of these paths is the most optimal attack paths. Therefore, finding the optimal attack paths in the attack graphs is critical.

As a result, we confirm that the optimal attack paths can be found using our proposed spatio-temporal attck COA optimization in scalable and time-varying networks those are abstracted as attack graphs. As shown in the red arrows in Fig. 3, the optimal attack path is 18 → 16 → 15 → 14 → 13 → 11 → 10 → 9 → 8 → 6 → 5 → 4 → 3 → 2 → 1. Likewise, we can also confirm that the optimal attack COA path in Fig. 4 is as follows: 23 → 21 → 20 → 19 → 18 → 16 → 15 → 14 → 13 → 11 → 10 → 9 → 8 → 6 → 5 → 4 → 3 → 2 → 1, using our proposed spatio-temporal algorithm.

Figure 3: Optimal Attack COA Paths Network Topology (Example 1)

Figure 4: Optimal Attack COA Paths Network Topology (Example 2)

Iv Conclusions and Future Work

In this paper, a novel spatio-temporal attack COA learning algorithm is proposed and evaluated in scalable and time-varying networks. In order to deal with scalable networks, intelligent and adaptive search-based techniques are utilized for designing and implementing our proposed spatial algorithm with appropriate heuristic function definition. In addition, in orrder to deal with time-varying network behaviors, Monte-Carlo (MC) tree search based adaptation is also additionally considered. Therefore, our proposed spatio-temporal attack COA learning algorithm is designed. Furthermore, we confirm that our proposed algorithm works well in two different types of networks.

As future research directions, our proposed spatio-temporal attack COA learning algorithm should be compared with the other well-known attack COA search methods. Furthermore, considering partially observable networks is also required for more practical applications.

Acknowledgment

This work was supported by the Agency for Defense Development under the contract UI210009XD. J. Kim is a corresponding author of this paper.

References

[1] M. Saad, J. Choi, D. Nyang, J. Kim, and A. Mohaisen, “Toward characterizing blockchain-based cryptocurrencies for highly accurate predictions,” IEEE Systems Journal, vol. 14, no. 1, pp. 321–332, March 2020.
[2] N.-N. Dao, T. V. Phan, U. Sa’ad, J. Kim, T. Bauschert, D.-T. Do, and S. Cho, “Securing heterogeneous IoT with intelligent DDoS attack behavior learning,” IEEE Systems Journal, pp. 1–10, 2021.
[3] M. Choi, J. Kim, and J. Moon, “Wireless video caching and dynamic streaming under differentiated quality requirements,” IEEE Journal on Selected Areas in Communications, vol. 36, no. 6, pp. 1245–1257, June 2018.
[4] N.-N. Dao, D.-N. Vu, W. Na, J. Kim, and S. Cho, “SGCO: Stabilized green crosshaul orchestration for dense IoT offloading services,” IEEE Journal on Selected Areas in Communications, vol. 36, no. 11, pp. 2538–2548, November 2018.
[5] J. Koo, J. Yi, J. Kim, M. A. Hoque, and S. Choi, “Seamless dynamic adaptive streaming in LTE/Wi-Fi integrated network under smartphone resource constraints,” IEEE Transactions on Mobile Computing, vol. 18, no. 7, pp. 1647–1660, July 2019.
[6] M. Choi, J. Kim, and J. Moon, “Dynamic power allocation and user scheduling for power-efficient and delay-constrained multiple access networks,” IEEE Transactions on Wireless Communications, vol. 18, no. 10, pp. 4846–4858, October 2019.
[7] M. Choi, A. No, M. Ji, and J. Kim, “Markov decision policies for dynamic video delivery in wireless caching networks,” IEEE Transactions on Wireless Communications, vol. 18, no. 12, pp. 5705–5718, December 2019.
[8] M. Choi, A. F. Molisch, and J. Kim, “Joint distributed link scheduling and power allocation for content delivery in wireless caching networks,” IEEE Transactions on Wireless Communications, vol. 19, no. 12, pp. 7810–7824, December 2020.
[9] M. Choi, A. F. Molisch, D.-J. Han, D. Kim, J. Kim, and J. Moon, “Probabilistic caching and dynamic delivery policies for categorized contents and consecutive user demands,” IEEE Transactions on Wireless Communications, vol. 20, no. 4, pp. 2685–2699, April 2021.
[10] J. Yi, S. Kim, J. Kim, and S. Choi, “Supremo: Cloud-assisted low-latency super-resolution in mobile devices,” IEEE Transactions on Mobile Computing, pp. 1–1, 2021.
[11] S. Jung, J. Kim, M. Levorato, C. Cordeiro, and J.-H. Kim, “Infrastructure-assisted on-driving experience sharing for millimeter-wave connected vehicles,” IEEE Transactions on Vehicular Technology, pp. 1–1, 2021.
[12] S. Jung, J. Kim, and J.-H. Kim, “Intelligent active queue management for stabilized QoS guarantees in 5G mobile networks,” IEEE Systems Journal, pp. 1–10, 2021.
[13] J. Park, S. Samarakoon, A. Elgabli, J. Kim, M. Bennis, S.-L. Kim, and M. Debbah, “Communication-efficient and distributed learning over wireless networks: Principles and applications,” Proceedings of the IEEE, vol. 109, no. 5, pp. 796–819, May 2021.
[14] A. Malik, J. Kim, K. S. Kim, and W.-Y. Shin, “A personalized preference learning framework for caching in mobile networks,” IEEE Transactions on Mobile Computing, vol. 20, no. 6, pp. 2124–2139, June 2021.
[15] M. Shin, J. Kim, and M. Levorato, “Auction-based charging scheduling with deep learning framework for multi-drone networks,” IEEE Transactions on Vehicular Technology, vol. 68, no. 5, pp. 4235–4248, May 2019.
[16] D. Kim, D. Kwon, L. Park, J. Kim, and S. Cho, “Multiscale LSTM-based deep learning for very-short-term photovoltaic power generation forecasting in smart city energy management,” IEEE Systems Journal, vol. 15, no. 1, pp. 346–354, March 2021.
[17] U. Meteriz, N. Fazil Yildiran, J. Kim, and D. Mohaisen, “Understanding the potential risks of sharing elevation information on fitness applications,” in Proc. IEEE 40th International Conference on Distributed Computing Systems (ICDCS), 2020, pp. 464–473.
[18] S. Jung, W. J. Yun, M. Shin, J. Kim, and J.-H. Kim, “Orchestrated scheduling and multi-agent deep reinforcement learning for cloud-assisted multi-UAV charging systems,” IEEE Transactions on Vehicular Technology, vol. 70, no. 6, pp. 5362–5377, June 2021.
[19] M. Shin and J. Kim, “Randomized adversarial imitation learning for autonomous driving,” in Proc. International Joint Conference on Artificial Intelligence (IJCAI), 2019, pp. 4590–4596.
[20] M. Shin, D.-H. Choi, and J. Kim, “Cooperative management for PV/ESS-enabled electric vehicle charging stations: A multiagent deep reinforcement learning approach,” IEEE Transactions on Industrial Informatics, vol. 16, no. 5, pp. 3493–3503, May 2020.
[21] W. J. Yun, S. Park, J. Kim, M. Shin, S. Jung, A. Mohaisen, and J.-H. Kim, “Cooperative multi-agent deep reinforcement learning for reliable surveillance via autonomous multi-UAV control,” IEEE Transactions on Industrial Informatics, pp. 1–1, 2022.
[22] K. S. P. Mell and S. Romanosky, “A complete guide to the common vulnerability scoring system version 2.0.” 2007.
[23] Z. Hu, R. Beuran, and Y. Tan, “Automated penetration testing using deep reinforcement learning,” in 2020 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW). IEEE, 2020, pp. 2–10.
[24] G. Dong, F. Yang, K.-L. Tsui, and C. Zou, “Active balancing of Lithium-Ion batteries using graph theory and A-star search algorithm,” IEEE Transactions on Industrial Informatics, vol. 17, no. 4, pp. 2587–2599, 2021.
[25] C. B. Browne, E. Powley, D. Whitehouse, S. M. Lucas, P. I. Cowling, P. Rohlfshagen, S. Tavener, D. Perez, S. Samothrakis, and S. Colton, “A survey of Monte Carlo tree search methods,” IEEE Transactions on Computational Intelligence and AI in Games, vol. 4, no. 1, pp. 1–43, 2012.
[26] X. Ou, S. Govindavajhala, A. W. Appel et al., “Mulval: A logic-based network security analyzer.” in USENIX security symposium, vol. 8. Baltimore, MD, 2005, pp. 113–128.