Leap-frog neural network for learning the symplectic evolution from partitioned data

Xin Li

^{1}

, Jian Li

^{2, 3}

and Zhihong Jeff Xia

^{4}

^{1}

Department of Statistics and Data Science, Southern University of Science and Technology of China. No 1088, Xueyuan Rd., Xili, Nanshan District,
Shenzhen, Guangdong, 518055, PR China,

^{2}

School of Astronomy and Space Science, Nanjing University, 163 Xianlin Avenue, Nanjing 210023, PR China,

^{3}

Key Laboratory of Modern Astronomy and Astrophysics in Ministry of Education, Nanjing University, Nanjing 210023, PR China

^{4}

Department of Mathematics, Northwestern University, 2033 Sheridan Road, Evanston, IL 60208, USA E-mail: ljian@nju.edu.cn

Accepted 1988 December 15. Received 1988 December 14; in original form 1988 October 11

Abstract

For the Hamiltonian system, this work considers the learning and prediction of the position ( $q$ ) and momentum ( $p$ ) variables generated by a symplectic evolution map. Similar to Chen & Tao (2021), the symplectic map is represented by the generating function. In addition, we develop a new learning scheme by splitting the time series ( $q_{i}$ , $p_{i}$ ) into several partitions, and then train a leap-frog neural network (LFNN) to approximate the generating function between the first (i.e. initial condition) and one of the rest partitions. For predicting the system evolution in a short timescale, the LFNN could effectively avoid the issue of accumulative error. Then the LFNN is applied to learn the behavior of the 2:3 resonant Kuiper belt objects, in a much longer time period, and there are two significant improvements on the neural network constructed in our previous work (Li et al., 2022): (1) conservation of the Jacobi integral ; (2) highly accurate prediction of the orbital evolution. We propose that the LFNN may be useful to make the prediction of the long time evolution of the Hamiltonian system.

keywords:

celestial mechanics – Kuiper belt: general – planets and satellites: dynamical evolution and stability – methods: miscellaneous

^†^†pagerange: Leap-frog neural network for learning the symplectic evolution from partitioned data–LABEL:lastpage^†^†pubyear: 2002

1 Introduction

Neural networks were designed for pattern recognition by learning the proper function between covariates (i.e. inputs) and variates (i.e. outputs) as a black-box (McCulloch & Pitts, 1943; Rosenblatt, 1958). The aim of artificial neural networks (ANNs) is to solve complicated problems by the method inspired from human brain. The ANN, which consists of an interconnected network, can learn from a large amount of experience data by updating the weights on its connections. Such a strategy has been efficiently applied to a wide range of pattern recognition problems in science and industry.

Time series prediction has become an intensive area of academic research. Traditional methods have constructed parametric models informed by domain expertise, e.g. auto-regressive (AR) (Box & Jenkins, 1976), exponential smoothing (Winters, 1960; Gardner, 1985) and structural time series models (Harvey, 1990). After that, machine learning methods provide a way to learn temporal dynamics in a data-driven manner (Ahmed et al., 2010). The significant increase in computing power has an enormous effect on the area of the time series forecasting, as machine learning now plays a key role in this regard.

Traditionally, differential equations are widely used for most physical models. In order to predict the time series, the modern machine learning—or, more precisely, the deep ANN (i.e., with more than one hidden layer)—is trained to represent continuous dynamical systems (E, 2017). One of the major tasks in deep learning is to learn the solutions of the differential equations as time series; and to go a step further, another advanced task it to learn the differential equations themselves, representing by a class of functions.

In the field of astronomy and astrophysics, the method of machine learning has already been widely applied to exoplanet discovery (Schanche et al., 2019; Armstrong, Gamper & Damoulas, 2021), galaxy classification (Zhang et al., 2019; Baqui et al., 2021; Vavilova et al., 2021), research of dark matter (Agarwal, Davé & Bassett, 2018; Lucie-Smith, Peiris & Pontzen, 2019; Petulante et al., 2021). However, for the classical n-body problem in celestial mechanics, the relative study did not start until very recently. Greydanus, Dzamba & Yosinski (2019) studied integrable Hamiltonian systems where the motion is regular everywhere, e.g. the two-body problem. They designed “Hamiltonian neural networks (HNN)” to learn the laws of physics directly from data, and also the conservation of the energy-like quantities on a long timescale. They applied the trained HNN to predict the position and momentum variables ${q, p}$ of future orbits. Nevertheless, their method doesn’t work very well on the 3-body problem, which is actually non-integrable. A few months later, Breen et al. (2020) declared that they could solve the general 3-body problem by means of training a deep ANN. For the planar case, they demonstrated that the deep ANN can provide solutions as accurate as those from the numerical integration within much less computational time. While we notice that their best-performing ANN is trained for trajectories with a relatively short evolution time of $T = 3.9$ time units.

The fundamental difficulty in learning the motion of the 3-body problem is that, the system could be chaotic, i.e. the slight changes in the initial conditions can lead to significantly different long-term trajectories (Poincaré, 1892). In the framework of the planar restricted 3-body problem, we studied the behaviors of objects in the 2:3 mean motion resonance with Neptune, which can be considered as an intermediate case of regular motion (Li et al., 2022). By learning the solutions of the differential equations, in terms of the orbital elements, our designed ANN can predict the trajectories of the 2:3 resonators as long as a entire resonant period, over $\sim 20000$ yr. Initially, we tried to use the position and momentum variables as in previous works, but we found that the predicted solutions can not guarantee the conservation of the Jacobi integral. Then we turned to use the orbital elements as training data, because in this way we actually assumed that the system is nearly the perturbed two-body problem. With this trick, we could predict the long-term trajectory along which the Jacobi integral is almost conserved. However, the weakness of employing the orbital elements is that, the semimajor axis and eccentricity are always on the order of 0.1-1, while the other two angles could increase to the values $≫ 1$ when the evolution time is long, then the difference in the orders of magnitudes of these quantities would not be suitable for the training process. So we were wondering, if our training data set comprising position and momentum variables, which are always on the same order of magnitude, how to make sure the Jacobi integral is conserved for the predicted orbits, i.e. keeping the system symplectic.

Very recently, Chen & Tao (2021) considered the learning and prediction of the non-linear time series generated by a latent symplectic map. They studied the Hamiltonian systems whose solution flows give such symplectic maps. They predicted the future trajectory by representing the symplectic map via a generating function, which is approximated by the so-called generating function neural network (GFNN). Chen & Tao (2021) has compared their method with some other works which perform quite well in learning the Hamiltonian systems, e.g. HNN (Greydanus, Dzamba & Yosinski, 2019) and an independent work (Bertalan et al., 2019), SRNN (Chen et al., 2020), SympNets (Jin et al., 2020), and Lutter, Ritter & Peters (2019), Toth et al. (2020), Zhong, Dey & Chakraborty (2020), Wu, Qin & Xiu (2020), Xiong et al. (2021), etc. All of these networks, except SympNets, are designed to learn some quantities that produce the Hamiltonian vector field. While the GFNN applies generating function along the evolution step by step, and the results show that it performs better on several examples compared to the other works mentioned above; while an intrinsic issue is that, the accumulation of error grows linearly in comparison with the ground truth obtained by the numerical simulation.

We then are thinking about that whether we could develop the idea “generating function” for learning the symplectic evolution from the position and momentum variables. We are about to make a major adjustment to Chen & Tao (2021): for the trajectories in the training set, we divide each trajectory into several partitions, and then deduce the generating function between the first and one of the rest partition elements. In this way, the machine could learn a class of generating functions for the entire trajectory, i.e. the time series evolution. Since each of the partition elements, at different time windows, is equally (symplectically) mapped from the first one, the accuracy of our prediction should be kept in a similar level but not increase with time. As we will show later, for the same data set of the 3-body problem provided by Chen & Tao (2021), the overall loss of our method can achieve a value that 2-3 times smaller than that of the GFNN. At this point, we may solve the problem raised in our previous work (Li et al., 2022), that we can use the training data consisting of the positions $q$ and the momenta $p$ to predict the long future orbits with the required accuracy, and also make sure the evolution is symplectic, i.e. keeping the Jacobi integral constant.

The rest of this paper is organised as follows: in Section 2, we introduce the mathematical background about linear Hamiltonian systems, including the symplectic map and generating function, and how to design the ANN to predict the future time series. In Section 3, for a dynamical model in the framework of planar restricted 3-body problem, we compare the performance of our designed ANN with the GFNN from Chen & Tao (2021). In Section 4, we apply this ANN to the evolution of the non-resonant and resonant Kuiper belt objects (KBOs), and show that the orbit prediction does have stable error even for quite a long time interval, and the Jacobi integral is also well conserved. The conclusions and discussion are presented in Section 5.

2 Methods

2.1 Hamiltonian system

We first review some basic mathematical background of Hamiltonian systems (Meyer, Hall & Offin, 2009). Let $g l (m, R)$ denote the set of all $m \times m$ non-singular matrices with entries in the real number field $R$ . Let $z$ be a coordinate vector in $R^{2 n}$ , $I$ be a time interval in $R$ , and the matrix $K : I \to g l (2 n, R)$ be continuous and symmetric. Then a Hamiltonian system can be described by $2 n$ ordinary differential equations:

˙ z = J \frac{\partial H}{\partial z} = J K (t) z = A (t) z,

(1)

where $H = H (z, t)$ is called the Hamiltonian of the system, and $J$ is the $2 n \times 2 n$ Poisson matrix:

J = [\begin{matrix} 0 & - I_{n} I_{n} & 0 \end{matrix}],

(2)

where $I_{n}$ is the $n \times n$ identity matrix.

Let $t_{0} \in I$ be the initial time point. From the theory of differential equations, if the right function $A (t) z$ is $C^{1}$ , or the Lipschitz condition in the variable $z$ is satisfied, given the initial condition $z_{0} = z (t_{0}) \in R^{2 n}$ , there exists a unique solution $z (t, t_{0}, z_{0})$ of equation (1) for $t \in I$ . We say the time evolution of $z$ is symplectic if and only if the Jacobi matrix of the transformation $z^{'}$ satisfies

(z^{'}) J (z^{'})^{T} = J .

(3)

As for the Hamilton’s equation (1), this condition is naturally satisfied.

In the current work, the latent evolution of a Hamiltonian system is considered. Let

z (t) = (q (t), p (t)),

(4)

where $q$ is called the coordinate and $p$ is called momentum, and these two variables are said to be conjugated. Then the system of equation (1) can be written in the form

˙ p (t) = - \frac{\partial H}{\partial q (t)}, ˙ q (t) = \frac{\partial H}{\partial p (t)} .

(5)

Assuming each time series $(q (t), p (t))$ is sampled at discretized time points, with a time step $h (> 0)$ , then we have

q_{i} = q (t_{i}), p_{i} = p (t_{i}) (t_{i} = i h) .

(6)

Now we consider the one step mapping

ϕ : (q (t_{i}), p (t_{i})) \to (q (t_{i + 1}), p (t_{i + 1})), \forall t_{i} .

(7)

Since the time series $(q (t), p (t))$ is symplectic, the corresponding mapping $ϕ$ is symplectic.

In our previous work (Li et al., 2022), the solution of the restricted 3-body problem is learned by multi-layer neural networks. We tried to train the ANN using the data of positions and velocities (equivalent to $q$ and $p$ , respectively). Although the prediction could be very closed to the ground truth data, the Jacobi integral seems to vary greatly along the orbital evolution, i.e. the mapping $ϕ$ fails to be symplectic. While we notice that the values of positions and velocities have the same order of magnitude, and this may be a positive feature in machine learning.

Now, instead of learning $q$ and $p$ directly, similar to Chen & Tao (2021) we turn to learn a class of mappings $ϕ$ . Then the time series $(q (t_{i}), p (t_{i}))$ can be derived step-by-step via the symplectic mapping $ϕ$ , thus the symplectic structure of the system evolution can be naturally kept. In addition, we need to control the accuracy of prediction of $q$ and $p$ in a small scale. Indeed, we can also say $ϕ$ in equation (7) is a canonical transformation, then we will adopt the classical technique “generating function” in the following.

2.2 Symplectic mapping and generating function

For clarity, here we consider the multidimensional vectors [ $q$ , $p$ ], where $q, p \in R^{n}$ . Corresponding to the 2 $n$ Hamiltonian system, $q$ is the position variable, $q_{i}$ is the conjugate momentum variable. Then the standard symplectic form is defined to be

Ω = d q \land d p,

(8)

where the operator $\land$ is called the exterior product or wedge product. Let $Q$ and $P$ be new variables, and we now investigate substitutions of the form

Q = Q (q, p), P = P (q, p) .

(9)

The change of variables in equation (9) is volume-preserving if and only if

d q \land d p = d Q \land d P,

(10)

which actually indicates that the transformation $(q, p) \to (Q, P)$ is symplectic. By equation (10), we can easily have

d p \land d q + d Q \land d P = 0.

(11)

This is sufficient to make sure the form $p d q + Q d P$ is exact, i.e. if we can find a function $F$ that satisfies

d F (q, P) = p d q + Q d P .

(12)

Since the differential form of $F$ can be written as

d F = \frac{\partial F}{\partial q} d q + \frac{\partial F}{\partial P} d P,

(13)

we can obtain the change of variables from $(q, p)$ to $(Q, P)$ , by

p = \frac{\partial F}{\partial q} (q, P), Q = \frac{\partial F}{\partial P} (q, P) .

(14)

By the implicit function theorem, when the Hessian matrix $\frac{\partial^{2} F}{\partial q \partial P}$ is non-singular, the equations in (14) are solvable for $P$ as a function of $q$ and $p$ and for $p$ as a function of $Q$ and $P$ . So equation (14) gives a straightforward way to construct the symplectic transformation, and the function $F$ is usually called generating function. For more details, the reader is referred to Chapter 6 in Meyer, Hall & Offin (2009).

2.3 Leap-frog neural network (LFNN)

In Chen & Tao (2021), the authors designed the network GFNN to learn the generating function, in order to obtain a symplectic mapping from one time point to the next. Then step by step, the system can continue to evolve with time. More precisely, for the one time step evolution, the mapping can be described as:

(q_{0}, p_{0}) \to (Q_{1}, P_{1}),

then denoting

{~ q}_{1} = Q_{1}, {~ p}_{1} = P_{1},

the next step is

({~ q}_{1}, {~ p}_{1}) \to (Q_{2}, P_{2}) .

This process could go on until

({~ q}_{N - 1}, {~ p}_{N - 1}) \to (Q_{N}, {% P}_{N}),

Here, the subscript $i$ corresponds to the $i$ -th time point $t_{i}$ in equation (7). We note that, when $N$ is large, i.e. for the long time evolution, the above prediction of the time series could cause severe accumulative error.

Figure 1: The structure of our designed ANN. The observed data consisting of the position $q$ and momentum $p$ is the input. The neural network contains 6 fully connected layers and learns to conserve a quantity which approximates the modified generating function $S$ . Then we use $S$ to predict the future orbit points ( $Q$ , $P$ ) by the symplectic evolution map.

Suppose we have the training set containing a time series in the interval $[t_{0}, t_{99}]$ , i.e. for $N = 99$ , with a fixed time step of $h$ . Different from Chen & Tao (2021), now we consider a finite partition of this time series, which is adopted to be 4 here. Thus we would get 4 subsets of the time series $(q_{i}, p_{i})$ :

Partition 0 : i = 0, \dots, 24, i.e. t \in [t_{0}, t_{24}],

Partition 1 : i = 25, \dots, 49, i.e. t \in [t_{25}, t_{49}],

Partition 2 : i = 50, \dots, 74, i.e. t \in [t_{50}, t_{74}],

Partition 3 : i = 75, \dots, 99, i.e. t \in [t_{75}, t_{99}] .

Figure 2: We split the time series $(q_{0}, p_{0}), \dots, (q_{99}, p_{99})$ at the respective epochs $t \in [t_{0}, \dots, t_{99}]$ into 4 partitions. Partition $0$ is set to be the initial condition, which is used to make the prediction for each of the rest partitions ( $1$ , $2$ and $3$ ) separately. The prediction is achieved by learning the modified generating function $S$ as we illustrated in Fig. 1. Since each $S^{j} (j = 1, 2, 3)$ represents the symplectic transformation between Partition $0$ and Partition $j$ , we have the advantage that the accumulative error could not accumulate from Partition $1$ to Partition $3$ .

Our new idea is to learn the generating function $F$ from the respective partitioned data, then $F$ is used to predict the time series. This process is visualized in Figs. 1 and partition. Let Partition $0$ be the initial condition, we first study the generating function $F^{1}$ between Partition $0$ and Partition $1$ . When we have $F^{1} (q, P) = q \cdot P + S^{1} (q, P)$ , and $\frac{\partial^{2} F^{1}}{\partial q \partial P}$ is non-singular, we can learn $S^{1}$ by the neural network with respect to the loss function:

\begin{matrix} L_{S^{1}} & = \frac{1}{25} 24 \sum i = 0 (| | \partial_{P} S^{1} (q, P) - (Q - q) | |_{2}^{2} + | | \partial_{q} S^{1} (q, P) - (p - P) | |_{2}^{2}) . \end{matrix}

Then, Partition $1$ can be generated by:

{\begin{matrix} p = P + \partial_{q} S^{1} (q, P) Q = q + \partial_{P} S^{1} (q, P) \end{matrix},

i.e.

{\begin{matrix} p_{i} = p_{i + 25} + \partial_{q_{i}} S^{1} (q_{i}, p_{i + 25}) q_{i + 25} = q_{i} + \partial_{p_{i + 25}} S^{1} (q_{i}, p_{i + 25}) \end{matrix}, i = 0, . . ., 24,

and accordingly we would obtain the symplectic mappings

(q_{0}, p_{0}) \to (q_{25}, p_{25}), \dots, (q_{24}, p_{24}) \to (q_{49}, p_{49}) .

(15)

If the generating function $S^{1}$ is approximated with error $\leq ϵ$ in the first derivative, by the designed ANN, the $L_{2}$ norm of the deviation between the predicted sequence

{({~ q}_{25}, {~ p}_{25}), \dots, ({~ q}_{49}, ~ p_{49})}

and the ground truth sequence

{(q_{25}, p_{25}), \dots, (q_{49}, p_{49})}

satisfies, for any $i \in [25, 49]$ ,

| | p_{i} - {~ p}_{i} | |_{2} \leq C \cdot ϵ,

and

| | q_{i} - {~ q}_{i} | |_{2} \leq C \cdot ϵ,

where $C$ is some constant.

It should be noted that, if we are predicting Partition $1$ by the method we described above, we need to know the momenta $p_{25}, \dots, p_{49}$ . Since we do not have the ground truth for $p_{i} (i \geq 25)$ , only the approximate values ${~ p}_{25}, \dots, {~ p}_{49}$ could be estimated. In general, there are two ways to obtain them. The first one is that, starting with the initial condition $Partition 0 : (q_{0}, p_{0}), \dots, (q_{24}, p_{24})$ , we use the Newton’s method to solve the following nonlinear equations:

{~ p}_{25} = p_{0} - \partial_{q_{0}} S^{1} (q_{0}, {~ p}_{25}),

⋮

{~ p}_{49} = p_{24} - \partial_{q_{24}} S^{1} (q_{24}, ~ p_{49}) .

By this procedure we could get ${~ p}_{25}, \dots, {~ p}_{49}$ . The other way is more direct, as we can just train an additional fully connected neural network to predict these approximate momenta by using the supervised learning like we did in Li et al. (2022).

Exactly the same, the Partition $2$ and Partition $3$ can also be predicted from Partition $0$ , respectively, by introducing the individual generating functions $F^{2} (q, P) = q \cdot P + S^{2} (q, P)$ and $F^{3} (q, P) = q \cdot P + S^{3} (q, P)$ . Then we could obtain the remaining two groups of symplectic mappings

(q_{0}, p_{0}) \to (q_{50}, p_{50}), \dots, (q_{24}, p_{24}) \to (q_{74}, p_{74}) .

(16)

and

(q_{0}, p_{0}) \to (q_{75}, p_{75}), \dots, (q_{24}, p_{24}) \to (q_{99}, p_{99}) .

(17)

In summary, for Partition $1$ , $2$ and $3$ , if the respect generating functions $S^{1}$ , $S^{2}$ and $S^{3}$ are approximated with errors $\leq ϵ$ , the loss between the ground truth and the prediction satisfies the following inequalities:

| | p_{i} - {~ p}_{i} | |_{2} \leq C \cdot ϵ,

and

| | q_{i} - {~ q}_{i} | |_{2} \leq C \cdot ϵ,

(18)

where $i \in [25, 99]$ .

The learning scheme designed above would be called the leap-frog neural network (LFNN). The main advantage of training the LFNN is that, we may avoid the issue of accumulative error since the symplectic mappings (15), (16) and (17) actually have no iteration, as we will show in the Section 3 by comparing our results with Chen & Tao (2021). Besides, the global error is rather small and the predictions of the time series in more general cases seem quite accurate.

2.4 Global Error formula for prediction

The considered sequence $(q_{i}, p_{i})$ has 100 time points in total (i.e. $i = 0$ - $99$ ), and is divided into 4 partitions. Thus, for each of the mappings (15), (16) and (17), the LFNN model is predicting a time series with a length of 25, e.g. $i = 25$ - $49$ for $Partition 1$ ; in addition, since we will later consider the planar dynamical models, either $q_{i}$ or $p_{i}$ has two components.

Then for each of the partitions ( $j = 1, 2, 3$ ), we can define two $25 \times 2$ matrices as:

Q_{s i m}^{j} (25 \times 2) = {q_{25 \cdot j}, q_{25 \cdot j + 1} . . ., q_{25 \cdot j + 24}},

P_{s i m}^{j} (25 \times 2) = {p_{25 \cdot j}, p_{25 \cdot j + 1} . . ., p_{25 \cdot j + 24}},

where the subscript “sim” denote the ground truth data from numerical simulations. Similarly, we can also define the matrices predicted by the LFNN as:

Q_{L F N N}^{j} (25 \times 2) = {{~ q}_{25 \cdot j}, {~ q}_{25 \cdot j + 1} . . ., {~ q}_{25 \cdot j + 24}},

P_{L F N N}^{j} (25 \times 2) = {{~ p}_{25 \cdot j}, {~ p}_{25 \cdot j + 1} . . ., {~ p}_{25 \cdot j + 24}} .

If we have the $25 \times 2$ matrices $A$ and $B$ , the distance between them can be defined as:

D (A, B) = \frac{1}{25} 25 \sum k = 1 \frac{\sqrt{\sum_{l = 1}^{2} (A_{k l} - B_{k l})^{2}}}{2} .

Then, for a specific partition, we can calculate the average loss between prediction ( $Q_{L F N N}^{j}$ and $P_{L F N N}^{j}$ ) and the ground truth ( $Q_{s i m}^{j}$ and $P_{s i m}^{j}$ ). When we consider just one trajectory, the averaged losses are:

	$l o s s (Q^{j}) = D (Q_{L F N N}^{j} (25 \times 2), Q_{s i m}^{j} (25 \times 2)),$
	$l o s s (P^{j}) = D (P_{L F N N}^{j} (25 \times 2), P_{s i m}^{j} (25 \times 2)) .$		(19)

And when we consider $M$ trajectories, the average losses are:

	$L O S S (Q^{j}) = \frac{\sum_{m = 1}^{M} l o s s (Q_{m}^{j})}{M},$
	$L O S S (P^{j}) = \frac{\sum_{m = 1}^{M} l o s s (P_{m}^{j})}{M} .$		(20)

Hereafter, without the superscript $j$ , we refer to the loss calculated for all the three partitions, i.e. $i = 25$ - $99$ .

The said losses will be monitored on both the training and validation sets, and the the hyperparameters of our LFNN would be refined to make sure the global errors can decrease to small values. As a results, the precision of prediction may reach an acceptable level. In the next section, we will evaluate the prediction errors for Partitions $1$ , $2$ and $3$ , respectively.

3 Learning improvement

Partition $j$	Time interval	Network	$l o s s (Q^{j})$
$j = 1$	$[t_{25}, t_{49}]$	GFNN	0.003530
		LFNN	0.003334
$j = 2$	$[t_{50}, t_{74}]$	GFNN	0.005018
		LFNN	0.004090
$j = 3$	$[t_{75}, t_{99}]$	GFNN	0.007859
		LFNN	0.003086

Table 1: For the prediction of the test orbit from the same dynamical model, the averaged losses

l o s s (Q^{j})

(see equation (2.4)) from the GFNN in Chen & Tao (2021) and the LFNN in this work, on the

j

-th partition illustrated in Fig. 2.

Having desinged the LFNN, we now apply this network to a dynamical model considered in Chen & Tao (2021), i.e. the planar circular restricted 3-body problem (PCR3BP). And we expect to see the improvement on the GFNN for learning the motion of this system.

In the framework of PCR3BP, the primary $m_{1}$ and the secondary $m_{2}$ are moving on circular orbits about their common centre of mass $O$ . These two massive bodies exert gravitational forces on the third massless particle but they are not affected by the particle. The motions of all these three bodies take place in the same plane. It is customary to choose non-dimensional parameters: the total mass of the system $(m_{1} + m_{2}) = 1$ ; and for the primary and secondary, the mutual distance $a_{N} = 1$ , and their angular velocity $n_{N} = 1$ .

Considering the synodic coordinate system $(x, y)$ , which has the origin located at $O$ but is rotating at a uniform rate $n_{N}$ in the counterclockwise direction, then the primary and secondary always lie along on the $x$ -axis at $(- μ, 0)$ and $(1 - μ, 0)$ , respectively. Here, the mass ratio $μ = m_{2} / (m_{1} + m_{2})$ is the only adjustable parameter in the PCR3BP, e.g. for the system of Earth-Moon, Sun-planet or binary stars. Within one of these systems, the equations of the motion of the third massless particle can be written as

	$¨ x - 2 ˙ y$	$=$	$\frac{\partial U}{\partial x},$
	$¨ y + 2 ˙ x$	$=$	$\frac{\partial U}{\partial y},$		(21)

where the “pseudo-potential” $U = U (x, y)$ is

U = \frac{1}{2} (x^{2} + y^{2}) + \frac{1 - μ}{r_{1}} + \frac{μ}{r_{2}},

(22)

and $r_{1}$ and $r_{2}$ are the particle’s distances to the primary and secondary, respectively:

	$r_{1}^{2}$	$=$	$(x + μ)^{2} + y^{2},$
	$r_{2}^{2}$	$=$	$(x + μ - 1)^{2} + y^{2} .$		(23)

It is well known that there is only one integral of the differential system (21), i.e. the Jacobi integral

C (x, y, ˙ x, ˙ y) = 2 U (x, y) - ({˙ x}^{2} + {˙ y}^{2}),

(24)

which is equivalent to the Hamiltonian quality of the PCR3BP. The preservation of $C$ guarantees that the orbital evolution of the particle is symplectic.

We briefly review the settings of the 3-body system in Chen & Tao (2021) for complement: (1) the primary and secondary have the equal masses, i.e., $μ = 0.5$ , and they form a binary; (2) for the particle, let $q = (x, y)$ be the position and $p = (˙ x, ˙ y)$ be the corresponding conjugate momentum; (3) for an orbit staring with $q = (5, 0)$ and $p = (0, \frac{1}{\sqrt{4.5}})$ , a number of time points have been recorded. Then these time points are adopted to be the initial conditions $q_{0}$ and $p_{0}$ to build the orbit database for the machine learning. In this way, $10^{5}$ and $10^{2}$ orbits are generated for the training set and the validation set, respectively; these ground truth orbits are numerical calculated by the 4th order Runge-Kutta integrator with a step size of $10^{- 3}$ ; (4) for each orbit in the training and validation sets, a step of $h = 0.1$ between two successive time points is adopted.

We notice that the two massive bodies locates at $(- 0.5, 0)$ and $(0.5, 0)$ , i.e. the distances to $O$ are 0.5, while the particle is about ten times farther away from $O$ . Then such a hierarchical triple system can be deemed to an approximate 2-body system, where the particle is orbiting the center of mass $O$ . This implies that the evolution of the particle is in the non-chaotic regime of PCR3BP, and the regular motion could be easily learned by the neural network.

All of our experiments are taken in PYTORCH on a desktop CPU workstation. Similar to Chen & Tao (2021), in order to learn the generating function, we construct the LFNN consists of 6 fully connected layers. There are respectively 200, 100, 50, 20 neurons in each hidden layers, where the activation function $tanh$ is used. We set the batch size to be 200, and the Adam optimizer is applied.

As described in Section 2.3, we split the sequence $(q_{i}, p_{i}) (i \in [0, 99])$ into 4 partitions. Given the initial condition in the time interval $t \in [t_{0}, t_{24}]$ (i.e. Partition $0$ ), we are going to compare our results in the later evolution time of $t \in [t_{25}, t_{99}]$ (i.e. Partitions $1$ , $2$ and $3$ ) with Chen & Tao (2021). We first make the prediction on $Partition 1$ , when the generating function $S^{1}$ has been learned, we would apply Newton’s method for

P + \partial_{q} S^{1} (q, P) - p = 0.

(25)

Let $p = p_{0}$ and $P =_{25}$ , and the initial point of the iteration (25) be $p_{24}$ , we can get ${~ p}_{25}$ . Then in the formula

Q = q + \partial_{P} S^{1} (q, P),

(26)

when we set $q = q_{0}$ , then we can have ${~ q}_{25} = Q$ . In the next step, let $p = p_{1}$ and $q = q_{1}$ , we can obtain $P =_{26}$ and $Q = {~ q}_{26}$ ; here the initial point for Newton’s method (25) is taken to be ${~ p}_{25}$ . This process continues until $i = 49$ , and the prediction on $Partition 1$ is achieved.

Independently, we then consider the prediction on $Partition 2$ . Also assume $Partition 0$ to be initial data, we repeat the above process to generate the symplectic mappings 16. Similarly, the symplectic mappings 17 for $Partition 3$ can be obtained as well.

After finishing the network training, we pick the same test orbit with $q_{0} = (5.01, - 0.01), p_{0} = (0, \frac{1}{\sqrt{4.51}})$ as considered in Chen & Tao (2021). For this test orbit, the averaged losses of $Q^{j}$ from both our best-trained LFNN and the GFNN are presented in Table 1.

It is obvious that, our method has quite small losses among all the 3 partitions; and most importantly, these losses corresponding to different periods of the orbit prediction are nearly the same, around 0.003-0.004. For the early evolution on Partition $1$ , i.e. $t \in [t_{25}, t_{49}]$ , the GFNN has the comparable loss to our results, about 0.0035. While for the later evolution on Partition $2$ , i.e. $t \in [t_{50}, t_{74}]$ , the GFNN’s loss increases to about 0.0050, which is $\sim 43$ per cent larger. And, for the end of the evolution on Partition $3$ , i.e. $t \in [t_{75}, t_{99}]$ , this loss continues to increase, up to a much larger value of 0.0079. From this point of view, the error of GFNN’s prediction seems to grow with time, while our LFNN’s prediction has nearly a stationary error. Thus our results show an advantage to predicting the long time evolution via the multiple-steps learning.

4 Applications

Beyond the orbit of Neptune, there exists a disk of small bodies orbiting the Sun, known as the Kuiper belt. By now, thousands of the Kuiper belt objects (KBOs) have been observed, and they can be classified by their orbital characteristics: the resonant, the classical (i.e. non-resonant), the scattered, and the detached objects. The studies of the KBOs’ dynamical evolution may provide important clues to the early history of our Solar system.

We now intend to use the designed LFNN to learn the behavior of the KBOs outside or in the mean motion resonance (MMR) with Neptune. In the framework of the PCR3BP, we refer to the Sun with mass $m_{1}$ as the primary, the planet Neptune with mass $m_{2}$ as the secondary, and a KBO as the third body. So in the equations of the motion of the KBO particle, i.e. equation (21), the mass ratio $μ \approx 5.146 \times 10^{- 5}$ . Then we can generate the training and validation data by numerical simulations. Still, we take 100 time points for each KBO’s orbit, for a certain evolution time.

4.1 Non-resonant KBOs

The classical KBOs have semimajor axes roughly between 42 AU to 48 AU, and are defined to be objects outside Neptune’s MMRs. In the PCR3BP, we consider the particles having initial non-dimensional semimajor axes $a = 1.44$ . Since the distance between the Sun and Neptune, i.e. $\sim 30$ AU, is set to be 1, the particles’ actually semimajor axes can be calculated to be about 43.2 AU. At this location, none of the MMRs with order lower than 8th exists. Besides, the KBO particles are placed on the nearly circular orbits with eccentricities $e = 0.01$ , to additionally make sure that they are non-resonant.

As in Section 3.1, the ground truth orbits are generated by using the 4th order Runge-Kutta integrator; and the the numbers of the orbits in the training and validation sets are $10^{5}$ and $10^{2}$ , respectively. For the 100 time points with a step of $h = 0.1$ , the total evolution time is 10 in the non-dimensional unit. According to Kepler’s third law, the orbital period of a particle is $\sqrt[3]{1.44 / 1} \times 2 π \approx 7.1$ . It suggests that the particles complete more than 1 full orbit over the evolution time of 10.

For the said particles starting with $a = 1.44$ and $e = 0.01$ , their initial longitudes of perihelia $ϖ$ are set to be $60^{\circ}$ , while the mean longitudes $λ$ are randomly selected between 0 and $2 π$ . Given such initial conditions, we generate the training and validation samples, which are out of the resonance regions.

Then we are about to re-trained our best LFNN model built on the generating function. Bearing in mind that, in order to make the symplectic transformations via the obtained generating function $S (q, P)$ , we need to know the new momentum $P$ , i.e. ${{~ p}_{25}, \dots, {~ p}_{99}}$ . If we apply the Newton’s method to the 100 validation orbits, in order to get the losses of $P$ and $Q$ , it is not easy to find the proper values of $P$ . This difficulty comes from the fact that, when we adopt the leap-frog learning, the gradient of the generating function could be not accurate enough. As we mentioned in Section 2.3, alternatively we could train an additional supervised neural network to learn the approximate $P$ . Then by $Q = q + \partial_{P} S (q, P)$ , the new position $Q$ can be predicted.

In Fig. 3, for each of the 100 trajectories from the validation set, we plot averaged losses of the predicted positions $~ q$ (i.e. loss(Q), see blue dots) and momenta $~ p$ (i.e. loss(P), see yellow dots). We find that the maximal loss is less than 0.0009, which is associated to $~ q$ . We further notice that the losses of $~ p$ are generally even more smaller, on the order of loss $(Q) \sim$ 0.0001-0.0005. Comparing with the case of “binary + particle” in Section 3.1, where loss $(Q)$ is an order of magnitude larger, the LFNN seems to work better on learning the evolution of non-resonant KBOs.

Figure 3: Learning the motion of non-resonant KBOs by the LFNN. For the time step of $h = 0.1$ , the averaged losses of the predicted orbits of the 100 validation samples. The blue dots refer to the position loss $l o s s (Q)$ , and the yellow dots indicate the momentum loss $l o s s (P)$ , as defined by equation (2.4).

4.2 Resonant KBOs

The population of resonant KBOs is believed to be a solid evidence for the orbital migration of Jovian planets. The study of the resonant KBOs is very important for us to understand the early shake-up of the outer Solar system, and a lot of works have been carried out even since early 1990s (Malhotra, 1993, 1995; Hahn & Malhotra, 2005; Li, Zhou & Sun, 2014; Nesvorný & Vokrouhlický, 2016; Pike & Lawler, 2017; Lawler et al., 2019). For this reason, in (Li et al., 2022) we provide a machine learning method that can fast predict the orbital evolution of the KBOs in the 2:3 MMR with Neptune. In the framework of the PCR3BR, since the trajectories of the 2:3 resonators are restricted on the invariant tori, their motions are regular and were learned rather accurately.

As we stated in Section 1, we intended to use the position $q$ and momentum $p$ as training data because the same order of magnitude of these variables could be more suitable for the machine learning. However, for $q$ and $p$ , our designed ANN in Li et al. (2022) can not keep the system symplectic by conserving the Jacobi integral. Now we would test how the new LFNN performs on predicting the time series $(q_{i}, p_{i})$ .

For the samples particles, their initial conditions are exactly the same as we used in (Li et al., 2022): $a_{0} = 1.31$ (the location of the nominal 2:3 resonance), $e_{0} = 0.1$ (for regular motion everywhere in the phase space), $ϖ_{0} = 60^{\circ}$ (unchanged by rotating the coordinate system); and, the initial resonant angles $σ_{0}$ are chosen in the region of $[180^{\circ} - 120^{\circ}, 180^{\circ} + 120^{\circ}]$ , where $180^{\circ}$ is the stable libration centre. Then the initial value of $λ$ can be determined from a given $σ$ by

σ = 3 λ - 2 λ_{N} - ϖ,

(27)

where $λ_{N}$ is the mean longitude of Neptune. For a given set of the orbital elements $(a, e, ϖ, λ)$ , the initial conditions of the resonators are generated in terms of the position $x$ and velocity $v$ . Then we numerically propagate the equations of motion, i.e., equation (21), by adopting the 8th-order Runge-Kutta integrator with a time step of $\sim 0.01$ yr and the local error tolerance of $10^{- 20}$ .

In order to learn the behaviour of the resonant KBOs, the time evolution should cover at least a full libration cycle. For the 2:3 resonators, the libration period is about 20000 yr, corresponding to the non-dimensional time of $\sim 762.5$ . While for the two applications considered before, the evolution time is only 10. This is the largest difference in predicting the particle’s orbit by our LFNN. In order to fulfill this task, we here increase the time step $h$ to be 9.65. Then the 100 time points lead to a total of evolution time of 965, which corresponds to the period of 25000 yr that we adopted in (Li et al., 2022). We additionally note that, the LFNN uses Partition $0$ as the initial condition to predict Partitions $1$ , $2$ and $3$ , this is also consistent to our previous work that utilize the known first 1/4 period of the orbit to predict the subsequent 3/4 period data.

Figure 4: Similar to Fig. 3, but for predicting the motion of resonant KBOs in the LFNN model. By adopting a very large time step of $h = 9.65$ , the total evolution time of the resonant KBOs is 25000 yr, which is longer than a full libration period. Here, the horizontal axis shows the initial resonant angle $σ_{0}$ from $60^{\circ}$ to $300^{\circ}$ , centered at the $180^{\circ}$ resonant centre.

orbital elements. For the case of e = 0.1, three examples of the temporal evolution of Jacobi integral for particle with $σ_{0} = 60^{\circ}$ from the validation set. The initial values for the first 6250 yrs are provided by the numerical integration (black dots). For the following 18750 yrs, the orbital parameters are predicted by the trained LFNN (green dots) and calculated by current method (red dots).

Similar to the LFNN designed for the non-resonant KBOs, but for the time series $(q_{i}, p_{i}) = (x_{i}, v_{i})$ with a much larger step $h$ , our new machine-learning method is further applied to the 2:3 resonant KBOs. The losses on the validation set for 100 orbits with different $σ_{0}$ are presented in Fig. 4. The blue and yellow dots indicate $l o s s (Q)$ and $l o s s (P)$ , respectively, calculated for the 75 predicted time points $({~ q}_{25}, {~ p}_{25}), \dots, ({~ q}_{99}, {~ p}_{99})$ . We can see that: (1) the maximum loss( $Q$ ) $\sim 0.08$ and loss( $P$ ) $\sim 0.05$ are at both ends of the curves; (2) there are two valleys where the loss reaches the minimum of the order of $\sim 0.01$ ; (3) there are two peaks as the local maximum. The profile of the loss curve looks very similar to that in Fig. 3 from Li et al. (2022). While, although the magnitude of the loss shown here is also similar, we can not directly compare the performance of the LFNN with the ANN in our previous work. This is because the data sequences are consisted of positions and velocities, but not orbital elements any more.

To show the improvement of the LFNN, we pick the sample “tp1” starting with $σ_{0} = 60^{\circ}$ , which has the largest loss of loss( $Q \sim 0.08$ ) and loss( $P \sim 0.05$ ) (see Fig. 4). Then we present in Fig. 5 the evolution for this sample from the validation set, and the associated orbital parameters can be simply obtained from the position and velocity variables. The black dots represent the ground truth values calculated by numerical integration. While the red dots denote the prediction made by the LFNN for the time period from $t_{25}$ to $t_{99}$ , where $t_{99}$ corresponds to the real time of 25000 yr. As a comparison, we apply the fully connected ANN in (Li et al., 2022) to an additional prediction, and the outputs of this ANN are indicated by the green dots.

The upper panel of Fig. 5 shows the time evolution of the Jacobi integral $C$ for the sample tp1. It is quite clear that the values of $C$ deduced from our previous method (green dots) seem to considerably depart from the ground truth (black dots), indicating that the conservation of $C$ can hardly hold. But, with regard to our new LFNN, the approximate $C$ values (red dots) are different from the ground truth values only by less than 0.01 at any time point. Therefore, the first improvement of the LFNN on predicting the motion of the 2:3 resonators is that, the symplecticity of the orbital evolution can be well maintained.

As shown in the lower four panels of Fig. 5, the second improvement is the increased accuracy on the predictions of the orbital elements $a$ , $e$ , $ϖ$ and $M$ ( $M = λ - ϖ$ is the mean anomaly). We can see that, the red dots are much closer to the black dots representing the ground truth than the green dots, especially for the slow variables $a$ and $e$ . In this sense, the designed LFNN can help to significantly reduce the errors for making the long time prediction, and thus is better than the simple fully connected ANN.

In a brief summary, we address that the LFNN works well even when the step $h$ of the time series $(q_{i}, p_{i})$ is relatively large.

Figure 5: For the validation sample tp1 starting with $σ_{0} = 60^{\circ}$ (see Fig. 4), the temporal evolution of Jacobi integral (upper panel) and the orbital elements $a$ , $e$ , $ϖ$ , $M$ (lower four panels). The black dots refer to the ground truth calculated by the numerical simulations. Provided the first 1/4 period of the ground truth orbit as the initial condition, the following 3/4 period data is respectively predicted by the new LFNN (red dots) and the fully connected ANN in Li et al. (2022) (green dots). As in the framework of PCR3BP, the semimajor axis values, $a$ , are in respect with Neptune’s semimajor axis $a_{N} \sim 30$ AU.

5 Conclusions

To improve our previous study of machine learning prediction for the 3-body problem, in which we considered a case of regular motion in the Hamiltonian system (Li et al., 2022), this paper is devoted to maintain the symplecticity of the system’s evolution. In the framework of PCR3BP, Li et al. (2022) tried to train a fully connected ANN to directly learn the time series $(q_{i}, p_{i})$ , where $q$ is the position and $p$ is the conjugate momentum (equivalent to the velocity $v$ ) here. However, the Jacobi constant can not be conserved for the predicted orbits.

Inspired by the GFNN in Chen & Tao (2021), considering the discrete time points $(q_{i}, p_{i})$ ( $i = 0$ -99), the current work learns the symplectic evolution map represented by the generating function. In mathematics, the method of generating function can naturally lead to the symplectic transformation from one time point to another. As an improvement, we further split the 100 time points into 4 partitions, i.e. $Partition 0$ : $(q_{0}, p_{0}), \dots, (q_{24}, p_{24})$ , $Partition 1$ : $(q_{25}, p_{25}) \dots, (q_{49}, p_{49})$ , $Partition 2$ : $(q_{50}, p_{50}) \dots, (q_{74}, p_{74})$ , and $Partition 3$ : $(q_{75}, p_{75}) \dots, (q_{99}, p_{99})$ . Then we build the symplectic evolution maps, from the initial condition of $Partition 0$ to each of $Partitions 1$ , $2$ and $3$ , respectively by the generating functions $S^{1}$ , $S^{2}$ and $S^{3}$ . When the generating functions are approximated by the neural network, we can get three independent groups of the flow maps (15), (16) and (17). This new structure of machine learning is named leap-frog neural network (LFNN).

The designed LFNN is first applied to the same PCR3BP model considered in Chen & Tao (2021), i.e. a massless particle orbiting around a binary with equal masses. For learning the time series $(q_{i}, p_{i})$ , we adopt the same step of $h = 0.1$ as used by the GFNN. The results show that, unlike the prediction of the GFNN, which has an error growing with time, our LFNN’s prediction has a stationary error at rather a small value. We then argue that the LFNN could avoid the issue of accumulative error, and thus may be helpful to increase the accuracy for making the long time prediction. This serves as the main advantage of our LFNN.

As an application of the designed LFNN, we have evaluated the prediction on the behavior of the KBOs beyond the orbit of Neptune. For the non-resonant KBOs, their orbital periods are about 7.1 in the non-dimensional unit. Thus, as in the case of ”binary + particle”, a small time step of $h = 0.1$ for the 100 time points is also adopted to train the network, leading to the evolution time of 10 which is longer than the said period of 1 full orbit. As expected, such a $h$ can result in very accurate prediction, with the loss on the order of $O (0.0001)$ .

But for the 2:3 resonant KBOs, we need to know the evolution over at least a full libration cycle of about 20000 yr. Thus we increase the time step to $h = 9.65$ , which leads to a total evolution time of 25000 yr as we used in Li et al. (2022). Although this $h$ becomes almost 100 time larger, our LFNN can still well work. By comparing with prediction of the ANN used in Li et al. (2022), the LFNN has two significant improvements: (1) the predicted orbits could conserve the Jacobi integral at quite a high level; (2) the predictions of the orbital elements $a$ , $e$ , $ϖ$ and $M$ are much more accurate, even for the validation sample with maximum loss. These results supports our argument that the LFNN may be useful to learn the long time evolution of the Hamiltonian system.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Nos. 11973027, 11933001 and 11601159), and National Key R&D Program of China (2019YFA0706601).

Data Availability

The data underlying this article are available in the article and in its online supplementary material.

References

Agarwal, Davé & Bassett (2018) Agarwal S., Davé R., Bassett B. A., 2018, MNRAS, 478, 3410
Ahmed et al. (2010) Ahmed N., et al.,2010, Econometric Reviews, 29(5-6):594-621.
Armstrong, Gamper & Damoulas (2021) Armstrong D. J., Gamper J., Damoulas T., 2021, MNRAS, 504, 5327
Baqui et al. (2021) Baqui P. O. et al., 2021, A&A, 645, A87
Bertalan et al. (2019) Bertalan, T., et al., 2019, Journal of Nonlinear Science, 29(12): 121107
Box & Jenkins (1976) Box G, Jenkins G., Time Series Analysis: Forecasting and Control. Holden-Day; 1976.
Breen et al. (2020) Breen P. G., et al., 2020, MNRAS, 494, 2465
Chen et al. (2020) Chen, Z., et al., 2020, In International Conference on Learning Representations (ICLR), preprint arXiv:1909.13334
Chen & Tao (2021) Chen R., Tao M., 2021, in Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July
E (2017) E W., 2017, Communications in Mathematics and Statistics, 5(1):1-11;
Gardner (1985) Gardner J., 1985, Journal of Forecasting, 4(1):1-28.
Gladman et al. (2008) Gladman B., Marsden B. G., Vanlaerhoven C., 2008, in Barucci M. A., Boehnhardt H., Cruikshank D. P., Morbidelli A., eds, Nomenclature in the Outer Solar System, In The Solar System Beyond Neptune, University of Arizona Press, Tucson, p. 43
Gallardo (2006) Gallardo T., 2006, Icarus, 184, 29
Greydanus, Dzamba & Yosinski (2019) Greydanus S., Dzamba M., Yosinski J., 2019, preprint (arXiv:1906.01563v3)
Harvey (1990) Harvey A.,1990, Forecasting, Structural Time Series Models and the Kalman Filter, Cambridge University Press.
Hahn & Malhotra (2005) Hahn J. M., Malhotra R., 2005, AJ, 130, 2392
Jin et al. (2020) Jin, P., et al., 2020, Neural Networks, 132:166-179
Khain et al. (2020) Khain T. et al., 2020, AJ, 159, 133
Koon et al. (2000) Koon W., et al.,2020, Chaos: An Interdisciplinary Journal of Nonlinear Science, 10(2): 427-469,
Li, Zhou & Sun (2014) Li J., Zhou L.-Y., Sun Y.-S., 2014, MNRAS, 437, 215
Lawler et al. (2019) Lawler et al., 2019, AJ, 157, 253
Li, Holman & Tao (2016) Li G., Holman M. & Tao M., 2016, 831(1): 96
Li et al. (2022) Li X., et al., 2022, MNRAS, 511, 2218-2228
Lutter, Ritter & Peters (2019) Lutter M., Ritter C., and Peters J., 2019, In International Conference on Learning Representations
Lucie-Smith, Peiris & Pontzen (2019) Lucie-Smith L., Peiris H. V., Pontzen A., 2019, MNRAS, 490, 331
Meyer, Hall & Offin (2009) Meyer K., Hall G., Offin D.: 2009, Introduction to Hamiltonian dynamical systems and the N-body problem, 2nd Edition, Springer, Berlin,p45-63
Malhotra (1993) Malhotra R., 1993, Nature, 365, 819
Malhotra (1995) Malhotra R., 1995, AJ, 110, 420
McCulloch & Pitts (1943) McCulloch W. S., Pitts W. , 1943, The Bulletin of Mathematical Biophysics, 5, 115
Nesvorný & Vokrouhlický (2016) Nesvorný D., Vokrouhlický D., 2016, ApJ, 825, 94
Petulante et al. (2021) Petulante A. et al., 2021, MNRAS, 504, 248
Pike & Lawler (2017) Pike R. E., Lawler S. M., 2017, AJ, 154, 171
Poincaré (1892) Poincaré H., 1892, Les methodes nouvelles de la mecanique celeste, Gauthier Villars, Paris, Vols. 1-3
Quarles et al. (2020) Quarles B., et al., 2020, The Astronomical Journal, 159(3):80,
Rosenblatt (1958) Rosenblatt F., 1958, Psychol. Rev., 65, 386
Schanche et al. (2019) Schanche N. et al., 2019, MNRAS, 483, 5534
Toth et al. (2020) Toth P., et al., 2020, In International Conference on Learning Representations (ICLR)
Winters (1960) Winters P., 1960, Management Science, 6(3):324-342.
Wu, Qin & Xiu (2020) Wu K., Qin T., and Xiu D., SIAM Journal on Scientific Computing, 42(6): A3704-A3729.
Xiong et al. (2021) Xiong S., et al., 2021, In International Conference on Learning Representations (ICLR)
Vavilova et al. (2021) Vavilova I. B. et al., 2021, A&A, 648, A122
Zhang et al. (2019) Zhang K. et al., 2019, ApJ, 883, 63
Zhong, Dey & Chakraborty (2020) Zhong Y., Dey B., and Chakraborty, A.,2020, International Conference on Learning Representations(ICLR)