Accelerating differential evolution algorithm with Gaussian sampling based on estimating the convergence points

1^st Rui Zhong Graduate School of Information Science and Technology
Hokkaido University
Sapporo, Japan
2^nd Masaharu Munetomo Information Initiative Center
Hokkaido University
Sapporo, Japan

Abstract

In this paper, we propose a simple strategy for estimating the convergence point approximately by averaging the elite sub-population. Based on this idea, we derive two methods, which are ordinary averaging strategy, and weighted averaging strategy. We also design a Gaussian sampling operator with the mean of the estimated convergence point with a certain standard deviation. This operator is combined with the traditional differential evolution algorithm (DE) to accelerate the convergence. Numerical experiments show that our proposal can accelerate the DE on most functions of 28 low-dimensional test functions on the CEC2013 Suite, and our proposal can easily be extended to combine with other population-based evolutionary algorithms with a simple modification.

Estimation of convergence point, Gaussian sampling, Acceleration, Averaging strategy

\UseRawInputEncoding

I Introduction

As an important part of artificial intelligence, evolutionary computation has achieved great success in solving continuous[2], large-scale[15], constraint[13], and multi-objective optimization problems[3] in the past few decades. However, according to the No Free Lunch Theorem (NFLT)[14], there is no optimization algorithm that can solve all optimization problems perfectly. NFLT proves that the average performance of any pair of algorithms $A$ and $B$ is identical on all possible problems. Therefore, if an algorithm performs well on a certain class of problems, it must pay for that with performance degradation on the remaining problems, since this is the only way for all algorithms to have the same performance on average across all functions. This hypothesis has a great impact on the scientific community for the notion that there is no universal optimizer. In addition, a single algorithm often cannot balance the exploration and the exploitation well. For several reasons, hybrid algorithms or memetic algorithms appears, the hybrid algorithm was first proposed to solve the Traveling Salesman Problem (TSP) by modifying the genetic algorithm with a local search operator[9]. However, NFLT also limits the hybrid algorithms with identical average performance on all possible problems, thus the features of the problem become the starting point for designing a suitable optimization algorithm.

Since hybrid algorithms were not proposed as specific optimization algorithms, but as a broad class of algorithms inspired by the diffusion of ideas and composed of multiple existing operators, the community started showing increasing attention toward these algorithmic structures as a general guideline for addressing specific problems.

In addition, many studies[1, 4] show that it is a promising strategy to find trusting regions or elites in the fitness landscape, which can be fully applied to guide the direction of evolution well. Murata et al[8]. first proposed that a mathematical method could be used to calculate the global optimum using the information of two subsequent generations. Yu et al[16]. proposed that in the differential evolution algorithm, the differential vector from the parent individual to its offspring (or the vector from an individual with poor fitness to an individual with higher fitness) is defined as the moving vector, and the convergence point is estimated according to the moving vector.

In this paper, we employ a simple strategy to approximately estimate the convergence point, which is roughly estimated by the averaging strategy and weighted averaging strategy of the elite sub-population. Besides, after each generation optimized by DE, we apply a Gaussian sampling operator with the mean of the estimated convergence point. In experiment design, we evaluate our proposal with 28 benchmark functions from the CEC 2013 Suite[7]. Finally, we summarize our work and provide some open topics for future discussions.

The rest of the paper is organized as follows. Section II includes preliminaries and related works. Section III provides a detailed description of our proposal. Section IV provides the numerical experiments and experiments analysis. Finally, Section V concludes the paper and shows the future directions.

Ii Preliminaries and Related works

In this section, we first introduce the preliminaries of this work, including the Random Search (RS), Genetic Algorithm (GA), Differential Evolution Algorithm (DE), Evolution Strategy (ES), and Particle Swarm Optimization (PSO). Finally, we introduce the original estimation of a convergence point.

Ii-a Preliminaries

Ii-A1 Rs

RS was proposed by Rastrigin in 1963, and an early introduction to RS with basic mathematical analysis was given in the paper [10]. The principle of RS is to iteratively move to better positions in the search space, these positions are randomly sampled around the current optimum. RS is an optimization method that does not require the gradients of the fitness landscape, so RS can be employed for discrete or differentiable problems. The Pseudocode of RS is shown in Algorithm 1.

Input:

D i m e n s i o n : D; S e a r c h s p a c e : S; G e n e r a t i o n : T

Output:

B e s t s o l u t i o n : E

Function RS( $D, S, T$ ):

t \leftarrow 0

▶

(Solution initialization)

P_{t} \leftarrow Initial (D, S)

F P \leftarrow Evaluate (P_{t})

E \leftarrow P_{t}

while $t < T$ and not convergence do

O_{t} \leftarrow Sampling (P_{t})

F O \leftarrow Evaluate (O_{t})

if $F O > F P$ then

P_{t} \leftarrow O_{t}

E \leftarrow O_{t}

end if

t \leftarrow t + 1

end while

return E

Algorithm 1 RS

Ii-A2 Ga

GA[5] simulates the biological evolution process of chromosomes with selection, crossover, and mutation. Chromosomes present problem solutions and are evaluated based on the fitness function to select parents. Selection is an important process for choosing parents to produce the offspring and can affect the convergence of GA, Mutation increases the diversity of the population by randomly modifying the genes in the chromosome with a certain probability. The Pseudocode of GA is shown in Algorithm 2

Input:

D i m e n s i o n : D; S e a r c h s p a c e : S; P o p u l a t i o n s i z e : P S; G e n e r a t i o n : T

Output:

B e s t s o l u t i o n : E

Function GA( $D, S, P S, T$ ):

t \leftarrow 0

▶

(Population initialization)

P_{t} \leftarrow Initial (D, S, P S)

F_{t} \leftarrow Evaluate (P_{t})

E \leftarrow bestIndividual (P_{t}, F_{t})

while $t < T$ and not convergence do

▶

(Selection, Crossover and Mutation)

O_{t} \leftarrow Selection (P_{t}, F_{t})

O_{t} \leftarrow Crossover (O_{t})

P_{t + 1} \leftarrow Mutation (O_{t})

F_{t + 1} \leftarrow Evaluate (P_{t + 1})

E \leftarrow bestIndividual (P_{t + 1}, F_{t + 1})

t \leftarrow t + 1

end while

return E

Algorithm 2 GA

Ii-A3 De

DE was first proposed by Storn and Price in 1995[12]. DE has been applied to solve many complex optimization problems, and the procedure is similar to the GA, DE mainly includes mutation, crossover, and selection. The Pseudocode of DE is shown in Algorithm 3.

Input:

D i m e n s i o n : D; S e a r c h s p a c e : S; P o p u l a t i o n s i z e : P S; G e n e r a t i o n : T

Output:

B e s t s o l u t i o n : E

Function DE( $D, S, P S, T$ ):

t \leftarrow 0

▶

(Population initialization)

P_{t} \leftarrow Initial (D, S, P S)

F_{t} \leftarrow Evaluate (P_{t})

E \leftarrow bestIndividual (P_{t}, F_{t})

while $t < T$ and not convergence do

▶

(Selection, Crossover and Mutation)

O_{t} \leftarrow Mutation (P_{t}, F_{t})

O_{t} \leftarrow Crossover (O_{t})

F_{t + 1} \leftarrow Evaluate (O_{t})

P_{t + 1} \leftarrow Selection (O_{t}, F_{t + 1})

E \leftarrow bestIndividual (P_{t + 1}, F_{t + 1})

t \leftarrow t + 1

end while

return E

Algorithm 3 DE

Mutation is an essential process in DE. We briefly introduce 2 common mutation strategies:

	$DE/rand/1 : V_{i} (g) = X_{p 1} (g) + F \cdot (X_{p 2} (g) - X_{p 3} (g))$
	$DE/best/1 : V_{i} (g) = X_{b e s t} (g) + F \cdot (X_{p 1} (g) - X_{p 2} (g))$

$X_{p i} (g)$ is the different individuals randomly selected from the current population, $X_{b e s t} (g)$ is the best individual in the current population. $F$ is the scaling factor. The differential vectors between individuals are calculated in the mutation is the origin of the name of DE.

Ii-A4 Es

ES was proposed in 1963[11], as an optimization algorithm, ES imitates the mechanism of biological evolution. Under the hypothesis that no matter what changes occur in genes, the results or traits always follow a Gaussian distribution with zero mean and a certain variance. Although ES employs mutation and crossover to generate offspring which is similar to GA, ES emphasizes the phenotype rather than the genotype, and ES also applies the real coding instead of binary coding in GA. The Pseudocode of ES is shown in Algorithm 4.

Input:

D i m e n s i o n : D; S e a r c h s p a c e : S; P o p u l a t i o n s i z e : P S; G e n e r a t i o n : T

Output:

B e s t s o l u t i o n : E

Function ES( $D, S, P S, T$ ):

t \leftarrow 0

▶

(Population initialization)

P_{t} \leftarrow Initial (D, S, P S)

F P_{t} \leftarrow Evaluate (P_{t})

E \leftarrow bestIndividual (P_{t}, F_{t})

while $t < T$ and not convergence do

▶

(Selection, Crossover and Mutation)

O_{t} \leftarrow Selection (P_{t})

O_{t} \leftarrow Crossover (O_{t})

O_{t} \leftarrow Mutation (O_{t})

F O_{t} \leftarrow Evaluate (O_{t})

P_{t + 1}, F P_{t + 1} \leftarrow Offspring (O_{t}, F O_{t}, P_{t}, F P_{t})

E \leftarrow bestIndividual (P_{t + 1}, F P_{t + 1})

t \leftarrow t + 1

end while

return E

Algorithm 4 ES

Ii-A5 Pso

PSO simulates the behavior of bird flocking, fish schooling, and swarming theory to realize the optimization process[6]. The original version of PSO as developed by the authors comprises a very simple concept, and paradigms can be implemented in a few lines of computer code. It requires only primitive mathematical operators and is computationally inexpensive in terms of both memory requirements and speed. The Pseudocode of PSO is shown in Algorithm 5.

Input:

D i m e n s i o n : D; S e a r c h s p a c e : S; P o p u l a t i o n s i z e : P S; G e n e r a t i o n : T

Output:

B e s t s o l u t i o n : g b e s t

Function PSO( $D, S, P S, T$ ):

t \leftarrow 0

▶

(Initialize the position and velocity)

X_{t}, V_{t} \leftarrow Initial (D, S, P S)

for $i = 0 t o P S$ do

p b e s t_{i} \leftarrow X_{t, i}

end for

F_{t} \leftarrow Evaluate (X_{t})

g b e s t \leftarrow bestIndividual (X_{t}, F_{t})

while $t < T$ and not convergence do

▶

(Optimization) for $i = 0 t o P S$ do

X_{t + 1, i} \leftarrow Update (X_{t, i}, V_{t, i})

V_{t + 1, i} \leftarrow Update (V_{t, i}, p b e s t_{i}, g b e s t)

end for

F_{t + 1} \leftarrow Evaluate (X_{t + 1})

for $i = 0 t o P S$ do

if $F_{t + 1, i} > F_{p b e s t, i}$ then

p b e s t_{i} \leftarrow X_{t + 1, i}

end if

if $F_{p b e s t, i} > F_{g b e s t, i}$ then

g b e s t_{i} \leftarrow p b e s t_{i}

end if

end for

t \leftarrow t + 1

end while

return g b e s t

Algorithm 5 PSO

We update the position and velocity by Eq (1) in lines 13 and 14 of the Algorithm 5

	$x_{i d}^{k + 1} = x_{i d}^{k} + v_{i d}^{k + 1}$		(1)
	$v_{i d}^{k + 1} = w v_{i d}^{k} + c_{1} r_{1} (p_{i d, p b e s t}^{k} - x_{i d}^{k}) +$
	$c_{2} r_{2} (p_{d, g b e s t}^{k} - x_{i d}^{k})$

$i$ denotes the $i^{t h}$ individual, $d$ denotes in the $d^{t h}$ dimension, $k$ denotes in the $k^{t h}$ generation. In the velocity update equation, $w$ is a inertia weight, $c_{1}$ and $c_{2}$ are learning factor, $r_{1}$ and $r_{2}$ are random number.

Ii-B Estimation of a convergence point

The concept of estimation of a convergence point was first proposed by Murata. The paper[8] hypothesize that: in a population-based optimization algorithm, all individuals are moving towards the global optimal solution. According to this hypothesis, the nearest point to the extensions of all their movement directions should locate near the global optimal point. Fig.1 shows how the estimation of a convergence point works. But in practice, the estimated point is not exactly on the global optimum due to incorrect directions or inaccurate directions of population movements. However, it is highly expected that the estimation point is close to the global optimum at least for unimodal functions.

Fig. 1: Moving vector $d_{i} (= o_{i} - p_{i})$ is calculated from a parent (worse) individual $p_{i}$ and its offspring (better) $o_{i}$ . The $⋆$ is the estimated convergence point.

Let us derive how to estimate the convergence point mathematically. First, we define parent (worse) individual $p_{i}$ , offspring (better) $o_{i}$ , and moving vector $d_{i}$ as describe in Fig 1. The unit direction vector of $d_{i}$ is given as $d_{0 i} = \frac{d_{i}}{| | d_{i} | |}$ ,i.e., $d_{0 i}^{T} d_{0 i} = 1$ . $X$ denotes the estimated convergence point, and $p_{i} + t_{i} d_{i}$ represents the expansion from parent individual $p_{i}$ with the direction $d_{i}$ . $L (X, t_{i})$ in Eq (2) becomes the minimum.

min (L (X, t_{i})) = min (n \sum i = 1 | | p_{i} + t_{i} d_{i} - X | |^{2})

(2)

As the minimum line segment from the convergence point $X$ to the expansion line segments is the orthogonal projection from $X$ , we can apply the Eq (3) into Eq (2) to remote $t_{i}$ .

d_{i}^{T} (p_{i} + t_{i} d_{i} - X) = 0 (o r t h o g o n a l c o n d i t i o n)

(3)

Finally, the convergence point $X$ can be calculated by Eq (4). See detail expansion of equations in paper [8].

ˆ X = {n \sum i = 1 (I_{d} - d_{0 i} d_{0 i}^{T})}_{d}^{- 1} {n \sum i = 1 (I_{d} - d_{0 i} d_{0 i}^{T}) p_{i}}

(4)

Iii Proposal

In this Section, we will introduce our proposal in detail. First, we propose two approximate methods for estimating the convergence points with Gaussian sampling, which are shown in Fig 2

Fig. 2: (a).elite sub-population averaging strategy for approximating the convergence point. (b).elite sub-population weighted averaging strategy for approximating the convergence point. The orange circle is the Gaussian sampling space with the mean of estimated convergence point.

First, we proposed a hypothesis that there is a higher probability for a optimum exists around an elite sub-population (individuals with higher fitness). Based on this hypothesis, we select the elite sub-population from the individual population of each generation from DE, and apply the averaging strategy and the weighted averaging strategy to approximately estimate the convergence point. Algorithm 6 and Algorithm 7 show the procedure of averaging strategy and weighted averaging strategy.

Input:

P o p u l a t i o n : P; E l i t e r a t e : r; D i m e n s i o n : D

Output:

E s t i m a t e d c o n v e r g e n c e p o i n t : C

Function AS( $P, r$ ):

E P \leftarrow bestIndividuals (P, r)

s \leftarrow size (E P)

for $i = 0 t o D$ do

C_{i} \leftarrow 0

for $j = 0 t o s$ do

C_{i} \leftarrow C_{i} + E P_{i, j}

end for

C_{i} \leftarrow C_{i} / s

end for

return C

Algorithm 6 Averaging strategy

Input:

P o p u l a t i o n : P; E l i t e r a t e : r; D i m e n s i o n : D

Output:

E s t i m a t e d c o n v e r g e n c e p o i n t : C

Function WAS( $P, r$ ):

E P \leftarrow bestIndividuals (P, r)

E F \leftarrow Evaluate (E P)

s \leftarrow size (E P)

F \leftarrow 0

for $i = 0 t o s$ do

F \leftarrow F + E F_{i}

end for

▶

(Calculate the weights) for $i = 0 t o s$ do

w_{i} \leftarrow E F_{i} / F

end for

for $i = 0 t o D$ do

C_{i} \leftarrow 0

for $j = 0 t o s$ do

C_{i} \leftarrow C_{i} + w_{j} * E P_{i, j}

end for

return C

Algorithm 7 Weighted averaging strategy

In Algorithm 7, we calculate the weights based on normalized fitness of individuals. After the convergence point is estimated, we apply a Gaussian sampling operator with the mean of the convergence point and a certain standard deviation $σ$ to find $k$ individuals. Then we select the best $k$ individuals among 1 estimated convergence point and $k$ individuals found by Gaussian sampling operator and worst $k$ individuals generated by DE as a part of offspring to participate the optimization.

Iv Numerical Experiment

Here, we define shortened names of our proposal:

(a) P1: proposal with averaging strategy and Gaussian sampling operator.

(b) P2: proposal with weighted averaging strategy and Gaussian sampling operator.

We apply 28 benchmark functions from the CEC2013 Suite[7] with independent 30 trial runs in this evaluation experiment. Table I shows the details of these functions.

Func.	Types	Characteristics	Optimum
$f_{1}$	Uni	Sphere function	-1400
$f_{2}$		Rotated high conditioned elliptic function	-1300
$f_{3}$		Rotated Bent Cigar function	-1200
$f_{4}$		Rotated discus function	-1100
$f_{5}$		Different powers function	-1000
$f_{6}$	Multi	Rotated Rosenbrock fs function	-900
$f_{7}$		Rotated Schaffers function	-800
$f_{8}$		Rotated Ackley fs function	-700
$f_{9}$		Rotated Weierstrass function	-600
$f_{10}$		Rotated Griewank fs function	-500
$f_{11}$		Rastrigin fs function	-400
$f_{12}$		Rotated Rastrigin fs function	-300
$f_{13}$		Non-continuous rotated Rastrigin fs function	-200
$f_{14}$		Schwefel fs function	-100
$f_{15}$		Rotated Schwefel fs function	100
$f_{16}$		Rotated Katsuura function	200
$f_{17}$		Lunacek bi-Rastrigin function	300
$f_{18}$		Rotated Lunacek bi-Rastrigin function	400
$f_{19}$		Expanded Griewank fs plus Rosenbrock fs function	500
$f_{20}$		Expanded Schaffer fs $f_{6}$ function	600
$f_{21}$	Comp	Composition function 1 ( $n$ = 5, rotated)	700
$f_{22}$		Composition function 2 ( $n$ = 3, unrotated)	800
$f_{23}$		Composition function 3 ( $n$ = 3, rotated)	900
$f_{24}$		Composition function 4 ( $n$ = 3, rotated)	1000
$f_{25}$		Composition function 5 ( $n$ = 3, rotated)	1100
$f_{26}$		Composition function 6 ( $n$ = 5, rotated)	1200
$f_{27}$		Composition function 7 ( $n$ = 5, rotated)	1300
$f_{28}$		Composition function 8 ( $n$ = 5, rotated)	1400

TABLE I: CEC2013 Suite: Uni=unimodal, Multi=multimodal, Comp=Composition

And we compare our proposal with RS, GA, DE, ES and PSO. The parameter of these algorithm is shown in Table II

Alg.	Parameter	Value
Common parameters	dimension	2-D, 10-D, 30-D
	population size	50 * dimension
	max evaluation times	1000 * dimension
	search space	[-100, 100] * dimension
RS	sampling strategy	random sampling
GA	strategy	Elitist GA
	crossover rate	0.5
	mutation rate	0.1
DE	strategy	DE/current-to-best/1
	scale factor	0.7
	crossover rate	0.9
ES	strategy	(1+1)-ES
ES	mutation strength	0.2
PSO	$w$	0.9
PSO	$c_{1}, c_{2}$	2
Proposal	strategy	DE/current-to-best/1
	scale factor	0.7
	crossover rate	0.9
	elite proportion	0.05
	$σ$ in Gaussian sampling	5

TABLE II: The parameter of comparison methods

At the end of optimization, we apply the Kruskal-Wallis test and the Holm multiple comparison test on the fitness values. Table III shows their results of the statistical tests. In Holm multiple comparison test, if our proposal is significant better than the second-best algorithm or original DE, we apply $≫$ (p $<$ 0.01) and $>$ (p $<$ 0.05) to denote the significance, $\approx$ represents there is no significance between two comparing methods.

Func.	2-D					10-D					30-D
Func.	P1	P2	P1 with DE	P2 with DE	P1 with P2	P1	P2	P1 with DE	P2 with DE	P1 with P2	P1	P2	P1 with DE	P2 with DE	P1 with P2
$f_{1}$	P1 $≫$ PSO	P2 $≫$ PSO	P1 $≫$ DE	P2 $≫$ DE	P1 $>$ P2	P1 $\approx$ PSO	P2 $\approx$ PSO	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2	PSO $\approx$ P1	PSO $\approx$ P2	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2
$f_{2}$	P1 $≫$ DE	P2 $≫$ DE	P1 $≫$ DE	P2 $≫$ DE	P2 $≫$ P1	PSO $≫$ P1	PSO $≫$ P2	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2	PSO $\approx$ P1	PSO $\approx$ P2	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2
$f_{3}$	P1 $≫$ DE	P2 $≫$ DE	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2	PSO $≫$ P1	PSO $≫$ P2	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2	PSO $\approx$ P1	PSO $\approx$ P2	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2
$f_{4}$	P1 $≫$ DE	P2 $≫$ DE	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2	PSO $≫$ P1	PSO $≫$ P2	P1 $>$ DE	P2 $>$ DE	P1 $\approx$ P2	PSO $\approx$ P1	PSO $\approx$ P2	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2
$f_{5}$	P1 $≫$ PSO	P2 $≫$ PSO	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2	PSO $≫$ P1	PSO $≫$ P2	P1 $>$ DE	P2 $>$ DE	P1 $\approx$ P2	PSO $\approx$ P1	PSO $\approx$ P2	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2
$f_{6}$	P1 $≫$ PSO	P2 $≫$ PSO	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2	PSO $≫$ P1	PSO $≫$ P2	P1 $>$ DE	P2 $>$ DE	P1 $\approx$ P2	PSO $\approx$ P1	PSO $\approx$ P2	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2
$f_{7}$	P1 $\approx$ PSO	P2 $\approx$ PSO	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2	PSO $≫$ P1	PSO $≫$ P2	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2	PSO $\approx$ P1	PSO $\approx$ P2	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2
$f_{8}$	P1 $\approx$ PSO	P2 $\approx$ PSO	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2	P1 $\approx$ PSO	P2 $\approx$ PSO	P1 $\approx$ DE	P2 $\approx$ DE	P1 $\approx$ P2	P1 $\approx$ DE	P2 $\approx$ DE	P1 $\approx$ DE	P2 $\approx$ DE	P1 $\approx$ P2
$f_{9}$	P1 $\approx$ PSO	P2 $\approx$ PSO	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2	PSO $≫$ P1	PSO $≫$ P2	P1 $>$ DE	P2 $≫$ DE	P1 $\approx$ P2	PSO $\approx$ P1	PSO $\approx$ P2	P1 $\approx$ DE	P2 $\approx$ DE	P1 $\approx$ P2
$f_{10}$	P1 $≫$ DE	P2 $≫$ DE	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2	PSO $≫$ P1	PSO $≫$ P2	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2	PSO $\approx$ P1	PSO $\approx$ P2	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2
$f_{11}$	P1 $\approx$ PSO	P2 $>$ PSO	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2	P1 $\approx$ GA	P2 $\approx$ GA	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2	PSO $≫$ P1	PSO $≫$ P2	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2
$f_{12}$	P1 $\approx$ PSO	P2 $\approx$ PSO	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2	P1 $≫$ PSO	P2 $≫$ PSO	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2	PSO $≫$ P1	PSO $≫$ P2	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2
$f_{13}$	P1 $\approx$ PSO	P2 $\approx$ PSO	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2	P1 $≫$ PSO	P2 $≫$ PSO	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2	PSO $≫$ P1	PSO $≫$ P2	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2
$f_{14}$	P1 $\approx$ GA	P2 $\approx$ GA	P1 $\approx$ DE	P2 $≫$ DE	P1 $\approx$ P2	P1 $≫$ PSO	P2 $≫$ PSO	P1 $\approx$ DE	P2 $\approx$ DE	P1 $\approx$ P2	GA $≫$ P1	GA $≫$ P2	P1 $\approx$ DE	P2 $>$ DE	P1 $\approx$ P2
$f_{15}$	P1 $\approx$ PSO	P2 $\approx$ PSO	P1 $≫$ DE	P2 $>$ DE	P1 $\approx$ P2	P1 $≫$ PSO	P2 $≫$ PSO	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2	P1 $≫$ PSO	P2 $≫$ PSO	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2
$f_{16}$	P1 $\approx$ DE	P2 $\approx$ DE	P1 $\approx$ DE	P2 $\approx$ DE	P1 $\approx$ P2	P1 $\approx$ GA	P2 $\approx$ GA	P1 $\approx$ DE	P2 $\approx$ DE	P1 $\approx$ P2	P1 $\approx$ ES	P2 $\approx$ ES	P1 $\approx$ DE	P2 $\approx$ DE	P1 $\approx$ P2
$f_{17}$	P1 $\approx$ DE	P2 $\approx$ DE	P1 $\approx$ DE	P2 $\approx$ DE	P1 $\approx$ P2	P1 $≫$ PSO	P2 $≫$ PSO	P1 $\approx$ DE	P2 $\approx$ DE	P1 $\approx$ P2	P1 $≫$ PSO	P2 $≫$ PSO	P1 $\approx$ DE	P2 $\approx$ DE	P1 $\approx$ P2
$f_{18}$	P1 $\approx$ PSO	P2 $\approx$ PSO	P1 $\approx$ DE	P2 $\approx$ DE	P1 $\approx$ P2	P1 $≫$ PSO	P2 $≫$ PSO	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2	PSO $≫$ P1	PSO $>$ P2	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2
$f_{19}$	P1 $\approx$ PSO	P2 $\approx$ PSO	P1 $\approx$ DE	P2 $\approx$ DE	P1 $\approx$ P2	P1 $≫$ PSO	P2 $>$ PSO	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2	PSO $≫$ P1	PSO $≫$ P2	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2
$f_{20}$	P1 $\approx$ PSO	P2 $\approx$ PSO	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2	PSO $≫$ P1	PSO $≫$ P2	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2	PSO $≫$ P1	PSO $≫$ P2	P1 $>$ DE	P2 $>$ DE	P1 $\approx$ P2
$f_{21}$	P1 $≫$ PSO	P2 $≫$ PSO	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2	PSO $≫$ P1	PSO $>$ P2	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2	PSO $≫$ P1	PSO $≫$ P2	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2
$f_{22}$	P1 $\approx$ GA	P2 $\approx$ GA	P1 $≫$ DE	P2 $≫$ DE	P1 $>$ P2	PSO $≫$ P1	PSO $≫$ P2	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2	GA $≫$ P1	GA $≫$ P2	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2
$f_{23}$	P1 $\approx$ PSO	P2 $\approx$ PSO	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2	P1 $\approx$ PSO	P2 $\approx$ PSO	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2	P1 $\approx$ PSO	P2 $\approx$ PSO	P1 $\approx$ DE	P2 $\approx$ DE	P1 $\approx$ P2
$f_{24}$	P1 $\approx$ PSO	P2 $\approx$ PSO	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2	P1 $\approx$ ES	P2 $\approx$ ES	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2	PSO $≫$ P1	PSO $≫$ P2	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2
$f_{25}$	P1 $\approx$ PSO	P2 $\approx$ PSO	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2	P1 $\approx$ PSO	P2 $\approx$ PSO	P1 $>$ DE	P2 $≫$ DE	P1 $\approx$ P2	PSO $≫$ P1	PSO $≫$ P2	P1 $\approx$ DE	P2 $\approx$ DE	P1 $\approx$ P2
$f_{26}$	P1 $\approx$ PSO	P2 $\approx$ PSO	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2	P1 $\approx$ GA	P2 $>$ GA	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2	PSO $≫$ P1	PSO $≫$ P2	P1 $\approx$ DE	P2 $≫$ DE	P1 $\approx$ P2
$f_{27}$	P1 $\approx$ PSO	P2 $\approx$ PSO	P1 $\approx$ DE	P2 $\approx$ DE	P1 $\approx$ P2	PSO $≫$ P1	PSO $≫$ P2	P1 $≫$ DE	P2 $≫$ DE	P1 $\approx$ P2	PSO $≫$ P1	PSO $≫$ P2	P1 $\approx$ DE	P2 $≫$ DE	P1 $\approx$ P2
$f_{28}$	P1 $\approx$ PSO	P2 $\approx$ PSO	P1 $\approx$ DE	P2 $\approx$ DE	P1 $\approx$ P2	P1 $\approx$ PSO	P2 $\approx$ PSO	P1 $>$ DE	P2 $≫$ DE	P1 $\approx$ P2	PSO $≫$ P1	PSO $≫$ P2	P1 $\approx$ DE	P2 $≫$ DE	P1 $\approx$ P2

TABLE III: CEC2013 Suite: Uni=unimodal, Multi=multimodal, Comp=Composition

From the experiment results, we can see that our proposal can significantly accelerate the DE in most test functions on the CEC2013 Suite, and P1 has no siginificant difference with P2 in most functions. In the 2-D space, our proposal is competitive with the comparing 5 methods, while in the 10-D and 50-D space, although PSO outperforms our proposal combined with DE on most test functions, this is mainly due to the limited performance of DE on these functions, our proposal can still accelerate the DE in optimization. Different from the method proposed by Yu et al[16], it is unnecessary for our proposal to calculate the moving vector, so it has stronger scalability for our proposal combined with other evolutionary algorithms.

V Conclusion

In this paper, we propose a simple strategy to estimate the convergence point approximately. Based on the averaging strategy, ordinary averaging strategy and weighted averaging strategy are derived. We also design a Gaussian sampling operator based on the estimated convergence point and combine it with traditional DE. Numerical experiments show our proposal can improve the performance of traditional DE. Although our proposal combined with DE is worse than PSO in some functions with 10-D and 50-D, the main reason is the performance of DE is limited in these functions, and our proposal has strong scalability that can be extended to all population-based heuristic algorithms in theory.

In future research, a well-performed local search operator replacing the Gaussian sampling is a promising topic, such as the introduction of CMA-ES or the adaptive evolution strategy. And in this paper, we only apply our proposal in low dimensional test functions. In future research, we will combine our proposal with cooperative co-evolution to solve large-scale optimization problems. Finally, the estimation of a convergence point combined with the Gaussian sampling operator has a broad prospect to solve optimization problems and can be easily extended with other population-based evolutionary algorithms.

Vi Acknowledgement

This work was supported by JSPS KAKENHI Grant Number JP20K11967.

References

[1] A. Abdolmaleki, B. Price, N. Lau, L. P. Reis, and G. Neumann (2017) Deriving and improving cma-es with information geometric trust regions. In Proceedings of the Genetic and Evolutionary Computation Conference, pp. 657–664. Cited by: §I.
[2] M. A. Ahandani, N. P. Shirjoposh, and R. Banimahd (2010) Three modified versions of differential evolution algorithm for continuous optimization. Soft Computing 15 (4), pp. 803–830. Cited by: §I.
[3] K. Deb (2011) Multi-objective optimisation using evolutionary algorithms: an introduction. In Multi-objective evolutionary optimisation for product design and manufacturing, pp. 3–34. Cited by: §I.
[4] F. Fröhlich and P. K. Sorger (2022) Fides: reliable trust-region optimization for parameter estimation of ordinary differential equation models. PLOS Computational Biology 18 (7), pp. e1010322. Cited by: §I.
[5] J. H. Holland (1992) Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT press. Cited by: §II-A2.
[6] J. Kennedy and R. Eberhart (1995) Particle swarm optimization. In Proceedings of ICNN’95 - International Conference on Neural Networks, Vol. 4, pp. 1942–1948 vol.4. External Links: Document Cited by: §II-A5.
[7] J. Liang, B. Qu, P. Suganthan, and A. Hernández-Díaz (2013-01) Problem definitions and evaluation criteria for the cec 2013 special session on real-parameter optimization. Technical Report 201212, Computational Intelligence Laboratory, Zhengzhou University, Zhengzhou China, pp. . Cited by: §I, §IV.
[8] N. Murata, R. Nishii, H. Takagi, and Y. Pei (2015-09-10) Analytical estimation of the convergence point of populations. In 2015 IEEE Congress on Evolutionary Computation, CEC 2015 - Proceedings, pp. 2619–2624. Cited by: §I, §II-B, §II-B.
[9] M. Norman, P. Moscato, and L. Plata (1991) A competitive-cooperative approach to complex combinatorial search. Cited by: §I.
[10] L. A. Rastrigin (1963) Convergence of random search method in extremal control of multi-parameter systems. Avtomat.i Telemekh, pp. 1467–1473. Cited by: §II-A1.
[11] L. B. Slobodkin (1964) The strategy of evolution. American scientist 52 (3), pp. 342–357. Cited by: §II-A4.
[12] R. Storn (1996) On the usage of differential evolution for function optimization. In Proceedings of north american fuzzy information processing, pp. 519–523. Cited by: §II-A3.
[13] F. Wang, W. Wang, T. Feng, and H. Li (2015) Parallel test sheets generation using differential evolution algorithm with constraint effective encoding and either-or mutation. In 2015 2nd International Conference on Information Science and Control Engineering, pp. 376–379. Cited by: §I.
[14] D.H. Wolpert and W.G. Macready (1997) No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation 1 (1), pp. 67–82. External Links: Document Cited by: §I.
[15] Z. Yang, K. Tang, and X. Yao (2007) Differential evolution for high-dimensional function optimization. pp. 3523–3530. Cited by: §I.
[16] J. Yu, Y. Li, Y. Pei, and H. Takagi (2019-05) Accelerating evolutionary computation using a convergence point estimated by weighted moving vectors. Complex and Intelligent Systems 6, pp. . External Links: Document Cited by: §I, §IV.