Accelerating differential evolution algorithm with Gaussian sampling based on estimating the convergence points

1st Rui Zhong Graduate School of Information Science and Technology
Hokkaido University
Sapporo, Japan
   2nd Masaharu Munetomo Information Initiative Center
Hokkaido University
Sapporo, Japan
Abstract

In this paper, we propose a simple strategy for estimating the convergence point approximately by averaging the elite sub-population. Based on this idea, we derive two methods, which are ordinary averaging strategy, and weighted averaging strategy. We also design a Gaussian sampling operator with the mean of the estimated convergence point with a certain standard deviation. This operator is combined with the traditional differential evolution algorithm (DE) to accelerate the convergence. Numerical experiments show that our proposal can accelerate the DE on most functions of 28 low-dimensional test functions on the CEC2013 Suite, and our proposal can easily be extended to combine with other population-based evolutionary algorithms with a simple modification.

Estimation of convergence point, Gaussian sampling, Acceleration, Averaging strategy
\UseRawInputEncoding

I Introduction

As an important part of artificial intelligence, evolutionary computation has achieved great success in solving continuous[2], large-scale[15], constraint[13], and multi-objective optimization problems[3] in the past few decades. However, according to the No Free Lunch Theorem (NFLT)[14], there is no optimization algorithm that can solve all optimization problems perfectly. NFLT proves that the average performance of any pair of algorithms and is identical on all possible problems. Therefore, if an algorithm performs well on a certain class of problems, it must pay for that with performance degradation on the remaining problems, since this is the only way for all algorithms to have the same performance on average across all functions. This hypothesis has a great impact on the scientific community for the notion that there is no universal optimizer. In addition, a single algorithm often cannot balance the exploration and the exploitation well. For several reasons, hybrid algorithms or memetic algorithms appears, the hybrid algorithm was first proposed to solve the Traveling Salesman Problem (TSP) by modifying the genetic algorithm with a local search operator[9]. However, NFLT also limits the hybrid algorithms with identical average performance on all possible problems, thus the features of the problem become the starting point for designing a suitable optimization algorithm.

Since hybrid algorithms were not proposed as specific optimization algorithms, but as a broad class of algorithms inspired by the diffusion of ideas and composed of multiple existing operators, the community started showing increasing attention toward these algorithmic structures as a general guideline for addressing specific problems.

In addition, many studies[1, 4] show that it is a promising strategy to find trusting regions or elites in the fitness landscape, which can be fully applied to guide the direction of evolution well. Murata et al[8]. first proposed that a mathematical method could be used to calculate the global optimum using the information of two subsequent generations. Yu et al[16]. proposed that in the differential evolution algorithm, the differential vector from the parent individual to its offspring (or the vector from an individual with poor fitness to an individual with higher fitness) is defined as the moving vector, and the convergence point is estimated according to the moving vector.

In this paper, we employ a simple strategy to approximately estimate the convergence point, which is roughly estimated by the averaging strategy and weighted averaging strategy of the elite sub-population. Besides, after each generation optimized by DE, we apply a Gaussian sampling operator with the mean of the estimated convergence point. In experiment design, we evaluate our proposal with 28 benchmark functions from the CEC 2013 Suite[7]. Finally, we summarize our work and provide some open topics for future discussions.

The rest of the paper is organized as follows. Section II includes preliminaries and related works. Section III provides a detailed description of our proposal. Section IV provides the numerical experiments and experiments analysis. Finally, Section V concludes the paper and shows the future directions.

Ii Preliminaries and Related works

In this section, we first introduce the preliminaries of this work, including the Random Search (RS), Genetic Algorithm (GA), Differential Evolution Algorithm (DE), Evolution Strategy (ES), and Particle Swarm Optimization (PSO). Finally, we introduce the original estimation of a convergence point.

Ii-a Preliminaries

Ii-A1 Rs

RS was proposed by Rastrigin in 1963, and an early introduction to RS with basic mathematical analysis was given in the paper [10]. The principle of RS is to iteratively move to better positions in the search space, these positions are randomly sampled around the current optimum. RS is an optimization method that does not require the gradients of the fitness landscape, so RS can be employed for discrete or differentiable problems. The Pseudocode of RS is shown in Algorithm 1.

Input:
Output:
Function RS():
       (Solution initialization) while  and not convergence do
             if  then
                  
             end if
            
       end while
      
Algorithm 1 RS

Ii-A2 Ga

GA[5] simulates the biological evolution process of chromosomes with selection, crossover, and mutation. Chromosomes present problem solutions and are evaluated based on the fitness function to select parents. Selection is an important process for choosing parents to produce the offspring and can affect the convergence of GA, Mutation increases the diversity of the population by randomly modifying the genes in the chromosome with a certain probability. The Pseudocode of GA is shown in Algorithm 2

Input:
Output:
Function GA():
       (Population initialization) while  and not convergence do
             (Selection, Crossover and Mutation)
       end while
      
Algorithm 2 GA

Ii-A3 De

DE was first proposed by Storn and Price in 1995[12]. DE has been applied to solve many complex optimization problems, and the procedure is similar to the GA, DE mainly includes mutation, crossover, and selection. The Pseudocode of DE is shown in Algorithm 3.

Input:
Output:
Function DE():
       (Population initialization) while  and not convergence do
             (Selection, Crossover and Mutation)
       end while
      
Algorithm 3 DE

Mutation is an essential process in DE. We briefly introduce 2 common mutation strategies:

is the different individuals randomly selected from the current population, is the best individual in the current population. is the scaling factor. The differential vectors between individuals are calculated in the mutation is the origin of the name of DE.

Ii-A4 Es

ES was proposed in 1963[11], as an optimization algorithm, ES imitates the mechanism of biological evolution. Under the hypothesis that no matter what changes occur in genes, the results or traits always follow a Gaussian distribution with zero mean and a certain variance. Although ES employs mutation and crossover to generate offspring which is similar to GA, ES emphasizes the phenotype rather than the genotype, and ES also applies the real coding instead of binary coding in GA. The Pseudocode of ES is shown in Algorithm 4.

Input:
Output:
Function ES():
       (Population initialization) while  and not convergence do
             (Selection, Crossover and Mutation)
       end while
      
Algorithm 4 ES

Ii-A5 Pso

PSO simulates the behavior of bird flocking, fish schooling, and swarming theory to realize the optimization process[6]. The original version of PSO as developed by the authors comprises a very simple concept, and paradigms can be implemented in a few lines of computer code. It requires only primitive mathematical operators and is computationally inexpensive in terms of both memory requirements and speed. The Pseudocode of PSO is shown in Algorithm 5.

Input:
Output:
Function PSO():
       (Initialize the position and velocity) for  do
            
       end for
       while  and not convergence do
             (Optimization) for  do
                  
             end for
             for  do
                   if  then
                        
                   end if
                  if  then
                        
                   end if
                  
             end for
            
       end while
      
Algorithm 5 PSO

We update the position and velocity by Eq (1) in lines 13 and 14 of the Algorithm 5

(1)

denotes the individual, denotes in the dimension, denotes in the generation. In the velocity update equation, is a inertia weight, and are learning factor, and are random number.

Ii-B Estimation of a convergence point

The concept of estimation of a convergence point was first proposed by Murata. The paper[8] hypothesize that: in a population-based optimization algorithm, all individuals are moving towards the global optimal solution. According to this hypothesis, the nearest point to the extensions of all their movement directions should locate near the global optimal point. Fig.1 shows how the estimation of a convergence point works. But in practice, the estimated point is not exactly on the global optimum due to incorrect directions or inaccurate directions of population movements. However, it is highly expected that the estimation point is close to the global optimum at least for unimodal functions.

Moving vector
Fig. 1: Moving vector is calculated from a parent (worse) individual and its offspring (better) . The is the estimated convergence point.

Let us derive how to estimate the convergence point mathematically. First, we define parent (worse) individual , offspring (better) , and moving vector as describe in Fig 1. The unit direction vector of is given as ,i.e., . denotes the estimated convergence point, and represents the expansion from parent individual with the direction . in Eq (2) becomes the minimum.

(2)

As the minimum line segment from the convergence point to the expansion line segments is the orthogonal projection from , we can apply the Eq (3) into Eq (2) to remote .

(3)

Finally, the convergence point can be calculated by Eq (4). See detail expansion of equations in paper [8].

(4)

Iii Proposal

In this Section, we will introduce our proposal in detail. First, we propose two approximate methods for estimating the convergence points with Gaussian sampling, which are shown in Fig 2

(a).elite sub-population averaging strategy for approximating the convergence point. (b).elite sub-population weighted averaging strategy for approximating the convergence point. The orange circle is the Gaussian sampling space with the mean of estimated convergence point.
Fig. 2: (a).elite sub-population averaging strategy for approximating the convergence point. (b).elite sub-population weighted averaging strategy for approximating the convergence point. The orange circle is the Gaussian sampling space with the mean of estimated convergence point.

First, we proposed a hypothesis that there is a higher probability for a optimum exists around an elite sub-population (individuals with higher fitness). Based on this hypothesis, we select the elite sub-population from the individual population of each generation from DE, and apply the averaging strategy and the weighted averaging strategy to approximately estimate the convergence point. Algorithm 6 and Algorithm 7 show the procedure of averaging strategy and weighted averaging strategy.

Input:
Output:
Function AS():
       for  do
             for  do
                  
             end for
            
       end for
      
Algorithm 6 Averaging strategy
Input:
Output:
Function WAS():
       for  do
            
       end for
       (Calculate the weights) for  do
            
       end for
      for  do
             for  do
                  
             end for
            
       end for
      
Algorithm 7 Weighted averaging strategy

In Algorithm 7, we calculate the weights based on normalized fitness of individuals. After the convergence point is estimated, we apply a Gaussian sampling operator with the mean of the convergence point and a certain standard deviation to find individuals. Then we select the best individuals among 1 estimated convergence point and individuals found by Gaussian sampling operator and worst individuals generated by DE as a part of offspring to participate the optimization.

Iv Numerical Experiment

Here, we define shortened names of our proposal:

(a) P1: proposal with averaging strategy and Gaussian sampling operator.

(b) P2: proposal with weighted averaging strategy and Gaussian sampling operator.

We apply 28 benchmark functions from the CEC2013 Suite[7] with independent 30 trial runs in this evaluation experiment. Table I shows the details of these functions.

Func. Types Characteristics Optimum
Uni Sphere function -1400
Rotated high conditioned elliptic function -1300
Rotated Bent Cigar function -1200
Rotated discus function -1100
Different powers function -1000
Multi Rotated Rosenbrock fs function -900
Rotated Schaffers function -800
Rotated Ackley fs function -700
Rotated Weierstrass function -600
Rotated Griewank fs function -500
Rastrigin fs function -400
Rotated Rastrigin fs function -300
Non-continuous rotated Rastrigin fs function -200
Schwefel fs function -100
Rotated Schwefel fs function 100
Rotated Katsuura function 200
Lunacek bi-Rastrigin function 300
Rotated Lunacek bi-Rastrigin function 400
Expanded Griewank fs plus Rosenbrock fs function 500
Expanded Schaffer fs function 600
Comp Composition function 1 ( = 5, rotated) 700
Composition function 2 ( = 3, unrotated) 800
Composition function 3 ( = 3, rotated) 900
Composition function 4 ( = 3, rotated) 1000
Composition function 5 ( = 3, rotated) 1100
Composition function 6 ( = 5, rotated) 1200
Composition function 7 ( = 5, rotated) 1300
Composition function 8 ( = 5, rotated) 1400
TABLE I: CEC2013 Suite: Uni=unimodal, Multi=multimodal, Comp=Composition

And we compare our proposal with RS, GA, DE, ES and PSO. The parameter of these algorithm is shown in Table II

Alg. Parameter Value
Common parameters dimension 2-D, 10-D, 30-D
population size 50 * dimension
max evaluation times 1000 * dimension
search space [-100, 100] * dimension
RS sampling strategy random sampling
GA strategy Elitist GA
crossover rate 0.5
mutation rate 0.1
DE strategy DE/current-to-best/1
scale factor 0.7
crossover rate 0.9
ES strategy (1+1)-ES
mutation strength 0.2
PSO 0.9
2
Proposal strategy DE/current-to-best/1
scale factor 0.7
crossover rate 0.9
elite proportion 0.05
in Gaussian sampling 5
TABLE II: The parameter of comparison methods

At the end of optimization, we apply the Kruskal-Wallis test and the Holm multiple comparison test on the fitness values. Table III shows their results of the statistical tests. In Holm multiple comparison test, if our proposal is significant better than the second-best algorithm or original DE, we apply (p0.01) and (p0.05) to denote the significance, represents there is no significance between two comparing methods.

Func. 2-D 10-D 30-D
P1 P2 P1 with DE P2 with DE P1 with P2 P1 P2 P1 with DE P2 with DE P1 with P2 P1 P2 P1 with DE P2 with DE P1 with P2
P1 PSO P2 PSO P1 DE P2 DE P1 P2 P1 PSO P2 PSO P1 DE P2 DE P1 P2 PSO P1 PSO P2 P1 DE P2 DE P1 P2
P1 DE P2 DE P1 DE P2 DE P2 P1 PSO P1 PSO P2 P1 DE P2 DE P1 P2 PSO P1 PSO P2 P1 DE P2 DE P1 P2
P1 DE P2 DE P1 DE P2 DE P1 P2 PSO P1 PSO P2 P1 DE P2 DE P1 P2 PSO P1 PSO P2 P1 DE P2 DE P1 P2
P1 DE P2 DE P1 DE P2 DE P1 P2 PSO P1 PSO P2 P1 DE P2 DE P1 P2 PSO P1 PSO P2 P1 DE P2 DE P1 P2
P1 PSO P2 PSO P1 DE P2 DE P1 P2 PSO P1 PSO P2 P1 DE P2 DE P1 P2 PSO P1 PSO P2 P1 DE P2 DE P1 P2
P1 PSO P2 PSO P1 DE P2 DE P1 P2 PSO P1 PSO P2 P1 DE P2 DE P1 P2 PSO P1 PSO P2 P1 DE P2 DE P1 P2
P1 PSO P2 PSO P1 DE P2 DE P1 P2 PSO P1 PSO P2 P1 DE P2 DE P1 P2 PSO P1 PSO P2 P1 DE P2 DE P1 P2
P1 PSO P2 PSO P1 DE P2 DE P1 P2 P1 PSO P2 PSO P1 DE P2 DE P1 P2 P1 DE P2 DE P1 DE P2 DE P1 P2
P1 PSO P2 PSO P1 DE P2 DE P1 P2 PSO P1 PSO P2 P1 DE P2 DE P1 P2 PSO P1 PSO P2 P1 DE P2 DE P1 P2
P1 DE P2 DE P1 DE P2 DE P1 P2 PSO P1 PSO P2 P1 DE P2 DE P1 P2 PSO P1 PSO P2 P1 DE P2 DE P1 P2
P1 PSO P2 PSO P1 DE P2 DE P1 P2 P1 GA P2 GA P1 DE P2 DE P1 P2 PSO P1 PSO P2 P1 DE P2 DE P1 P2
P1 PSO P2 PSO P1 DE P2 DE P1 P2 P1 PSO P2 PSO P1 DE P2 DE P1 P2 PSO P1 PSO P2 P1 DE P2 DE P1 P2
P1 PSO P2 PSO P1 DE P2 DE P1 P2 P1 PSO P2 PSO P1 DE P2 DE P1 P2 PSO P1 PSO P2 P1 DE P2 DE P1 P2
P1 GA P2 GA P1 DE P2 DE P1 P2 P1 PSO P2 PSO P1 DE P2 DE P1 P2 GA P1 GA P2 P1 DE P2 DE P1 P2
P1 PSO P2 PSO P1 DE P2 DE P1 P2 P1 PSO P2 PSO P1 DE P2 DE P1 P2 P1 PSO P2 PSO P1 DE P2 DE P1 P2
P1 DE P2 DE P1 DE P2 DE P1 P2 P1 GA P2 GA P1 DE P2 DE P1 P2 P1 ES P2 ES P1 DE P2 DE P1 P2
P1 DE P2 DE P1 DE P2 DE P1 P2 P1 PSO P2 PSO P1 DE P2 DE P1 P2 P1 PSO P2 PSO P1 DE P2 DE P1 P2
P1 PSO P2 PSO P1 DE P2 DE P1 P2 P1 PSO P2 PSO P1 DE P2 DE P1 P2 PSO P1 PSO P2 P1 DE P2 DE P1 P2
P1 PSO P2 PSO P1 DE P2 DE P1 P2 P1 PSO P2 PSO P1 DE P2 DE P1 P2 PSO P1 PSO P2 P1 DE P2 DE P1 P2
P1 PSO P2 PSO P1 DE P2 DE P1 P2 PSO P1 PSO P2 P1 DE P2 DE P1 P2 PSO P1 PSO P2 P1 DE P2 DE P1 P2
P1 PSO P2 PSO P1 DE P2 DE P1 P2 PSO P1 PSO P2 P1 DE P2 DE P1 P2 PSO P1 PSO P2 P1 DE P2 DE P1 P2
P1 GA P2 GA P1 DE P2 DE P1 P2 PSO P1 PSO P2 P1 DE P2 DE P1 P2 GA P1 GA P2 P1 DE P2 DE P1 P2
P1 PSO P2 PSO P1 DE P2 DE P1 P2 P1 PSO P2 PSO P1 DE P2 DE P1 P2 P1 PSO P2 PSO P1 DE P2 DE P1 P2
P1 PSO P2 PSO P1 DE P2 DE P1 P2 P1 ES P2 ES P1 DE P2 DE P1 P2 PSO P1 PSO P2 P1 DE P2 DE P1 P2
P1 PSO P2 PSO P1 DE P2 DE P1 P2 P1 PSO P2 PSO P1 DE P2 DE P1 P2 PSO P1 PSO P2 P1 DE P2 DE P1 P2
P1 PSO P2 PSO P1 DE P2 DE P1 P2 P1 GA P2 GA P1 DE P2 DE P1 P2 PSO P1 PSO P2 P1 DE P2 DE P1 P2
P1 PSO P2 PSO P1 DE P2 DE P1 P2 PSO P1 PSO P2 P1 DE P2 DE P1 P2 PSO P1 PSO P2 P1 DE P2 DE P1 P2
P1 PSO P2 PSO P1 DE P2 DE P1 P2 P1 PSO P2 PSO P1 DE P2 DE P1 P2 PSO P1 PSO P2 P1 DE P2 DE P1 P2
TABLE III: CEC2013 Suite: Uni=unimodal, Multi=multimodal, Comp=Composition

From the experiment results, we can see that our proposal can significantly accelerate the DE in most test functions on the CEC2013 Suite, and P1 has no siginificant difference with P2 in most functions. In the 2-D space, our proposal is competitive with the comparing 5 methods, while in the 10-D and 50-D space, although PSO outperforms our proposal combined with DE on most test functions, this is mainly due to the limited performance of DE on these functions, our proposal can still accelerate the DE in optimization. Different from the method proposed by Yu et al[16], it is unnecessary for our proposal to calculate the moving vector, so it has stronger scalability for our proposal combined with other evolutionary algorithms.

V Conclusion

In this paper, we propose a simple strategy to estimate the convergence point approximately. Based on the averaging strategy, ordinary averaging strategy and weighted averaging strategy are derived. We also design a Gaussian sampling operator based on the estimated convergence point and combine it with traditional DE. Numerical experiments show our proposal can improve the performance of traditional DE. Although our proposal combined with DE is worse than PSO in some functions with 10-D and 50-D, the main reason is the performance of DE is limited in these functions, and our proposal has strong scalability that can be extended to all population-based heuristic algorithms in theory.

In future research, a well-performed local search operator replacing the Gaussian sampling is a promising topic, such as the introduction of CMA-ES or the adaptive evolution strategy. And in this paper, we only apply our proposal in low dimensional test functions. In future research, we will combine our proposal with cooperative co-evolution to solve large-scale optimization problems. Finally, the estimation of a convergence point combined with the Gaussian sampling operator has a broad prospect to solve optimization problems and can be easily extended with other population-based evolutionary algorithms.

Vi Acknowledgement

This work was supported by JSPS KAKENHI Grant Number JP20K11967.

References

  • [1] A. Abdolmaleki, B. Price, N. Lau, L. P. Reis, and G. Neumann (2017) Deriving and improving cma-es with information geometric trust regions. In Proceedings of the Genetic and Evolutionary Computation Conference, pp. 657–664. Cited by: §I.
  • [2] M. A. Ahandani, N. P. Shirjoposh, and R. Banimahd (2010) Three modified versions of differential evolution algorithm for continuous optimization. Soft Computing 15 (4), pp. 803–830. Cited by: §I.
  • [3] K. Deb (2011) Multi-objective optimisation using evolutionary algorithms: an introduction. In Multi-objective evolutionary optimisation for product design and manufacturing, pp. 3–34. Cited by: §I.
  • [4] F. Fröhlich and P. K. Sorger (2022) Fides: reliable trust-region optimization for parameter estimation of ordinary differential equation models. PLOS Computational Biology 18 (7), pp. e1010322. Cited by: §I.
  • [5] J. H. Holland (1992) Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT press. Cited by: §II-A2.
  • [6] J. Kennedy and R. Eberhart (1995) Particle swarm optimization. In Proceedings of ICNN’95 - International Conference on Neural Networks, Vol. 4, pp. 1942–1948 vol.4. External Links: Document Cited by: §II-A5.
  • [7] J. Liang, B. Qu, P. Suganthan, and A. Hernández-Díaz (2013-01) Problem definitions and evaluation criteria for the cec 2013 special session on real-parameter optimization. Technical Report 201212, Computational Intelligence Laboratory, Zhengzhou University, Zhengzhou China, pp. . Cited by: §I, §IV.
  • [8] N. Murata, R. Nishii, H. Takagi, and Y. Pei (2015-09-10) Analytical estimation of the convergence point of populations. In 2015 IEEE Congress on Evolutionary Computation, CEC 2015 - Proceedings, pp. 2619–2624. Cited by: §I, §II-B, §II-B.
  • [9] M. Norman, P. Moscato, and L. Plata (1991) A competitive-cooperative approach to complex combinatorial search. Cited by: §I.
  • [10] L. A. Rastrigin (1963) Convergence of random search method in extremal control of multi-parameter systems. Avtomat.i Telemekh, pp. 1467–1473. Cited by: §II-A1.
  • [11] L. B. Slobodkin (1964) The strategy of evolution. American scientist 52 (3), pp. 342–357. Cited by: §II-A4.
  • [12] R. Storn (1996) On the usage of differential evolution for function optimization. In Proceedings of north american fuzzy information processing, pp. 519–523. Cited by: §II-A3.
  • [13] F. Wang, W. Wang, T. Feng, and H. Li (2015) Parallel test sheets generation using differential evolution algorithm with constraint effective encoding and either-or mutation. In 2015 2nd International Conference on Information Science and Control Engineering, pp. 376–379. Cited by: §I.
  • [14] D.H. Wolpert and W.G. Macready (1997) No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation 1 (1), pp. 67–82. External Links: Document Cited by: §I.
  • [15] Z. Yang, K. Tang, and X. Yao (2007) Differential evolution for high-dimensional function optimization. pp. 3523–3530. Cited by: §I.
  • [16] J. Yu, Y. Li, Y. Pei, and H. Takagi (2019-05) Accelerating evolutionary computation using a convergence point estimated by weighted moving vectors. Complex and Intelligent Systems 6, pp. . External Links: Document Cited by: §I, §IV.