Accelerating differential evolution algorithm with Gaussian sampling based on estimating the convergence points
Abstract
In this paper, we propose a simple strategy for estimating the convergence point approximately by averaging the elite sub-population. Based on this idea, we derive two methods, which are ordinary averaging strategy, and weighted averaging strategy. We also design a Gaussian sampling operator with the mean of the estimated convergence point with a certain standard deviation. This operator is combined with the traditional differential evolution algorithm (DE) to accelerate the convergence. Numerical experiments show that our proposal can accelerate the DE on most functions of 28 low-dimensional test functions on the CEC2013 Suite, and our proposal can easily be extended to combine with other population-based evolutionary algorithms with a simple modification.
I Introduction
As an important part of artificial intelligence, evolutionary computation has achieved great success in solving continuous[2], large-scale[15], constraint[13], and multi-objective optimization problems[3] in the past few decades. However, according to the No Free Lunch Theorem (NFLT)[14], there is no optimization algorithm that can solve all optimization problems perfectly. NFLT proves that the average performance of any pair of algorithms and is identical on all possible problems. Therefore, if an algorithm performs well on a certain class of problems, it must pay for that with performance degradation on the remaining problems, since this is the only way for all algorithms to have the same performance on average across all functions. This hypothesis has a great impact on the scientific community for the notion that there is no universal optimizer. In addition, a single algorithm often cannot balance the exploration and the exploitation well. For several reasons, hybrid algorithms or memetic algorithms appears, the hybrid algorithm was first proposed to solve the Traveling Salesman Problem (TSP) by modifying the genetic algorithm with a local search operator[9]. However, NFLT also limits the hybrid algorithms with identical average performance on all possible problems, thus the features of the problem become the starting point for designing a suitable optimization algorithm.
Since hybrid algorithms were not proposed as specific optimization algorithms, but as a broad class of algorithms inspired by the diffusion of ideas and composed of multiple existing operators, the community started showing increasing attention toward these algorithmic structures as a general guideline for addressing specific problems.
In addition, many studies[1, 4] show that it is a promising strategy to find trusting regions or elites in the fitness landscape, which can be fully applied to guide the direction of evolution well. Murata et al[8]. first proposed that a mathematical method could be used to calculate the global optimum using the information of two subsequent generations. Yu et al[16]. proposed that in the differential evolution algorithm, the differential vector from the parent individual to its offspring (or the vector from an individual with poor fitness to an individual with higher fitness) is defined as the moving vector, and the convergence point is estimated according to the moving vector.
In this paper, we employ a simple strategy to approximately estimate the convergence point, which is roughly estimated by the averaging strategy and weighted averaging strategy of the elite sub-population. Besides, after each generation optimized by DE, we apply a Gaussian sampling operator with the mean of the estimated convergence point. In experiment design, we evaluate our proposal with 28 benchmark functions from the CEC 2013 Suite[7]. Finally, we summarize our work and provide some open topics for future discussions.
The rest of the paper is organized as follows. Section II includes preliminaries and related works. Section III provides a detailed description of our proposal. Section IV provides the numerical experiments and experiments analysis. Finally, Section V concludes the paper and shows the future directions.
Ii Preliminaries and Related works
In this section, we first introduce the preliminaries of this work, including the Random Search (RS), Genetic Algorithm (GA), Differential Evolution Algorithm (DE), Evolution Strategy (ES), and Particle Swarm Optimization (PSO). Finally, we introduce the original estimation of a convergence point.
Ii-a Preliminaries
Ii-A1 Rs
RS was proposed by Rastrigin in 1963, and an early introduction to RS with basic mathematical analysis was given in the paper [10]. The principle of RS is to iteratively move to better positions in the search space, these positions are randomly sampled around the current optimum. RS is an optimization method that does not require the gradients of the fitness landscape, so RS can be employed for discrete or differentiable problems. The Pseudocode of RS is shown in Algorithm 1.
Ii-A2 Ga
GA[5] simulates the biological evolution process of chromosomes with selection, crossover, and mutation. Chromosomes present problem solutions and are evaluated based on the fitness function to select parents. Selection is an important process for choosing parents to produce the offspring and can affect the convergence of GA, Mutation increases the diversity of the population by randomly modifying the genes in the chromosome with a certain probability. The Pseudocode of GA is shown in Algorithm 2
Ii-A3 De
DE was first proposed by Storn and Price in 1995[12]. DE has been applied to solve many complex optimization problems, and the procedure is similar to the GA, DE mainly includes mutation, crossover, and selection. The Pseudocode of DE is shown in Algorithm 3.
Mutation is an essential process in DE. We briefly introduce 2 common mutation strategies:
is the different individuals randomly selected from the current population, is the best individual in the current population. is the scaling factor. The differential vectors between individuals are calculated in the mutation is the origin of the name of DE.
Ii-A4 Es
ES was proposed in 1963[11], as an optimization algorithm, ES imitates the mechanism of biological evolution. Under the hypothesis that no matter what changes occur in genes, the results or traits always follow a Gaussian distribution with zero mean and a certain variance. Although ES employs mutation and crossover to generate offspring which is similar to GA, ES emphasizes the phenotype rather than the genotype, and ES also applies the real coding instead of binary coding in GA. The Pseudocode of ES is shown in Algorithm 4.
Ii-A5 Pso
PSO simulates the behavior of bird flocking, fish schooling, and swarming theory to realize the optimization process[6]. The original version of PSO as developed by the authors comprises a very simple concept, and paradigms can be implemented in a few lines of computer code. It requires only primitive mathematical operators and is computationally inexpensive in terms of both memory requirements and speed. The Pseudocode of PSO is shown in Algorithm 5.
Ii-B Estimation of a convergence point
The concept of estimation of a convergence point was first proposed by Murata. The paper[8] hypothesize that: in a population-based optimization algorithm, all individuals are moving towards the global optimal solution. According to this hypothesis, the nearest point to the extensions of all their movement directions should locate near the global optimal point. Fig.1 shows how the estimation of a convergence point works. But in practice, the estimated point is not exactly on the global optimum due to incorrect directions or inaccurate directions of population movements. However, it is highly expected that the estimation point is close to the global optimum at least for unimodal functions.
Let us derive how to estimate the convergence point mathematically. First, we define parent (worse) individual , offspring (better) , and moving vector as describe in Fig 1. The unit direction vector of is given as ,i.e., . denotes the estimated convergence point, and represents the expansion from parent individual with the direction . in Eq (2) becomes the minimum.
(2) |
Iii Proposal
In this Section, we will introduce our proposal in detail. First, we propose two approximate methods for estimating the convergence points with Gaussian sampling, which are shown in Fig 2
First, we proposed a hypothesis that there is a higher probability for a optimum exists around an elite sub-population (individuals with higher fitness). Based on this hypothesis, we select the elite sub-population from the individual population of each generation from DE, and apply the averaging strategy and the weighted averaging strategy to approximately estimate the convergence point. Algorithm 6 and Algorithm 7 show the procedure of averaging strategy and weighted averaging strategy.
In Algorithm 7, we calculate the weights based on normalized fitness of individuals. After the convergence point is estimated, we apply a Gaussian sampling operator with the mean of the convergence point and a certain standard deviation to find individuals. Then we select the best individuals among 1 estimated convergence point and individuals found by Gaussian sampling operator and worst individuals generated by DE as a part of offspring to participate the optimization.
Iv Numerical Experiment
Here, we define shortened names of our proposal:
(a) P1: proposal with averaging strategy and Gaussian sampling operator.
(b) P2: proposal with weighted averaging strategy and Gaussian sampling operator.
We apply 28 benchmark functions from the CEC2013 Suite[7] with independent 30 trial runs in this evaluation experiment. Table I shows the details of these functions.
Func. | Types | Characteristics | Optimum |
---|---|---|---|
Uni | Sphere function | -1400 | |
Rotated high conditioned elliptic function | -1300 | ||
Rotated Bent Cigar function | -1200 | ||
Rotated discus function | -1100 | ||
Different powers function | -1000 | ||
Multi | Rotated Rosenbrock fs function | -900 | |
Rotated Schaffers function | -800 | ||
Rotated Ackley fs function | -700 | ||
Rotated Weierstrass function | -600 | ||
Rotated Griewank fs function | -500 | ||
Rastrigin fs function | -400 | ||
Rotated Rastrigin fs function | -300 | ||
Non-continuous rotated Rastrigin fs function | -200 | ||
Schwefel fs function | -100 | ||
Rotated Schwefel fs function | 100 | ||
Rotated Katsuura function | 200 | ||
Lunacek bi-Rastrigin function | 300 | ||
Rotated Lunacek bi-Rastrigin function | 400 | ||
Expanded Griewank fs plus Rosenbrock fs function | 500 | ||
Expanded Schaffer fs function | 600 | ||
Comp | Composition function 1 ( = 5, rotated) | 700 | |
Composition function 2 ( = 3, unrotated) | 800 | ||
Composition function 3 ( = 3, rotated) | 900 | ||
Composition function 4 ( = 3, rotated) | 1000 | ||
Composition function 5 ( = 3, rotated) | 1100 | ||
Composition function 6 ( = 5, rotated) | 1200 | ||
Composition function 7 ( = 5, rotated) | 1300 | ||
Composition function 8 ( = 5, rotated) | 1400 |
And we compare our proposal with RS, GA, DE, ES and PSO. The parameter of these algorithm is shown in Table II
Alg. | Parameter | Value |
Common parameters | dimension | 2-D, 10-D, 30-D |
population size | 50 * dimension | |
max evaluation times | 1000 * dimension | |
search space | [-100, 100] * dimension | |
RS | sampling strategy | random sampling |
GA | strategy | Elitist GA |
crossover rate | 0.5 | |
mutation rate | 0.1 | |
DE | strategy | DE/current-to-best/1 |
scale factor | 0.7 | |
crossover rate | 0.9 | |
ES | strategy | (1+1)-ES |
mutation strength | 0.2 | |
PSO | 0.9 | |
2 | ||
Proposal | strategy | DE/current-to-best/1 |
scale factor | 0.7 | |
crossover rate | 0.9 | |
elite proportion | 0.05 | |
in Gaussian sampling | 5 |
At the end of optimization, we apply the Kruskal-Wallis test and the Holm multiple comparison test on the fitness values. Table III shows their results of the statistical tests. In Holm multiple comparison test, if our proposal is significant better than the second-best algorithm or original DE, we apply (p0.01) and (p0.05) to denote the significance, represents there is no significance between two comparing methods.
Func. | 2-D | 10-D | 30-D | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
P1 | P2 | P1 with DE | P2 with DE | P1 with P2 | P1 | P2 | P1 with DE | P2 with DE | P1 with P2 | P1 | P2 | P1 with DE | P2 with DE | P1 with P2 | |
P1 PSO | P2 PSO | P1 DE | P2 DE | P1 P2 | P1 PSO | P2 PSO | P1 DE | P2 DE | P1 P2 | PSO P1 | PSO P2 | P1 DE | P2 DE | P1 P2 | |
P1 DE | P2 DE | P1 DE | P2 DE | P2 P1 | PSO P1 | PSO P2 | P1 DE | P2 DE | P1 P2 | PSO P1 | PSO P2 | P1 DE | P2 DE | P1 P2 | |
P1 DE | P2 DE | P1 DE | P2 DE | P1 P2 | PSO P1 | PSO P2 | P1 DE | P2 DE | P1 P2 | PSO P1 | PSO P2 | P1 DE | P2 DE | P1 P2 | |
P1 DE | P2 DE | P1 DE | P2 DE | P1 P2 | PSO P1 | PSO P2 | P1 DE | P2 DE | P1 P2 | PSO P1 | PSO P2 | P1 DE | P2 DE | P1 P2 | |
P1 PSO | P2 PSO | P1 DE | P2 DE | P1 P2 | PSO P1 | PSO P2 | P1 DE | P2 DE | P1 P2 | PSO P1 | PSO P2 | P1 DE | P2 DE | P1 P2 | |
P1 PSO | P2 PSO | P1 DE | P2 DE | P1 P2 | PSO P1 | PSO P2 | P1 DE | P2 DE | P1 P2 | PSO P1 | PSO P2 | P1 DE | P2 DE | P1 P2 | |
P1 PSO | P2 PSO | P1 DE | P2 DE | P1 P2 | PSO P1 | PSO P2 | P1 DE | P2 DE | P1 P2 | PSO P1 | PSO P2 | P1 DE | P2 DE | P1 P2 | |
P1 PSO | P2 PSO | P1 DE | P2 DE | P1 P2 | P1 PSO | P2 PSO | P1 DE | P2 DE | P1 P2 | P1 DE | P2 DE | P1 DE | P2 DE | P1 P2 | |
P1 PSO | P2 PSO | P1 DE | P2 DE | P1 P2 | PSO P1 | PSO P2 | P1 DE | P2 DE | P1 P2 | PSO P1 | PSO P2 | P1 DE | P2 DE | P1 P2 | |
P1 DE | P2 DE | P1 DE | P2 DE | P1 P2 | PSO P1 | PSO P2 | P1 DE | P2 DE | P1 P2 | PSO P1 | PSO P2 | P1 DE | P2 DE | P1 P2 | |
P1 PSO | P2 PSO | P1 DE | P2 DE | P1 P2 | P1 GA | P2 GA | P1 DE | P2 DE | P1 P2 | PSO P1 | PSO P2 | P1 DE | P2 DE | P1 P2 | |
P1 PSO | P2 PSO | P1 DE | P2 DE | P1 P2 | P1 PSO | P2 PSO | P1 DE | P2 DE | P1 P2 | PSO P1 | PSO P2 | P1 DE | P2 DE | P1 P2 | |
P1 PSO | P2 PSO | P1 DE | P2 DE | P1 P2 | P1 PSO | P2 PSO | P1 DE | P2 DE | P1 P2 | PSO P1 | PSO P2 | P1 DE | P2 DE | P1 P2 | |
P1 GA | P2 GA | P1 DE | P2 DE | P1 P2 | P1 PSO | P2 PSO | P1 DE | P2 DE | P1 P2 | GA P1 | GA P2 | P1 DE | P2 DE | P1 P2 | |
P1 PSO | P2 PSO | P1 DE | P2 DE | P1 P2 | P1 PSO | P2 PSO | P1 DE | P2 DE | P1 P2 | P1 PSO | P2 PSO | P1 DE | P2 DE | P1 P2 | |
P1 DE | P2 DE | P1 DE | P2 DE | P1 P2 | P1 GA | P2 GA | P1 DE | P2 DE | P1 P2 | P1 ES | P2 ES | P1 DE | P2 DE | P1 P2 | |
P1 DE | P2 DE | P1 DE | P2 DE | P1 P2 | P1 PSO | P2 PSO | P1 DE | P2 DE | P1 P2 | P1 PSO | P2 PSO | P1 DE | P2 DE | P1 P2 | |
P1 PSO | P2 PSO | P1 DE | P2 DE | P1 P2 | P1 PSO | P2 PSO | P1 DE | P2 DE | P1 P2 | PSO P1 | PSO P2 | P1 DE | P2 DE | P1 P2 | |
P1 PSO | P2 PSO | P1 DE | P2 DE | P1 P2 | P1 PSO | P2 PSO | P1 DE | P2 DE | P1 P2 | PSO P1 | PSO P2 | P1 DE | P2 DE | P1 P2 | |
P1 PSO | P2 PSO | P1 DE | P2 DE | P1 P2 | PSO P1 | PSO P2 | P1 DE | P2 DE | P1 P2 | PSO P1 | PSO P2 | P1 DE | P2 DE | P1 P2 | |
P1 PSO | P2 PSO | P1 DE | P2 DE | P1 P2 | PSO P1 | PSO P2 | P1 DE | P2 DE | P1 P2 | PSO P1 | PSO P2 | P1 DE | P2 DE | P1 P2 | |
P1 GA | P2 GA | P1 DE | P2 DE | P1 P2 | PSO P1 | PSO P2 | P1 DE | P2 DE | P1 P2 | GA P1 | GA P2 | P1 DE | P2 DE | P1 P2 | |
P1 PSO | P2 PSO | P1 DE | P2 DE | P1 P2 | P1 PSO | P2 PSO | P1 DE | P2 DE | P1 P2 | P1 PSO | P2 PSO | P1 DE | P2 DE | P1 P2 | |
P1 PSO | P2 PSO | P1 DE | P2 DE | P1 P2 | P1 ES | P2 ES | P1 DE | P2 DE | P1 P2 | PSO P1 | PSO P2 | P1 DE | P2 DE | P1 P2 | |
P1 PSO | P2 PSO | P1 DE | P2 DE | P1 P2 | P1 PSO | P2 PSO | P1 DE | P2 DE | P1 P2 | PSO P1 | PSO P2 | P1 DE | P2 DE | P1 P2 | |
P1 PSO | P2 PSO | P1 DE | P2 DE | P1 P2 | P1 GA | P2 GA | P1 DE | P2 DE | P1 P2 | PSO P1 | PSO P2 | P1 DE | P2 DE | P1 P2 | |
P1 PSO | P2 PSO | P1 DE | P2 DE | P1 P2 | PSO P1 | PSO P2 | P1 DE | P2 DE | P1 P2 | PSO P1 | PSO P2 | P1 DE | P2 DE | P1 P2 | |
P1 PSO | P2 PSO | P1 DE | P2 DE | P1 P2 | P1 PSO | P2 PSO | P1 DE | P2 DE | P1 P2 | PSO P1 | PSO P2 | P1 DE | P2 DE | P1 P2 |
From the experiment results, we can see that our proposal can significantly accelerate the DE in most test functions on the CEC2013 Suite, and P1 has no siginificant difference with P2 in most functions. In the 2-D space, our proposal is competitive with the comparing 5 methods, while in the 10-D and 50-D space, although PSO outperforms our proposal combined with DE on most test functions, this is mainly due to the limited performance of DE on these functions, our proposal can still accelerate the DE in optimization. Different from the method proposed by Yu et al[16], it is unnecessary for our proposal to calculate the moving vector, so it has stronger scalability for our proposal combined with other evolutionary algorithms.
V Conclusion
In this paper, we propose a simple strategy to estimate the convergence point approximately. Based on the averaging strategy, ordinary averaging strategy and weighted averaging strategy are derived. We also design a Gaussian sampling operator based on the estimated convergence point and combine it with traditional DE. Numerical experiments show our proposal can improve the performance of traditional DE. Although our proposal combined with DE is worse than PSO in some functions with 10-D and 50-D, the main reason is the performance of DE is limited in these functions, and our proposal has strong scalability that can be extended to all population-based heuristic algorithms in theory.
In future research, a well-performed local search operator replacing the Gaussian sampling is a promising topic, such as the introduction of CMA-ES or the adaptive evolution strategy. And in this paper, we only apply our proposal in low dimensional test functions. In future research, we will combine our proposal with cooperative co-evolution to solve large-scale optimization problems. Finally, the estimation of a convergence point combined with the Gaussian sampling operator has a broad prospect to solve optimization problems and can be easily extended with other population-based evolutionary algorithms.
Vi Acknowledgement
This work was supported by JSPS KAKENHI Grant Number JP20K11967.
References
- [1] (2017) Deriving and improving cma-es with information geometric trust regions. In Proceedings of the Genetic and Evolutionary Computation Conference, pp. 657–664. Cited by: §I.
- [2] (2010) Three modified versions of differential evolution algorithm for continuous optimization. Soft Computing 15 (4), pp. 803–830. Cited by: §I.
- [3] (2011) Multi-objective optimisation using evolutionary algorithms: an introduction. In Multi-objective evolutionary optimisation for product design and manufacturing, pp. 3–34. Cited by: §I.
- [4] (2022) Fides: reliable trust-region optimization for parameter estimation of ordinary differential equation models. PLOS Computational Biology 18 (7), pp. e1010322. Cited by: §I.
- [5] (1992) Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT press. Cited by: §II-A2.
- [6] (1995) Particle swarm optimization. In Proceedings of ICNN’95 - International Conference on Neural Networks, Vol. 4, pp. 1942–1948 vol.4. External Links: Document Cited by: §II-A5.
- [7] (2013-01) Problem definitions and evaluation criteria for the cec 2013 special session on real-parameter optimization. Technical Report 201212, Computational Intelligence Laboratory, Zhengzhou University, Zhengzhou China, pp. . Cited by: §I, §IV.
- [8] (2015-09-10) Analytical estimation of the convergence point of populations. In 2015 IEEE Congress on Evolutionary Computation, CEC 2015 - Proceedings, pp. 2619–2624. Cited by: §I, §II-B, §II-B.
- [9] (1991) A competitive-cooperative approach to complex combinatorial search. Cited by: §I.
- [10] (1963) Convergence of random search method in extremal control of multi-parameter systems. Avtomat.i Telemekh, pp. 1467–1473. Cited by: §II-A1.
- [11] (1964) The strategy of evolution. American scientist 52 (3), pp. 342–357. Cited by: §II-A4.
- [12] (1996) On the usage of differential evolution for function optimization. In Proceedings of north american fuzzy information processing, pp. 519–523. Cited by: §II-A3.
- [13] (2015) Parallel test sheets generation using differential evolution algorithm with constraint effective encoding and either-or mutation. In 2015 2nd International Conference on Information Science and Control Engineering, pp. 376–379. Cited by: §I.
- [14] (1997) No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation 1 (1), pp. 67–82. External Links: Document Cited by: §I.
- [15] (2007) Differential evolution for high-dimensional function optimization. pp. 3523–3530. Cited by: §I.
- [16] (2019-05) Accelerating evolutionary computation using a convergence point estimated by weighted moving vectors. Complex and Intelligent Systems 6, pp. . External Links: Document Cited by: §I, §IV.